How to query Elastic Search in Python

티스토리 뷰

Language/Python

How to query Elastic Search in Python

KyeongRok Kim 2020. 10. 27. 13:15

개요

SQL은 알고 있지만 Elastic Search에서 쿼리를 하려면 엘라스틱서치만의 문법으로 쿼리를 해주어야 합니다. 이 포스트는 SQL을 엘라스틱쿼리로 바꿔서 질의하는 내용에 대해 다룹니다.

엘라스틱서치는 이하 es라고 하겠습니다.

krksap.tistory.com/1634

Elastic Search(엘라스틱서치) 설치(win), 명령어

설치.windows10 www.elastic.co/downloads/elasticsearch?latest [ Download Elasticsearch Free | Get Started Now | Elastic | Elastic Want it hosted? Deploy on Elastic Cloud. Get Started » www.elastic.c..

krksap.tistory.com

엘라스틱서치 설치하기, Devtools에서의 명령어는 위 포스트에 있습니다.

모든 indices보기

es의 index는 SQL의 table과 같습니다. 일단 쿼리를 해보고 싶다면 table명을 알아야겠지요?

from elasticsearch import Elasticsearch

es = Elasticsearch()
r = es.indices.get_alias('*')
print(r)

Developer Tools:

GET _cat/indices

결과

{'product_list': {'aliases': {}}, 'data_set': {'aliases': {}}, 'user_2019': {'aliases': {}}}

결과 해석

product_list, data_set, user_2019라는 3개의 table이 있는 것을 알 수 있습니다.

1.select

from elasticsearch import Elasticsearch

def searchAPI(index_name):
    es = Elasticsearch()
    index = index_name
    body = {
        'size':10000,
        'query':{
            'match_all':{}
        }
    }
    res = es.search(index=index, body=body)
    return res

result = searchAPI('users')
hits = result['hits']['hits']
print(len(hits))

위와 같이 query를 하면 users에서 10,000개를 불러오라는 뜻입니다. 실제로 users에는 48,000여개의 데이터가 들어있지만

{
    'size':200000,
    'query':{
        'match_all':{}
    }
}

이렇게 2만개를 할 경우 에러가 납니다.

'''
elasticsearch.exceptions.RequestError: TransportError(400, 'search_phase_execution_exception', 'Result window is too large, from + size must be less than or equal to: [10000] but was [200000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.')
'''
에러 메세지는 from으로부터 10000개씩 사용하게 되어 있는데 그 이상을 요청 했다는 뜻입니다. 이렇게 10000개가 넘어간다면 좀 더 효율적인 방법을 사용하라고 합니다.

Kibana에서 직접 사용 할 때는 '싱글쿼트 대신 "더블쿼트를 이용하시기 바랍니다.

{
    'query':{
        'match_all':{}
    }
}

또한 위와 같이 size를 빼고 쿼리를 하면 개수가 10개만 나옵니다.

2. select count(*) from <table_name>

from elasticsearch import Elasticsearch
es = Elasticsearch()
es.count(index='table_name')

GET <Index_name>/_count?pretty

elasticsearch라이브러리는 count() function을 제공합니다. 개수만 알고 싶을때는 es.count(index='table_name')을 써줍니다.

결과

{'count': 48542, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}}

3. Delete index(drop table)

es.indices.delete('index_name')를 이용하시면 됩니다.
index(table)에 있는 document(record)를 지우려면 es.delete()를 이용합니다.

ES에서 table역할을 하는 index지우기

es.indices.delete('index_name')
res = es.count(index='index_name')
print(res)

결과
elasticsearch.exceptions.NotFoundError: TransportError(404, 'index_not_found_exception', 'no such index [index_name]')

결과에 404에러가 났습니다. es.indices.delete('index_name') 이 코드가 잘 작동 해서 해당 index가 잘 지워졌기 때문에 에러가 난 것입니다.

4. 특정 column만 추출하기

sql에서 해당 table 의 code, year만 뽑고 싶다면 select code, year from table_name limit 10000 이렇게 사용 합니다.

es에서는 아래와 같이 "_source"를 이용해 필요한 column만 뽑아낼 수 있습니다.

{
    "size": 10000,
    "_source": {
        "includes": [ "code", "year"],
    },
    "query": {
        "match_all": {}
    },
}

5. schema(mapping)확인하기

인덱스(table)의 구조를 알고 싶을 때 사용하는 방법입니다.

create table query라고 할 수 있습니다.

from elasticsearch import Elasticsearch
es = Elasticsearch()
print(es.indices.get_mapping('table_name'))

ES 설치, 명령어 모음

krksap.tistory.com/1634

저작자표시

'Language > Python' 카테고리의 다른 글

Python venv만들기, package install and conda (1)	2021.01.18
Python Flask Pycharm Community Edition에서 개발환경 구축하기 (0)	2020.10.30
Python hashlib로 sha256인코딩한 hex값 만들기 (0)	2020.09.27
Python Pandas Merge(Join) 조인 합치기 (0)	2020.09.21
Python에서 주소로 Tree만들기 법정동코드, 행정동코드 바꾸는 예제 (0)	2020.09.19

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2024/04 »
일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

글 보관함

뷰티풀 프로그래밍

티스토리 뷰