ElasticSearch: How to iterate / scroll through all documents in index
In ElasticSearch, you can use the Scroll API to scroll through all documents in an entire index.
In Python you can scroll like this:
def es_iterate_all_documents(es, index, pagesize=250, scroll_timeout="1m", **kwargs):
"""
Helper to iterate ALL values from a single index
Yields all the documents.
"""
is_first = True
while True:
# Scroll next
if is_first: # Initialize scroll
result = es.search(index=index, scroll="1m", **kwargs, body={
"size": pagesize
})
is_first = False
else:
result = es.scroll(body={
"scroll_id": scroll_id,
"scroll": scroll_timeout
})
scroll_id = result["_scroll_id"]
hits = result["hits"]["hits"]
# Stop after no more docs
if not hits:
break
# Yield each entry
yield from (hit['_source'] for hit in hits)
This function will yield
each document encountered in the index.
Example usage for index my_index
:
es = Elasticsearch([{"host": "localhost"}])
for entry in es_iterate_all_documents(es, 'my_index'):
print(entry) # Prints the document as stored in the DB