ElasticSearch: How to iterate all documents in index using Python (up to 10000 documents)
Important Note: This simple approach only works for up to ~10000 documents. Prefer using our scroll-based solution: SeeĀ ElasticSearch: How to iterate / scroll through all documents in index
Use this helper function to iterate over all the documens in an index
def es_iterate_all_documents(es, index, pagesize=250, **kwargs):
"""
Helper to iterate ALL values from
Yields all the documents.
"""
offset = 0
while True:
result = es.search(index=index, **kwargs, body={
"size": pagesize,
"from": offset
})
hits = result["hits"]["hits"]
# Stop after no more docs
if not hits:
break
# Yield each entry
yield from (hit['_source'] for hit in hits)
# Continue from there
offset += pagesize
Usage example:
for entry in es_iterate_all_documents(es, 'my_index'):
print(entry) # Prints the document as stored in the DB
How it works
You can iterate over all documents in an index in ElasticSearch by using queries like
{
"size": 250,
"from": 0
}
and increasing "from"
by "size"
after each iteration.