In ElasticSearch, you can use the Scroll API to scroll through all documents in an entire index.
In Python you can scroll like this:
def es_iterate_all_documents(es, index, pagesize=250, scroll_timeout="1m", **kwargs): """ Helper to iterate ALL values from a single index Yields all the documents. """ is_first = True while True: # Scroll next if is_first: # Initialize scroll result = es.search(index=index, scroll="1m", **kwargs, body={ "size": pagesize }) is_first = False else: result = es.scroll(body={ "scroll_id": scroll_id, "scroll": scroll_timeout }) scroll_id = result["_scroll_id"] hits = result["hits"]["hits"] # Stop after no more docs if not hits: break # Yield each entry yield from (hit['_source'] for hit in hits)
This function will yield
each document encountered in the index.
Example usage for index my_index
:
es = Elasticsearch([{"host": "localhost"}]) for entry in es_iterate_all_documents(es, 'my_index'): print(entry) # Prints the document as stored in the DB