Faster Python Elasticsearch index() by using concurrent.futures ThreadPoolExecutor

In our previous post Elasticsearch Python minimal index() / insert example we showed how to insert a document into Elasticsearch.

When inserting a large number of documents into Elasticsearch, you will notice that it’s extremely slow to wait for the API call to finish before trying to insert the document.

In this post we’ll show a simple way of doing many requests in parallel so multiple¬†index operations are running concurrently while your code is processing more documents. For this, we’ll use concurrent.futures.ThreadPoolExecutor and – after inserting all documents into the queue, use concurrent.futures.wait to wait for all requests to finish before we’ll exit.

#!/usr/bin/env python3
from elasticsearch import Elasticsearch
from concurrent.futures import ThreadPoolExecutor
import concurrent.futures

index_executor = ThreadPoolExecutor(64)
futures = []

es = Elasticsearch()
for i in range(1000):
    future = index_executor.submit(es.index, index="test-index", id=i, body={"test": 123})
    futures.append(future)

print("Waiting for requests to complete...")
concurrent.futures.wait(futures)