ElasticSearch

How to fix elasticsearch.exceptions.RequestError: RequestError(400, ‘resource_already_exists_exception’, ‘index […] already exists’) in Python

Problem:

You want to create an ElasticSearch index in Python using code like

es.indices.create("nodes") # Create an index names "nodes"

but you see the following error message:

Traceback (most recent call last):
  File "estest.py", line 22, in <module>
    es.indices.create("nodes")
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/client/utils.py", line 168, in _wrapped
    return func(*args, params=params, headers=headers, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/client/indices.py", line 123, in create
    return self.transport.perform_request(
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/transport.py", line 415, in perform_request
    raise e
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/transport.py", line 381, in perform_request
    status, headers_response, data = connection.perform_request(
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/connection/http_urllib3.py", line 277, in perform_request
    self._raise_error(response.status, raw_data)
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/connection/base.py", line 330, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
elasticsearch.exceptions.RequestError: RequestError(400, 'resource_already_exists_exception', 'index [nodes/mXAiBt0wTKK4Y31HpshVbw] already exists')

Solution:

The error message tells you that the index you are trying to create already exists!

The simples solution is to use the code from our post on How to create ElasticSearch index if it doesn’t already exist in Python:

def es_create_index_if_not_exists(es, index):
    """Create the given ElasticSearch index and ignore error if it already exists"""
    try:
        es.indices.create(index)
    except elasticsearch.exceptions.RequestError as ex:
        if ex.error == 'resource_already_exists_exception':
            pass # Index already exists. Ignore.
        else: # Other exception - raise it
            raise ex

and use that function to create your index:

es_create_index_if_not_exists(es, "nodes") # Creates the "nodes" index ; doesn't fail if it already exists

 

Posted by Uli Köhler in Databases, ElasticSearch, Python

How to create ElasticSearch index if it doesn’t already exist in Python

The following utility will create an index if it doesn’t exist already by ignoring any resource_already_exists_exception

def es_create_index_if_not_exists(es, index):
    """Create the given ElasticSearch index and ignore error if it already exists"""
    try:
        es.indices.create(index)
    except elasticsearch.exceptions.RequestError as ex:
        if ex.error == 'resource_already_exists_exception':
            pass # Index already exists. Ignore.
        else: # Other exception - raise it
            raise ex

# Example usage: Create "nodes" index
es_create_index_if_not_exists(es, "nodes")

 

Posted by Uli Köhler in Databases, ElasticSearch, Python

What is the ElasticSearch equivalent to an SQL table?

In ElasticSearch, the concept which closely resembles an SQL table is called an index.

Compared to an SQL table, the index does not neccessarily need to have a predefined structure – the ElasticSearch index is more similar to a MongoDB collection.

However, an elasticsearch index has many features similar to SQL tables such as indices (which you typically don’t need to create explicity – ElasticSearch takes care of that for you).

Typically, indices contain lots of similar documents that have (mostly) the same properties.

Posted by Uli Köhler in Databases, ElasticSearch

How to fix Elasticsearch Python elasticsearch.exceptions.NotFoundError: NotFoundError(404, ‘index_not_found_exception’, ‘no such index [node]’, node, index_or_alias)

Problem:

You want to update ElasticSearch index settings in Python using code like

es.indices.put_settings(index="node", body={
    "index.mapping.total_fields.limit": 100000
})

but you see an error message like this one:

Traceback (most recent call last):
  File "estest.py", line 11, in <module>
    es.indices.put_settings(index="node", body={
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/client/utils.py", line 168, in _wrapped
    return func(*args, params=params, headers=headers, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/client/indices.py", line 786, in put_settings
    return self.transport.perform_request(
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/transport.py", line 415, in perform_request
    raise e
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/transport.py", line 381, in perform_request
    status, headers_response, data = connection.perform_request(
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/connection/http_urllib3.py", line 277, in perform_request
    self._raise_error(response.status, raw_data)
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/connection/base.py", line 330, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
elasticsearch.exceptions.NotFoundError: NotFoundError(404, 'index_not_found_exception', 'no such index [node]', node, index_or_alias)

Solution:

The index needs to be created first in order to be able to put settings. First, double-check if you spelled the index name correctly! You can see the index name in the exception: no such index [node] means that the index is called node

The direct way to create an index is

es.indices.create("node")

But note that this will fail if the index already exists. In order to work around this issue, I recommend to use the code from our previous post How to create ElasticSearch index if it doesn’t already exist in Python:

def es_create_index_if_not_exists(es, index):
    """Create the given ElasticSearch index and ignore error if it already exists"""
    try:
        es.indices.create(index)
    except elasticsearch.exceptions.RequestError as ex:
        if ex.error == 'resource_already_exists_exception':
            pass # Index already exists. Ignore.
        else: # Other exception - raise it
            raise exnodes

and use the es_create_index_if_not_exists()  function to create your index:

es_create_index_if_not_exists(es, "node") # Creates the "node" index ; doesn't fail if it already exists

 

Posted by Uli Köhler in Databases, ElasticSearch, Python

How to set index setting using Python ElasticSearch client

You can set index settings using the official ElasticSearch python client library by using:

es.indices.put_settings(index="my-index", body={
    # Put your index settings here
    # Example: "index.mapping.total_fields.limit": 100000
})

Full example:

from elasticsearch import Elasticsearch

es = Elasticsearch()

es.indices.put_settings(index="ways", body={
    "index.mapping.total_fields.limit": 100000
})
Posted by Uli Köhler in Databases, ElasticSearch, Python

How to increase ElasticSearch total field limit using Python API

When using ElasticSearch, you will sometimes encounter an Limit of total fields [1000] has been exceeded when you insert a large document.

One solution that often works for real-world scenarios is to just increase the default limit of 1000 to, for example, 100000 to account for even the largest documents.

How this can be done is, however, not well-documented for the ElasticSearch Python API. Here’s how you can do it:

from elasticsearch import Elasticsearch

es = Elasticsearch()

es.indices.put_settings(index="my-index", body={
    "index.mapping.total_fields.limit": 100000
})

 

Posted by Uli Köhler in Databases, ElasticSearch, Python

Faster Python Elasticsearch index() by using concurrent.futures ThreadPoolExecutor

In our previous post Elasticsearch Python minimal index() / insert example we showed how to insert a document into Elasticsearch.

When inserting a large number of documents into Elasticsearch, you will notice that it’s extremely slow to wait for the API call to finish before trying to insert the document.

In this post we’ll show a simple way of doing many requests in parallel so multiple index operations are running concurrently while your code is processing more documents. For this, we’ll use concurrent.futures.ThreadPoolExecutor and – after inserting all documents into the queue, use concurrent.futures.wait to wait for all requests to finish before we’ll exit.

#!/usr/bin/env python3
from elasticsearch import Elasticsearch
from concurrent.futures import ThreadPoolExecutor
import concurrent.futures

index_executor = ThreadPoolExecutor(64)
futures = []

es = Elasticsearch()
for i in range(1000):
    future = index_executor.submit(es.index, index="test-index", id=i, body={"test": 123})
    futures.append(future)

print("Waiting for requests to complete...")
concurrent.futures.wait(futures)

 

Posted by Uli Köhler in Databases, ElasticSearch, Python

Elasticsearch Python minimal index() / insert example

This minimal example inserts a single document into Elasticsearch running at http://localhost:9200:

#!/usr/bin/env python3
from elasticsearch import Elasticsearch

es = Elasticsearch()
es.index(index="test-index", id=1, body={"test": 123})

 

Posted by Uli Köhler in Databases, ElasticSearch, Python

Simple Elasticsearch setup with docker-compose

The following docker-compose.yml is a simple starting point for using ElasticSearch within a docker-based setup:

version: '2.2'
services:
    elasticsearch1:
        image: docker.elastic.co/elasticsearch/elasticsearch:7.13.4
        container_name: elasticsearch1
        environment:
            - cluster.name=docker-cluster
            - node.name=elasticsearch1
            - cluster.initial_master_nodes=elasticsearch1
            - bootstrap.memory_lock=true
            - http.cors.allow-origin=http://localhost:1358,http://127.0.0.1:1358
            - http.cors.enabled=true
            - http.cors.allow-headers=X-Requested-With,X-Auth-Token,Content-Type,Content-Length,Authorization
            - http.cors.allow-credentials=true
            - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
        ulimits:
            memlock:
                soft: -1
                hard: -1
        volumes:
            - ./esdata1:/usr/share/elasticsearch/data
        ports:
            - 9200:9200
    dejavu:
        image: appbaseio/dejavu
        container_name: dejavu
        ports:
            - 1358:1358

Now create the esdata1 directory with the correct permissions:

sudo mkdir esdata1
sudo chown -R 1000:1000 esdata1

We also need to configure the vm.max_map_count sysctl parameter:

echo -e "\nvm.max_map_count=524288\n" | sudo tee -a /etc/sysctl.conf && sudo sysctl -w vm.max_map_count=524288

 

I recommend to place it in /opt/elasticsearch, but you can place wherever you like.

If you want to autostart it on boot, see Create a systemd service for your docker-compose project in 10 seconds or just use this snippet from said post:

curl -fsSL https://techoverflow.net/scripts/create-docker-compose-service.sh | sudo bash /dev/stdin

This will create a systemd service named elasticsearch (if your directory is named elasticsearch like /opt/elasticsearch) and enable and start it immediately. Hence you can restart using

sudo systemctl restart elasticsearch

and view the logs using

sudo journalctl -xfu elasticsearch

For more complex setup involving more than one node, see our previous post on ElasticSearch docker-compose.yml and systemd service generator

Posted by Uli Köhler in Container, Databases, Docker, ElasticSearch

How to fix ElasticSearch [1]: initial heap size […] not equal to maximum heap size […];

Problem:

Your ElasticSearch server fails to start with an error message like

ERROR: [1] bootstrap checks failed
[1]: initial heap size [536870912] not equal to maximum heap size [2147483648]; this can cause resize pauses and prevents memory locking from locking the entire heap
ERROR: Elasticsearch did not exit normally - check the logs at /usr/share/elasticsearch/logs/docker-cluster.log

Solution:

Set the initial heap size equal to the maximum heap size: The -Xms argument and the -Xmx argument must be equal, for example:

-Xms2048m -Xmx2048m

Typically (such as in a docker-based setup) you can set this in ES_JAVA_OPTS:

ES_JAVA_OPTS=-Xms2048m -Xmx2048m

For docker-compose based environments, this is an example configuration that works:

environment:
    - cluster.name=docker-cluster
    - node.name=elasticsearch1
    - cluster.initial_master_nodes=elasticsearch1
    - bootstrap.memory_lock=true
    - http.cors.allow-origin=http://localhost:1358,http://127.0.0.1:1358
    - http.cors.enabled=true
    - http.cors.allow-headers=X-Requested-With,X-Auth-Token,Content-Type,Content-Length,Authorization
    - http.cors.allow-credentials=true
    - "ES_JAVA_OPTS=-Xms2048m -Xmx2048m"

After that, restart your ElasticSearch instance.

Posted by Uli Köhler in ElasticSearch

How to fix ElasticSearch docker AccessDeniedException[/usr/share/elasticsearch/data/nodes];”,

Problem:

You are trying to start a dockerized ElasticSearch instance but you see an error log like

lasticsearch1    | {"type": "server", "timestamp": "2020-04-18T01:17:27,564Z", "level": "ERROR", "component": "o.e.b.ElasticsearchUncaughtExceptionHandler", "cluster.name": "docker-cluster", "node.name": "elasticsearch1", "message": "uncaught exception in thread [main]", 
elasticsearch1    | "stacktrace": ["org.elasticsearch.bootstrap.StartupException: ElasticsearchException[failed to bind service]; nested: AccessDeniedException[/usr/share/elasticsearch/data/nodes];",
elasticsearch1    | "at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:174) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:161) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:125) ~[elasticsearch-cli-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-cli-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:126) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "Caused by: org.elasticsearch.ElasticsearchException: failed to bind service",
elasticsearch1    | "at org.elasticsearch.node.Node.<init>(Node.java:615) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.node.Node.<init>(Node.java:257) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:221) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:221) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:349) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:170) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "... 6 more",
elasticsearch1    | "Caused by: java.nio.file.AccessDeniedException: /usr/share/elasticsearch/data/nodes",
elasticsearch1    | "at sun.nio.fs.UnixException.translateToIOException(UnixException.java:90) ~[?:?]",
elasticsearch1    | "at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]",
elasticsearch1    | "at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) ~[?:?]",
elasticsearch1    | "at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:389) ~[?:?]",
elasticsearch1    | "at java.nio.file.Files.createDirectory(Files.java:693) ~[?:?]",
elasticsearch1    | "at java.nio.file.Files.createAndCheckIsDirectory(Files.java:800) ~[?:?]",
elasticsearch1    | "at java.nio.file.Files.createDirectories(Files.java:786) ~[?:?]",
elasticsearch1    | uncaught exception in thread [main]
elasticsearch1    | "at org.elasticsearch.env.NodeEnvironment.lambda$new$0(NodeEnvironment.java:274) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.env.NodeEnvironment$NodeLock.<init>(NodeEnvironment.java:211) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:271) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.node.Node.<init>(Node.java:277) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.node.Node.<init>(Node.java:257) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:221) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:221) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:349) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:170) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "... 6 more"] }
elasticsearch1    | ElasticsearchException[failed to bind service]; nested: AccessDeniedException[/usr/share/elasticsearch/data/nodes];
elasticsearch1    | Likely root cause: java.nio.file.AccessDeniedException: /usr/share/elasticsearch/data/nodes
elasticsearch1    |     at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:90)
elasticsearch1    |     at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
elasticsearch1    |     at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
elasticsearch1    |     at java.base/sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:389)
elasticsearch1    |     at java.base/java.nio.file.Files.createDirectory(Files.java:693)
elasticsearch1    |     at java.base/java.nio.file.Files.createAndCheckIsDirectory(Files.java:800)
elasticsearch1    |     at java.base/java.nio.file.Files.createDirectories(Files.java:786)
elasticsearch1    |     at org.elasticsearch.env.NodeEnvironment.lambda$new$0(NodeEnvironment.java:274)
elasticsearch1    |     at org.elasticsearch.env.NodeEnvironment$NodeLock.<init>(NodeEnvironment.java:211)
elasticsearch1    |     at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:271)
elasticsearch1    |     at org.elasticsearch.node.Node.<init>(Node.java:277)
elasticsearch1    |     at org.elasticsearch.node.Node.<init>(Node.java:257)
elasticsearch1    |     at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:221)
elasticsearch1    |     at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:221)
elasticsearch1    |     at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:349)
elasticsearch1    |     at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:170)
elasticsearch1    |     at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:161)
elasticsearch1    |     at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86)
elasticsearch1    |     at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:125)
elasticsearch1    |     at org.elasticsearch.cli.Command.main(Command.java:90)
elasticsearch1    |     at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:126)
elasticsearch1    |     at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92)
elasticsearch1    | For complete error details, refer to the log at /usr/share/elasticsearch/logs/docker-cluster.log

Solution:

Fix the permissions of the host directory mapped to /usr/share/elasticsearch/data. On my instance that directory is /var/lib/elasticsearch/esdata1.

Run

sudo chown -R 1000:1000 [directory]

e.g.

sudo chown -R 1000:1000 /var/lib/elasticsearch/esdata1

 

Posted by Uli Köhler in ElasticSearch

ElasticSearch: How to iterate / scroll through all documents in index

In ElasticSearch, you can use the Scroll API to scroll through all documents in an entire index.

In Python you can scroll like this:

def es_iterate_all_documents(es, index, pagesize=250, scroll_timeout="1m", **kwargs):
    """
    Helper to iterate ALL values from a single index
    Yields all the documents.
    """
    is_first = True
    while True:
        # Scroll next
        if is_first: # Initialize scroll
            result = es.search(index=index, scroll="1m", **kwargs, body={
                "size": pagesize
            })
            is_first = False
        else:
            result = es.scroll(body={
                "scroll_id": scroll_id,
                "scroll": scroll_timeout
            })
        scroll_id = result["_scroll_id"]
        hits = result["hits"]["hits"]
        # Stop after no more docs
        if not hits:
            break
        # Yield each entry
        yield from (hit['_source'] for hit in hits)

This function will yield each document encountered in the index.

Example usage for index my_index:

es = Elasticsearch([{"host": "localhost"}])

for entry in es_iterate_all_documents(es, 'my_index'):
    print(entry) # Prints the document as stored in the DB

 

Posted by Uli Köhler in Databases, ElasticSearch, Python

ElasticSearch: How to iterate all documents in index using Python (up to 10000 documents)

Important Note: This simple approach only works for up to ~10000 documents. Prefer using our scroll-based solution: See ElasticSearch: How to iterate / scroll through all documents in index

Use this helper function to iterate over all the documens in an index

def es_iterate_all_documents(es, index, pagesize=250, **kwargs):
    """
    Helper to iterate ALL values from
    Yields all the documents.
    """
    offset = 0
    while True:
        result = es.search(index=index, **kwargs, body={
            "size": pagesize,
            "from": offset
        })
        hits = result["hits"]["hits"]
        # Stop after no more docs
        if not hits:
            break
        # Yield each entry
        yield from (hit['_source'] for hit in hits)
        # Continue from there
        offset += pagesize

Usage example:

for entry in es_iterate_all_documents(es, 'my_index'):
    print(entry) # Prints the document as stored in the DB

How it works

You can iterate over all documents in an index in ElasticSearch by using queries like

{
    "size": 250,
    "from": 0
}

and increasing "from" by "size" after each iteration.

Posted by Uli Köhler in Databases, ElasticSearch, Python

Fixing ElasticSearch ‘Unknown key for a VALUE_NUMBER in [offset].’

The error message

Unknown key for a VALUE_NUMBER in [offset].

in ElasticSearch tells you that in the query JSON you have specified an offset (numeric value) but ElasticSearch doesn’t know what to do with offset.

This is easy to fix: In order to specify an offset to start from, use from instead of offset.

Incorrect:

{
    "size": 250,
    "offset": 1000
}

Correct:

{
    "size": 250,
    "from": 1000
}

 

Posted by Uli Köhler in Databases, ElasticSearch

Fixing ElasticSearch ‘Unknown key for a START_ARRAY in […].’

When you get an error message like

elasticsearch.exceptions.RequestError: RequestError(400, 'parsing_exception', 'Unknown key for a START_ARRAY in [size].')

in ElasticSearch, look for the value in the [brackets]. In our example this is [size].

Your query contains an array for that key but an array is not allowed.

For example, this query is malformed:

{
    "size": [10, 11]
}

because size takes a number, not an array.

In Python, if you are programmatically building your array, you might have an extra comma at the end of your line.

For example, if you have a line like

query["size"] = 250,

this is just syntactic sugar for

query["size"] = (250,)

i.e. it will set size to a tuple with one element. In the JSON this will result in

{
    "size": [250]
}

In order to fix that issue, remove the comma from the end of the line

query["size"] = 250

which will result in the correct query JSON

{
    "size": 250
}
Posted by Uli Köhler in Databases, ElasticSearch

ElasticSearch equivalent to MongoDB .count()

The ElasticSearch equivalent to MongoDB’s count() is also called count. It can be used in a similar way

When you have an ElasticSearch query like (example in Python)

result = es.search(index="my_index", body={
    "query": {
        "match": {
            "my_field": "my_value"
        }
    }
})
result_docs = [hit["_source"] for hit in result["hits"]["hits"]]

you can easily change it to a count-only query by replacing search with count:

result = es.count(index="my_index", body={
    "query": {
        "match": {
            "my_field": "my_value"
        }
    }
})
result_count = result["count"] # e.g. 200
Posted by Uli Köhler in Databases, ElasticSearch

Fixing ElasticSearch ‘no [query] registered for [query]’

Problem:

You want to run a query in ElasticSearch, but you get an error message like

elasticsearch.exceptions.RequestError: RequestError(400, 'parsing_exception', 'no [query] registered for [query]')

Solution:

In your query body, you have two "query" objects nested in each other. Remove the outer "query", keeping only the inner one.

Example:

Incorrect:

{
    "query": {
        "query": {
            "match": {
                "my_field": "my_value"
            }
        }
    }
}

Correct:

{
    "query": {
        "match": {
            "my_field": "my_value"
        }
    }
}

 

Posted by Uli Köhler in Databases, ElasticSearch

How to fix ElasticSearch ‘Types cannot be provided in put mapping requests, unless the include_type_name parameter is set to true’

Problem:

You want to create a mapping in ElasticSearch but you see an error message like

elasticsearch.exceptions.RequestError: RequestError(400, 'illegal_argument_exception', 'Types cannot be provided in put mapping requests, unless the include_type_name parameter is set to true.')

Solution:

As already suggested in the error message, set the include_type_name parameter to True.

With the Python API this is as simple as adding include_type_name=True to the put_mapping(...) call:

es.indices.put_mapping(index='my_index', body=my_mapping, doc_type='_doc', include_type_name=True)

In case you now see an error like

TypeError: put_mapping() got an unexpected keyword argument 'include_type_name'

you need to upgrade your elasticsearch python library, e.g. using

sudo pip3 install --upgrade elasticsearch

 

Posted by Uli Köhler in Databases, ElasticSearch, Python

How to view ElasticSearch cluster health using curl

To view the cluster health of your ElasticSearch cluster use

curl -X GET "http://localhost:9200/_cluster/health?pretty=true"

If your ElasticSearch is not running on localhost, replace localhost by the hostname or IP address ElasticSearch is running on.

Example output:
{
  "cluster_name" : "docker-cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 0,
  "active_shards" : 0,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

The most important information from this is:

  • "cluster_name" : "docker-cluster" The name you assigned to your cluster. You should verify that you are connecting to the correct cluster. All ElasticSearch nodes from that cluster must have the same cluster name, or they won’t connect!
  • "number_of_nodes" : 1 The number of nodes currently in the cluster. Sometimes some nodes take longer to start up, so if there are some nodes missing, wait a minute and retry
  • "status" : "green" The status or cluster health of your cluster.

The cluster health can take three values:

  • green: Everything is OK with your cluster (like in our example)
  • yellow: Your cluster is mostly OK, but some shards couldn’t be replicated. This is often the case with cluster consisting of one node only (in that case, note that a data loss on the one node can not be recovered)
  • red: Something is wrong with the cluster. Usually that’s some configuration issue, so be sure to check the logs.

Also see the official reference on cluster health

If you are looking for help on how to setup your ElasticSearch cluster using docker and docker-compose, you can generate your config file using our generator at ElasticSearch docker-compose.yml and systemd service generator.

Posted by Uli Köhler in Databases, ElasticSearch