ElasticSearch

How to fix ElasticSearch ‘[match] query doesn’t support multiple fields, found […] and […]’

Problem:

You want to run an ElasticSearch query like

{
    "query": {
        "match" : {
            "one_field" : "one_value",
            "another_field": "another_value"
        }
    }
}

but you only see an error message like

elasticsearch.exceptions.RequestError: RequestError(400, 'parsing_exception', "[match] query doesn't support multiple fields, found [one_field] and [another_field]")

Solution:

Match queries only support one field. You should use a bool query with a must clause containing multiple match queries instead:

{
    "query": {
        "bool": {
            "must": [
                {"match": {"one_field" : "one_value"}},
                {"match": {"another_field" : "another_value"}},
            ]
        }
    }
}

Also see the official docs for the MultiMatch query.

Posted by Uli Köhler in Databases, ElasticSearch

How to fix ElasticSearch ‘no [query] registered for [missing]’

Problem:

You are trying to run an ElasticSearch query like

{
    "query": {
        "missing" : { "field" : "myfield" }
    }
}

to find documents that do not have myfield.

However you only see an error message like this:

elasticsearch.exceptions.RequestError: RequestError(400, 'parsing_exception', 'no [query] registered for [missing]')

Solution:

As the ElasticSearch documentation tells, us, there is no missing query! You instead need to use an exists query inside a must_not clause:

{
    "query": {
        "bool": {
            "must_not": {
                "exists": {
                    "field": "myfield"
                }
            }
        }
    }
}

 

Posted by Uli Köhler in Databases, ElasticSearch

How to fix ElasticSearch ‘Root mapping definition has unsupported parameters’

Problem:

You want to create an ElasticSearch index with a custom mapping or update the mapping of an existing ElasticSearch index but you see an error message like

elasticsearch.exceptions.RequestError: RequestError(400, 'mapper_parsing_exception', 'Root mapping definition has unsupported parameters:  [mappings : {properties={num_total={type=integer}, approved={type=integer}, num_translated={type=integer}, pattern_length={type=integer}, num_unapproved={type=integer}, pattern={type=keyword}, num_approved={type=integer}, translated={type=integer}, untranslated={type=integer}, num_untranslated={type=integer}, group={type=keyword}}}]')

Solution:

This can point to multiple issues. Essentially, ElasticSearch is trying to tell you that the structure of your JSON is not correct.

Often this error is misinterpreted as individual field definitions being wrong, but this is rarely the issue (and only if an individual field definition is completely malformed).

If your message is structured like

... unsupported parameters:  [mappings : ...

then the most likely root cause is that you have mappings nested inside mappings in your JSON. This also applies if you update a mapping (put_mapping) – in this case the outer mapping is implicit!

Example: Your code looks like this:

es.indices.put_mapping(index='my_index, doc_type='_doc', body={
    "mappings": {
        "properties": {
            "pattern": {
                "type":  "keyword"
            }
        }
    }
})

ElasticSearch will internally create a JSON like this internally:

{
    "mappings": {
        "mappings": {
            "properties": {
                "pattern": {
                    "type":  "keyword"
                }
            }
        }
    }
}

See that there are two mappings inside each other? ElasticSearch does not view this as a correctly structured JSON, therefore you need to remove the "mapping": {...} from your code, resulting in

es.indices.put_mapping(index='my_index, doc_type='_doc', body={
    "properties": {
        "pattern": {
            "type":  "keyword"
        }
    }
})
Posted by Uli Köhler in Databases, ElasticSearch, Python

Fixing ElasticSearch ‘No handler for type [int] declared on field …’

Problem:

You want to create an index with a custom mapping in ElasticSearch but you see an error message like this:

elasticsearch.exceptions.RequestError: RequestError(400, 'mapper_parsing_exception', 'No handler for type [int] declared on field [id]')

Solution:

You likely have a mapping like

"id": {
    "type":  "int"
}

in your mapping properties.

The issue here is int: ElasticSearch uses integer as type of integers, not int!

In order to fix the issue, change the property to

"id": {
    "type":  "integer"
}

and retry creating the index.

 

Posted by Uli Köhler in Databases, ElasticSearch

How to fix ElasticSearch [FORBIDDEN/12/index read-only / allow delete (api)]

If you try to index a document in ElasticSearch and you see an error message like this:

elasticsearch.exceptions.AuthorizationException: AuthorizationException(403, 'cluster_block_exception', 'blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];')

you can unlock writes to your cluster (all indexes) using

curl -XPUT -H "Content-Type: application/json" http://localhost:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'

(thanks to Imran273 on StackOverflow for the original solution)

Note however that often there’s an underlying reason that caused ElasticSearch to lock writes to the index. Most often it is caused by exceeding the disk watermark / quota. See How to disable ElasticSearch disk quota / watermark for details on how to work around that.

Posted by Uli Köhler in Databases, ElasticSearch

How to disable ElasticSearch disk quota / watermark

In its default configuration, ElasticSearch will not allocate any more disk space when more than 90% of the disk are used overall (i.e. by ElasticSearch or other applications).

You can set the watermark extremely low using

curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "transient": {
    "cluster.routing.allocation.disk.watermark.low": "30mb",
    "cluster.routing.allocation.disk.watermark.high": "20mb",
    "cluster.routing.allocation.disk.watermark.flood_stage": "10mb",
    "cluster.info.update.interval": "1m"
  }
}
'

After doing that, you might need to unlock your cluster for write accesses if you had already exceeded your watermark before:

curl -XPUT -H "Content-Type: application/json" http://localhost:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'

See How to fix ElasticSearch [FORBIDDEN/12/index read-only / allow delete (api)] for more details on that.

I do not recommend to set the values to zero (i.e. below 10 Megabytes) because using every byte of available disk space might cause issues on your system since more important applications will not be able to properly allocate disk space any more.

In order to view the current disk usage use

curl -XGET "http://localhost:9200/_cat/allocation?v&pretty"

See How to view & interpret disk space usage of your ElasticSearch cluster for more details.

Posted by Uli Köhler in Databases, ElasticSearch

How to insert test data into ElasticSearch 6.x

If you just want to insert some test documents into ElasticSearch 6.x, you can use this simple command:

curl -X POST "localhost:9200/mydocuments/_doc/" -H 'Content-Type: application/json' -d"
{
    \"test\" : true,
    \"post_date\" : \"$(date -Ins)\"
}"

Run this command multiple times to insert multiple documents!

In case of success, this will output a message like

{"_index":"mydocuments","_type":"_doc","_id":"vCxB82kBn2U9QxlET2aG","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}

Also see the official docs on the indexing API.

Posted by Uli Köhler in ElasticSearch

How to view & interpret disk space usage of your ElasticSearch cluster

In order to find out how much disk space every node of your elasticsearch cluster is using and how much disk space is remaining, use

curl -XGET "http://localhost:9200/_cat/allocation?v&pretty"

Example output for one node without any data:

shards disk.indices disk.used disk.avail disk.total disk.percent host       ip         node
     0           0b     2.4gb    200.9gb    203.3gb            1 172.18.0.2 172.18.0.2 TxYuHLF

This example output tells you:

  • shards: 0 The cluster currently has no shards. This means there is no data in the cluster
  • disk.indices: 0b The cluster currently uses 0 bytes of disk space for indexes.
  • disk.used: 2.4gb The disk ElasticSearch will store its data on has 2.4 Gigabytes used spaced. This does not mean that ElasticSearch uses 2.4 Gigabytes, any other application (including the operating system) might also use (part of) that space.
  • disk.avail: 200.9gb The disk ElasticSearch will store its data on has 200.9 Gigabytes of free space. Remember that this will not shrink only if ElasticSearch is using data on said disk, other applications might also consume some of the disk space depending on how you set up ElasticSearch.
  • disk.total: 203.3gb The disk ElasticSearch will store its data on has a total size of 203.3 gigabytes (total size as in available space on the filesystem without any files)
  • disk.percent: 1 Currently 1 % of the total disk space available (disk.total) is used. This value is always rounded to full percents.
  • host, ip, node: Which node this line is referring to.

Example with one node with some test data (see this TechOverflow post on how to generate test data):

shards disk.indices disk.used disk.avail disk.total disk.percent host       ip         node
     5        6.8kb     2.4gb     21.6gb       24gb           10 172.18.0.2 172.18.0.2 J3W5zqj
     5                                                                                 UNASSIGNED

As we can see, ElasticSearch now has 5 shards. Note that the second line tells us that 5 shards are UNASSIGNED. This is because ElasticSearch has been configured to make one replica for each shard and there is no second node where it can put the replica. For development configurations this is usually OK, but production configurations should usually have at least two nodes. See our docker-compose and systemd service generator for ElasticSearch for instructions on how to configure a local multi-node cluster using docker.

Posted by Uli Köhler in ElasticSearch

How to backup all indices from ElasticSearch

You can use elasticdump to backup all indices from your ElasticSearch cluster. Install using

sudo npm install elasticdump -g

If you don’t have npm, see How to install NodeJS 10.x on Ubuntu in 1 minute.

This package installs two binarys: elasticdump (used to dump a single index) and multielasticdump (used to dump multiple indices in parallel)

We can use multielasticdump to dump all indexes:

mkdir -p es_backup
multielasticdump --direction=dump --input=http://localhost:9200 --output=es_backup

Restore using:

multielasticdump --direction=load --input=es_backup --output=http://localhost:9200

 

Posted by Uli Köhler in ElasticSearch, Linux

How to configure Google Cloud Kubernetes Elasticsearch Cluster with internal load balancer

Google Cloud offers a convenient way of installing an ElasticSearch cluster on top of a Google Cloud Kubernetes cluster. However, the documentation tells you to expose the ElasticSearch instance using

kubectl patch service/"elasticsearch-elasticsearch-svc" \
  --namespace "default" \
  --patch '{"spec": {"type": "LoadBalancer"}}'

However this command will expost ElasticSearch to an external IP which will make it publically accessible in the default configuration.

Here’s the equivalent command that will expose ElasticSearch to an internal load balancer with an internal IP address that will only be available from Google Cloud.

kubectl patch service/"elasticsearch-elasticsearch-svc" \
  --namespace "default" \
  --patch '{"spec": {"type": "LoadBalancer"}, "metadata": {"annotations": {"cloud.google.com/load-balancer-type": "Internal"}}}'

You might need to replace the name of your service (elasticsearch-elasticsearch-svc in this example) and possibly your namespace.

 

Posted by Uli Köhler in Cloud, ElasticSearch, Kubernetes

ElasticSearch equivalent to MongoDB .distinct(…)

Let’s say we have an ElasticSearch index called strings with a field pattern of {"type": "keyword"}.

Now we want to do the equivalent of MongoDB db.getCollection('...').distinct('pattern'):

Solution:

In Python you can use the iterate_distinct_field() helper from this previous post on ElasticSearch distinct. Full example:

from elasticsearch import Elasticsearch

es = Elasticsearch()

def iterate_distinct_field(es, fieldname, pagesize=250, **kwargs):
    """
    Helper to get all distinct values from ElasticSearch
    (ordered by number of occurrences)
    """
    compositeQuery = {
        "size": pagesize,
        "sources": [{
                fieldname: {
                    "terms": {
                        "field": fieldname
                    }
                }
            }
        ]
    }
    # Iterate over pages
    while True:
        result = es.search(**kwargs, body={
            "aggs": {
                "values": {
                    "composite": compositeQuery
                }
            }
        })
        # Yield each bucket
        for aggregation in result["aggregations"]["values"]["buckets"]:
            yield aggregation
        # Set "after" field
        if "after_key" in result["aggregations"]["values"]:
            compositeQuery["after"] = \
                result["aggregations"]["values"]["after_key"]
        else: # Finished!
            break

# Usage example
for result in iterate_distinct_field(es, fieldname="pattern.keyword", index="strings"):
    print(result) # e.g. {'key': {'pattern': 'mypattern'}, 'doc_count': 315}
Posted by Uli Köhler in Databases, ElasticSearch, Python

How to query distinct field values in ElasticSearch

Let’s say we have an ElasticSearch index called strings with a field pattern of {"type": "keyword"}.

Get the top N values of the column

If we want to get the top N ( 12 in our example) entries, i.e. the patterns that are present in the most documents, we can use this query:

{
    "aggs" : {
        "patterns" : {
            "terms" : {
                "field" : "pattern.keyword",
                "size": 12
            }
        }
    }
}

Full example in Python:

from elasticsearch import Elasticsearch

es = Elasticsearch()

result = es.search(index="strings", body={
    "aggs" : {
        "patterns" : {
            "terms" : {
                "field" : "pattern.keyword",
                "size": 12
            }
        }
    }
})
for aggregation in result["aggregations"]["patterns"]["buckets"]:
    print(aggregation) # e.g. {'key': 'mypattern, 'doc_count': 2802}

See the terms aggregation documentation for more infos.

Get all the distinct values of the column

Getting all the values is slightly more complicated since we need to use a composite aggregation that returns an after_key to paginate the query.

This Python helper function will automatically paginate the query with configurable page size:

from elasticsearch import Elasticsearch

es = Elasticsearch()

def iterate_distinct_field(es, fieldname, pagesize=250, **kwargs):
    """
    Helper to get all distinct values from ElasticSearch
    (ordered by number of occurrences)
    """
    compositeQuery = {
        "size": pagesize,
        "sources": [{
                fieldname: {
                    "terms": {
                        "field": fieldname
                    }
                }
            }
        ]
    }
    # Iterate over pages
    while True:
        result = es.search(**kwargs, body={
            "aggs": {
                "values": {
                    "composite": compositeQuery
                }
            }
        })
        # Yield each bucket
        for aggregation in result["aggregations"]["values"]["buckets"]:
            yield aggregation
        # Set "after" field
        if "after_key" in result["aggregations"]["values"]:
            compositeQuery["after"] = \
                result["aggregations"]["values"]["after_key"]
        else: # Finished!
            break

# Usage example
for result in iterate_distinct_field(es, fieldname="pattern.keyword", index="strings"):
    print(result) # e.g. {'key': {'pattern': 'mypattern'}, 'doc_count': 315}
Posted by Uli Köhler in Databases, ElasticSearch, Python

How to fix ElasticSearch ‘Fielddata is disabled on text fields by default’ for keyword field

Problem:

You have a field in ElasticSearch named e.g. patterns of type keyword. However, when you query for an aggregation of this field e.g.

es.search(index="strings", body={
    "size": 0,
    "aggs" : {
        "patterns" : {
            "terms" : { "field" : "pattern" }
        }
    }
})

you see this error message:

elasticsearch.exceptions.RequestError: RequestError(400, 'search_phase_execution_exception', 'Fielddata is disabled on text fields by default. Set fielddata=true on [pattern] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.'

Solution:

This error message is confusing since you already have a keyword field. However, the ElasticSearch fielddata documentation tells us that you need to to use pattern.keyword in the query instead of just pattern.

Full example:

es.search(index="strings", body={
    "size": 0,
    "aggs" : {
        "patterns" : {
            "terms" : { "field" : "pattern.keyword" }
        }
    }
})
Posted by Uli Köhler in Databases, ElasticSearch