How to fix ElasticSearch ‘[match] query doesn’t support multiple fields, found […] and […]’

Problem:

You want to run an ElasticSearch query like

{
    "query": {
        "match" : {
            "one_field" : "one_value",
            "another_field": "another_value"
        }
    }
}

but you only see an error message like

elasticsearch.exceptions.RequestError: RequestError(400, 'parsing_exception', "[match] query doesn't support multiple fields, found [one_field] and [another_field]")

Solution:

Match queries only support one field. You should use a bool query with a must clause containing multiple match queries instead:

{
    "query": {
        "bool": {
            "must": [
                {"match": {"one_field" : "one_value"}},
                {"match": {"another_field" : "another_value"}},
            ]
        }
    }
}

Also see the official docs for the MultiMatch query.

How to fix ElasticSearch ‘no [query] registered for [missing]’

Problem:

You are trying to run an ElasticSearch query like

{
    "query": {
        "missing" : { "field" : "myfield" }
    }
}

to find documents that do not have myfield.

However you only see an error message like this:

elasticsearch.exceptions.RequestError: RequestError(400, 'parsing_exception', 'no [query] registered for [missing]')

Solution:

As the ElasticSearch documentation tells, us, there is no missing query! You instead need to use an exists query inside a must_not clause:

{
    "query": {
        "bool": {
            "must_not": {
                "exists": {
                    "field": "myfield"
                }
            }
        }
    }
}

 

How to fix ElasticSearch Root mapping definition has unsupported parameters: [mappings : {properties=…

Problem:

You want to create an ElasticSearch index with a custom mapping or update the mapping of an existing ElasticSearch index but you see an error message like

elasticsearch.exceptions.RequestError: RequestError(400, 'mapper_parsing_exception', 'Root mapping definition has unsupported parameters:  [mappings : {properties={num_total={type=integer}, approved={type=integer}, num_translated={type=integer}, pattern_length={type=integer}, num_unapproved={type=integer}, pattern={type=keyword}, num_approved={type=integer}, translated={type=integer}, untranslated={type=integer}, num_untranslated={type=integer}, group={type=keyword}}}]')

Solution:

This can point to multiple issues. Essentially, ElasticSearch is trying to tell you that the structure of your JSON is not correct.

Often this error is misinterpreted as individual field definitions being wrong, but this is rarely the issue (and only if an individual field definition is completely malformed).

If your message is structured like

... unsupported parameters:  [mappings : ...

then the most likely root cause is that you have mappings nested inside mappings in your JSON. This also applies if you update a mapping (put_mapping) – in this case the outer mapping is implicit!

Example: Your code looks like this:

es.indices.put_mapping(index='my_index, doc_type='_doc', body={
    "mappings": {
        "properties": {
            "pattern": {
                "type":  "keyword"
            }
        }
    }
})

ElasticSearch will internally create a JSON like this internally:

{
    "mappings": {
        "mappings": {
            "properties": {
                "pattern": {
                    "type":  "keyword"
                }
            }
        }
    }
}

See that there are two mappings inside each other? ElasticSearch does not view this as a correctly structured JSON, therefore you need to remove the "mapping": {...} from your code, resulting in

es.indices.put_mapping(index='my_index, doc_type='_doc', body={
    "properties": {
        "pattern": {
            "type":  "keyword"
        }
    }
})

Fixing ElasticSearch ‘No handler for type [int] declared on field …’

Problem:

You want to create an index with a custom mapping in ElasticSearch but you see an error message like this:

elasticsearch.exceptions.RequestError: RequestError(400, 'mapper_parsing_exception', 'No handler for type [int] declared on field [id]')

Solution:

You likely have a mapping like

"id": {
    "type":  "int"
}

in your mapping properties.

The issue here is int: ElasticSearch uses integer as type of integers, not int!

In order to fix the issue, change the property to

"id": {
    "type":  "integer"
}

and retry creating the index.

 

How to fix ElasticSearch [FORBIDDEN/12/index read-only / allow delete (api)]

If you try to index a document in ElasticSearch and you see an error message like this:

elasticsearch.exceptions.AuthorizationException: AuthorizationException(403, 'cluster_block_exception', 'blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];')

you can unlock writes to your cluster (all indexes) using

curl -XPUT -H "Content-Type: application/json" http://localhost:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'

(thanks to Imran273 on StackOverflow for the original solution)

Note however that often there’s an underlying reason that caused ElasticSearch to lock writes to the index. Most often it is caused by exceeding the disk watermark / quota. See How to disable ElasticSearch disk quota / watermark for details on how to work around that.

How to disable ElasticSearch disk quota / watermark

In its default configuration, ElasticSearch will not allocate any more disk space when more than 90% of the disk are used overall (i.e. by ElasticSearch or other applications).

You can set the watermark extremely low using

curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "transient": {
    "cluster.routing.allocation.disk.watermark.low": "30mb",
    "cluster.routing.allocation.disk.watermark.high": "20mb",
    "cluster.routing.allocation.disk.watermark.flood_stage": "10mb",
    "cluster.info.update.interval": "1m"
  }
}
'

After doing that, you might need to unlock your cluster for write accesses if you had already exceeded your watermark before:

curl -XPUT -H "Content-Type: application/json" http://localhost:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'

See How to fix ElasticSearch [FORBIDDEN/12/index read-only / allow delete (api)] for more details on that.

I do not recommend to set the values to zero (i.e. below 10 Megabytes) because using every byte of available disk space might cause issues on your system since more important applications will not be able to properly allocate disk space any more.

In order to view the current disk usage use

curl -XGET "http://localhost:9200/_cat/allocation?v&pretty"

See How to view & interpret disk space usage of your ElasticSearch cluster for more details.

ElasticSearch equivalent to MongoDB .distinct(…)

Let’s say we have an ElasticSearch index called strings with a field pattern of {"type": "keyword"}.

Now we want to do the equivalent of MongoDB db.getCollection('...').distinct('pattern'):

Solution:

In Python you can use the iterate_distinct_field() helper from this previous post on ElasticSearch distinct. Full example:

from elasticsearch import Elasticsearch

es = Elasticsearch()

def iterate_distinct_field(es, fieldname, pagesize=250, **kwargs):
    """
    Helper to get all distinct values from ElasticSearch
    (ordered by number of occurrences)
    """
    compositeQuery = {
        "size": pagesize,
        "sources": [{
                fieldname: {
                    "terms": {
                        "field": fieldname
                    }
                }
            }
        ]
    }
    # Iterate over pages
    while True:
        result = es.search(**kwargs, body={
            "aggs": {
                "values": {
                    "composite": compositeQuery
                }
            }
        })
        # Yield each bucket
        for aggregation in result["aggregations"]["values"]["buckets"]:
            yield aggregation
        # Set "after" field
        if "after_key" in result["aggregations"]["values"]:
            compositeQuery["after"] = \
                result["aggregations"]["values"]["after_key"]
        else: # Finished!
            break

# Usage example
for result in iterate_distinct_field(es, fieldname="pattern.keyword", index="strings"):
    print(result) # e.g. {'key': {'pattern': 'mypattern'}, 'doc_count': 315}

How to query distinct field values in ElasticSearch

Let’s say we have an ElasticSearch index called strings with a field pattern of {"type": "keyword"}.

Get the top N values of the column

If we want to get the top N ( 12 in our example) entries, i.e. the patterns that are present in the most documents, we can use this query:

{
    "aggs" : {
        "patterns" : {
            "terms" : {
                "field" : "pattern.keyword",
                "size": 12
            }
        }
    }
}

Full example in Python:

from elasticsearch import Elasticsearch

es = Elasticsearch()

result = es.search(index="strings", body={
    "aggs" : {
        "patterns" : {
            "terms" : {
                "field" : "pattern.keyword",
                "size": 12
            }
        }
    }
})
for aggregation in result["aggregations"]["patterns"]["buckets"]:
    print(aggregation) # e.g. {'key': 'mypattern, 'doc_count': 2802}

See the terms aggregation documentation for more infos.

Get all the distinct values of the column

Getting all the values is slightly more complicated since we need to use a composite aggregation that returns an after_key to paginate the query.

This Python helper function will automatically paginate the query with configurable page size:

from elasticsearch import Elasticsearch

es = Elasticsearch()

def iterate_distinct_field(es, fieldname, pagesize=250, **kwargs):
    """
    Helper to get all distinct values from ElasticSearch
    (ordered by number of occurrences)
    """
    compositeQuery = {
        "size": pagesize,
        "sources": [{
                fieldname: {
                    "terms": {
                        "field": fieldname
                    }
                }
            }
        ]
    }
    # Iterate over pages
    while True:
        result = es.search(**kwargs, body={
            "aggs": {
                "values": {
                    "composite": compositeQuery
                }
            }
        })
        # Yield each bucket
        for aggregation in result["aggregations"]["values"]["buckets"]:
            yield aggregation
        # Set "after" field
        if "after_key" in result["aggregations"]["values"]:
            compositeQuery["after"] = \
                result["aggregations"]["values"]["after_key"]
        else: # Finished!
            break

# Usage example
for result in iterate_distinct_field(es, fieldname="pattern.keyword", index="strings"):
    print(result) # e.g. {'key': {'pattern': 'mypattern'}, 'doc_count': 315}

How to fix ElasticSearch ‘Fielddata is disabled on text fields by default’ for keyword field

Problem:

You have a field in ElasticSearch named e.g. patterns of type keyword. However, when you query for an aggregation of this field e.g.

es.search(index="strings", body={
    "size": 0,
    "aggs" : {
        "patterns" : {
            "terms" : { "field" : "pattern" }
        }
    }
})

you see this error message:

elasticsearch.exceptions.RequestError: RequestError(400, 'search_phase_execution_exception', 'Fielddata is disabled on text fields by default. Set fielddata=true on [pattern] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.'

Solution:

This error message is confusing since you already have a keyword field. However, the ElasticSearch fielddata documentation tells us that you need to to use pattern.keyword in the query instead of just pattern.

Full example:

es.search(index="strings", body={
    "size": 0,
    "aggs" : {
        "patterns" : {
            "terms" : { "field" : "pattern.keyword" }
        }
    }
})

How to fix ‘elasticsearch exited with code 78’

Problem:

You want to run ElasticSearch using docker, but the container immediately stops again using this error message

elasticsearch exited with code 78

or

elasticsearch2 exited with code 78

Solution:

If you look through the entire log message, you’ll find lines like

elasticsearch     | [1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

Therefore we need to increase the vm.max_map_count limit:

sudo sysctl -w vm.max_map_count=524288

Now we need to edit /etc/sysctl.conf so the setting will also be in effect after a reboot.

Look for any vm.max_map_count line in /etc/sysctl.conf. If you find one, set its value to 524288. If there is no such line present, add the line

vm.max_map_count=524288

to the end of /etc/sysctl.conf

Original source: GitHub

 

MongoDB: How to run db.adminCommand() in NodeJS

Problem:

You want to run a db.adminCommand() in NodeJS using the node-mongodb-native client, e.g. you want to run the NodeJS equivalent of

db.adminCommand({setParameter: 1, internalQueryExecMaxBlockingSortBytes: 100151432});

Solution:

Use conn.executeDbAdminCommand() where db is a MongoDB database object.

db.executeDbAdminCommand({setParameter: 1, internalQueryExecMaxBlockingSortBytes: 100151432});

Full example:

// To install, use npm i --save mongodb
const MongoClient = require('mongodb').MongoClient;

async function configureMongoDB() {
    // Connect to MongoDB
    const conn = await MongoClient.connect('mongodb://localhost:27017/', { useNewUrlParser: true });
    const db = await conn.db('mydb');
    // Configure MongoDB settings
    await db.executeDbAdminCommand({
        setParameter: 1,
        internalQueryExecMaxBlockingSortBytes: 100151432
    });
    // Cleanup
    return conn.close();
}

// Run configureMongoDB()
configureMongoDB().then(() => {}).catch(console.error)

 

How to fix NodeJS MongoDB ‘Cannot read property ‘high_’ of null’

When encountering an error message like

TypeError: Cannot read property 'high_' of null
    at Long.equals (/home/uli/dev/NMUN/node_modules/bson/lib/bson/long.js:236:31)
    at nextFunction (/home/uli/dev/NMUN/node_modules/mongodb-core/lib/cursor.js:473:16)
    at Cursor.next (/home/uli/dev/NMUN/node_modules/mongodb-core/lib/cursor.js:763:3)
    at Cursor._next (/home/uli/dev/NMUN/node_modules/mongodb/lib/cursor.js:211:36)
    at nextObject (/home/uli/dev/NMUN/node_modules/mongodb/lib/operations/cursor_ops.js:192:10)
    at hasNext (/home/uli/dev/NMUN/node_modules/mongodb/lib/operations/cursor_ops.js:135:3)
    (...)

you likely have code like this:

const cursor = db.getCollection('mycollection').find({})
while (cursor.hasNext()) {
    const doc = cursor.next();
    // ... handle doc ...
}

The solution is quite simple: Since find(), cursor.hasNext() and cursor.next() all return Promises, you can’t use their results directly.

This example shows you how to do it properly using async/await:

const cursor = await db.getCollection('mycollection').find({})
while (await cursor.hasNext()) {
    const doc = await cursor.next();
    // ... handle doc ...
}

In order to do this remember that the function containing this code will need to be an async function. See the Mozilla documentation or google for Javascript async tutorial in order to learn about the details!

How to install MongoDB CE on Ubuntu in 1 minute

Run these shell commands on your Ubuntu computer to install the current MongoDB community edition and automatically start it (both instantly and on bootup)

sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 9DA31620334BD75D9DCB49F368818C72E52529D4
echo "deb [ arch=amd64 ] https://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/4.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-4.0.list
sudo apt-get update
sudo apt-get install -y mongodb-org
sudo systemctl enable mongod
sudo systemctl start mongod

Source: Official MongoDB documentation

How to make mongodump export JSON instead of BSON

mongodump exports your MongoDB database to a BSON format, but if you want to use JSON instead, you need to use the mongoexport tool.

Examples:

mongoexport -d mydatabase -c mycollection # Prints to stdout
mongoexport -d mydatabase -c mycollection -o mycollections.json # Write to mycollections.json

Note that some datatypes that can be stored in MongoDB can only be represented in BSON – so in some cases, information may be lost by exporting to JSON.

Exporting campaign contact CSVs from VTiger using SQL

In Exporting campaign account CSVs from VTiger using SQL we investigated how to use direct SQL database access to export names and emails for organizations for a given campaign.

This post shows how to do the same thing. Please refer to the previous post for a detailed explanation.

SELECT CONCAT(cont.firstname, " ", cont.lastname) AS "Name", cont.email AS "EMail" FROM vtiger_campaigncontrel AS ccr JOIN vtiger_contactdetails cont ON (cont.contactid = ccr.contactid) WHERE `campaignid` =(SELECT postvalue FROM `vtiger_modtracker_detail` WHERE fieldname = "record_id" AND `id` = (SELECT `id` FROM `vtiger_modtracker_detail` WHERE postvalue = (SELECT campaign_no FROM vtiger_campaign WHERE `campaignname` = "MyCampaign") and fieldname = "campaign_no")) AND cont.email <> ''

Besides joining on vtiger_contactdetails and vtiger_campaigncontrel instead of the account-related equivalent, the only major difference is in assembling the name. For contacts, first name and last name are stored separatedly. Therefore we use CONCAT(cont.firstname, " ", cont.lastname) to get a single name string out of that information.

As shown in the previous post, you can use phpmyadmin to export the result of the query to CSV or to Excel directly. While this might seem like a tedious process, it is much less tedious than copy-and-pasting hundreds of contacts manually.

Exporting campaign account CSVs from VTiger using SQL

Update: Related post for exporting contacts instead of accounts

Although VTiger works great as CRM, its campaign management feature is missing one of most important features: It doesn’t support exporting

This post aims to provide technically oriented users a guide on how to use SQL queries to perform that task. We assume basic knowledge of SQL syntax, MySQL and PHPMyAdmin.

The goal for our example is to export a table of Account name and E-Mail. For the email, the primary email will be used, unless it’s empty in which case the secondary email is used.

The input for the query is the campaign name (in our example, this will be MyCampaign).

Querying the campaign number

The campaign number is a string, e.g. CAM4 or CAM9 which is stored in vtiger_campaign.

SELECT campaign_no FROM vtiger_campaign WHERE `campaignname` = "MyCampaign"

 Finding the campaign record ID

In the campaign account relations n-to-n relations table, the campaign number is not used directly – instead, the campaign record ID is used. We need to query that from vtiger_modtracker_detail, which contain metadata key-value-data about the campaigns (among other things). One of the K-V pairs contain the campaign identifier (key campaign_no) whereas another one contains the record ID (key record_id).

SELECT postvalue FROM `vtiger_modtracker_detail` WHERE `id` = (SELECT `id` FROM `vtiger_modtracker_detail` WHERE postvalue = <<campaign number>> and fieldname = "campaign_no") AND fieldname = "record_id"

Full query:

SELECT postvalue FROM `vtiger_modtracker_detail` WHERE `id` = (SELECT `id` FROM `vtiger_modtracker_detail` WHERE postvalue = (SELECT campaign_no FROM vtiger_campaign WHERE `campaignname` = "MyCampaign") and fieldname = "campaign_no") AND fieldname = "record_id"

The result of this query is the numeric campaign record ID, e.g. 315.

Querying campaign-to-account relations

We can now use the campaign record ID to query the n-to-n table vtiger_campaignaccountrel. There are other, similar tables for leads (vtiger_campaignleadrel) and contacts (vtiger_campaigncontrel).

SELECT accountid FROM vtiger_campaignaccountrel WHERE `campaignid` = <<campaign record ID>>

Full query:

SELECT accountid FROM vtiger_campaignaccountrel WHERE `campaignid` = (SELECT postvalue FROM `vtiger_modtracker_detail` WHERE fieldname = "record_id" AND `id` = (SELECT `id` FROM `vtiger_modtracker_detail` WHERE postvalue = (SELECT campaign_no FROM vtiger_campaign WHERE `campaignname` = "MyCampaign") and fieldname = "campaign_no"))

The result of this query is a list of account IDs related to the campaign. The order is usually the same as displayed in VTiger.

Querying account information

Using the list of account IDs, we can JOIN on vtiger_account to query the account details. Note that you need to use the appropriate table for leads or contacts if you used an alternative relations table in the previous step.

SELECT acc.accountname AS "Name", COALESCE(acc.email1, acc.email2) AS "EMail" FROM vtiger_campaignaccountrel AS car JOIN vtiger_account acc ON (acc.accountid = car.accountid) WHERE `campaignid` =(SELECT postvalue FROM `vtiger_modtracker_detail` WHERE fieldname = "record_id" AND `id` = (SELECT `id` FROM `vtiger_modtracker_detail` WHERE postvalue = (SELECT campaign_no FROM vtiger_campaign WHERE `campaignname` = "MyCampaign") and fieldname = "campaign_no")) AND (acc.email1 <> '' OR acc.email2 <> '')

Note that besides the SELECT column list there are some aspects which are specific to the Name and Email query:

  1. COALESCE(acc.email1, acc.email2) uses email2 if email1 is empty
  2. AND (email1 <> '' OR email2 <> '') ignores accounts that have neither email1 nor email2 set

If you want a custom list of columns, you need to modify that appropriately.

Exporting CSV

When entering the query into a tool like PHPMyAdmin, you can click the Export button in order to export the result to different formats, including CSV and Excel.

While this might not be the most convenient option, it’s a very flexible way of generating and exporting the data. Any more convenient ways are likely to be less flexible.

Compiling LevelDB as LLVM binary on Linux

Some time ago I wrote a guide on how to compile and install LevelDB on Linux.

Recently I’m desperately trying to get into LLVM and a tutorial series on how to use LLVM with C/C++ is coming shortly.

As I’m using LevelDB in many of my projects I’d like a way of generating a LLVM IR (intermediate representation) of the LevelDB C++ source – I could link a LLVM program to the native binary, but in order to profit from LLVMs features I suppose using IRs for as many dependencies as possible is the way to go.

Generally there are two ways to go:

  1. Use the g++ LLVM backend
  2. Use clang++

I usually tend to use clang++ for LLVM tasks because even with colorgcc and some recent improvements in gcc error message generation I prefer the clang++ error messages, even if I have way more experience with gcc error messages. Additionally the g++ with LLVM backend does seem to have some bugs, including interpreting -emit-llvm as -e -m -i …, plus recent distribution versions don’t work too well with the LLVM gold plugin and it has proved difficult to tell GCC reliably that it shall use llvm-ld as linker.

Read more

Compiling & Installing LevelDB on Linux

Update: Please also take a look at this followup article for an automatic compilation script that builds Ubuntu DEB packages!

Problem:

You want to compile and install LevelDB (including development headers) on your Linux computer. ./configure && make && make install does not work so you don’t know how to do this.

or:

You have successfully compiled LevelDB, but make install doesn’t work (there is no official installation procedure yet) and you don’t know how to install it to your system

Read more