Databases

How to list MongoDB databases on command line

Use this command to list the MongoDB databases on the command line:

echo 'show dbs' | mongo

Example output:

MongoDB shell version v4.0.13
connecting to: mongodb://127.0.0.1:27017/?gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("5c1e505e-9b05-4270-ab20-c537c0760481") }
MongoDB server version: 4.0.13
admin       0.000GB
config      0.000GB
drawing     0.001GB
order       0.000GB
production  0.001GB
standards   0.001GB
user        0.000GB
bye

 

Posted by Uli Köhler in MongoDB

How to get schema of SQLite3 table in Python

Also see How to show table schema for SQLite3 table on the command line

Use this function to find the table schema of a SQLite3 table in Python:

def sqlite_table_schema(conn, name):
    """Return a string representing the table's CREATE"""
    cursor = conn.execute("SELECT sql FROM sqlite_master WHERE name=?;", [name])
    sql = cursor.fetchone()[0]
    cursor.close()
    return sql

Usage example:

print(sqlite_table_schema(conn, 'commands'))

Full example:

#!/usr/bin/env python3
import sqlite3
conn = sqlite3.connect('/usr/share/command-not-found/commands.db')

def sqlite_table_schema(conn, name):
    cursor = conn.execute("SELECT sql FROM sqlite_master WHERE name=?;", [name])
    sql = cursor.fetchone()[0]
    cursor.close()
    return sql

print(sqlite_table_schema(conn, 'commands'))

which prints

CREATE TABLE "commands" 
           (
            [cmdID] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
            [pkgID] INTEGER NOT NULL,
            [command] TEXT,
            FOREIGN KEY ([pkgID]) REFERENCES "pkgs" ([pkgID])
           )

 

 

Posted by Uli Köhler in Python, SQLite

How to fix SQLite3 Python ‘Incorrect number of bindings supplied. The current statement uses 1, … supplied’

Problem:

You are trying to run a simple SQL query with placeholders on a SQLite3 database, e.g.:

name = "mytable"
conn.execute("SELECT sql FROM sqlite_master WHERE name=?;", name)

But you get an exception like this:

---------------------------------------------------------------------------
ProgrammingError                          Traceback (most recent call last)
<ipython-input-55-e385cf40fd72> in <module>
      1 name = "mytable"
----> 2 conn.execute("SELECT sql FROM sqlite_master WHERE type='table' AND name=?;", name)

ProgrammingError: Incorrect number of bindings supplied. The current statement uses 1, and there are 7 supplied.

Solution:

You need to use a list as the second argument to conn.execute(...)!

Since you only gave the function a string, the string is being interpreted as list of characters.

In our example from above, you just need to wrap name in square brackets to read [name]:

name = "mytable"
conn.execute("SELECT sql FROM sqlite_master WHERE name=?;", [name])
Posted by Uli Köhler in Python, SQLite

How to list SQLite3 database tables on command line

Also see How to list tables in SQLite3 database in Python

Use this command to list all the tables in a SQLite3 database using the sqlite3 command line tool:

sqlite3 [database file] "SELECT name FROM sqlite_master WHERE type='table' AND name != 'sqlite_sequence';"

Example:

$ sqlite3 /usr/share/command-not-found/commands.db "SELECT name FROM sqlite_master WHERE type='table' AND name != 'sqlite_sequence';"
commands
packages

 

Posted by Uli Köhler in Databases, SQLite

How to list tables in SQLite3 database in Python

Also see How to list SQLite3 database tables on command line

You can use this snippet to list all the SQL tables in your SQLite 3.x database in Python:

def tables_in_sqlite_db(conn):
    cursor = conn.execute("SELECT name FROM sqlite_master WHERE type='table';")
    tables = [
        v[0] for v in cursor.fetchall()
        if v[0] != "sqlite_sequence"
    ]
    cursor.close()
    return tables

Usage example:

#!/usr/bin/env python3
import sqlite3
# Open database
conn = sqlite3.connect('/usr/share/command-not-found/commands.db')
# List tables
tables = tables_in_sqlite_db(conn)

# Your code goes here!
# Example:
print(tables) # prints ['commands', 'packages']

 

Posted by Uli Köhler in Databases, Python, SQLite

How to dump SQL from SQLite3 database file

If you have a SQLite3 database file (usually the filename extension is .db) and you want to dump it as SQL code, you can use

sqlite3 [database file] .dump

For example:

$ sqlite3 /usr/share/command-not-found/commands.db .dump
[...]
DELETE FROM sqlite_sequence;
INSERT INTO sqlite_sequence VALUES('packages',14669);
INSERT INTO sqlite_sequence VALUES('commands',47706);
CREATE INDEX idx_commands_command ON commands (command);
CREATE INDEX idx_packages_name ON packages (name);
COMMIT;

Missing the sqlite3 executable?

In case you don’t have the sqlite3 executable, you can install it using

sudo apt -y install sqlite3

on Ubuntu/Debian-based system

Posted by Uli Köhler in Allgemein, SQLite

How to fix MongoDB ‘exception in initAndListen: IllegalOperation: Attempted to create a lock file on a read-only directory: /var/lib/mongodb, terminating’

Problem:

When trying to start MongoDB e.g. using sudo systemctl start mongod, the process terminates with status code 100 and you see this error message in /var/log/mongodb/mongodb.log:

2019-07-22T17:11:07.858+0200 I STORAGE  [initandlisten] exception in initAndListen: IllegalOperation: Attempted to create a lock file on a read-only directory: /var/lib/mongodb, terminating

Solution:

Fix the permissions of /var/lib/mongodb:

sudo chown -R mongodb: /var/lib/mongodb

then restart MongoDB e.g. using

sudo systemctl restart mongod

In case that does not help, check your error message if you are using a data directory different to /var/lib/mongodb. In that case run

sudo chown -R mongodb: <insert your data directory here>

In case that is not the case or doesn’t help either, check if your filesystem is mounted in read-only mode.

Posted by Uli Köhler in MongoDB

How to fix mongorestore ‘E11000 duplicate key error collection’

Problem:

You are trying to run mongorestore my-backup, but you see large numbers of warning messages like this one:

- E11000 duplicate key error collection: mydb.mycollection index: _id_ dup key: { : "MyKey" }

Solution:

By default, mongorestore does not overwrite or delete any existing documents. You need to tell it to drop each collection immediately before importing it from the backup:

mongorestore --drop my-backup

Any documents not in the backup will be permanently lost after running this!

Note that this will not drop collections that are not present in the backup.

Also see mongodump/mongorestore minimal examples

Posted by Uli Köhler in Databases

mongodump/mongorestore minimal examples

Create & restore a database

mongodump --db mydb --out mydb.mongobackup

This will backup the database mydb from the MongoDB running at localhost and store the backup in the newly created DigiKey.mongobackup directory as BSON.

mongorestore mydb.mongobackup

This will restore the backup to localhost (the database name, mydb, is stored in the backup directory).

It will not overwrite or update existing documents, nor delete documents that are currently present but not present in the backup.

Restore backup with drop

mongorestore --drop mydb.mongobackup

This will drop (i.e. delete) each collection before importing from the backup. This means that

  • Existing documents will effectively be overwritten
  • Documents that are currently present but not present in the backup will be deleted

However note that while importing the backup, some documents might be missing from the database until the backup has been fully restored.

Note that this will not drop collections that are not present in the backup.

Posted by Uli Köhler in Databases

ElasticSearch: How to iterate / scroll through all documents in index

In ElasticSearch, you can use the Scroll API to scroll through all documents in an entire index.

In Python you can scroll like this:

def es_iterate_all_documents(es, index, pagesize=250, scroll_timeout="1m", **kwargs):
    """
    Helper to iterate ALL values from a single index
    Yields all the documents.
    """
    is_first = True
    while True:
        # Scroll next
        if is_first: # Initialize scroll
            result = es.search(index=index, scroll="1m", **kwargs, body={
                "size": pagesize
            })
            is_first = False
        else:
            result = es.scroll(body={
                "scroll_id": scroll_id,
                "scroll": scroll_timeout
            })
        scroll_id = result["_scroll_id"]
        hits = result["hits"]["hits"]
        # Stop after no more docs
        if not hits:
            break
        # Yield each entry
        yield from (hit['_source'] for hit in hits)

This function will yield each document encountered in the index.

Example usage for index my_index:

es = Elasticsearch([{"host": "localhost"}])

for entry in es_iterate_all_documents(es, 'my_index'):
    print(entry) # Prints the document as stored in the DB

 

Posted by Uli Köhler in Databases, ElasticSearch, Python

ElasticSearch: How to iterate all documents in index using Python (up to 10000 documents)

Important Note: This simple approach only works for up to ~10000 documents. Prefer using our scroll-based solution: See ElasticSearch: How to iterate / scroll through all documents in index

Use this helper function to iterate over all the documens in an index

def es_iterate_all_documents(es, index, pagesize=250, **kwargs):
    """
    Helper to iterate ALL values from
    Yields all the documents.
    """
    offset = 0
    while True:
        result = es.search(index=index, **kwargs, body={
            "size": pagesize,
            "from": offset
        })
        hits = result["hits"]["hits"]
        # Stop after no more docs
        if not hits:
            break
        # Yield each entry
        yield from (hit['_source'] for hit in hits)
        # Continue from there
        offset += pagesize

Usage example:

for entry in es_iterate_all_documents(es, 'my_index'):
    print(entry) # Prints the document as stored in the DB

How it works

You can iterate over all documents in an index in ElasticSearch by using queries like

{
    "size": 250,
    "from": 0
}

and increasing "from" by "size" after each iteration.

Posted by Uli Köhler in Databases, ElasticSearch, Python

Fixing ElasticSearch ‘Unknown key for a VALUE_NUMBER in [offset].’

The error message

Unknown key for a VALUE_NUMBER in [offset].

in ElasticSearch tells you that in the query JSON you have specified an offset (numeric value) but ElasticSearch doesn’t know what to do with offset.

This is easy to fix: In order to specify an offset to start from, use from instead of offset.

Incorrect:

{
    "size": 250,
    "offset": 1000
}

Correct:

{
    "size": 250,
    "from": 1000
}

 

Posted by Uli Köhler in Databases, ElasticSearch

Fixing ElasticSearch ‘Unknown key for a START_ARRAY in […].’

When you get an error message like

elasticsearch.exceptions.RequestError: RequestError(400, 'parsing_exception', 'Unknown key for a START_ARRAY in [size].')

in ElasticSearch, look for the value in the [brackets]. In our example this is [size].

Your query contains an array for that key but an array is not allowed.

For example, this query is malformed:

{
    "size": [10, 11]
}

because size takes a number, not an array.

In Python, if you are programmatically building your array, you might have an extra comma at the end of your line.

For example, if you have a line like

query["size"] = 250,

this is just syntactic sugar for

query["size"] = (250,)

i.e. it will set size to a tuple with one element. In the JSON this will result in

{
    "size": [250]
}

In order to fix that issue, remove the comma from the end of the line

query["size"] = 250

which will result in the correct query JSON

{
    "size": 250
}
Posted by Uli Köhler in Databases, ElasticSearch

ElasticSearch equivalent to MongoDB .count()

The ElasticSearch equivalent to MongoDB’s count() is also called count. It can be used in a similar way

When you have an ElasticSearch query like (example in Python)

result = es.search(index="my_index", body={
    "query": {
        "match": {
            "my_field": "my_value"
        }
    }
})
result_docs = [hit["_source"] for hit in result["hits"]["hits"]]

you can easily change it to a count-only query by replacing search with count:

result = es.count(index="my_index", body={
    "query": {
        "match": {
            "my_field": "my_value"
        }
    }
})
result_count = result["count"] # e.g. 200
Posted by Uli Köhler in Databases, ElasticSearch

Fixing ElasticSearch ‘no [query] registered for [query]’

Problem:

You want to run a query in ElasticSearch, but you get an error message like

elasticsearch.exceptions.RequestError: RequestError(400, 'parsing_exception', 'no [query] registered for [query]')

Solution:

In your query body, you have two "query" objects nested in each other. Remove the outer "query", keeping only the inner one.

Example:

Incorrect:

{
    "query": {
        "query": {
            "match": {
                "my_field": "my_value"
            }
        }
    }
}

Correct:

{
    "query": {
        "match": {
            "my_field": "my_value"
        }
    }
}

 

Posted by Uli Köhler in Databases, ElasticSearch

How to fix ElasticSearch ‘Types cannot be provided in put mapping requests, unless the include_type_name parameter is set to true’

Problem:

You want to create a mapping in ElasticSearch but you see an error message like

elasticsearch.exceptions.RequestError: RequestError(400, 'illegal_argument_exception', 'Types cannot be provided in put mapping requests, unless the include_type_name parameter is set to true.')

Solution:

As already suggested in the error message, set the include_type_name parameter to True.

With the Python API this is as simple as adding include_type_name=True to the put_mapping(...) call:

es.indices.put_mapping(index='my_index', body=my_mapping, doc_type='_doc', include_type_name=True)

In case you now see an error like

TypeError: put_mapping() got an unexpected keyword argument 'include_type_name'

you need to upgrade your elasticsearch python library, e.g. using

sudo pip3 install --upgrade elasticsearch

 

Posted by Uli Köhler in Databases, ElasticSearch, Python

How to view ElasticSearch cluster health using curl

To view the cluster health of your ElasticSearch cluster use

curl -X GET "http://localhost:9200/_cluster/health?pretty=true"

If your ElasticSearch is not running on localhost, replace localhost by the hostname or IP address ElasticSearch is running on.

Example output:
{
  "cluster_name" : "docker-cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 0,
  "active_shards" : 0,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

The most important information from this is:

  • "cluster_name" : "docker-cluster" The name you assigned to your cluster. You should verify that you are connecting to the correct cluster. All ElasticSearch nodes from that cluster must have the same cluster name, or they won’t connect!
  • "number_of_nodes" : 1 The number of nodes currently in the cluster. Sometimes some nodes take longer to start up, so if there are some nodes missing, wait a minute and retry
  • "status" : "green" The status or cluster health of your cluster.

The cluster health can take three values:

  • green: Everything is OK with your cluster (like in our example)
  • yellow: Your cluster is mostly OK, but some shards couldn’t be replicated. This is often the case with cluster consisting of one node only (in that case, note that a data loss on the one node can not be recovered)
  • red: Something is wrong with the cluster. Usually that’s some configuration issue, so be sure to check the logs.

Also see the official reference on cluster health

If you are looking for help on how to setup your ElasticSearch cluster using docker and docker-compose, you can generate your config file using our generator at ElasticSearch docker-compose.yml and systemd service generator.

Posted by Uli Köhler in Databases, ElasticSearch

How to fix ElasticSearch ‘[match] query doesn’t support multiple fields, found […] and […]’

Problem:

You want to run an ElasticSearch query like

{
    "query": {
        "match" : {
            "one_field" : "one_value",
            "another_field": "another_value"
        }
    }
}

but you only see an error message like

elasticsearch.exceptions.RequestError: RequestError(400, 'parsing_exception', "[match] query doesn't support multiple fields, found [one_field] and [another_field]")

Solution:

Match queries only support one field. You should use a bool query with a must clause containing multiple match queries instead:

{
    "query": {
        "bool": {
            "must": [
                {"match": {"one_field" : "one_value"}},
                {"match": {"another_field" : "another_value"}},
            ]
        }
    }
}

Also see the official docs for the MultiMatch query.

Posted by Uli Köhler in Databases, ElasticSearch

How to fix ElasticSearch ‘no [query] registered for [missing]’

Problem:

You are trying to run an ElasticSearch query like

{
    "query": {
        "missing" : { "field" : "myfield" }
    }
}

to find documents that do not have myfield.

However you only see an error message like this:

elasticsearch.exceptions.RequestError: RequestError(400, 'parsing_exception', 'no [query] registered for [missing]')

Solution:

As the ElasticSearch documentation tells, us, there is no missing query! You instead need to use an exists query inside a must_not clause:

{
    "query": {
        "bool": {
            "must_not": {
                "exists": {
                    "field": "myfield"
                }
            }
        }
    }
}

 

Posted by Uli Köhler in Databases, ElasticSearch