Databases

Best-practice configuration for MongoDB with docker-compose

Create /var/lib/mongodb/docker-compose.yml:

version: '3.1'
services:
  mongo:
    image: mongo
    volumes:
        - ./data:/data/db
    ports:
        - 27017:27017
    command: --serviceExecutor adaptive

This will store the MongoDB data in /var/lib/mongodb/data. I prefer this variant to using docker volumes since this method keeps all MongoDB-related data in the same directory.

Then create a systemd service using

curl -fsSL https://techoverflow.net/scripts/create-docker-compose-service.sh | sudo bash /dev/stdin

See our post on how to Create a systemd service for your docker-compose project in 10 seconds for more details on this method.

You can access MongoDB at localhost:27017! It will autostart after boot

Restart by

sudo systemctl restart mongodb

Stop by

sudo systemctl stop mongodb

View logs:

sudo journalctl -xfu mongodb

View logs in less:

sudo journalctl -xu mongodb

 

Posted by Uli Köhler in Docker, MongoDB

How to fix ElasticSearch [1]: initial heap size […] not equal to maximum heap size […];

Problem:

Your ElasticSearch server fails to start with an error message like

ERROR: [1] bootstrap checks failed
[1]: initial heap size [536870912] not equal to maximum heap size [2147483648]; this can cause resize pauses and prevents memory locking from locking the entire heap
ERROR: Elasticsearch did not exit normally - check the logs at /usr/share/elasticsearch/logs/docker-cluster.log

Solution:

Set the initial heap size equal to the maximum heap size: The -Xms argument and the -Xmx argument must be equal, for example:

-Xms2048m -Xmx2048m

Typically (such as in a docker-based setup) you can set this in ES_JAVA_OPTS:

ES_JAVA_OPTS=-Xms2048m -Xmx2048m

For docker-compose based environments, this is an example configuration that works:

environment:
    - cluster.name=docker-cluster
    - node.name=elasticsearch1
    - cluster.initial_master_nodes=elasticsearch1
    - bootstrap.memory_lock=true
    - http.cors.allow-origin=http://localhost:1358,http://127.0.0.1:1358
    - http.cors.enabled=true
    - http.cors.allow-headers=X-Requested-With,X-Auth-Token,Content-Type,Content-Length,Authorization
    - http.cors.allow-credentials=true
    - "ES_JAVA_OPTS=-Xms2048m -Xmx2048m"

After that, restart your ElasticSearch instance.

Posted by Uli Köhler in ElasticSearch

How to run psql in Gitlab Docker image

When using the offical gitlab Docker container, you can use this command to run psql:

docker exec -t -u gitlab-psql [container name] psql -h /var/opt/gitlab/postgresql/ -d gitlabhq_production

In case you’re using a docker-compose based setup, use this command:

docker-compose exec -u gitlab-psql gitlab psql -h /var/opt/gitlab/postgresql/ -d gitlabhq_production

Note that gitlab in this command is the container name.

Posted by Uli Köhler in Databases, Docker, Linux

The security risk of running docker mariadb/mysql with MYSQL_ALLOW_EMPTY_PASSWORD=yes

This is part of a common docker-compose.yml which is frequently seen on the internet

version: '3'
services:
  mariadb:
    image: 'mariadb:latest'
    environment:
      - MYSQL_ALLOW_EMPTY_PASSWORD=yes
      - MYSQL_DATABASE=redmine
    volumes:
      - './mariadb_data:/var/lib/mysql'
 [...]

Simple and secure, right? A no-root-password MariaDB instance that’s running in a separate container and does not have its port 3306 exposed – so only services from the same docker-compose.yml can reach it since docker-compose puts all those services in a separate network.

Wrong.

While the MariaDB instance is not reachable from the internet since no, it can be reached by any process via its internal IP address.

In order to comprehend what’s happening, we shall take a look at docker’s networks. In this case, my docker-compose config is called redmine.

$ docker network ls | grep redmine
ea7ed38f469b        redmine_default           bridge              local

This is the network that docker-compose creates without any explicit network configuration. Let’s inspect the network to show the hosts:

[
    // [...]
        "Containers": {
            "2578fc65b4dab9f204d0a252e421dd4ddd9f41c35642d48350f4e59370581757": {
                "Name": "redmine_mariadb_1",
                "EndpointID": "1e6d81acc096a12fc740173f4e107090333c42e8a86680ac5c9886c148d578e7",
                "MacAddress": "02:42:ac:12:00:02",
                "IPv4Address": "172.18.0.2/16",
                "IPv6Address": ""
            },
            "7867f71d2a36265c34c133b70aea487b90ea68fcf30ecb42d6e7e9a376cf8e07": {
                "Name": "redmine_redmine_1",
                "EndpointID": "f5ac7b3325aa9bde12f0c625c4881f9a6fc9957da4965767563ec9a3b76c19c3",
                "MacAddress": "02:42:ac:12:00:03",
                "IPv4Address": "172.18.0.3/16",
                "IPv6Address": ""
            }
        },
    // [...]
]

We can see that the IP address of the redmine_mariadb_1 container is 172.18.0.2.

Using the internal IP 172.18.0.2, you can access the MySQL server.

Any process on the host (even from unprivileged users) can connect to the container without any password, e.g.

$ mysqldump -uroot -h172.18.0.2 --all-databases
// This will show the dump of the entire MariaDB database

How to mitigate this security risk?

Mitigation is quite easy since we only need to set a root password for the MariaDB instance.

My recommended best practice is to avoid duplicate passwords. In order to do this, create a .env file in the directory where docker-compose.yml is located.

MARIADB_ROOT_PASSWORD=aiPaipei6ookaemue4voo0NooC0AeH

Remember to replace the password by a random password or use this shell script to automatically create it:

echo MARIADB_ROOT_PASSWORD=$(pwgen 30) > .env

Now we can use ${MARIADB_ROOT_PASSWORD} in docker-compose.yml whereever the MariaDB root password is required, for example:

version: '3'
services:
  mariadb:
    image: 'mariadb:latest'
    environment:
      - MYSQL_ROOT_PASSWORD=${MARIADB_ROOT_PASSWORD}
      - MYSQL_DATABASE=redmine
    volumes:
      - './mariadb_data:/var/lib/mysql'
  redmine:
    image: 'redmine:latest'
    environment:
      - REDMINE_USERNAME=admin
      - REDMINE_PASSWORD=redmineadmin
      - [email protected]
      - REDMINE_DB_MYSQL=mariadb
      - REDMINE_DB_USERNAME=root
      - REDMINE_DB_PASSWORD=${MARIADB_ROOT_PASSWORD}
    ports:
      - '3718:3000'
    volumes:
      - './redmine_data/conf:/usr/src/redmine/conf'
      - './redmine_data/files:/usr/src/redmine/files'
      - './redmine_themes:/usr/src/redmine/public/themes'
    depends_on:
      - mariadb

Note that the mariadb docker image will not change the root password if the database directory already exists (mariadb_data in this example).

My recommended best practice for changing the root password is to use mysqldump --all-databases to export the entire database to a SQL file, then backup and delete the data directory, then re-start the container so the new root password will be set. After that, re-import the dump from the SQL file.

Posted by Uli Köhler in Databases, Docker, Linux

How to fix ElasticSearch docker AccessDeniedException[/usr/share/elasticsearch/data/nodes];”,

Problem:

You are trying to start a dockerized ElasticSearch instance but you see an error log like

lasticsearch1    | {"type": "server", "timestamp": "2020-04-18T01:17:27,564Z", "level": "ERROR", "component": "o.e.b.ElasticsearchUncaughtExceptionHandler", "cluster.name": "docker-cluster", "node.name": "elasticsearch1", "message": "uncaught exception in thread [main]", 
elasticsearch1    | "stacktrace": ["org.elasticsearch.bootstrap.StartupException: ElasticsearchException[failed to bind service]; nested: AccessDeniedException[/usr/share/elasticsearch/data/nodes];",
elasticsearch1    | "at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:174) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:161) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:125) ~[elasticsearch-cli-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-cli-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:126) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "Caused by: org.elasticsearch.ElasticsearchException: failed to bind service",
elasticsearch1    | "at org.elasticsearch.node.Node.<init>(Node.java:615) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.node.Node.<init>(Node.java:257) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:221) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:221) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:349) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:170) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "... 6 more",
elasticsearch1    | "Caused by: java.nio.file.AccessDeniedException: /usr/share/elasticsearch/data/nodes",
elasticsearch1    | "at sun.nio.fs.UnixException.translateToIOException(UnixException.java:90) ~[?:?]",
elasticsearch1    | "at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]",
elasticsearch1    | "at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) ~[?:?]",
elasticsearch1    | "at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:389) ~[?:?]",
elasticsearch1    | "at java.nio.file.Files.createDirectory(Files.java:693) ~[?:?]",
elasticsearch1    | "at java.nio.file.Files.createAndCheckIsDirectory(Files.java:800) ~[?:?]",
elasticsearch1    | "at java.nio.file.Files.createDirectories(Files.java:786) ~[?:?]",
elasticsearch1    | uncaught exception in thread [main]
elasticsearch1    | "at org.elasticsearch.env.NodeEnvironment.lambda$new$0(NodeEnvironment.java:274) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.env.NodeEnvironment$NodeLock.<init>(NodeEnvironment.java:211) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:271) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.node.Node.<init>(Node.java:277) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.node.Node.<init>(Node.java:257) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:221) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:221) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:349) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:170) ~[elasticsearch-7.6.2.jar:7.6.2]",
elasticsearch1    | "... 6 more"] }
elasticsearch1    | ElasticsearchException[failed to bind service]; nested: AccessDeniedException[/usr/share/elasticsearch/data/nodes];
elasticsearch1    | Likely root cause: java.nio.file.AccessDeniedException: /usr/share/elasticsearch/data/nodes
elasticsearch1    |     at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:90)
elasticsearch1    |     at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
elasticsearch1    |     at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
elasticsearch1    |     at java.base/sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:389)
elasticsearch1    |     at java.base/java.nio.file.Files.createDirectory(Files.java:693)
elasticsearch1    |     at java.base/java.nio.file.Files.createAndCheckIsDirectory(Files.java:800)
elasticsearch1    |     at java.base/java.nio.file.Files.createDirectories(Files.java:786)
elasticsearch1    |     at org.elasticsearch.env.NodeEnvironment.lambda$new$0(NodeEnvironment.java:274)
elasticsearch1    |     at org.elasticsearch.env.NodeEnvironment$NodeLock.<init>(NodeEnvironment.java:211)
elasticsearch1    |     at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:271)
elasticsearch1    |     at org.elasticsearch.node.Node.<init>(Node.java:277)
elasticsearch1    |     at org.elasticsearch.node.Node.<init>(Node.java:257)
elasticsearch1    |     at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:221)
elasticsearch1    |     at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:221)
elasticsearch1    |     at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:349)
elasticsearch1    |     at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:170)
elasticsearch1    |     at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:161)
elasticsearch1    |     at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86)
elasticsearch1    |     at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:125)
elasticsearch1    |     at org.elasticsearch.cli.Command.main(Command.java:90)
elasticsearch1    |     at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:126)
elasticsearch1    |     at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92)
elasticsearch1    | For complete error details, refer to the log at /usr/share/elasticsearch/logs/docker-cluster.log

Solution:

Fix the permissions of the host directory mapped to /usr/share/elasticsearch/data. On my instance that directory is /var/lib/elasticsearch/esdata1.

Run

sudo chown -R 1000:1000 [directory]

e.g.

sudo chown -R 1000:1000 /var/lib/elasticsearch/esdata1

 

Posted by Uli Köhler in ElasticSearch

How to repair docker-compose MariaDB instances (aria_chk -r)

Problem:

You are trying to run a MariaDB container using docker-compose. However, the database container doesn’t start up and you see error messages like these in the logs:

[ERROR] mysqld: Aria recovery failed. Please run aria_chk -r on all Aria tables and delete all aria_log.######## files
[ERROR] Plugin 'Aria' init function returned error.
[ERROR] Plugin 'Aria' registration as a STORAGE ENGINE failed.
....
[ERROR] Could not open mysql.plugin table. Some plugins may be not loaded
[ERROR] Failed to initialize plugins.
[ERROR] Aborting

Solution:

The log messages already tell you what to do – but they don’t tell you how to do it:

Aria recovery failed. Please run aria_chk -r on all Aria tables and delete all aria_log.######## files

First, backup the entire MariaDB data directory: Check onto which host directory the data directory (/var/lib/mysql) of the container is mapped and copy the entire directory to a backup space. This is important in case the repair process fails.

Now let’s run aria_chk -r to check and repair MySQL table files.

docker-compose run my-db bash -c 'aria_chk -r /var/lib/mysql/**/*'

Replace my-db by the name of your database container. This will attempt to repair a lot of non-table-files as well but aria_chk will happily ignore those.

Now we can delete the log files:

docker-compose run my-db bash -c 'rm /var/lib/mysql/aria_log.*'

Again, replace my-db by the name of your database container.

Posted by Uli Köhler in Databases, Docker

RocksDB minimal example in C++

This minimal example shows how to open a RocksDB database, write a key and how to read it.

#include <cassert>
#include <string>
#include <rocksdb/db.h>

using namespace std;

int main(int argc, char** argv) {
    rocksdb::DB* db;
    rocksdb::Options options;
    options.create_if_missing = true;
    rocksdb::Status status =
    rocksdb::DB::Open(options, "/tmp/testdb", &db);
    assert(status.ok());

    // Insert value
    status = db->Put(rocksdb::WriteOptions(), "Test key", "Test value");
    assert(status.ok());

    // Read back value
    std::string value;
    status = db->Get(rocksdb::ReadOptions(), "Test key", &value);
    assert(status.ok());
    assert(!status.IsNotFound());

    // Read key which does not exist
    status = db->Get(rocksdb::ReadOptions(), "This key does not exist", &value);
    assert(status.IsNotFound());
}

Build using this CMakeLists.txt

add_executable(rocksdb-example rocksdb-example.cpp)
target_link_libraries(rocksdb-example rocksdb dl)

Compile using

cmake .
make
./rocksdb-example

 

Posted by Uli Köhler in C/C++, Databases

How to list MongoDB databases on command line

Use this command to list the MongoDB databases on the command line:

echo 'show dbs' | mongo

Example output:

MongoDB shell version v4.0.13
connecting to: mongodb://127.0.0.1:27017/?gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("5c1e505e-9b05-4270-ab20-c537c0760481") }
MongoDB server version: 4.0.13
admin       0.000GB
config      0.000GB
drawing     0.001GB
order       0.000GB
production  0.001GB
standards   0.001GB
user        0.000GB
bye

 

Posted by Uli Köhler in MongoDB

How to get schema of SQLite3 table in Python

Also see How to show table schema for SQLite3 table on the command line

Use this function to find the table schema of a SQLite3 table in Python:

def sqlite_table_schema(conn, name):
    """Return a string representing the table's CREATE"""
    cursor = conn.execute("SELECT sql FROM sqlite_master WHERE name=?;", [name])
    sql = cursor.fetchone()[0]
    cursor.close()
    return sql

Usage example:

print(sqlite_table_schema(conn, 'commands'))

Full example:

#!/usr/bin/env python3
import sqlite3
conn = sqlite3.connect('/usr/share/command-not-found/commands.db')

def sqlite_table_schema(conn, name):
    cursor = conn.execute("SELECT sql FROM sqlite_master WHERE name=?;", [name])
    sql = cursor.fetchone()[0]
    cursor.close()
    return sql

print(sqlite_table_schema(conn, 'commands'))

which prints

CREATE TABLE "commands" 
           (
            [cmdID] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
            [pkgID] INTEGER NOT NULL,
            [command] TEXT,
            FOREIGN KEY ([pkgID]) REFERENCES "pkgs" ([pkgID])
           )

 

 

Posted by Uli Köhler in Python, SQLite

How to fix SQLite3 Python ‘Incorrect number of bindings supplied. The current statement uses 1, … supplied’

Problem:

You are trying to run a simple SQL query with placeholders on a SQLite3 database, e.g.:

name = "mytable"
conn.execute("SELECT sql FROM sqlite_master WHERE name=?;", name)

But you get an exception like this:

---------------------------------------------------------------------------
ProgrammingError                          Traceback (most recent call last)
<ipython-input-55-e385cf40fd72> in <module>
      1 name = "mytable"
----> 2 conn.execute("SELECT sql FROM sqlite_master WHERE type='table' AND name=?;", name)

ProgrammingError: Incorrect number of bindings supplied. The current statement uses 1, and there are 7 supplied.

Solution:

You need to use a list as the second argument to conn.execute(...)!

Since you only gave the function a string, the string is being interpreted as list of characters.

In our example from above, you just need to wrap name in square brackets to read [name]:

name = "mytable"
conn.execute("SELECT sql FROM sqlite_master WHERE name=?;", [name])
Posted by Uli Köhler in Python, SQLite

How to list SQLite3 database tables on command line

Also see How to list tables in SQLite3 database in Python

Use this command to list all the tables in a SQLite3 database using the sqlite3 command line tool:

sqlite3 [database file] "SELECT name FROM sqlite_master WHERE type='table' AND name != 'sqlite_sequence';"

Example:

$ sqlite3 /usr/share/command-not-found/commands.db "SELECT name FROM sqlite_master WHERE type='table' AND name != 'sqlite_sequence';"
commands
packages

 

Posted by Uli Köhler in Databases, SQLite

How to list tables in SQLite3 database in Python

Also see How to list SQLite3 database tables on command line

You can use this snippet to list all the SQL tables in your SQLite 3.x database in Python:

def tables_in_sqlite_db(conn):
    cursor = conn.execute("SELECT name FROM sqlite_master WHERE type='table';")
    tables = [
        v[0] for v in cursor.fetchall()
        if v[0] != "sqlite_sequence"
    ]
    cursor.close()
    return tables

Usage example:

#!/usr/bin/env python3
import sqlite3
# Open database
conn = sqlite3.connect('/usr/share/command-not-found/commands.db')
# List tables
tables = tables_in_sqlite_db(conn)

# Your code goes here!
# Example:
print(tables) # prints ['commands', 'packages']

 

Posted by Uli Köhler in Databases, Python, SQLite

How to dump SQL from SQLite3 database file

If you have a SQLite3 database file (usually the filename extension is .db) and you want to dump it as SQL code, you can use

sqlite3 [database file] .dump

For example:

$ sqlite3 /usr/share/command-not-found/commands.db .dump
[...]
DELETE FROM sqlite_sequence;
INSERT INTO sqlite_sequence VALUES('packages',14669);
INSERT INTO sqlite_sequence VALUES('commands',47706);
CREATE INDEX idx_commands_command ON commands (command);
CREATE INDEX idx_packages_name ON packages (name);
COMMIT;

Missing the sqlite3 executable?

In case you don’t have the sqlite3 executable, you can install it using

sudo apt -y install sqlite3

on Ubuntu/Debian-based system

Posted by Uli Köhler in Allgemein, SQLite

How to fix MongoDB ‘exception in initAndListen: IllegalOperation: Attempted to create a lock file on a read-only directory: /var/lib/mongodb, terminating’

Problem:

When trying to start MongoDB e.g. using sudo systemctl start mongod, the process terminates with status code 100 and you see this error message in /var/log/mongodb/mongodb.log:

2019-07-22T17:11:07.858+0200 I STORAGE  [initandlisten] exception in initAndListen: IllegalOperation: Attempted to create a lock file on a read-only directory: /var/lib/mongodb, terminating

Solution:

Fix the permissions of /var/lib/mongodb:

sudo chown -R mongodb: /var/lib/mongodb

then restart MongoDB e.g. using

sudo systemctl restart mongod

In case that does not help, check your error message if you are using a data directory different to /var/lib/mongodb. In that case run

sudo chown -R mongodb: <insert your data directory here>

In case that is not the case or doesn’t help either, check if your filesystem is mounted in read-only mode.

Posted by Uli Köhler in MongoDB

How to fix mongorestore ‘E11000 duplicate key error collection’

Problem:

You are trying to run mongorestore my-backup, but you see large numbers of warning messages like this one:

- E11000 duplicate key error collection: mydb.mycollection index: _id_ dup key: { : "MyKey" }

Solution:

By default, mongorestore does not overwrite or delete any existing documents. You need to tell it to drop each collection immediately before importing it from the backup:

mongorestore --drop my-backup

Any documents not in the backup will be permanently lost after running this!

Note that this will not drop collections that are not present in the backup.

Also see mongodump/mongorestore minimal examples

Posted by Uli Köhler in Databases

mongodump/mongorestore minimal examples

Create & restore a database

mongodump --db mydb --out mydb.mongobackup

This will backup the database mydb from the MongoDB running at localhost and store the backup in the newly created DigiKey.mongobackup directory as BSON.

mongorestore mydb.mongobackup

This will restore the backup to localhost (the database name, mydb, is stored in the backup directory).

It will not overwrite or update existing documents, nor delete documents that are currently present but not present in the backup.

Restore backup with drop

mongorestore --drop mydb.mongobackup

This will drop (i.e. delete) each collection before importing from the backup. This means that

  • Existing documents will effectively be overwritten
  • Documents that are currently present but not present in the backup will be deleted

However note that while importing the backup, some documents might be missing from the database until the backup has been fully restored.

Note that this will not drop collections that are not present in the backup.

Posted by Uli Köhler in Databases

ElasticSearch: How to iterate / scroll through all documents in index

In ElasticSearch, you can use the Scroll API to scroll through all documents in an entire index.

In Python you can scroll like this:

def es_iterate_all_documents(es, index, pagesize=250, scroll_timeout="1m", **kwargs):
    """
    Helper to iterate ALL values from a single index
    Yields all the documents.
    """
    is_first = True
    while True:
        # Scroll next
        if is_first: # Initialize scroll
            result = es.search(index=index, scroll="1m", **kwargs, body={
                "size": pagesize
            })
            is_first = False
        else:
            result = es.scroll(body={
                "scroll_id": scroll_id,
                "scroll": scroll_timeout
            })
        scroll_id = result["_scroll_id"]
        hits = result["hits"]["hits"]
        # Stop after no more docs
        if not hits:
            break
        # Yield each entry
        yield from (hit['_source'] for hit in hits)

This function will yield each document encountered in the index.

Example usage for index my_index:

es = Elasticsearch([{"host": "localhost"}])

for entry in es_iterate_all_documents(es, 'my_index'):
    print(entry) # Prints the document as stored in the DB

 

Posted by Uli Köhler in Databases, ElasticSearch, Python

ElasticSearch: How to iterate all documents in index using Python (up to 10000 documents)

Important Note: This simple approach only works for up to ~10000 documents. Prefer using our scroll-based solution: See ElasticSearch: How to iterate / scroll through all documents in index

Use this helper function to iterate over all the documens in an index

def es_iterate_all_documents(es, index, pagesize=250, **kwargs):
    """
    Helper to iterate ALL values from
    Yields all the documents.
    """
    offset = 0
    while True:
        result = es.search(index=index, **kwargs, body={
            "size": pagesize,
            "from": offset
        })
        hits = result["hits"]["hits"]
        # Stop after no more docs
        if not hits:
            break
        # Yield each entry
        yield from (hit['_source'] for hit in hits)
        # Continue from there
        offset += pagesize

Usage example:

for entry in es_iterate_all_documents(es, 'my_index'):
    print(entry) # Prints the document as stored in the DB

How it works

You can iterate over all documents in an index in ElasticSearch by using queries like

{
    "size": 250,
    "from": 0
}

and increasing "from" by "size" after each iteration.

Posted by Uli Köhler in Databases, ElasticSearch, Python

Fixing ElasticSearch ‘Unknown key for a VALUE_NUMBER in [offset].’

The error message

Unknown key for a VALUE_NUMBER in [offset].

in ElasticSearch tells you that in the query JSON you have specified an offset (numeric value) but ElasticSearch doesn’t know what to do with offset.

This is easy to fix: In order to specify an offset to start from, use from instead of offset.

Incorrect:

{
    "size": 250,
    "offset": 1000
}

Correct:

{
    "size": 250,
    "from": 1000
}

 

Posted by Uli Köhler in Databases, ElasticSearch