Databases

How to test if MongoDB database exists on command line (bash)

Use this command to test if a given MongoDB database exists:

mongo --quiet --eval 'db.getMongo().getDBNames().indexOf("mydb")'

This will return an index such as 0 or 241 if the database is found. On the other hand, it will return -1 if the database does not exist.

docker-compose version:

docker-compose exec mongodb mongo --quiet --eval 'db.getMongo().getDBNames().indexOf("mydb")'

where mongodb is the name of your container.

Now we can put it together in a bash script to test if the database exists:

# Query if DB exists in MongoDB
mongo_indexof_db=$(mongo --quiet --eval 'db.getMongo().getDBNames().indexOf("mydb")')
if [ $mongo_indexof_db -ne "-1" ]; then
    echo "MongoDB database exists"
else
    echo "MongoDB database does not exist"
fi

 

docker-compose variant:

# Query if DB exists in MongoDB
mongo_indexof_db=$(docker-compose -f inspect.yml exec -T mongodb mongo --quiet --eval 'db.getMongo().getDBNames().indexOf("mydb")')
if [ $mongo_indexof_db -ne "-1" ]; then
    echo "MongoDB database exists"
else
    echo "MongoDB database does not exist"
fi

 

Posted by Uli Köhler in MongoDB, Shell

How to fix Python MongoDB TypeError: Object of type ObjectId is not JSON serializable

Problem:

When trying to export data as JSON that has originally been queried from MongoDB using code like

with open("alle.json", "w") as outfile:
    json.dump(alle, outfile)

you see the following error message:

File /usr/lib/python3.9/json/__init__.py:179, in dump(obj, fp, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
    173     iterable = cls(skipkeys=skipkeys, ensure_ascii=ensure_ascii,
    174         check_circular=check_circular, allow_nan=allow_nan, indent=indent,
    175         separators=separators,
    176         default=default, sort_keys=sort_keys, **kw).iterencode(obj)
    177 # could accelerate with writelines in some versions of Python, at
    178 # a debuggability cost
--> 179 for chunk in iterable:
    180     fp.write(chunk)

File /usr/lib/python3.9/json/encoder.py:429, in _make_iterencode.<locals>._iterencode(o, _current_indent_level)
    427     yield _floatstr(o)
    428 elif isinstance(o, (list, tuple)):
--> 429     yield from _iterencode_list(o, _current_indent_level)
    430 elif isinstance(o, dict):
    431     yield from _iterencode_dict(o, _current_indent_level)

File /usr/lib/python3.9/json/encoder.py:325, in _make_iterencode.<locals>._iterencode_list(lst, _current_indent_level)
    323         else:
    324             chunks = _iterencode(value, _current_indent_level)
--> 325         yield from chunks
    326 if newline_indent is not None:
    327     _current_indent_level -= 1

File /usr/lib/python3.9/json/encoder.py:405, in _make_iterencode.<locals>._iterencode_dict(dct, _current_indent_level)
    403         else:
    404             chunks = _iterencode(value, _current_indent_level)
--> 405         yield from chunks
    406 if newline_indent is not None:
    407     _current_indent_level -= 1

File /usr/lib/python3.9/json/encoder.py:438, in _make_iterencode.<locals>._iterencode(o, _current_indent_level)
    436         raise ValueError("Circular reference detected")
    437     markers[markerid] = o
--> 438 o = _default(o)
    439 yield from _iterencode(o, _current_indent_level)
    440 if markers is not None:

File /usr/lib/python3.9/json/encoder.py:179, in JSONEncoder.default(self, o)
    160 def default(self, o):
    161     """Implement this method in a subclass such that it returns
    162     a serializable object for ``o``, or calls the base implementation
    163     (to raise a ``TypeError``).
   (...)
    177 
    178     """
--> 179     raise TypeError(f'Object of type {o.__class__.__name__} '
    180                     f'is not JSON serializable')

TypeError: Object of type ObjectId is not JSON serializable

Solution:

This error occurs because objects queried from PyMongo always contain _id which is of type ObjectId and the normal JSON library (or drop-in replacements like simplejson do not know how to create JSON representations of Objects of type ObjectId).

In order to fix this, use pymongo‘s json_util instead of json. Note that the bson.json_util package contains dumps but does not contain dump, so use the following snippet to write to a file:

 

import bson.json_util as json_util

with open("alle.json", "w") as outfile:
    outfile.write(json_util.dumps(alle))

 

Posted by Uli Köhler in MongoDB, Python

How iterate all documents in MongoDB collection using pymongo

This example will connect to the MongoDB running at localhost (on the default port 27017) without any username or password and open the database named mydb (also see Python MongoDB minimal connect example using pymongo), open the collection mycollectionand iterate all the documents in said collection, printing each document.

from pymongo import MongoClient
client = MongoClient("mongodb://localhost")
db = client["mydb"]
mycollection = db["mycollection"]

for doc in mycollection.find():
    print(doc)

This will print, for example,

{'_id': 123, 'name': 'John', 'phone': '+123456789'}

 

Posted by Uli Köhler in Databases, MongoDB

How to list MongoDB collection names in Python using pymongo

This example will connect to the MongoDB running at localhost (on the default port 27017) without any username or password and open the database named mydb (also see Python MongoDB minimal connect example using pymongo) and list all the collection names in mydb:

from pymongo import MongoClient
client = MongoClient("mongodb://localhost")
db = client["mydb"]

print(db.list_collection_names())

This will print, for example,

['people', 'salaries']

 

Posted by Uli Köhler in Databases, MongoDB

Python MongoDB minimal connect example using pymongo

This example will connect to the MongoDB running at localhost (on the default port 27017) without any username or password and open the database named mydb

from pymongo import MongoClient
client = MongoClient("mongodb://localhost")
db = client["mydb"]

 

Posted by Uli Köhler in Databases, MongoDB

Recommended PostgreSQL docker-compose service configuration

I use the following docker-compose.yml service:

version: '3.5'
services:
  postgres:
    image: postgres
    restart: unless-stopped
    volumes:
      - ./pg_data:/var/lib/postgresql/data
    environment:
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
      - POSTGRES_DB=${POSTGRES_DB}
      - POSTGRES_USER=${POSTGRES_USER}

With the following .env:

POSTGRES_DB=headscale
POSTGRES_USER=headscale
POSTGRES_PASSWORD=vah2phuen3shesahc6Jeenaechecee

Using .env has the huge advantage that other services like my backup script can access the configuration in a standardized manner using environment variables.

Posted by Uli Köhler in Container, Databases, Docker

How I run pg_dump in my docker-compose setup

I have the following docker-compose.yml service:

version: '3.5'
services:
  postgres:
    image: postgres
    restart: unless-stopped
    volumes:
      - ./pg_data:/var/lib/postgresql/data
    environment:
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
      - POSTGRES_DB=${POSTGRES_DB}
      - POSTGRES_USER=${POSTGRES_USER}

With the following .env:

POSTGRES_DB=headscale
POSTGRES_USER=headscale
POSTGRES_PASSWORD=vah2phuen3shesahc6Jeenaechecee

Given that setup, I run pg_dump like this:

source .env && docker-compose exec postgres pg_dump -U${POSTGRES_USER} > pgdump-$(date +%F_%H-%M-%S).sql
Posted by Uli Köhler in Container, Databases, Docker

pre and post script for veeam MongoDB backups

Create an pre.sh and post.sh script to be run before and after a veeam backup of your ubuntu machine runs and will shutdown MongoDB and your webservice (myservice) using the MongoDB before the backup and start it again after the backup has finished. For example:

systemctl stop myservice
systemctl stop mongod
exit 0

and

systemctl start mongod
systemctl start myservice
exit 0

I highly recommend proper error handling – here handover error codes from systemctl commands.

Select the path to the shell scripts like this:
veeam job setup enable scripts

Posted by Joshua Simon in Linux, MongoDB, veeam

How to optimize MySQL/MariaDB tables in docker-compose

If your MariaDB / MySQL root password is stored in .env , use this command:

source .env && docker-compose exec mariadb mysqlcheck -uroot -p$MARIADB_ROOT_PASSWORD --auto-repair --optimize --all-databases

You can also directly use the root password in the command:

docker-compose exec mariadb mysqlcheck -uroot -phoox8AiFahuniPaivatoh2iexighee --auto-repair --optimize --all-databases

 

Posted by Uli Köhler in Container, Databases, Docker

How to fix Nextcloud 4047 InnoDB refuses to write tables with ROW_FORMAT=COMPRESSED or KEY_BLOCK_SIZE.

Problem:

When trying to maintenance:repair your Nextcloud instance, e.g. during a utf8mb4 upgrade, you see an error message like

ERROR: An exception occurred while executing a query: SQLSTATE[HY000]: General error: 4047 InnoDB refuses to write tables with ROW_FORMAT=COMPRESSED or KEY_BLOCK_SIZE.

Solution:

You need to turn off innodb-read-only-compressed. I do this by starting my MySQL docker with

--skip-innodb-read-only-compressed

Full command line which worked for me:

--transaction-isolation=READ-COMMITTED --binlog-format=ROW --innodb-file-per-table=1 --skip-innodb-read-only-compressed

Full docker-compose.yml nextcloud section:

nextcloud-db:
  image: mariadb
  command: --transaction-isolation=READ-COMMITTED --binlog-format=ROW --innodb-file-per-table=1 --skip-innodb-read-only-compressed
  restart: always
  volumes:
    - ./nextcloud-db:/var/lib/mysql
  environment:
    - MYSQL_ROOT_PASSWORD=${MYSQL_ROOT_PASSWORD}
    - MYSQL_PASSWORD=${MYSQL_PASSWORD}

 

Posted by Uli Köhler in Cloud, Databases, Technologies

How to fix elasticsearch.exceptions.RequestError: RequestError(400, ‘resource_already_exists_exception’, ‘index […] already exists’) in Python

Problem:

You want to create an ElasticSearch index in Python using code like

es.indices.create("nodes") # Create an index names "nodes"

but you see the following error message:

Traceback (most recent call last):
  File "estest.py", line 22, in <module>
    es.indices.create("nodes")
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/client/utils.py", line 168, in _wrapped
    return func(*args, params=params, headers=headers, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/client/indices.py", line 123, in create
    return self.transport.perform_request(
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/transport.py", line 415, in perform_request
    raise e
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/transport.py", line 381, in perform_request
    status, headers_response, data = connection.perform_request(
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/connection/http_urllib3.py", line 277, in perform_request
    self._raise_error(response.status, raw_data)
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/connection/base.py", line 330, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
elasticsearch.exceptions.RequestError: RequestError(400, 'resource_already_exists_exception', 'index [nodes/mXAiBt0wTKK4Y31HpshVbw] already exists')

Solution:

The error message tells you that the index you are trying to create already exists!

The simples solution is to use the code from our post on How to create ElasticSearch index if it doesn’t already exist in Python:

def es_create_index_if_not_exists(es, index):
    """Create the given ElasticSearch index and ignore error if it already exists"""
    try:
        es.indices.create(index)
    except elasticsearch.exceptions.RequestError as ex:
        if ex.error == 'resource_already_exists_exception':
            pass # Index already exists. Ignore.
        else: # Other exception - raise it
            raise ex

and use that function to create your index:

es_create_index_if_not_exists(es, "nodes") # Creates the "nodes" index ; doesn't fail if it already exists

 

Posted by Uli Köhler in Databases, ElasticSearch, Python

How to create ElasticSearch index if it doesn’t already exist in Python

The following utility will create an index if it doesn’t exist already by ignoring any resource_already_exists_exception

def es_create_index_if_not_exists(es, index):
    """Create the given ElasticSearch index and ignore error if it already exists"""
    try:
        es.indices.create(index)
    except elasticsearch.exceptions.RequestError as ex:
        if ex.error == 'resource_already_exists_exception':
            pass # Index already exists. Ignore.
        else: # Other exception - raise it
            raise ex

# Example usage: Create "nodes" index
es_create_index_if_not_exists(es, "nodes")

 

Posted by Uli Köhler in Databases, ElasticSearch, Python

What is the ElasticSearch equivalent to an SQL table?

In ElasticSearch, the concept which closely resembles an SQL table is called an index.

Compared to an SQL table, the index does not neccessarily need to have a predefined structure – the ElasticSearch index is more similar to a MongoDB collection.

However, an elasticsearch index has many features similar to SQL tables such as indices (which you typically don’t need to create explicity – ElasticSearch takes care of that for you).

Typically, indices contain lots of similar documents that have (mostly) the same properties.

Posted by Uli Köhler in Databases, ElasticSearch

How to fix Elasticsearch Python elasticsearch.exceptions.NotFoundError: NotFoundError(404, ‘index_not_found_exception’, ‘no such index [node]’, node, index_or_alias)

Problem:

You want to update ElasticSearch index settings in Python using code like

es.indices.put_settings(index="node", body={
    "index.mapping.total_fields.limit": 100000
})

but you see an error message like this one:

Traceback (most recent call last):
  File "estest.py", line 11, in <module>
    es.indices.put_settings(index="node", body={
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/client/utils.py", line 168, in _wrapped
    return func(*args, params=params, headers=headers, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/client/indices.py", line 786, in put_settings
    return self.transport.perform_request(
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/transport.py", line 415, in perform_request
    raise e
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/transport.py", line 381, in perform_request
    status, headers_response, data = connection.perform_request(
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/connection/http_urllib3.py", line 277, in perform_request
    self._raise_error(response.status, raw_data)
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/connection/base.py", line 330, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
elasticsearch.exceptions.NotFoundError: NotFoundError(404, 'index_not_found_exception', 'no such index [node]', node, index_or_alias)

Solution:

The index needs to be created first in order to be able to put settings. First, double-check if you spelled the index name correctly! You can see the index name in the exception: no such index [node] means that the index is called node

The direct way to create an index is

es.indices.create("node")

But note that this will fail if the index already exists. In order to work around this issue, I recommend to use the code from our previous post How to create ElasticSearch index if it doesn’t already exist in Python:

def es_create_index_if_not_exists(es, index):
    """Create the given ElasticSearch index and ignore error if it already exists"""
    try:
        es.indices.create(index)
    except elasticsearch.exceptions.RequestError as ex:
        if ex.error == 'resource_already_exists_exception':
            pass # Index already exists. Ignore.
        else: # Other exception - raise it
            raise exnodes

and use the es_create_index_if_not_exists()  function to create your index:

es_create_index_if_not_exists(es, "node") # Creates the "node" index ; doesn't fail if it already exists

 

Posted by Uli Köhler in Databases, ElasticSearch, Python

How to set index setting using Python ElasticSearch client

You can set index settings using the official ElasticSearch python client library by using:

es.indices.put_settings(index="my-index", body={
    # Put your index settings here
    # Example: "index.mapping.total_fields.limit": 100000
})

Full example:

from elasticsearch import Elasticsearch

es = Elasticsearch()

es.indices.put_settings(index="ways", body={
    "index.mapping.total_fields.limit": 100000
})
Posted by Uli Köhler in Databases, ElasticSearch, Python

How to get IndicesClient when using ElasticSearch Python API

When using the official ElasticSearch Python client, you can get the IndicesClient for an Elasticsearch instance by using

es.indices

Full example:

from elasticsearch import Elasticsearch

es = Elasticsearch()

indices_client = es.indices

 

Posted by Uli Köhler in Databases, ElasticSearch, Python

How to increase ElasticSearch total field limit using Python API

When using ElasticSearch, you will sometimes encounter an Limit of total fields [1000] has been exceeded when you insert a large document.

One solution that often works for real-world scenarios is to just increase the default limit of 1000 to, for example, 100000 to account for even the largest documents.

How this can be done is, however, not well-documented for the ElasticSearch Python API. Here’s how you can do it:

from elasticsearch import Elasticsearch

es = Elasticsearch()

es.indices.put_settings(index="my-index", body={
    "index.mapping.total_fields.limit": 100000
})

 

Posted by Uli Köhler in Databases, ElasticSearch, Python

Faster Python Elasticsearch index() by using concurrent.futures ThreadPoolExecutor

In our previous post Elasticsearch Python minimal index() / insert example we showed how to insert a document into Elasticsearch.

When inserting a large number of documents into Elasticsearch, you will notice that it’s extremely slow to wait for the API call to finish before trying to insert the document.

In this post we’ll show a simple way of doing many requests in parallel so multiple index operations are running concurrently while your code is processing more documents. For this, we’ll use concurrent.futures.ThreadPoolExecutor and – after inserting all documents into the queue, use concurrent.futures.wait to wait for all requests to finish before we’ll exit.

#!/usr/bin/env python3
from elasticsearch import Elasticsearch
from concurrent.futures import ThreadPoolExecutor
import concurrent.futures

index_executor = ThreadPoolExecutor(64)
futures = []

es = Elasticsearch()
for i in range(1000):
    future = index_executor.submit(es.index, index="test-index", id=i, body={"test": 123})
    futures.append(future)

print("Waiting for requests to complete...")
concurrent.futures.wait(futures)

 

Posted by Uli Köhler in Databases, ElasticSearch, Python

Elasticsearch Python minimal index() / insert example

This minimal example inserts a single document into Elasticsearch running at http://localhost:9200:

#!/usr/bin/env python3
from elasticsearch import Elasticsearch

es = Elasticsearch()
es.index(index="test-index", id=1, body={"test": 123})

 

Posted by Uli Köhler in Databases, ElasticSearch, Python

Simple Elasticsearch setup with docker-compose

The following docker-compose.yml is a simple starting point for using ElasticSearch within a docker-based setup:

version: '2.2'
services:
    elasticsearch1:
        image: docker.elastic.co/elasticsearch/elasticsearch:7.13.4
        container_name: elasticsearch1
        environment:
            - cluster.name=docker-cluster
            - node.name=elasticsearch1
            - cluster.initial_master_nodes=elasticsearch1
            - bootstrap.memory_lock=true
            - http.cors.allow-origin=http://localhost:1358,http://127.0.0.1:1358
            - http.cors.enabled=true
            - http.cors.allow-headers=X-Requested-With,X-Auth-Token,Content-Type,Content-Length,Authorization
            - http.cors.allow-credentials=true
            - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
        ulimits:
            memlock:
                soft: -1
                hard: -1
        volumes:
            - ./esdata1:/usr/share/elasticsearch/data
        ports:
            - 9200:9200
    dejavu:
        image: appbaseio/dejavu
        container_name: dejavu
        ports:
            - 1358:1358

Now create the esdata1 directory with the correct permissions:

sudo mkdir esdata1
sudo chown -R 1000:1000 esdata1

We also need to configure the vm.max_map_count sysctl parameter:

echo -e "\nvm.max_map_count=524288\n" | sudo tee -a /etc/sysctl.conf && sudo sysctl -w vm.max_map_count=524288

 

I recommend to place it in /opt/elasticsearch, but you can place wherever you like.

If you want to autostart it on boot, see Create a systemd service for your docker-compose project in 10 seconds or just use this snippet from said post:

curl -fsSL https://techoverflow.net/scripts/create-docker-compose-service.sh | sudo bash /dev/stdin

This will create a systemd service named elasticsearch (if your directory is named elasticsearch like /opt/elasticsearch) and enable and start it immediately. Hence you can restart using

sudo systemctl restart elasticsearch

and view the logs using

sudo journalctl -xfu elasticsearch

For more complex setup involving more than one node, see our previous post on ElasticSearch docker-compose.yml and systemd service generator

Posted by Uli Köhler in Container, Databases, Docker, ElasticSearch