Technologies

How to backup all indices from ElasticSearch

You can use elasticdump to backup all indices from your ElasticSearch cluster. Install using

sudo npm install elasticdump -g

If you don’t have npm, see How to install NodeJS 10.x on Ubuntu in 1 minute.

This package installs two binarys: elasticdump (used to dump a single index) and multielasticdump (used to dump multiple indices in parallel)

We can use multielasticdump to dump all indexes:

mkdir -p es_backup
multielasticdump --direction=dump --input=http://localhost:9200 --output=es_backup

Restore using:

multielasticdump --direction=load --input=es_backup --output=http://localhost:9200

 

Posted by Uli Köhler in ElasticSearch, Linux

How to expand Kubernetes Physical Volume Claim (PVC)

Important note: By default, volumes will not be resized immediately but instead require a restart of the associated pod.

First, ensure that you have set allowVolumeExpansion: true for the storage class of your PVC. See our previous post on How to allow Physical Volume Claim (PVC) resize for Kubernetes storage class for more details.

We can expand the volume (named myapp-myapp-pvc-myapp-myapp-1 in this example) by running

kubectl patch pvc/"myapp-myapp-pvc-myapp-myapp-1" \
  --namespace "default" \
  --patch '{"spec": {"resources": {"requests": {"storage": "40Gi"}}}}'

Ensure that you have replaced  the name of the PVC (myapp-myapp-pvc-myapp-myapp-1 in this example) and the storage size. It’s only possible to increase the size of the volume / expand it and not to downsize / shrink it. If your size is less than the previous value, you’ll see this error message:

The PersistentVolumeClaim "myapp-myapp-pvc-myapp-myapp-1" is invalid: spec.resources.requests.storage: Forbidden: field can not be less than previous value

After running this command, the PVC will be in the FileSystemResizePending state.

In order for the update to have effect, you’ll need to force Kubernetes to re-create all the pods for your deployment. To find out how to do this, read our post on How to force restarting all Pods in a Kubernetes Deployment.

For reference, see the official documentation on expanding persistent volumes

Posted by Uli Köhler in Cloud, Kubernetes

How to force restarting all Pods in a Kubernetes Deployment

In contrast to classical deployment managers like systemd or pm2, Kubernetes does not provide a simple restart my application command.

However there’s an easy workaround: If you chance anything in your configuration, even innocuous things that don’t have any effect, Kubernetes will restart your pods.

Consider configuring a rolling update strategy before doing this if you are updating a production application that should have minimal downtime.

In this example we’ll assume you have a StatefulSet your want to update and it’s named elasticsearch-elasticsearch. Be sure to fill in the actual name of your deployment here.

kubectl patch statefulset/elasticsearch-elasticsearch -p \
  "{\"spec\":{\"template\":{\"metadata\":{\"annotations\":{\"dummy-date\":\"`date +'%s'`\"}}}}}"

This will just set a dummy-date annotation which does not have any effect.

You can monitor the update by

kubectl rollout status statefulset/elasticsearch-elasticsearch

Credits for the original solution idea to pstadler on GitHub.

Posted by Uli Köhler in Cloud, Kubernetes

How to configure Google Cloud Kubernetes Elasticsearch Cluster with internal load balancer

Google Cloud offers a convenient way of installing an ElasticSearch cluster on top of a Google Cloud Kubernetes cluster. However, the documentation tells you to expose the ElasticSearch instance using

kubectl patch service/"elasticsearch-elasticsearch-svc" \
  --namespace "default" \
  --patch '{"spec": {"type": "LoadBalancer"}}'

However this command will expost ElasticSearch to an external IP which will make it publically accessible in the default configuration.

Here’s the equivalent command that will expose ElasticSearch to an internal load balancer with an internal IP address that will only be available from Google Cloud.

kubectl patch service/"elasticsearch-elasticsearch-svc" \
  --namespace "default" \
  --patch '{"spec": {"type": "LoadBalancer"}, "metadata": {"annotations": {"cloud.google.com/load-balancer-type": "Internal"}}}'

You might need to replace the name of your service (elasticsearch-elasticsearch-svc in this example) and possibly your namespace.

 

Posted by Uli Köhler in Cloud, ElasticSearch, Kubernetes

How to install MicroK8S (MicroKubernetes) on Ubuntu in 30 seconds

This set of commands will install & start MikroK8S (MikroKubernetes) on Ubuntu and similar Linux distributions.

sudo snap install microk8s --classic
sudo snap install kubectl --classic
sudo microk8s.enable # Autostart on boot
sudo microk8s.start # Start right now
# Wait until microk8s has started
until microk8s.status ; do sleep 1 ; done
# Enable some standard modules
microk8s.enable dashboard registry istio

For reference see the official quickstart manual.

Posted by Uli Köhler in Allgemein, Cloud, Container, Kubernetes

How to fix kubectl unknown shorthand flag: ‘f’ in -f

Problem:

You want to run a Kubernetes kubectl command like

kubectl -f my-app-deployment.yaml

but you see this error message after kubectl prints its entire help page:

unknown shorthand flag: 'f' in -f

Solution:

You are missing an actual command to kubectl. Most likely you want create something on your Kubernetes instance, in which case you want to run this instead:

kubectl create -f my-app-deployment.yaml

You might also want to apply or replace your config instead. Note that apply does not automatically restart your Kubernetes Pods. Read How to fix Kubernetes kubectl apply not restarting pods for more information.

Posted by Uli Köhler in Cloud, Container, Kubernetes

How to rsync to Google Cloud VM instance on command line

If you want to connect to a Google Cloud VM instance (my-instance in this example) from your command line using SSH, use this command:

rsync -Pavz [local file] $(gcloud compute instances list --filter="name=my-instance" --format "get(networkInterfaces[0].accessConfigs[0].natIP)"):

The subcommand (enclosed in $(...) ) finds the correct external IP address for your instance (see How to find IP address of Google Cloud VM instance on command line for more details), so this command boils down to for example

rsync -Pavz [local file] 35.207.77.101:

Using the -Pavz option is not specifically neccessary but these are the options I regularly use for rsync file transfers. You can use any rsync options, Google Cloud does not impose any specific restrictions here. For reference see the rsync manpage.

In case you want to use a different username for the SSH login, you can of course prefix the $(...) section like this:

rsync -Pavz [local file] sshuser@$(gcloud compute instances list --filter="name=my-instance" --format "get(networkInterfaces[0].accessConfigs[0].natIP)"):
Posted by Uli Köhler in Cloud

How to SSH to Google Cloud VM instance on command line

If you want to connect to a Google Cloud VM instance (my-instance in this example) from your command line using SSH, you have two options:

Directly connect using gcloud

This will always work if your instance has SSH enabled, even if it does not have an external IP:

gcloud compute ssh my-instance --zone $(gcloud compute instances list --filter="name=my-instance" --format "get(zone)" | awk -F/ '{print $NF}')

Note that your have to replace my-instance by your actual instance name two times in the command above. The subcommand (enclosed in $(...) ) finds the correct zone for your instance since at the time of writing this article gcloud compute ssh will not work unless you set the correct zone for that instance. See How to find zone of Google Cloud VM instance on command line for more details.

Connect using external IP

You can also use gcloud to get the external IP and connect to it using your standard SSH client.

ssh $(gcloud compute instances list --filter="name=my-instance" --format "get(networkInterfaces[0].accessConfigs[0].natIP)")

This has the added advantage that your will be able to use this in other SSH-like command like rsync.

For reference also see the official manual on Securely Connecting to Instances

Posted by Uli Köhler in Cloud

How to find zone of Google Cloud VM instance on command line

Problem:

You have a VM instance (my-instance in our example) for which you want to find out the zone it’s residing in using the gcloud command line tool.

Solution:

If you just want to see the zone of the instance (remember to replace my-instance by your instance name!), use

gcloud compute instances list --filter="name=my-instance" --format "[box]"

This will format the output nicely and show you more information about your instance. Example output:

┌─────────────┬────────────────┬─────────────────────────────┬─────────────┬─────────────┬───────────────┬─────────┐
│    NAME     │      ZONE      │         MACHINE_TYPE        │ PREEMPTIBLE │ INTERNAL_IP │  EXTERNAL_IP  │  STATUS │
├─────────────┼────────────────┼─────────────────────────────┼─────────────┼─────────────┼───────────────┼─────────┤
│ my-instance │ europe-west3-c │ custom (16 vCPU, 32.00 GiB) │             │ 10.156.0.1  │ 35.207.77.101 │ RUNNING │
└─────────────┴────────────────┴─────────────────────────────┴─────────────┴─────────────┴───────────────┴─────────┘

In this example, the zone is europe-west3-c.

In case you want to see only the zone, use this command instead:

gcloud compute instances list --filter="name=katc-main" --format "get(zone)" | awk -F/ '{print $NF}'

Example output:

europe-west3-c

Also see our other post How to find IP address of Google Cloud VM instance on command line.

In order to see what other information about instances you can see in a similar fashion, use

gcloud compute instances list --filter="name=my-instance" --format "text"
Posted by Uli Köhler in Cloud

How to find IP address of Google Cloud VM instance on command line

Problem:

You have a VM instance (my-instance in our example) for which you want to get the external or internal IP using the gcloud command line tool.

Solution:

If you just want to see the external IP of the instance (remember to replace my-instance by your instance name!), use

gcloud compute instances list --filter="name=my-instance" --format "[box]"

This will format the output nicely and show you more information about your instance. Example output:

┌─────────────┬────────────────┬─────────────────────────────┬─────────────┬─────────────┬───────────────┬─────────┐
│    NAME     │      ZONE      │         MACHINE_TYPE        │ PREEMPTIBLE │ INTERNAL_IP │  EXTERNAL_IP  │  STATUS │
├─────────────┼────────────────┼─────────────────────────────┼─────────────┼─────────────┼───────────────┼─────────┤
│ my-instance │ europe-west3-c │ custom (16 vCPU, 32.00 GiB) │             │ 10.156.0.1  │ 35.207.77.101 │ RUNNING │
└─────────────┴────────────────┴─────────────────────────────┴─────────────┴─────────────┴───────────────┴─────────┘

In this example, the external IP address is 35.207.77.101.

In case you want to see only the IP address, use this command instead:

gcloud compute instances list --filter="name=my-instance" --format "get(networkInterfaces[0].accessConfigs[0].natIP)"

Example output:

35.207.77.101

In order to see only the internal IP address (accessible only from Google Cloud), use

gcloud compute instances list --filter="name=my-instance" --format "get(networkInterfaces[0].networkIP)"

In the linux shell, the result of this command can easily be used as input to other commands. For example, to ping my-instance, use

ping $(gcloud compute instances list --filter="name=katc-main" --format "get(networkInterfaces[0].accessConfigs[0].natIP)")

Also see our related post How to find zone of Google Cloud VM instance on command line

In order to see what other information about instances you can see in a similar fashion, use

gcloud compute instances list --filter="name=my-instance" --format "text"
Posted by Uli Köhler in Cloud

How to fix Kubernetes kubectl apply not restarting pods

Problem:

You made an update to your Kubernetes YAML configuration which you applied with

kubectl apply -f [YAML filename]

but Kubernetes still keeps the old version of the software running.

Solution:

Instead of kubectl apply -f ... use

kubectl replace --force -f [YAML filename]

This will update the configuration on the server and also update the running pods.

Original answer on StackOverflow

Posted by Uli Köhler in Cloud, Container, Kubernetes

How to fix kubectl Unable to connect to the server: dial tcp …:443: i/o timeout

Problem:

You want to create or edit a Kubernetes service but when running e.g.

kubectl create -f my-service.yml

you see an error message similar to this:

Unable to connect to the server: dial tcp 35.198.129.60:443: i/o timeout

Solution:

There are three common reasons for this issue:

  1. Your Kubernetes cluster is not running. Verify that your cluster has been started, e.g. by pinging the IP address.
  2. There are networking issues that prevent you from accessing the cluster. Verify that you can ping the IP and try to track down whether there is a firewall in place preventing the access
  3. You have configured a cluster that does not exist any more.

In case of Google Cloud Kubernetes, case (3) can easily be fixed by configuring Kubernetes to use your current cluster:

gcloud container clusters get-credentials [cluster name] --zone [zone]

This will automatically update the default cluster for kubectl.

In case you don’t know the correct cluster name and zone, use

gcloud container clusters list
Posted by Uli Köhler in Cloud, Container, Kubernetes

How to build & upload a Dockerized application to Google Container Registry in 5 minutes

This post provides an easy example on how to build & upload your application to the private Google Container registry. We assume you have already setup your project and installed Docker. In this example, we’ll build & upload pseudo-perseus v1.0. Since this is a NodeJS-based application, we also assume that you installed a recent version of NodeJS and NPM (see our previous article on how to do that using Ubuntu)

First we configure docker to be able to authenticate to Google:

gcloud auth configure-docker

Now we can checkout the repository and install the NPM packages:

git clone https://github.com/ulikoehler/pseudo-perseus.git
cd pseudo-perseus
git checkout v1.0
npm install

Now we can build the local docker image (we directly name it so that it can be uploaded to the Google Container Registry. Be sure to use the correct google cloud project ID!):

docker build -t eu.gcr.io/myproject-123456/pseudo-perseus:v1.0 .

The next step is to upload the image:

docker push eu.gcr.io/myproject-123456/pseudo-perseus:v1.0

For reference see the official Container Registry documentation.

Posted by Uli Köhler in Cloud, Container, Docker

Fixing gcloud WARNING: `docker-credential-gcloud` not in system PATH

Problem:

You want to configure docker to be able to access Google Container Registry using

gcloud auth configure-docker

but you see this warning message:

WARNING: `docker-credential-gcloud` not in system PATH.
gcloud's Docker credential helper can be configured but it will not work until this is corrected.
gcloud credential helpers already registered correctly.

Solution:

Install docker-credential-gcloud using

sudo gcloud components install docker-credential-gcr

In case you see this error message:

ERROR: (gcloud.components.install) You cannot perform this action because this Cloud SDK installation is managed by an external package manager.
Please consider using a separate installation of the Cloud SDK created through the default mechanism described at: https://cloud.google.com/sdk/

use this alternate installation command instead (this command is for Linux, see the official documentation for other operating systems):

VERSION=1.5.0
OS=linux
ARCH=amd64

curl -fsSL "https://github.com/GoogleCloudPlatform/docker-credential-gcr/releases/download/v${VERSION}/docker-credential-gcr_${OS}_${ARCH}-${VERSION}.tar.gz" \
  | tar xz --to-stdout ./docker-credential-gcr \
  | sudo tee /usr/bin/docker-credential-gcr > /dev/null && sudo chmod +x /usr/bin/docker-credential-gcr

After that, configure docker using

docker-credential-gcr configure-docker

Now you can retry running your original command.

For reference, see the official documentation.

Posted by Uli Köhler in Cloud, Container, Docker, Linux

How to fix kubectl ‘The connection to the server localhost:8080 was refused – did you specify the right host or port?’

Problem:

You want to configure a Kubernetes service using kubectl using a command like

kubectl patch service/"my-elasticsearch-svc" --namespace "default"   --patch '{"spec": {"type": "LoadBalancer"}}'

but you only see this error message:

The connection to the server localhost:8080 was refused - did you specify the right host or port?

Solution:

Kubernetes does not have the correct credentials to access the cluster.

Add the correct credentials to the kubectl config using

gcloud container clusters get-credentials [cluster name] --zone [cluster zone]

e.g.

gcloud container clusters get-credentials cluster-1 --zone europe-west3-c

After that, retry your original command.

In case you don’t know your cluster name or zone, use

gcloud container clusters list

to display the cluster metadata.

Credits to this StackOverflow answer for the original solution.

Posted by Uli Köhler in Allgemein, Cloud, Container, Kubernetes

ElasticSearch equivalent to MongoDB .distinct(…)

Let’s say we have an ElasticSearch index called strings with a field pattern of {"type": "keyword"}.

Now we want to do the equivalent of MongoDB db.getCollection('...').distinct('pattern'):

Solution:

In Python you can use the iterate_distinct_field() helper from this previous post on ElasticSearch distinct. Full example:

from elasticsearch import Elasticsearch

es = Elasticsearch()

def iterate_distinct_field(es, fieldname, pagesize=250, **kwargs):
    """
    Helper to get all distinct values from ElasticSearch
    (ordered by number of occurrences)
    """
    compositeQuery = {
        "size": pagesize,
        "sources": [{
                fieldname: {
                    "terms": {
                        "field": fieldname
                    }
                }
            }
        ]
    }
    # Iterate over pages
    while True:
        result = es.search(**kwargs, body={
            "aggs": {
                "values": {
                    "composite": compositeQuery
                }
            }
        })
        # Yield each bucket
        for aggregation in result["aggregations"]["values"]["buckets"]:
            yield aggregation
        # Set "after" field
        if "after_key" in result["aggregations"]["values"]:
            compositeQuery["after"] = \
                result["aggregations"]["values"]["after_key"]
        else: # Finished!
            break

# Usage example
for result in iterate_distinct_field(es, fieldname="pattern.keyword", index="strings"):
    print(result) # e.g. {'key': {'pattern': 'mypattern'}, 'doc_count': 315}
Posted by Uli Köhler in Databases, ElasticSearch, Python

How to query distinct field values in ElasticSearch

Let’s say we have an ElasticSearch index called strings with a field pattern of {"type": "keyword"}.

Get the top N values of the column

If we want to get the top N ( 12 in our example) entries, i.e. the patterns that are present in the most documents, we can use this query:

{
    "aggs" : {
        "patterns" : {
            "terms" : {
                "field" : "pattern.keyword",
                "size": 12
            }
        }
    }
}

Full example in Python:

from elasticsearch import Elasticsearch

es = Elasticsearch()

result = es.search(index="strings", body={
    "aggs" : {
        "patterns" : {
            "terms" : {
                "field" : "pattern.keyword",
                "size": 12
            }
        }
    }
})
for aggregation in result["aggregations"]["patterns"]["buckets"]:
    print(aggregation) # e.g. {'key': 'mypattern, 'doc_count': 2802}

See the terms aggregation documentation for more infos.

Get all the distinct values of the column

Getting all the values is slightly more complicated since we need to use a composite aggregation that returns an after_key to paginate the query.

This Python helper function will automatically paginate the query with configurable page size:

from elasticsearch import Elasticsearch

es = Elasticsearch()

def iterate_distinct_field(es, fieldname, pagesize=250, **kwargs):
    """
    Helper to get all distinct values from ElasticSearch
    (ordered by number of occurrences)
    """
    compositeQuery = {
        "size": pagesize,
        "sources": [{
                fieldname: {
                    "terms": {
                        "field": fieldname
                    }
                }
            }
        ]
    }
    # Iterate over pages
    while True:
        result = es.search(**kwargs, body={
            "aggs": {
                "values": {
                    "composite": compositeQuery
                }
            }
        })
        # Yield each bucket
        for aggregation in result["aggregations"]["values"]["buckets"]:
            yield aggregation
        # Set "after" field
        if "after_key" in result["aggregations"]["values"]:
            compositeQuery["after"] = \
                result["aggregations"]["values"]["after_key"]
        else: # Finished!
            break

# Usage example
for result in iterate_distinct_field(es, fieldname="pattern.keyword", index="strings"):
    print(result) # e.g. {'key': {'pattern': 'mypattern'}, 'doc_count': 315}
Posted by Uli Köhler in Databases, ElasticSearch, Python

How to fix ElasticSearch ‘Fielddata is disabled on text fields by default’ for keyword field

Problem:

You have a field in ElasticSearch named e.g. patterns of type keyword. However, when you query for an aggregation of this field e.g.

es.search(index="strings", body={
    "size": 0,
    "aggs" : {
        "patterns" : {
            "terms" : { "field" : "pattern" }
        }
    }
})

you see this error message:

elasticsearch.exceptions.RequestError: RequestError(400, 'search_phase_execution_exception', 'Fielddata is disabled on text fields by default. Set fielddata=true on [pattern] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.'

Solution:

This error message is confusing since you already have a keyword field. However, the ElasticSearch fielddata documentation tells us that you need to to use pattern.keyword in the query instead of just pattern.

Full example:

es.search(index="strings", body={
    "size": 0,
    "aggs" : {
        "patterns" : {
            "terms" : { "field" : "pattern.keyword" }
        }
    }
})
Posted by Uli Köhler in Databases, ElasticSearch

How to fix ModuleNotFoundError: No module named ‘grpc’ in Python

Problem:

You want to run a Python script that is using some Google Cloud services. However you see an error message similar to this:

[...]
  File "/usr/local/lib/python3.6/dist-packages/google/api_core/gapic_v1/__init__.py", line 16, in <module>
    from google.api_core.gapic_v1 import config
  File "/usr/local/lib/python3.6/dist-packages/google/api_core/gapic_v1/config.py", line 23, in <module>
    import grpc
ModuleNotFoundError: No module named 'grpc'

Solution:

Install the grpcio Python module:

sudo pip3 install grpcio

or, for Python 2.x

sudo pip install grpcio
Posted by Uli Köhler in Cloud, Linux, Python

ElasticSearch docker-compose.yml and systemd service generator

Just looking for a simple solution with a single ElasticSearch node? See our new post Simple Elasticsearch setup with docker-compose

New: Now with ElasticSearch 7.13.4

This generator allows you to generate a systemd service file for a docker-compose setup that is automatically restarted if it fails.

Continue reading →

Posted by Uli Köhler in Container, Databases, Docker, ElasticSearch, Generators, Linux