Python

How to fix Elasticsearch Python elasticsearch.exceptions.NotFoundError: NotFoundError(404, ‘index_not_found_exception’, ‘no such index [node]’, node, index_or_alias)

Problem:

You want to update ElasticSearch index settings in Python using code like

es.indices.put_settings(index="node", body={
    "index.mapping.total_fields.limit": 100000
})

but you see an error message like this one:

Traceback (most recent call last):
  File "estest.py", line 11, in <module>
    es.indices.put_settings(index="node", body={
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/client/utils.py", line 168, in _wrapped
    return func(*args, params=params, headers=headers, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/client/indices.py", line 786, in put_settings
    return self.transport.perform_request(
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/transport.py", line 415, in perform_request
    raise e
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/transport.py", line 381, in perform_request
    status, headers_response, data = connection.perform_request(
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/connection/http_urllib3.py", line 277, in perform_request
    self._raise_error(response.status, raw_data)
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/connection/base.py", line 330, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
elasticsearch.exceptions.NotFoundError: NotFoundError(404, 'index_not_found_exception', 'no such index [node]', node, index_or_alias)

Solution:

The index needs to be created first in order to be able to put settings. First, double-check if you spelled the index name correctly! You can see the index name in the exception: no such index [node] means that the index is called node

The direct way to create an index is

es.indices.create("node")

But note that this will fail if the index already exists. In order to work around this issue, I recommend to use the code from our previous post How to create ElasticSearch index if it doesn’t already exist in Python:

def es_create_index_if_not_exists(es, index):
    """Create the given ElasticSearch index and ignore error if it already exists"""
    try:
        es.indices.create(index)
    except elasticsearch.exceptions.RequestError as ex:
        if ex.error == 'resource_already_exists_exception':
            pass # Index already exists. Ignore.
        else: # Other exception - raise it
            raise exnodes

and use the es_create_index_if_not_exists()  function to create your index:

es_create_index_if_not_exists(es, "node") # Creates the "node" index ; doesn't fail if it already exists

 

Posted by Uli Köhler in Databases, ElasticSearch, Python

How to set index setting using Python ElasticSearch client

You can set index settings using the official ElasticSearch python client library by using:

es.indices.put_settings(index="my-index", body={
    # Put your index settings here
    # Example: "index.mapping.total_fields.limit": 100000
})

Full example:

from elasticsearch import Elasticsearch

es = Elasticsearch()

es.indices.put_settings(index="ways", body={
    "index.mapping.total_fields.limit": 100000
})
Posted by Uli Köhler in Databases, ElasticSearch, Python

How to get IndicesClient when using ElasticSearch Python API

When using the official ElasticSearch Python client, you can get the IndicesClient for an Elasticsearch instance by using

es.indices

Full example:

from elasticsearch import Elasticsearch

es = Elasticsearch()

indices_client = es.indices

 

Posted by Uli Köhler in Databases, ElasticSearch, Python

How to increase ElasticSearch total field limit using Python API

When using ElasticSearch, you will sometimes encounter an Limit of total fields [1000] has been exceeded when you insert a large document.

One solution that often works for real-world scenarios is to just increase the default limit of 1000 to, for example, 100000 to account for even the largest documents.

How this can be done is, however, not well-documented for the ElasticSearch Python API. Here’s how you can do it:

from elasticsearch import Elasticsearch

es = Elasticsearch()

es.indices.put_settings(index="my-index", body={
    "index.mapping.total_fields.limit": 100000
})

 

Posted by Uli Köhler in Databases, ElasticSearch, Python

Faster Python Elasticsearch index() by using concurrent.futures ThreadPoolExecutor

In our previous post Elasticsearch Python minimal index() / insert example we showed how to insert a document into Elasticsearch.

When inserting a large number of documents into Elasticsearch, you will notice that it’s extremely slow to wait for the API call to finish before trying to insert the document.

In this post we’ll show a simple way of doing many requests in parallel so multiple index operations are running concurrently while your code is processing more documents. For this, we’ll use concurrent.futures.ThreadPoolExecutor and – after inserting all documents into the queue, use concurrent.futures.wait to wait for all requests to finish before we’ll exit.

#!/usr/bin/env python3
from elasticsearch import Elasticsearch
from concurrent.futures import ThreadPoolExecutor
import concurrent.futures

index_executor = ThreadPoolExecutor(64)
futures = []

es = Elasticsearch()
for i in range(1000):
    future = index_executor.submit(es.index, index="test-index", id=i, body={"test": 123})
    futures.append(future)

print("Waiting for requests to complete...")
concurrent.futures.wait(futures)

 

Posted by Uli Köhler in Databases, ElasticSearch, Python

Elasticsearch Python minimal index() / insert example

This minimal example inserts a single document into Elasticsearch running at http://localhost:9200:

#!/usr/bin/env python3
from elasticsearch import Elasticsearch

es = Elasticsearch()
es.index(index="test-index", id=1, body={"test": 123})

 

Posted by Uli Köhler in Databases, ElasticSearch, Python

How to use Cloudflare API key instead of token in Python Cloudflare API

Problem:

You want to access Cloudflare using the Cloudflare Python API like this:

#!/usr/bin/env python3
import CloudFlare
cf = CloudFlare.CloudFlare(
    email="[email protected]",
    token="Oochee3_aucho0aiTahc8caVuak6Que_N_Aegi9o"
)
# ...

but when you try to use the key=... argument like this:

cf = CloudFlare.CloudFlare(
    email="[email protected]",
    key="Oochee3_aucho0aiTahc8caVuak6Que_N_Aegi9o"
)

you see this error message:

Traceback (most recent call last):
  File "run.py", line 4, in <module>
    cf = CloudFlare.CloudFlare(
TypeError: __init__() got an unexpected keyword argument 'key'

Solution:

Just use the key in the token=... argument like this:

cf = CloudFlare.CloudFlare(
    email="[email protected]",
    token="[YOUR API KEY]"
)

This usage is officially documented in the README section of the Cloudflare API.

Posted by Uli Köhler in Networking, Python

How to fix Python Cloudflare CloudFlare.exceptions.CloudFlareAPIError: no token defined

Problem:

You want to run a program using the Cloudflare API, e.g. this example code:

#!/usr/bin/env python3
import CloudFlare

cf = CloudFlare.CloudFlare({
    "email": "[email protected]",
    "token": "Oochee3_aucho0aiTahc8caVuak6Que_N_Aegi9o" 
})
zones = cf.zones.get()
for zone in zones:
    zone_id = zone['id']
    zone_name = zone['name']
    print(zone_id, zone_name)

But when trying to run it, you see the following error message:

Traceback (most recent call last):
  File "run.py", line 8, in 
    zones = cf.zones.get()
  File "/usr/local/lib/python3.8/dist-packages/CloudFlare/cloudflare.py", line 672, in get
    return self._base.call_with_auth('GET', self._parts,
  File "/usr/local/lib/python3.8/dist-packages/CloudFlare/cloudflare.py", line 117, in call_with_auth
    self._AddAuthHeaders(headers, method)
  File "/usr/local/lib/python3.8/dist-packages/CloudFlare/cloudflare.py", line 90, in _AddAuthHeaders
    raise CloudFlareAPIError(0, 'no token defined')
CloudFlare.exceptions.CloudFlareAPIError: no token defined

Solution:

You are using the wrong syntax to give arguments to CloudFlare.CloudFlare(), use email=… and token=… arguments directly instead of using a dict!

cf = CloudFlare.CloudFlare(
    email="[email protected]",
    token="Oochee3_aucho0aiTahc8caVuak6Que_N_Aegi9o"
)

Note that you can’t do all operations with all tokens and if you perform an operation that is not possible with your token, you’ll see an error message like CloudFlare.exceptions.CloudFlareAPIError: Invalid request headers

Posted by Uli Köhler in Networking, Python

How to fix Python Cloudflare CloudFlare.exceptions.CloudFlareAPIError: no email and no token defined

Problem:

You want to run a program using the Cloudflare API, e.g. this example code:

#!/usr/bin/env python3
import CloudFlare

cf = CloudFlare.CloudFlare()
zones = cf.zones.get()
for zone in zones:
    zone_id = zone['id']
    zone_name = zone['name']
    print(zone_id, zone_name)

But when trying to run it, you see the following error message:

Traceback (most recent call last):
  File "test-cloudflare-api.py", line 5, in <module>
    zones = cf.zones.get()
  File "/usr/local/lib/python3.8/dist-packages/CloudFlare/cloudflare.py", line 672, in get
    return self._base.call_with_auth('GET', self._parts,
  File "/usr/local/lib/python3.8/dist-packages/CloudFlare/cloudflare.py", line 117, in call_with_auth
    self._AddAuthHeaders(headers, method)
  File "/usr/local/lib/python3.8/dist-packages/CloudFlare/cloudflare.py", line 88, in _AddAuthHeaders
    raise CloudFlareAPIError(0, 'no email and no token defined')
CloudFlare.exceptions.CloudFlareAPIError: no email and no token defined

Solution:

The Cloudflare API is missing the credentials you use to login. The easiest way to call the API with credentials is to initialize CloudFlare.CloudFlare() with the email and token as arguments

cf = CloudFlare.CloudFlare(
    email="[email protected]",
    token="Oochee3_aucho0aiTahc8caVuak6Que_N_Aegi9o"
)

Note that you can’t do all operations with all tokens and if you perform an operation that is not possible with your token, you’ll see an error message like CloudFlare.exceptions.CloudFlareAPIError: Invalid request headers

Posted by Uli Köhler in Networking, Python

How to filter OpenStreetmap nodes by tag from .osm.pbf file using Python & Osmium

In our previous post Minimal example how to read .osm.pbf file using Python & osmium we investigated how to read a .osm.pbf file and count all nodes, ways and relations.

Today, we’ll investigate how to filter for specific nodes by tag using osmium working on .osm.pbf files. In this example, we’ll filter for power: tower and count how many nodes we can find

#!/usr/bin/env python3
import osmium as osm

class FindPowerTowerHandler(osm.SimpleHandler):
    def __init__(self):
        osm.SimpleHandler.__init__(self)
        self.count = 0

    def node(self, node):
        if node.tags.get("power") == "tower":
            self.count += 1

osmhandler = FindPowerTowerHandler()
osmhandler.apply_file("germany-latest.osm.pbf")

print(f'Number of power=tower nodes: {osmhandler.node_count}')

Also see How to filter OpenStreetmap ways by tag from .osm.pbf file using Python & Osmium

Posted by Uli Köhler in Geography, OpenStreetMap, Python

How to read single value from XLS(X) using pandas

pandas can be used conveniently to read a table of values from Excel. When extracting data from real-life Excel sheets, there are often metadata fields which are not structured as a table readable by pandas.

Reading the pandas docs, it is not obvious how we can extract the value of a single cell (without any associated headers) with a fixed position, for example:

In this example, we want to extract cell C3, that is we want to end up with a string of value Test value #123. Since there is no clear table structure in this excel sheet (and other cells might contain other values we are not interested in – for example, headlines or headers), we don’t want to have a pd.DataFrame but simply a string.

This is how you can do it:

 

def read_value_from_excel(filename, column="C", row=3):
    """Read a single cell value from an Excel file"""
    return pd.read_excel(filename, skiprows=row - 1, usecols=column, nrows=1, header=None, names=["Value"]).iloc[0]["Value"]

# Example usage
read_value_from_excel(“Test.xlsx”, “C”, 3) # Prints
Let’s explain the parameters we’re using as arguments to pd.read_excel():

  • Test.xlsx: The filename of the file you want to read
  • skiprows=2: Row number minus one, so the desired row is the first we read
  • usecols="C": Which columns we’re interested in – only one!
  • nrows=1: Read only a single row
  • header=None: Do not assume that the first row we read is a header row
  • names=["Value"]: Set the name for the single column to Value
  • .iloc[0]: From the resulting pd.DataFrame, get the first row ([0]) by index (iloc)
  • ["Value"] From the resulting row, extract the "Value" column – which is the only column available.

In my opinion, using pandas is the best way of extracting for most real-world usecases (i.e. more focused on development speed than on execution speed) because not only does it provide automatic engine selection for .xls and .xlsx files, it’s also present on most Data Science setups anyway and provides a standardized API.

Posted by Uli Köhler in pandas, Python

How to fix pandas pd.read_excel() error XLRDError: Excel xlsx file; not supported

Problem:

When trying to read an .xlsx file using pandas pd.read_excel() you see this error message:

XLRDError: Excel xlsx file; not supported

Solution:

The xlrd library only supports .xls files, not .xlsx files. In order to make pandas able to read .xlsx files, install openpyxl:

sudo pip3 install openpyxl

After that, retry running your script (if you are running a Jupyter Notebook, be sure to restart the notebook to reload pandas!).

If the error still persists, you have two choices:

Choice 1 (preferred): Update pandas

Pandas 1.1.3 doesn’t automatically select the correct XLSX reader engine, but pandas 1.3.1 does:

sudo pip3 install --upgrade pandas

If you are running a Jupyter Notebook, be sure to restart the notebook to load the updated pandas version!

Choice 2: Explicitly set the engine in pd.read_excel()

Add engine='openpyxl' to your pd.read_excel() command, for example:

pd.read_excel('my.xlsx', engine='openpyxl')

 

Posted by Uli Köhler in pandas, Python

How to compute crystal load capacitors using Python

The UliEngineering library provides a convenient way to compute which load capacitors are suitable for a given crystal oscillator circuit.

First, install UliEngineering using

sudo pip3 install UliEngineering

Now you need to find out the following parameters:

  • The load capacitance from the crystal’s datasheet. Typical values would be 7pF or 12.5pF or 20pF
  • The pin capacitance CPin from the datasheet of the IC your crystal is connected to (e.g. your Ethernet PHY or your RTC). Typical values would be between 0.5pF to 5pF
  • Estimate the stray capacitance CStray from the board (that is the capacitance of each crystal oscillator pin to the PCB ground). You typically just take an educated guess here, so here are some values to start at:
    •  Start at 3pF
    • If you have a 4+ layer board (as opposed to a two layer board), add 2pF (since the distance to the ground plane below is much smaller on 4 layer boards
    • If you have long traces >1cm (long traces to the crystal should be avoided at all costs!), add 4pF for each cm of trace beyond 1cm for 2-layer boards or add 6pF for each cm of trace beyond 1cm for 4+ layer boards

Now we’ll plug those values into UliEngineering and compute the load capacitance:

from UliEngineering.Electronics.Crystal import load_capacitors
from UliEngineering.EngineerIO import auto_print

auto_print(load_capacitors, cload="9 pF", cpin="1 pF", cstray="5 pF")

This will print 7.00 pF

If you just want to the the value and not an auto-formatted string, use load_capacitors() directly:

capacitor = load_capacitors(cload="9 pF", cpin="1 pF", cstray="5 pF")

In this example, we’ll get capacitor == 7e-12.

Note that 7pF means that you have to add a 7pF capacitor at each side of the crystal. I recommend to use only NP0/C0G capacitors here since X5R/X7R capacitors and other high-k-dielectric ceramic capacitors not only have a high temperature coefficient but also they are not as suitable for analog applications since their capacitance will change with the voltage being applied.

Note that we used a lot of guesswork in the PCB stray capacitance estimation above, so if you need high accuracy, you need to tune your crystal oscillator to work best with your specific board. See our followup post on How to tune your crystal oscillator to get the best possible frequency accuracy for further reading.

Posted by Uli Köhler in Electronics, Python

How to add pandas pd.Timestamp

You can’t directly add Pandas pd.Timestamp instances:

t1 = pd.Timestamp('now')
t2 = pd.Timestamp('now')

t1 + t2
# TypeError: unsupported operand type(s) for +: 'Timestamp' and 'Timestamp'

But you can convert them to a numpy timestamp using their asm8 attribute, convert that timestamp to an integer, add it and convert it back:

t1 = pd.Timestamp('now')
t2 = pd.Timestamp('now')

tsum = (t1.asm8.astype(np.int64) + t2.asm8.astype(np.int64))
tsum_timestamp = pd.Timestamp(tsum.astype('<M8[ns]'))

 

Posted by Uli Köhler in pandas, Python

How to get average or mean between two pandas pd.Timestamp objects?

In order to compute the mean value between two pd.Timestamp instances, subtract them to obtain a pd.Timedelta and then add said Timedelta object to the first (smaller) timestamp:

t1 = pd.Timestamp('now')
t2 = pd.Timestamp('now')

mean_timestamp = t1 + ((t2 - t1) / 2)

 

Posted by Uli Köhler in pandas, Python

How to display BytesIO containing a PNG image in Jupyter/IPython

In IPython, you can use the Image() function to show an image:

 

from IPython.display import Image
Image(filename="img1.png")

But what if you don’t have the PNG data in a file but in a BytesIO?

Use .getvalue():

from IPython.display import Image

my_bio = ... # Insert your code here
Image(my_bio.getvalue())

 

 

Posted by Uli Köhler in Python

How to export certificates from Traefik certificate store

Traefik stores certificates as base64 encoded X.509 certificates and keys inside the certificate store.

This is a python script to export certificate from Traefik certificate store .json file:

import json
import base64

# Read Traefik ACME JSON
with open("acme.json") as acme_file:
    acme = json.load(acme_file)

# Select certificates from a specific resolver
resolver_name = "my-resolver"
certificates = acme[resolver_name]["Certificates"]

# Find the specific certificate we are looking for
certificate = [certificate for certificate in certificates if "myddomain.com" in certificate["domain"].get("sans", [])][0]

# Extract X.509 certificate data
certificate_data = base64.b64decode(certificate["certificate"])
key_data = base64.b64decode(certificate["key"])

# Export certificate and key to file
with open("certificate.pem", "wb") as certfile:
    certfile.write(certificate_data)

with open("key.pem", "wb") as keyfile:
    keyfile.write(key_data)

Note that depending on what is the primary name for your certificate, you might need to use

if "myddomain.com" == certificate["domain"]["main"]

instead of

if "myddomain.com" in certificate["domain"].get("sans", [])

 

Posted by Uli Köhler in Python, Traefik

How to filter OpenStreetmap ways by tag from .osm.pbf file using Python & Osmium

In our previous post Minimal example how to read .osm.pbf file using Python & osmium we investigated how to read a .osm.pbf file and count all nodes, ways and relations.

Today, we’ll investigate how to filter for specific ways by tag using osmium working on .osm.pbf files. In this example, we’ll filter by highwaytrunk

#!/usr/bin/env python3
import osmium as osm
from collections import namedtuple
import itertools

Way = namedtuple("Way", ["id", "nodes", "tags"])

class FindHighwayTrunks(osm.SimpleHandler):
    """
    Find ways with "highway: trunk" tag
    """

    def __init__(self):
        osm.SimpleHandler.__init__(self)
        self.ways = []

    def way(self, way):
        # If this way is a highway: trunk, ...
        if way.tags.get("highway") == "trunk":
            # ... add it to self.ways
            # NOTE: We may not keep a reference to the way object,
            # so we have to create a new Way object
            nodes = [node.ref for node in way.nodes]
            self.ways.append(Way(way.id, nodes, dict(way.tags)))

# Find ways with the given tag
way_finder = FindHighwayTrunks()
way_finder.apply_file("kenya-latest.osm.pbf")

print(f"Found {len(way_finder.ways)} ways")

This example uses kenya-latest.osm.pbf downloaded from Geofabrik, but you can use any .osm.pbf file. Parsing said file takes about 10 seconds on my Desktop.

Note that we can’t just self.ways.append(way) because way is a temporary/internal Protobuf object and may not be used directly since. Hence, we create our own object that just wraps the extracted contents of way: Its id, its list of nodes (node IDs) and a dictionary containing the tags. Otherwise, you’ll see this error message:

Traceback (most recent call last):
  File "run.py", line 32, in <module>
    trunk_handler.apply_file("kenya-latest.osm.pbf")
RuntimeError: Way callback keeps reference to OSM object. This is not allowed.
Posted by Uli Köhler in Geography, Geoinformatics, OpenStreetMap, Python

How to install cartopy on Ubuntu

First, you need to install libproj (which is a dependency of cartopy) using

sudo apt -y install libproj-dev libgeos-dev

After that, install cartopy using

sudo pip3 install cartopy

 

Posted by Uli Köhler in Cartopy, Geoinformatics, Python

How fix Cartopy pip install error ‘Proj 4.9.0 must be installed.’

Problem:

When trying to install cartopy using e.g. pip3 install cartopy you see this error message:

Collecting cartopy
  Downloading Cartopy-0.19.0.post1.tar.gz (12.1 MB)
     |████████████████████████████████| 12.1 MB 13.2 MB/s 
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  ERROR: Command errored out with exit status 1:
   command: /usr/bin/python3 /tmp/tmp2q7tvpo8 get_requires_for_build_wheel /tmp/tmpfwd5htse
       cwd: /tmp/pip-install-dg2i313t/cartopy
  Complete output (1 lines):
  Proj 4.9.0 must be installed.
  ----------------------------------------
ERROR: Command errored out with exit status 1: /usr/bin/python3 /tmp/tmp2q7tvpo8 get_requires_for_build_wheel /tmp/tmpfwd5htse Check the logs for full command output.

Solution:

Install proj using e.g.

sudo apt -y install libproj-dev

and then install cartopy again using e.g.

sudo pip3 install cartopy

 

Posted by Uli Köhler in Cartopy, Geoinformatics, Python