Python

Computing the CRC8-ATM CRC in Python

The 8-bit CRC8-ATM polynomial is used in many embedded applications, including Trinamic UART-controlled stepper motor drivers like the TMC2209:

\text{CRC} = x^8 + x^2 + x^1 + x^0

The following code provides an example on how to compute this type of CRC in Python:

def compute_crc8_atm(datagram, initial_value=0):
    crc = initial_value
    # Iterate bytes in data
    for byte in datagram:
        # Iterate bits in byte
        for _ in range(0, 8):
            if (crc >> 7) ^ (byte & 0x01):
                crc = ((crc << 1) ^ 0x07) & 0xFF
            else:
                crc = (crc << 1) & 0xFF
            # Shift to next bit
            byte = byte >> 1
    return crc

This code has been field-verified for the TMC2209.

Posted by Uli Köhler in Algorithms, Embedded, MicroPython, Python

MicroPython ESP32 minimal UART example

This example shows how to use UART on the ESP32 using MicroPython. In this example, we use UART1 which is mapped to pins GPIO9 (RX) and GPIO10 (TX).

from machine import UART
uart = UART(1, 115200) # 1st argument: UART number: Hardware UART #1

# Write
uart.write("test")

# Read
print(uart.read()) # Read as much as possible using

Don’t know how to upload the file to MicroPython so it is automatically run on boot?

Posted by Uli Köhler in Embedded, MicroPython, Python

How to fix ESP32 MicroPython ‘ValueError: pin can only be input’

Problem:

You are trying to initialize an ESP32 pin in MicroPython using

import machine
machine.Pin(34, machine.Pin.OUT)

but you see an error message like

>>> machine.Pin(34, machine.Pin.OUT)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: pin can only be input

Solution:

On the ESP32, pins with numbers >= 34 are input-only pins!

You need to use other pins < 34 if you need output capability!

For reference, see the relevant MicroPython source code section:

// configure mode
if (args[ARG_mode].u_obj != mp_const_none) {
    mp_int_t pin_io_mode = mp_obj_get_int(args[ARG_mode].u_obj);
    if (self->id >= 34 && (pin_io_mode & GPIO_MODE_DEF_OUTPUT)) {
        mp_raise_ValueError("pin can only be input");
    } else {
        gpio_set_direction(self->id, pin_io_mode);
    }
}
Posted by Uli Köhler in Embedded, MicroPython, Python

How to upload files to MicroPython over USB/serial

In this post we will investigate how to connect to a wireless network on boot

First, install ampy – a tool to modify the MicroPython filesystem over a serial connection.

sudo pip3 install adafruit-ampy

Now prepare your script – we’ll use main.py in this example.

Upload the file to the board:

ampy -p /dev/ttyUSB* put main.py

This only takes about 2-4 seconds. In case ampy is still running after 10 seconds, you might need to

  • Stop ampy (Ctrl+C), reset the board using the RESET button and retry the command
  • Stop ampy (Ctrl+C). Detach USB and ensure the board is powered off (and not powered externally). Re-Attach USB and retry the command.
  • In case that doesn’t help, try re-flashing your board with the most recent version of MicroPython. See How to flash MicroPython to your ESP32 board in 30 seconds. This will also clear the internal filesystem and hence remove any file that might cause failure to boot properly.
Posted by Uli Köhler in Embedded, MicroPython, Python

MicroPython ESP32 blink example

This MicroPython code blinks GPIO2 which is connected to the LED on most ESP32 boards.

import machine
import time
led = machine.Pin(2, machine.Pin.OUT)
while True:
    led.value(1)
    time.sleep(1)
    led.value(0)
    time.sleep(1)

Don’t know how to upload the file to MicroPython so it is automatically run on boot?

Posted by Uli Köhler in Embedded, MicroPython, Python

How to run WebREPL without webrepl_setup in MicroPython

Problem:

You want to enable WebREPL on your MicroPython board using

import webrepl
webrepl.start()

but it is only showing this error message:

WebREPL is not configured, run 'import webrepl_setup'

However, you want to configure WebREPL programmatically instead of manually running it on every single board.

Solution:

Use

import webrepl
webrepl.start(password="Rua8ohje")

This will circumvent webrepl_setup completely and is compatible with an automated setup process.

Note: At the time of writing this you can only use passwords with 8 characters max! (see How to fix MicroPython WebREPL ValueError in File &#8222;webrepl.py&#8220;, line 72, in start )

Posted by Uli Köhler in Embedded, MicroPython, Python

How to fix MicroPython WebREPL ValueError in File “webrepl.py”, line 72, in start

Problem:

You want to configure your MicroPython WebREPL programmatically using webrepl.start(password="...") but you see a stacktrace like

>>> webrepl.start(password="Rua8ohjedo")
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "webrepl.py", line 72, in start
ValueError:

Solution:

Use a shorter password with 8 characters max:

webrepl.start(password="Rua8ohje")

 

Posted by Uli Köhler in Embedded, MicroPython, Python

How to autoconnect to Wifi using MicroPython on your ESP32 board

In this post we will investigate how to connect to a wireless network on boot

First, install ampy – a tool to modify the MicroPython filesystem over a serial connection.

sudo pip3 install adafruit-ampy

Now download main.py and save it in your current working directory and insert your wifi credentials:

import network
station = network.WLAN(network.STA_IF)
station.active(True)
station.connect("YourWifiName", "EnterYourWifiPasswordHere")

Upload the file to the board:

ampy -p /dev/ttyUSB* put main.py

In case ampy shows no output within 5 seconds, try resetting the board, waiting for 5-10 seconds and retrying the upload using ampy

Note: You can list the files on the board’s filesystem using

ampy -p /dev/ttyUSB0 ls

You can verify the content of main.py using

ampy -p /dev/ttyUSB0 get main.py
Posted by Uli Köhler in Embedded, MicroPython, Python

How to read IDF diabetes statistics in Python using Pandas

The International Diabetes Foundation provides a Data portal with various statistics related to diabetes.

In this post we’ll show how to read the Diabetes estimates (20-79 y) / People with diabetes, in 1,000s data export in CSV format using pandas.

First download IDF (people-with-diabetes--in-1-000s).csv from the data page.

Now we can parse the CSV file:

import pandas as pd

# Download at https://www.diabetesatlas.org/data/en/indicators/1/
df = pd.read_csv("IDF (people-with-diabetes--in-1-000s).csv")
# Parse year columns to obtain floats and multiply by thousands factor. Pandas fails to parse values like "12,345.67"
for column in df.columns:
    try:
        int(column)
        df[column] = df[column].apply(lambda s: None if s == "-" else float(s.replace(",", "")) * 1000)
    except:
        pass

As you can see in the postprocessing step, the number of diabetes patients are given in 1000s in the CSV, so we multiply them by 1000 to obtain the actual numbers.

If you want to modify the data columns (i.e. the columns referring to year), you can use this simple template:

for column in df.columns:
    try:
        int(column) # Will raise ValueError() if column is not a year number
        # Whatever you do here will only be applied to year columns
        df[column] = df[column] * 0.75 # Example on how to modify a column
        # But note that if your code raises an Exception, it will be ignored!
    except:
        pass

Let’s plot some data:

regions = df[df["Type"] == "Region"] # Only regions, not individual countries

from matplotlib import pyplot as plt
plt.style.use("ggplot")
plt.gcf().set_size_inches(20,4)
plt.ylabel("Diabetes patients [millions]")
plt.xlabel("Region")
plt.title("Diabetes patients in 2019 by region")
plt.bar(regions["Country/Territory"], regions["2019"] / 1e6)

Note that if you use a more recent dataset than the version I’m using the 2019 column might not exist in your CSV file. Choose an appropriate column in that case.

Posted by Uli Köhler in Bioinformatics, Python

Parsing World Population Prospects (WPP) XLSX data in Python

The United Nations provides the Word Population Prospects (WPP) dataset on geographic and age distribution of mankind as downloadable XLSX files.

Reading these files in Python is rather easy. First we have to find out how many rows to skip. For the 2019 WPP dataset this value is 16 since row 17 contains all the column headers. The number of rows to skip might be different depending on the dataset. We’re using WPP2019_POP_F07_1_POPULATION_BY_AGE_BOTH_SEXES.xlsx in this example.

We can use Pandas read_excel() function to import the dataset in Python:

import pandas as pd

df = pd.read_excel("WPP2019_INT_F03_1_POPULATION_BY_AGE_ANNUAL_BOTH_SEXES.xlsx", skiprows=16, na_values=["..."])

This will take a few seconds until the large dataset has been processed. Now we can check if skiprows=16 is the correct value. It is correct if pandas did recognize the column names correctly:

>>> df.columns
Index(['Index', 'Variant', 'Region, subregion, country or area *', 'Notes',
       'Country code', 'Type', 'Parent code', 'Reference date (as of 1 July)',
       '0-4', '5-9', '10-14', '15-19', '20-24', '25-29', '30-34', '35-39',
       '40-44', '45-49', '50-54', '55-59', '60-64', '65-69', '70-74', '75-79',
       '80-84', '85-89', '90-94', '95-99', '100+'],
      dtype='object')

Now let’s filter for a country:

russia = df[df["Region, subregion, country or area *"] == 'Russian Federation']

This will show us the population data for multiple years in 5-year intervals from 1950 to 2020. Now let’s filter for the most recent year:

russia.loc[russia["Reference date (as of 1 July)"].idxmax()]

This will show us a single dataset:

Index                                                 3255
Variant                                          Estimates
Region, subregion, country or area *    Russian Federation
Notes                                                  NaN
Country code                                           643
Type                                          Country/Area
Parent code                                            923
Reference date (as of 1 July)                         2020
0-4                                                9271.69
5-9                                                9350.92
10-14                                              8174.26
15-19                                              7081.77
20-24                                               6614.7
25-29                                              8993.09
30-34                                              12543.8
35-39                                              11924.7
40-44                                              10604.6
45-49                                              9770.68
50-54                                              8479.65
55-59                                                10418
60-64                                              10073.6
65-69                                              8427.75
70-74                                              5390.38
75-79                                              3159.34
80-84                                              3485.78
85-89                                              1389.64
90-94                                              668.338
95-99                                              102.243
100+                                                 9.407
Name: 3254, dtype: object
​

How can we plot that data? First, we need to select all the columns that contain age data. We’ll do this by manually inserting the name of the first such column (0-4) into the following code and assuming that there are no columns after the last age column:

>>> df.columns[df.columns.get_loc("0-4"):]
Index(['0-4', '5-9', '10-14', '15-19', '20-24', '25-29', '30-34', '35-39',
       '40-44', '45-49', '50-54', '55-59', '60-64', '65-69', '70-74', '75-79',
       '80-84', '85-89', '90-94', '95-99', '100+'],
      dtype='object')

Now let’s select those columns from the russia dataset:

most_recent_russia = russia.loc[russia["Reference date (as of 1 July)"].idxmax()]
age_columns = df.columns[df.columns.get_loc("0-4"):]

russian_age_data = most_recent_russia[age_columns]

Let’s have a look at the dataset:

>>> russian_age_data
0-4      9271.69
5-9      9350.92
10-14    8174.26
15-19    7081.77
20-24     6614.7
25-29    8993.09
30-34    12543.8
35-39    11924.7
40-44    10604.6
45-49    9770.68
50-54    8479.65
55-59      10418
60-64    10073.6
65-69    8427.75
70-74    5390.38
75-79    3159.34
80-84    3485.78
85-89    1389.64
90-94    668.338
95-99    102.243
100+       9.407

That looks useable, note however that the values are in thousands, i.e. we have to multiply the values by 1000 to obtain the actual estimates of the population. Let’s plot it:

from matplotlib import pyplot as plt
plt.style.use("ggplot")

plt.title("Age composition of the Russian population (2020)")
plt.ylabel("People in age group [Millions]")
plt.xlabel("Age group")
plt.gcf().set_size_inches(15,5)
# Data is given in thousands => divide by 1000 to obtain millions
plt.plot(russian_age_data.index, russian_age_data.as_matrix() / 1000., lw=3)

The finished plot will look like this:

Here’s our finished script:

#!/usr/bin/env python3
import pandas as pd
df = pd.read_excel("WPP2019_POP_F07_1_POPULATION_BY_AGE_BOTH_SEXES.xlsx", skiprows=16)
# Filter only russia
russia = df[df["Region, subregion, country or area *"] == 'Russian Federation']

# Filter only most recent estimate (1 row)
most_recent_russia = russia.loc[russia["Reference date (as of 1 July)"].idxmax()]
# Retain only value columns
age_columns = df.columns[df.columns.get_loc("0-4"):]
russian_age_data = most_recent_russia[age_columns]

# Plot!
from matplotlib import pyplot as plt
plt.style.use("ggplot")

plt.title("Age composition of the Russian population (2020)")
plt.ylabel("People in age group [Millions]")
plt.xlabel("Age group")
plt.gcf().set_size_inches(15,5)
# Data is given in thousands => divide by 1000 to obtain millions
plt.plot(russian_age_data.index, russian_age_data.as_matrix() / 1000., lw=3)

# Export as SVG
plt.savefig("russian-demographics.svg")

 

 

Posted by Uli Köhler in Bioinformatics, Data science, Python

How to save Matplotlib plot to string as SVG

You can use StringIO to save a Matplotlib plot to a string without saving it to an intermediary file:

from matplotlib import pyplot as plt
plt.plot([0, 1], [2, 3]) # Just a minimal showcase

# Save plot to StringIO
from io import StringIO
i = StringIO()
plt.savefig(i, format="svg")

# How to access the string
print(i.getvalue())

Note that unless you save to a file you need to set the format=... parameter when calling plt.savefig(). If saving to a file, Matplotlib will try to derive the format from the filename extension (like .svg)

Posted by Uli Köhler in Python

How to map MeSH ID to MeSH term using Python

Our MeSH-JSON project provides a pre-compiled MeSH-ID-to-term map as JSON:

Download here or use this command to download:

wget "http://techoverflow.net/downloads/mesh.json.gz"

 

How to use in Python:

#!/usr/bin/env python3
import json
import gzip

# How to load
with gzip.open("mesh.json.gz", "rb") as infile:
    mesh_id_to_term = json.load(infile)

# Usage example
print(mesh_id_to_term["D059630"]) # Prints 'Mesenchymal Stem Cells'

 

Posted by Uli Köhler in Bioinformatics, Python

How to parse all PubMed baseline files in parallel using Python

In our previous post How to parse PubMed baseline data using Python we investigate how to use the pubmed_parser library to parse PubMed medline data using Python.

In this follow-up we’ll provide an example of how to use glob to select all PubMed baseline files in a directory and use concurrent.futures with tqdm to provide a convenient yet easy-to-use process parallelism using ProcessPoolExecutor and a progress bar UI for the command line.

First, install the requirements using

pip install git+git://github.com/titipata/pubmed_parser.git six numpy tqdm

Now download this script, ensure some files like pubmed20n0002.xml.gz or pubmed20n0004.xml.gz are in the same directory and run it:

#!/usr/bin/env python3
import pubmed_parser as pp
import glob
import os
from collections import Counter
import concurrent.futures
from tqdm import tqdm

# Source: https://techoverflow.net/2017/05/18/how-to-use-concurrent-futures-map-with-a-tqdm-progress-bar/
def tqdm_parallel_map(executor, fn, *iterables, **kwargs):
    """
    Equivalent to executor.map(fn, *iterables),
    but displays a tqdm-based progress bar.
    
    Does not support timeout or chunksize as executor.submit is used internally
    
    **kwargs is passed to tqdm.
    """
    futures_list = []
    for iterable in iterables:
        futures_list += [executor.submit(fn, i) for i in iterable]
    for f in tqdm(concurrent.futures.as_completed(futures_list), total=len(futures_list), **kwargs):
        yield f.result()

def parse_and_process_file(filename):
    """
    This function contains our parsing code. Usually, you would only modify this function.
    """
    # Don't parse authors and references for this example, since we don't need it
    dat = pp.parse_medline_xml(filename, author_list=False, reference_list=False)

    # For this example, we'll build a set of count of all MeSH IDs in this file
    ctr = Counter()
    for entry in dat:
        terms = [term.partition(":")[0].strip() for term in entry["mesh_terms"].split(";")]
        for term in terms:
            ctr[term] += 1
    return filename, ctr


if __name__ == "__main__":
    # Find all pubmed files in the current directory
    all_filenames = glob.glob("pubmed*.xml.gz")
    # For some workloads you might want to use a ThreadPoolExecutor,
    # but a ProcessPoolExecutor is a good default
    executor = concurrent.futures.ProcessPoolExecutor(os.cpu_count())
    # Iterate results as they come in (the order is not the same as in the input!)
    for filename, ctr in tqdm_parallel_map(executor, parse_and_process_file, all_filenames):
        # NOTE: If you print() here, this might interfere with the progress bar,
        # but we accept that here since it's just an example
        print(filename, ctr)

Now you can start modifying the example, most notably the parse_and_process_file() function to do whatever processing you intend to do.

Posted by Uli Köhler in Bioinformatics, Python

How to parse PubMed baseline data using Python

Want to parse more than one PubMed file? Also see our follow-up at How to parse all PubMed baseline files in parallel using Python

PubMed provides a data dump of metadata from all PubMed articles on the NCBI Servers.

In this example, we’ll parse one of the compressed metadata XML files using the pubmed_parser Python library.

First, download one of the .xml.gz files. For this example, we’ll use pubmed20n0001.xml.gz.

Now we can install the required libraries:

pip install git+git://github.com/titipata/pubmed_parser.git six numpy

Now you can download & run our script. For this example, we will extract a list of MeSH terms for every  and print PubMed ID and a list of MeSH IDs.

#!/usr/bin/env python3
import pubmed_parser as pp

# Don't parse authors and references for this example, since we don't need it
dat = pp.parse_medline_xml("pubmed20n0001.xml.gz", author_list=False, reference_list=False)
# Iterate PubMed entries from that file
for entry in dat:
    # entry["mesh_terms"] is like "D000818:Animal; ..."
    # In this example, we are only interested in the MeSH ID, like D000818.
    # Print the PubMed ID, followed by a list of MeSH terms.
    print(entry["pmid"], [
        term.partition(":")[0].strip() for term in entry["mesh_terms"].split(";")
    ])

Running this script takes 13.3 seconds on my Notebook which is equivalent to about 1.4 MBytes of GZipped input data per second. When running the script, you will see lines like

30957 ['D000319', 'D001794', 'D003864', 'D006801', 'D006973', 'D007676', 'D010869']

which means that the PubMed Article with ID 30957 has the MeSH terms ['D000319', 'D001794', 'D003864', 'D006801', 'D006973', 'D007676', 'D010869'].

See the pubmed_parser documentation, or just try it out interactively for more information on what fields are available in the entries you can iterate.

Posted by Uli Köhler in Bioinformatics, Python

How to install PyPy3 + virtual environment in 30 seconds

TL;DR:

Run this

wget -qO- https://techoverflow.net/scripts/pypy3-installer.sh | bash

then run vpypy every time you want to activate (you might need to restart). The script currently assumes you are running Linux x86_64 and have installed virtualenv (sudo apt install virtualenv or similar if you don’t have it installed)

Full description:

PyPy is an alternate Python implementation that can be used to speed up many workloads. However, installing it is a somewhat cumbersome process, especially if you don’t have too much experience with virtual environments and related concepts.

We provide a script that automatically downloads PyPy3, installs it to ~/.pypy3 and creates a virtual environment in ~/.pypy3-virtualenv. After that, it creates a shell alias vpypy that aliases to source ~/.pypy3-virtualenv/bin/activate and hence provides an easily memoizable way of activating the environment without requiring the user to memoize the directory.

Also, since both pypy3 itself and the virtual environment are  installed in the user’s home directory, running this script does not require admin permissions.

After running the script using

wget -qO- https://techoverflow.net/scripts/pypy3-installer.sh | bash

you can activate the virtual environment using the vpypy alias that is automatically added to ~/.bashrc and ~/.zshrc. Restart your shell for the alias definition to load, then run vpypy:

uli@uli-laptop ~ % vpypy
(.pypy3-virtualenv) uli@uli-laptop ~ % 

You can see that the prompt has changed. Now you can use pip (which will install packages locally to the PyPy3 virtualenv), python (which maps to pypy3) and other related executables. In order to run a script using PyPy, just run python myscript.py

Full source code:

#!/bin/bash
# TechOverflow's 30-second Pypy3 virtual environment generator
# This script is released under CC0 1.0 Universal
DIRECTORY=~/.pypy3
VENV_DIRECTORY=~/.pypy3-virtualenv
VERSION=pypy3.6-v7.3.0-linux64

# Download (or use existing) pypy3
if [ -d "$DIRECTORY" ]; then
    echo "Skipping PyPy download, already exists"
else
    echo "Downloading PyPy to $DIRECTORY"
    # Download & extract to DIRECTORY
    wget https://techoverflow.net/downloads/${VERSION}.tar.bz2 -O /tmp/${VERSION}.tar.bz2
    bash -c "cd /tmp && tar xjvf ${VERSION}.tar.bz2"
    mv /tmp/${VERSION} $DIRECTORY
    rm /tmp/${VERSION}.tar.bz2
fi

# Create virtualenv
if [ -d "$VENV_DIRECTORY" ]; then
    echo "Skipping to create pypy3 virtualenv, already exists"
else
    echo "Creating PyPy virtual environment in $VENV_DIRECTORY"
    virtualenv -p ${DIRECTORY}/bin/pypy3 ${VENV_DIRECTORY}
fi

# Create "vpypy" shortcut
set -x
vpypy
result="$?"
set +x
if [ "$result" -ne 127 ]; then
    echo "Skipping to create vpypy shortcut, already exists in current shell"
else
    echo "Creating bash/zsh shortcut 'vpypy'"
    if [ -f ~/.bashrc ]; then
        echo -e "\n# TechOverflow PyPy installer\nalias vpypy='source ${VENV_DIRECTORY}/bin/activate'\n" >> ~/.bashrc
    fi
    if [ -f ~/.zshrc ]; then
        echo -e "\n# TechOverflow PyPy installer\nalias vpypy='source ${VENV_DIRECTORY}/bin/activate'\n" >> ~/.zshrc
    fi
    # Activate shortcut in current shell (but do not automatically activate virtual environment)
    alias vpypy='source ${VENV_DIRECTORY}/bin/activate'
fi

echo -e "\n\nPyPy installation finished. Restart your shell, then run 'vpypy' to activate the virtual environment"

 

 

Posted by Uli Köhler in Linux, Python

Is pypng 16-bit PNG encoding faster using pypy on the Raspberry Pi?

In our previous post How to save Raspberry Pi raw 10-bit image as 16-bit PNG using pypng we investigated how to use the pypng library to save 10-bit raw Raspberry Pi Camera images to 16-bit PNG files.

However, saving a single image took ~26 seconds using CPython 3.7.3. Since pypy can provide speedups to many Python workloads, we tried using pypy3 7.0.0 (see How to install pypy3 on the Raspberry Pi) to speed up the PNG encoding.

Results

pypng PNG export seems to be one of the workloads that are much slower using pypy3.

  • CPython 3.7.3: Encoding took 24.22 seconds
  • pypy3 7.0.0: Encoding took 266.60 seconds

Encoding is more that 10x slower when using pypy3!

Hence I don’t recommend using pypy3 to speed up pypng encoding workloads, at least not on the Raspberry Pi!

Full example

This example is derived from our full example previously posted on How to save Raspberry Pi raw 10-bit image as 16-bit PNG using pypng:

#!/usr/bin/env python3
import time
import picamera
import picamera.array
import numpy as np
import png

# Capture image
print("Capturing image...")
with picamera.PiCamera() as camera:
    with picamera.array.PiBayerArray(camera) as stream:
        camera.capture(stream, 'jpeg', bayer=True)
        # Demosaic data and write to rawimg
        # (stream.array contains the non-demosaiced data)
        rawimg = stream.demosaic()

# Write to PNG
print("Writing 16-bit PNG...")
t0 = time.time()
with open('16bit.png', 'wb') as outfile:
    writer = png.Writer(width=rawimg.shape[1], height=rawimg.shape[0], bitdepth=16, greyscale=False)
    # rawimg is a (w, h, 3) RGB uint16 array
    # but PyPNG needs a (w, h*3) array
    png_data = np.reshape(rawimg, (-1, rawimg.shape[1]*3))
    # Scale 10 bit data to 16 bit values (else it will appear black)
    # NOTE: Depending on your photo and the settings,
    #  it might still appear quite dark!
    png_data *= int(2**6)
    writer.write(outfile, png_data)
t1 = time.time()

print(f"Encoding took {(t1 - t0):.2f} seconds")

 

Posted by Uli Köhler in Python, Raspberry Pi

How to install pypy3 on the Raspberry Pi

This post shows you an easy way of getting pypy3 running on the Raspberry Pi. I used Raspbian Buster on a Raspberry Pi 3 for this example. On Raspbian buster this will install pypy3 7.x!

First install pypy3 and virtualenv:

sudo apt update && sudo apt -y install pypy3 pypy3-dev virtualenv

Now we can create a virtualenv to install pypy packages into:

virtualenv -p /usr/bin/pypy3 ~/pypy3-virtualenv

Now we can activate the virtualenv. You need to do this every time you want to use pypy, for each shell / SSH connection separately:

source ~/pypy3-virtualenv/bin/activate

If your shell prompt is now prefixed by (pypy3-virtualenv) you have successfully activated the virtualenv:

(pypy3-virtualenv) uli@raspberrypi:~ $

Now python points to pypy3 and pip will install packages locally to ~/pypy3-virtualenv.

Now you can use e.g.

python3 myscript.py

to run your script (both python and python3 will point to pypy3 if you activated the virtual environment!).

Note: Installing pypy3-dev is not strictly neccessary to get pypy3 running, but you need it in order to compile native librarie like numpy.

Posted by Uli Köhler in Python, Raspberry Pi

How to capture RaspberryPi camera 10-bit raw image in Python

You can use the picamera Python library to capture a raw sensor image of a camera attached to the Raspberry Pi via CSI:

#!/usr/bin/env python3
import picamera
import picamera.array
import numpy as np

# Capture image
print("Capturing image...")
with picamera.PiCamera() as camera:
    with picamera.array.PiBayerArray(camera) as stream:
        camera.capture(stream, 'jpeg', bayer=True)
        # Demosaic data and write to rawimg
        # (stream.array contains the non-demosaiced data)
        rawimg = stream.demosaic()

rawimg is a numpy uint16 array of dimensions (w, h, 3), e.g. (1944, 2592, 3) and contains integer values from 0 to 1023.

You can, for example, save it in a NumPy file using

np.save("rawimg.npy", rawimg) # Reload with np.load("rawimg.npy")

or save it in a compressed format using

np.savez_compressed("rawimg.npz", rawimg) # Reload with np.load("rawimg.npz")
Posted by Uli Köhler in Python, Raspberry Pi

How to fix ModuleNotFoundError: No module named ‘picamera’

Problem:

You want to run a Python script using the Raspberry Pi camera but you see an error message like

Traceback (most recent call last):
  File "mycamera.py", line 2, in <module>
    import picamera
ModuleNotFoundError: No module named 'picamera'

Solution:

You need to install the picamera Python module using pip:

sudo pip3 install picamera

or, if you are still using Python 2.x:

sudo pip install picamera

In case you see

sudo: pip3: command not found

install pip3 using

sudo apt install -y python3-pip

 

Posted by Uli Köhler in Python, Raspberry Pi

Simulating survival data for Kaplan-Meier plots in Python

Libraries like lifelines provide a plethora of example datasets that one can work with. However, for many tasks you need to simulate specific behaviour in survival curves.

In this post, we demonstrate a simple algorithm to generate survival data in a format comparable to the one used in the lifelines example datasets like load_leukemia().

The generation algorithm is based on the following assumptions:

  • There is a strict survival plateau with a given survival probability starting at a given point in time
  • The progression from 100% survival, t=0 to the survival plateau is approximately linear (i.e. if you would generate an infinite number of datapoints, the survival curve would be linear)
  • No censoring events shall be generated except for censoring all surviving participants at the end point of the timeline.

Code:

import numpy as np
import random
from lifelines import KaplanMeierFitter

def simulate_survival_data_linear(N, survival_plateau, t_plateau, t_end):
    """
    Generate random simulated survival data using a linear model
    
    Keyword parameters
    ------------------
    N : integer
        Number of entries to generate
    survival_plateau : float
        The survival probability of the survival plateau
    t_plateau : float
        The time point where the survival plateau starts
    t_end : float
        The time point where all surviving participants will be censored.
    
    Returns
    -------
    A dict with "Time" and "Event" numpy arrays: 0 is censored, 1 is event
    """
    data = {"Time": np.zeros(N), "Event": np.zeros(N)}

    for i in range(N):
        r = random.random()
        if r <= survival_plateau:
            # Event is censoring at the end of the time period
            data["Time"][i] = t_end
            data["Event"][i] = 0
        else: # Event occurs
            # Normalize where we are between 100% and the survival plateau
            p = (r - survival_plateau) / (1 - survival_plateau)
            # Linear model: Time of event linearly depends on uniformly & randomly chosen position
            #  in range (0...tplateau)
            t = p * t_plateau
            data["Time"][i] = t
            data["Event"][i] = 1
    return data

# Example usage
data1 = simulate_survival_data_linear(250, 0.2, 18, 24)
data2 = simulate_survival_data_linear(250, 0.4, 17.2, 24)

Given data1 and data2 (see the usage example at the end of the code) you can plot them using

# Plot bad subgroup
kmf1 = KaplanMeierFitter()
kmf1.fit(data1["Time"], event_observed=data1["Event"], label="Bad subgroup")
ax = kmf1.plot()

# Plot good subgroup
kmf2 = KaplanMeierFitter()
kmf2.fit(data2["Time"], event_observed=data2["Event"], label="Good subgroup")
ax = kmf2.plot(ax=ax)

# Set Y axis to fixed scale
ax.set_ylim([0.0, 1.0])

Thi

Do not want a survival plateau?

Just set t_end = t_survival:

# Example usage
data1 = simulate_survival_data_linear(250, 0.2, 24, 24)
data2 = simulate_survival_data_linear(250, 0.4, 24, 24)
# Code to plot: See above

What happens if you have a low number of participants?

Let’s use 25 instead of 250 as above:

# Example usage
data1 = simulate_survival_data_linear(25, 0.2, 24, 24)
data2 = simulate_survival_data_linear(25, 0.4, 24, 24)
# Plot code: See above

Although we generated the data with the same data, the difference is much less clear in this example, especially towards the end of the timeline (note however that the data is generated randomly, so you might see a different result). You can see a large portion of the confidence intervals overlappings near t=24. In other words, based on this data it is not clear that the two groups of patients are significantly different (in other words, P \geq 0.05)

Posted by Uli Köhler in Python, Statistics