Python

Is pypng 16-bit PNG encoding faster using pypy on the Raspberry Pi?

In our previous post How to save Raspberry Pi raw 10-bit image as 16-bit PNG using pypng we investigated how to use the pypng library to save 10-bit raw Raspberry Pi Camera images to 16-bit PNG files.

However, saving a single image took ~26 seconds using CPython 3.7.3. Since pypy can provide speedups to many Python workloads, we tried using pypy3 7.0.0 (see How to install pypy3 on the Raspberry Pi) to speed up the PNG encoding.

Results

pypng PNG export seems to be one of the workloads that are much slower using pypy3.

  • CPython 3.7.3: Encoding took 24.22 seconds
  • pypy3 7.0.0: Encoding took 266.60 seconds

Encoding is more that 10x slower when using pypy3!

Hence I don’t recommend using pypy3 to speed up pypng encoding workloads, at least not on the Raspberry Pi!

Full example

This example is derived from our full example previously posted on How to save Raspberry Pi raw 10-bit image as 16-bit PNG using pypng:

#!/usr/bin/env python3
import time
import picamera
import picamera.array
import numpy as np
import png

# Capture image
print("Capturing image...")
with picamera.PiCamera() as camera:
    with picamera.array.PiBayerArray(camera) as stream:
        camera.capture(stream, 'jpeg', bayer=True)
        # Demosaic data and write to rawimg
        # (stream.array contains the non-demosaiced data)
        rawimg = stream.demosaic()

# Write to PNG
print("Writing 16-bit PNG...")
t0 = time.time()
with open('16bit.png', 'wb') as outfile:
    writer = png.Writer(width=rawimg.shape[1], height=rawimg.shape[0], bitdepth=16, greyscale=False)
    # rawimg is a (w, h, 3) RGB uint16 array
    # but PyPNG needs a (w, h*3) array
    png_data = np.reshape(rawimg, (-1, rawimg.shape[1]*3))
    # Scale 10 bit data to 16 bit values (else it will appear black)
    # NOTE: Depending on your photo and the settings,
    #  it might still appear quite dark!
    png_data *= int(2**6)
    writer.write(outfile, png_data)
t1 = time.time()

print(f"Encoding took {(t1 - t0):.2f} seconds")

 

Posted by Uli Köhler in Python, Raspberry Pi

How to install pypy3 on the Raspberry Pi

This post shows you an easy way of getting pypy3 running on the Raspberry Pi. I used Raspbian Buster on a Raspberry Pi 3 for this example. On Raspbian buster this will install pypy3 7.x!

First install pypy3 and virtualenv:

sudo apt update && sudo apt -y install pypy3 pypy3-dev virtualenv

Now we can create a virtualenv to install pypy packages into:

virtualenv -p /usr/bin/pypy3 ~/pypy3-virtualenv

Now we can activate the virtualenv. You need to do this every time you want to use pypy, for each shell / SSH connection separately:

source ~/pypy3-virtualenv/bin/activate

If your shell prompt is now prefixed by (pypy3-virtualenv) you have successfully activated the virtualenv:

(pypy3-virtualenv) uli@raspberrypi:~ $

Now python points to pypy3 and pip will install packages locally to ~/pypy3-virtualenv.

Now you can use e.g.

python3 myscript.py

to run your script (both python and python3 will point to pypy3 if you activated the virtual environment!).

Note: Installing pypy3-dev is not strictly neccessary to get pypy3 running, but you need it in order to compile native librarie like numpy.

Posted by Uli Köhler in Python, Raspberry Pi

How to capture RaspberryPi camera 10-bit raw image in Python

You can use the picamera Python library to capture a raw sensor image of a camera attached to the Raspberry Pi via CSI:

#!/usr/bin/env python3
import picamera
import picamera.array
import numpy as np

# Capture image
print("Capturing image...")
with picamera.PiCamera() as camera:
    with picamera.array.PiBayerArray(camera) as stream:
        camera.capture(stream, 'jpeg', bayer=True)
        # Demosaic data and write to rawimg
        # (stream.array contains the non-demosaiced data)
        rawimg = stream.demosaic()

rawimg is a numpy uint16 array of dimensions (w, h, 3), e.g. (1944, 2592, 3) and contains integer values from 0 to 1023.

You can, for example, save it in a NumPy file using

np.save("rawimg.npy", rawimg) # Reload with np.load("rawimg.npy")

or save it in a compressed format using

np.savez_compressed("rawimg.npz", rawimg) # Reload with np.load("rawimg.npz")
Posted by Uli Köhler in Python, Raspberry Pi

How to fix ModuleNotFoundError: No module named ‘picamera’

Problem:

You want to run a Python script using the Raspberry Pi camera but you see an error message like

Traceback (most recent call last):
  File "mycamera.py", line 2, in <module>
    import picamera
ModuleNotFoundError: No module named 'picamera'

Solution:

You need to install the picamera Python module using pip:

sudo pip3 install picamera

or, if you are still using Python 2.x:

sudo pip install picamera

In case you see

sudo: pip3: command not found

install pip3 using

sudo apt install -y python3-pip

 

Posted by Uli Köhler in Python, Raspberry Pi

Simulating survival data for Kaplan-Meier plots in Python

Libraries like lifelines provide a plethora of example datasets that one can work with. However, for many tasks you need to simulate specific behaviour in survival curves.

In this post, we demonstrate a simple algorithm to generate survival data in a format comparable to the one used in the lifelines example datasets like load_leukemia().

The generation algorithm is based on the following assumptions:

  • There is a strict survival plateau with a given survival probability starting at a given point in time
  • The progression from 100% survival, t=0 to the survival plateau is approximately linear (i.e. if you would generate an infinite number of datapoints, the survival curve would be linear)
  • No censoring events shall be generated except for censoring all surviving participants at the end point of the timeline.

Code:

import numpy as np
import random
from lifelines import KaplanMeierFitter

def simulate_survival_data_linear(N, survival_plateau, t_plateau, t_end):
    """
    Generate random simulated survival data using a linear model
    
    Keyword parameters
    ------------------
    N : integer
        Number of entries to generate
    survival_plateau : float
        The survival probability of the survival plateau
    t_plateau : float
        The time point where the survival plateau starts
    t_end : float
        The time point where all surviving participants will be censored.
    
    Returns
    -------
    A dict with "Time" and "Event" numpy arrays: 0 is censored, 1 is event
    """
    data = {"Time": np.zeros(N), "Event": np.zeros(N)}

    for i in range(N):
        r = random.random()
        if r <= survival_plateau:
            # Event is censoring at the end of the time period
            data["Time"][i] = t_end
            data["Event"][i] = 0
        else: # Event occurs
            # Normalize where we are between 100% and the survival plateau
            p = (r - survival_plateau) / (1 - survival_plateau)
            # Linear model: Time of event linearly depends on uniformly & randomly chosen position
            #  in range (0...tplateau)
            t = p * t_plateau
            data["Time"][i] = t
            data["Event"][i] = 1
    return data

# Example usage
data1 = simulate_survival_data_linear(250, 0.2, 18, 24)
data2 = simulate_survival_data_linear(250, 0.4, 17.2, 24)

Given data1 and data2 (see the usage example at the end of the code) you can plot them using

# Plot bad subgroup
kmf1 = KaplanMeierFitter()
kmf1.fit(data1["Time"], event_observed=data1["Event"], label="Bad subgroup")
ax = kmf1.plot()

# Plot good subgroup
kmf2 = KaplanMeierFitter()
kmf2.fit(data2["Time"], event_observed=data2["Event"], label="Good subgroup")
ax = kmf2.plot(ax=ax)

# Set Y axis to fixed scale
ax.set_ylim([0.0, 1.0])

Thi

Do not want a survival plateau?

Just set t_end = t_survival:

# Example usage
data1 = simulate_survival_data_linear(250, 0.2, 24, 24)
data2 = simulate_survival_data_linear(250, 0.4, 24, 24)
# Code to plot: See above

What happens if you have a low number of participants?

Let’s use 25 instead of 250 as above:

# Example usage
data1 = simulate_survival_data_linear(25, 0.2, 24, 24)
data2 = simulate_survival_data_linear(25, 0.4, 24, 24)
# Plot code: See above

Although we generated the data with the same data, the difference is much less clear in this example, especially towards the end of the timeline (note however that the data is generated randomly, so you might see a different result). You can see a large portion of the confidence intervals overlappings near t=24. In other words, based on this data it is not clear that the two groups of patients are significantly different (in other words, P \geq 0.05)

Posted by Uli Köhler in Python, Statistics

How to use custom legend label in lifelines Kaplan-Meier plot?

Using the lifelines library, you can easily plot Kaplan-Meier plots, e.g. as seen in our previous post Minimal Python Kaplan-Meier Plot example:

from lifelines.datasets import load_leukemia
from lifelines import KaplanMeierFitter
df = load_leukemia()

kmf = KaplanMeierFitter()
kmf.fit(df['t'], df['Rx']) # t = Timepoints, Rx: 0=censored, 1=event
kmf.plot()

What if you want a custom label instead of KM_estimates to appear in the legend?

Use kmf.fit(..., label='Your label'). Since we use the leukemias dataset for this example, we use the label 'Leukemia'

Full example:

from lifelines.datasets import load_leukemia
from lifelines import KaplanMeierFitter
# Load datasets
df_leukemia = load_leukemia()

# Fit & plot leukemia dataset
kmf_leukemia = KaplanMeierFitter()
kmf_leukemia.fit(df_leukemia['t'], df_leukemia['Rx'], label="Leukemia")
ax = kmf_leukemia.plot()
# Set Y axis to fixed scale
ax.set_ylim([0.0, 1.0])

Posted by Uli Köhler in Python, Statistics

How plot multiple Kaplan-Meier curves using lifelines

Using the lifelines library, you can easily plot Kaplan-Meier plots, e.g. as seen in our previous post Minimal Python Kaplan-Meier Plot example:

from lifelines.datasets import load_leukemia
from lifelines import KaplanMeierFitter
df = load_leukemia()

kmf = KaplanMeierFitter()
kmf.fit(df['t'], df['Rx']) # t = Timepoints, Rx: 0=censored, 1=event
kmf.plot()

What if you want to plot multiple survival curves?

The call to kmf.plot() returns a Matplotlib ax object which you can use on a second kmf2.plot() call als argument: kmf2.plot(ax=ax).

Full example (note that we also set a fixed Y range of [0.0, 1.0], see How to make Y axis start from 0 in lifelines Kaplan-Meier plots):

from lifelines.datasets import load_leukemia, load_lymphoma
from lifelines import KaplanMeierFitter
# Load datasets
df_leukemia = load_leukemia()
df_lymphoma = load_lymphoma()

# Fit & plot leukemia dataset
kmf_leukemia = KaplanMeierFitter()
kmf_leukemia.fit(df_leukemia['t'], df_leukemia['Rx'], label="Leukemia")
ax = kmf_leukemia.plot()

# Fit & plot lymphoma dataset
kmf_lymphoma = KaplanMeierFitter()
kmf_lymphoma.fit(df_lymphoma['Time'], df_lymphoma['Censor'], label="Lymphoma")
ax = kmf_lymphoma.plot(ax=ax)

# Set Y axis to fixed scale
ax.set_ylim([0.0, 1.0])

Posted by Uli Köhler in Python, Statistics

How to make Y axis start from 0 in lifelines Kaplan-Meier plots

Using the lifelines library, you can easily plot Kaplan-Meier plots, e.g. as seen in our previous post Minimal Python Kaplan-Meier Plot example:

from lifelines.datasets import load_leukemia
from lifelines import KaplanMeierFitter
df = load_leukemia()

kmf = KaplanMeierFitter()
kmf.fit(df['t'], df['Rx']) # t = Timepoints, Rx: 0=censored, 1=event
kmf.plot()

What if you want to make the Y axis start from 0.0 and not from approx. 0.2 as in this example?

Remember that lifelines just calls matplotlib internally and km.plot() returns the ax object that you can use to manipulate the plot. In this specific case, you can use

ax.set_ylim([0.0, 1.0])

to stop autoranging the Y axis and set its range to fixed [0.0, 1.0].

Full example:

from lifelines.datasets import load_leukemia
from lifelines import KaplanMeierFitter
df = load_leukemia()

kmf = KaplanMeierFitter()
kmf.fit(df['t'], df['Rx']) # t = Timepoints, Rx: 0=censored, 1=event
ax = kmf.plot()
# Set Y axis range to [0.0, 1.0]
ax.set_ylim([0.0, 1.0])

Posted by Uli Köhler in Python, Statistics

How to hide confidence interval in lifelines Kaplan-Meier plots

Using the lifelines library, you can easily plot Kaplan-Meier plots, e.g. as seen in our previous post Minimal Python Kaplan-Meier Plot example:

from lifelines.datasets import load_leukemia
from lifelines import KaplanMeierFitter
df = load_leukemia()

kmf = KaplanMeierFitter()
kmf.fit(df['t'], df['Rx']) # t = Timepoints, Rx: 0=censored, 1=event
kmf.plot()

What if you just want to show the dark blue curve and hide the light-blue confidence interval?

Easy: Just use km.plot(ci_show=False). ci in ci_show means confidence interval. Example:

from lifelines.datasets import load_leukemia
from lifelines import KaplanMeierFitter
df = load_leukemia()

kmf = KaplanMeierFitter()
kmf.fit(df['t'], df['Rx']) # t = Timepoints, Rx: 0=censored, 1=event
kmf.plot(ci_show=False)

Posted by Uli Köhler in Python, Statistics

Minimal Python Kaplan-Meier Plot example

Install the lifelines library:

sudo pip3 install lifelines

Now you can use this snippet to create a basic Kaplan-Meier plot from an example dataset included with the library:

from lifelines.datasets import load_leukemia
from lifelines import KaplanMeierFitter
# Load example dataset
df = load_leukemia()

# Create model from data
kmf = KaplanMeierFitter()
kmf.fit(df['t'], df['Rx']) # t = Timepoints, Rx: 0=censored, 1=event
# Plot model's survival curve
kmf.plot()

Note that different datasets might have different names for the time column (t in this example) and the event/censoring column (Rx in this example)

We exported the plot using

import matplotlib.pyplot as plt 
plt.savefig("Kaplan-Meier-Example-Leukemias.svg")

 

Posted by Uli Köhler in Python, Statistics

How to sort files by modification date in Python

You can use sorted() to sort the files, and use key=lambda t: os.stat(t).st_mtime to sort by modification time. This will sort with ascending modification time (i.e. most recently modified file last).

In case you want to sort with descending modification time (i.e. most recently modified file first), use key=lambda t: -os.stat(t).st_mtime

Full example:

# List all files in the home directory
files = glob.glob(os.path.expanduser("~/*"))

# Sort by modification time (mtime) ascending and descending
sorted_by_mtime_ascending = sorted(files, key=lambda t: os.stat(t).st_mtime)
sorted_by_mtime_descending = sorted(files, key=lambda t: -os.stat(t).st_mtime)

 

Posted by Uli Köhler in Python

How to read str from binary file in Python

When you have a file-like object in Python, .read() will always return bytes. You can use any of the following solutions to get a str from the binary file.

Option 1: Decode the bytes

You can call .decode() on the bytes. Ensure to use the right encoding. utf-8 is often correct if you don’t know what the encoding is, but

binary = myfile.read() # type: bytes
text = binary.decode("utf-8")

# Short version
text = myfile.read().decode("utf-8")

Option 2: Wrap the file so it appears like a file in text mode

Use io.TextIOWrapper like this:

import io

text_file = io.TextIOWrapper(myfile, encoding="utf-8")
text = text_file.read()
Posted by Uli Köhler in Python

Wrap binary file-like object to get decoded text file-like object in Python

Problem:

In Python, you have a file-like object that reads binary data (e.g. if you open a file using open("my-file", "rb"))

You want to pass that file-like object to a function that expects a file-like object that expects a file-like object in text mode (i.e. where you can read str from, not bytes).

Solution:

Use io.TextIOWrapper:

with open("fp-lib-table", "rb") as infile:
    # infile.read() would return bytes
    text_infile = io.TextIOWrapper(infile)
    # text-infile.read() would return a str

In case you need to use a specific encoding, use encoding=...:

text_infile = io.TextIOWrapper(infile, encoding="utf-8")

 

Posted by Uli Köhler in Python

Reading and writing KiCAD libraries in Python

This snippet provides a parser and serializer for KiCAD schematic symbol libraries and DCM symbol documentation libraries. It will parse the KiCAD libraries into a number of records, consisting of a list of lines. An extended version of this snippet has been used in my KiCADSamacSysImporter project

__author__ = "Uli Köhler"
__license__ = "CC0 1.0 Universal"

class KiCADDocLibrary(object):
    def __init__(self, records=[]):
        self.records = records
        
    @staticmethod
    def read(file):
        # A record is everything between '$CMP ...' line and '$ENDCMP' line
        records = []
        current_record = None
        for line in file:
            line = line.strip()
            if line.startswith('$CMP '):
                current_record = []
            # Add line to record if we have any current record
            if current_record is not None:
                current_record.append(line)
            if line.startswith("$ENDCMP"):
                if current_record is not None:
                    records.append(current_record)
                    current_record = None
        return KiCADDocLibrary(records)
        
    def write(self, out):
        # Write header
        out.write("EESchema-DOCLIB  Version 2.0\n")
        # Write records
        for rec in self.records:
            out.write("#\n")
            for line in rec:
                out.write(line)
                out.write("\n")
        # Write footer
        out.write("#\n#End Doc Library\n")        

        
class KiCADSchematicSymbolLibrary(object):
    def __init__(self, records=[]):
        self.records = records
        
    @staticmethod
    def read(file):
        # A record is everything between 'DEF ...' line and 'ENDDEF' line
        records = []
        current_record = None
        current_comment_lines = [] # 
        for line in file:
            line = line.strip()
            if line.startswith("#encoding"):
                continue # Ignore - we always use #encoding utf-8
            # Comment line processing
            if line.startswith("#"):
                current_comment_lines.append(line)
            # Start of record
            if line.startswith('DEF '):
                current_record = current_comment_lines
                current_comment_lines = []
            # Add line to record if we have any current record
            if current_record is not None:
                current_record.append(line)
            if line.startswith("ENDDEF"):
                if current_record is not None:
                    records.append(current_record)
                    current_record = None
            # Clear comment lines
            # We can only do this now to avoid clearing them before
            #  they are used in the DEF clause
            if not line.startswith("#"):
                current_comment_lines = []
        return KiCADSchematicSymbolLibrary(records)
        
    def write(self, out):
        # Write header
        out.write("EESchema-LIBRARY Version 2.4\n#encoding utf-8\n")
        # Write records
        for rec in self.records:
            for line in rec:
                out.write(line)
                out.write("\n")
        # Write footer
        out.write("#\n#End Library\n")        

 

Posted by Uli Köhler in Electronics, KiCAD, Python

Python example: List files in ZIP archive

This example lists all the filenames contained in a ZIP file using Python’s built-in zipfile library.

#!/usr/bin/env python3
import zipfile

def list_files_in_zip(filename):
    """
    List all files inside a ZIP archive
    Yields filename strings.
    """
    with zipfile.ZipFile(filename) as thezip:
        for zipinfo in thezip.infolist():
            yield zipinfo.filename
                
# Usage example
for filename in list_files_in_zip("myarchive.zip"):
    print(filename)

 

Posted by Uli Köhler in Python

How to get milliseconds from pandas.Timedelta object

If you have a pandas.Timedelta object, you can use Timedelta.total_seconds() to get the seconds as a floating-point number with millisecond resolution and then multiply with one billion (1e3, the number of milliseconds in one second) to obtain the number of milliseconds in the Timedelta:

timedelta.total_seconds() * 1e3

In case you want an integer, use

int(round(timedelta.total_seconds() * 1e3))

Note that using round() is required here to avoid errors due to floating point precision.

or use this function definition:

def milliseconds_from_timedelta(timedelta):
    """Compute the milliseconds in a timedelta as floating-point number"""
    return timedelta.total_seconds() * 1e3

def milliseconds_from_timedelta_integer(timedelta):
    """Compute the milliseconds in a timedelta as integer number"""
    return int(round(timedelta.total_seconds() * 1e3))

# Usage example:
ms = milliseconds_from_timedelta(timedelta)
print(ms) # Prints 2000.752
ms = milliseconds_from_timedelta_integer(timedelta)
print(ms) # Prints 2001

 

Posted by Uli Köhler in Python

How to get microseconds from pandas.Timedelta object

If you have a pandas.Timedelta object, you can use Timedelta.total_seconds() to get the seconds as a floating-point number with microsecond resolution and then multiply with one million (1e6, the number of microseconds in one second) to obtain the number of microseconds in the Timedelta:

timedelta.total_seconds() * 1e6

In case you want an integer, use

int(round(timedelta.total_seconds() * 1e6))

Note that using round() is required here to avoid errors due to floating point precision.

or use this function definition:

def microseconds_from_timedelta(timedelta):
    """Compute the microseconds in a timedelta as floating-point number"""
    return timedelta.total_seconds() * 1e6

def microseconds_from_timedelta_integer(timedelta):
    """Compute the microseconds in a timedelta as integer number"""
    return int(round(timedelta.total_seconds() * 1e6))

# Usage example:
us = microseconds_from_timedelta(timedelta)
print(us) # Prints 2000751.9999999998

us = microseconds_from_timedelta_integer(timedelta)
print(us) # Prints 2000752
Posted by Uli Köhler in Python

How to get nanoseconds from pandas.Timedelta object

If you have a pandas.Timedelta object, you can use Timedelta.total_seconds() to get the seconds as a floating-point number with nanosecond resolution and then multiply with one billion (1e9, the number of nanoseconds in one second) to obtain the number of nanoseconds in the Timedelta:

timedelta.total_seconds() * 1e9

In case you want an integer, use

int(round(timedelta.total_seconds() * 1e9))

Note that using round() is required here to avoid errors due to floating point precision.

or use this function definition:

def nanoseconds_from_timedelta(timedelta):
    """Compute the nanoseconds in a timedelta as floating-point number"""
    return timedelta.total_seconds() * 1e9

def nanoseconds_from_timedelta_integer(timedelta):
    """Compute the nanoseconds in a timedelta as integer number"""
    return int(round(timedelta.total_seconds() * 1e9))

# Usage example:
ns = nanoseconds_from_timedelta(timedelta)
print(ns) # Prints 2000751999.9999998

ns = nanoseconds_from_timedelta_integer(timedelta)
print(ns) # Prints 2000752000

 

Posted by Uli Köhler in Python

How to create pandas.Timedelta object from two timestamps

If you have two pandas.Timestamp objects, you can simply substract them using the minus operator (-) in order to obtain a pandas.Timedelta object:

import pandas as pd
import time

# Create two timestamps
ts1 = pd.Timestamp.now()
time.sleep(2)
ts2 = pd.Timestamp.now()

# The difference of these timestamps is a pandas.Timedelta object.
timedelta = ts2 - ts1
print(timedelta) # Prints '0 days 00:00:02.000752'

 

Posted by Uli Köhler in Python