Python

How to save Matplotlib plot to string as SVG

You can use StringIO to save a Matplotlib plot to a string without saving it to an intermediary file:

from matplotlib import pyplot as plt
plt.plot([0, 1], [2, 3]) # Just a minimal showcase

# Save plot to StringIO
from io import StringIO
i = StringIO()
plt.savefig(i, format="svg")

# How to access the string
print(i.getvalue())

Note that unless you save to a file you need to set the format=... parameter when calling plt.savefig(). If saving to a file, Matplotlib will try to derive the format from the filename extension (like .svg)

Posted by Uli Köhler in Python

How to map MeSH ID to MeSH term using Python

Our MeSH-JSON project provides a pre-compiled MeSH-ID-to-term map as JSON:

Download here or use this command to download:

wget "http://techoverflow.net/downloads/mesh.json.gz"

 

How to use in Python:

#!/usr/bin/env python3
import json
import gzip

# How to load
with gzip.open("mesh.json.gz", "rb") as infile:
    mesh_id_to_term = json.load(infile)

# Usage example
print(mesh_id_to_term["D059630"]) # Prints 'Mesenchymal Stem Cells'

 

Posted by Uli Köhler in Bioinformatics, Python

How to parse all PubMed baseline files in parallel using Python

In our previous post How to parse PubMed baseline data using Python we investigate how to use the pubmed_parser library to parse PubMed medline data using Python.

In this follow-up we’ll provide an example of how to use glob to select all PubMed baseline files in a directory and use concurrent.futures with tqdm to provide a convenient yet easy-to-use process parallelism using ProcessPoolExecutor and a progress bar UI for the command line.

First, install the requirements using

pip install git+git://github.com/titipata/pubmed_parser.git six numpy tqdm

Now download this script, ensure some files like pubmed20n0002.xml.gz or pubmed20n0004.xml.gz are in the same directory and run it:

#!/usr/bin/env python3
import pubmed_parser as pp
import glob
import os
from collections import Counter
import concurrent.futures
from tqdm import tqdm

# Source: https://techoverflow.net/2017/05/18/how-to-use-concurrent-futures-map-with-a-tqdm-progress-bar/
def tqdm_parallel_map(executor, fn, *iterables, **kwargs):
    """
    Equivalent to executor.map(fn, *iterables),
    but displays a tqdm-based progress bar.
    
    Does not support timeout or chunksize as executor.submit is used internally
    
    **kwargs is passed to tqdm.
    """
    futures_list = []
    for iterable in iterables:
        futures_list += [executor.submit(fn, i) for i in iterable]
    for f in tqdm(concurrent.futures.as_completed(futures_list), total=len(futures_list), **kwargs):
        yield f.result()

def parse_and_process_file(filename):
    """
    This function contains our parsing code. Usually, you would only modify this function.
    """
    # Don't parse authors and references for this example, since we don't need it
    dat = pp.parse_medline_xml(filename, author_list=False, reference_list=False)

    # For this example, we'll build a set of count of all MeSH IDs in this file
    ctr = Counter()
    for entry in dat:
        terms = [term.partition(":")[0].strip() for term in entry["mesh_terms"].split(";")]
        for term in terms:
            ctr[term] += 1
    return filename, ctr


if __name__ == "__main__":
    # Find all pubmed files in the current directory
    all_filenames = glob.glob("pubmed*.xml.gz")
    # For some workloads you might want to use a ThreadPoolExecutor,
    # but a ProcessPoolExecutor is a good default
    executor = concurrent.futures.ProcessPoolExecutor(os.cpu_count())
    # Iterate results as they come in (the order is not the same as in the input!)
    for filename, ctr in tqdm_parallel_map(executor, parse_and_process_file, all_filenames):
        # NOTE: If you print() here, this might interfere with the progress bar,
        # but we accept that here since it's just an example
        print(filename, ctr)

Now you can start modifying the example, most notably the parse_and_process_file() function to do whatever processing you intend to do.

Posted by Uli Köhler in Bioinformatics, Python

How to parse PubMed baseline data using Python

Want to parse more than one PubMed file? Also see our follow-up at How to parse all PubMed baseline files in parallel using Python

PubMed provides a data dump of metadata from all PubMed articles on the NCBI Servers.

In this example, we’ll parse one of the compressed metadata XML files using the pubmed_parser Python library.

First, download one of the .xml.gz files. For this example, we’ll use pubmed20n0001.xml.gz.

Now we can install the required libraries:

pip install git+git://github.com/titipata/pubmed_parser.git six numpy

Now you can download & run our script. For this example, we will extract a list of MeSH terms for every  and print PubMed ID and a list of MeSH IDs.

#!/usr/bin/env python3
import pubmed_parser as pp

# Don't parse authors and references for this example, since we don't need it
dat = pp.parse_medline_xml("pubmed20n0001.xml.gz", author_list=False, reference_list=False)
# Iterate PubMed entries from that file
for entry in dat:
    # entry["mesh_terms"] is like "D000818:Animal; ..."
    # In this example, we are only interested in the MeSH ID, like D000818.
    # Print the PubMed ID, followed by a list of MeSH terms.
    print(entry["pmid"], [
        term.partition(":")[0].strip() for term in entry["mesh_terms"].split(";")
    ])

Running this script takes 13.3 seconds on my Notebook which is equivalent to about 1.4 MBytes of GZipped input data per second. When running the script, you will see lines like

30957 ['D000319', 'D001794', 'D003864', 'D006801', 'D006973', 'D007676', 'D010869']

which means that the PubMed Article with ID 30957 has the MeSH terms ['D000319', 'D001794', 'D003864', 'D006801', 'D006973', 'D007676', 'D010869'].

See the pubmed_parser documentation, or just try it out interactively for more information on what fields are available in the entries you can iterate.

Posted by Uli Köhler in Bioinformatics, Python

How to install PyPy3 + virtual environment in 30 seconds

TL;DR:

Run this

wget -qO- https://techoverflow.net/scripts/pypy3-installer.sh | bash

then run vpypy every time you want to activate (you might need to restart). The script currently assumes you are running Linux x86_64 and have installed virtualenv (sudo apt install virtualenv or similar if you don’t have it installed)

Full description:

PyPy is an alternate Python implementation that can be used to speed up many workloads. However, installing it is a somewhat cumbersome process, especially if you don’t have too much experience with virtual environments and related concepts.

We provide a script that automatically downloads PyPy3, installs it to ~/.pypy3 and creates a virtual environment in ~/.pypy3-virtualenv. After that, it creates a shell alias vpypy that aliases to source ~/.pypy3-virtualenv/bin/activate and hence provides an easily memoizable way of activating the environment without requiring the user to memoize the directory.

Also, since both pypy3 itself and the virtual environment are  installed in the user’s home directory, running this script does not require admin permissions.

After running the script using

wget -qO- https://techoverflow.net/scripts/pypy3-installer.sh | bash

you can activate the virtual environment using the vpypy alias that is automatically added to ~/.bashrc and ~/.zshrc. Restart your shell for the alias definition to load, then run vpypy:

uli@uli-laptop ~ % vpypy
(.pypy3-virtualenv) uli@uli-laptop ~ % 

You can see that the prompt has changed. Now you can use pip (which will install packages locally to the PyPy3 virtualenv), python (which maps to pypy3) and other related executables. In order to run a script using PyPy, just run python myscript.py

Full source code:

#!/bin/bash
# TechOverflow's 30-second Pypy3 virtual environment generator
# This script is released under CC0 1.0 Universal
DIRECTORY=~/.pypy3
VENV_DIRECTORY=~/.pypy3-virtualenv
VERSION=pypy3.6-v7.3.0-linux64

# Download (or use existing) pypy3
if [ -d "$DIRECTORY" ]; then
    echo "Skipping PyPy download, already exists"
else
    echo "Downloading PyPy to $DIRECTORY"
    # Download & extract to DIRECTORY
    wget https://techoverflow.net/downloads/${VERSION}.tar.bz2 -O /tmp/${VERSION}.tar.bz2
    bash -c "cd /tmp && tar xjvf ${VERSION}.tar.bz2"
    mv /tmp/${VERSION} $DIRECTORY
    rm /tmp/${VERSION}.tar.bz2
fi

# Create virtualenv
if [ -d "$VENV_DIRECTORY" ]; then
    echo "Skipping to create pypy3 virtualenv, already exists"
else
    echo "Creating PyPy virtual environment in $VENV_DIRECTORY"
    virtualenv -p ${DIRECTORY}/bin/pypy3 ${VENV_DIRECTORY}
fi

# Create "vpypy" shortcut
set -x
vpypy
result="$?"
set +x
if [ "$result" -ne 127 ]; then
    echo "Skipping to create vpypy shortcut, already exists in current shell"
else
    echo "Creating bash/zsh shortcut 'vpypy'"
    if [ -f ~/.bashrc ]; then
        echo -e "\n# TechOverflow PyPy installer\nalias vpypy='source ${VENV_DIRECTORY}/bin/activate'\n" >> ~/.bashrc
    fi
    if [ -f ~/.zshrc ]; then
        echo -e "\n# TechOverflow PyPy installer\nalias vpypy='source ${VENV_DIRECTORY}/bin/activate'\n" >> ~/.zshrc
    fi
    # Activate shortcut in current shell (but do not automatically activate virtual environment)
    alias vpypy='source ${VENV_DIRECTORY}/bin/activate'
fi

echo -e "\n\nPyPy installation finished. Restart your shell, then run 'vpypy' to activate the virtual environment"

 

 

Posted by Uli Köhler in Linux, Python

Is pypng 16-bit PNG encoding faster using pypy on the Raspberry Pi?

In our previous post How to save Raspberry Pi raw 10-bit image as 16-bit PNG using pypng we investigated how to use the pypng library to save 10-bit raw Raspberry Pi Camera images to 16-bit PNG files.

However, saving a single image took ~26 seconds using CPython 3.7.3. Since pypy can provide speedups to many Python workloads, we tried using pypy3 7.0.0 (see How to install pypy3 on the Raspberry Pi) to speed up the PNG encoding.

Results

pypng PNG export seems to be one of the workloads that are much slower using pypy3.

  • CPython 3.7.3: Encoding took 24.22 seconds
  • pypy3 7.0.0: Encoding took 266.60 seconds

Encoding is more that 10x slower when using pypy3!

Hence I don’t recommend using pypy3 to speed up pypng encoding workloads, at least not on the Raspberry Pi!

Full example

This example is derived from our full example previously posted on How to save Raspberry Pi raw 10-bit image as 16-bit PNG using pypng:

#!/usr/bin/env python3
import time
import picamera
import picamera.array
import numpy as np
import png

# Capture image
print("Capturing image...")
with picamera.PiCamera() as camera:
    with picamera.array.PiBayerArray(camera) as stream:
        camera.capture(stream, 'jpeg', bayer=True)
        # Demosaic data and write to rawimg
        # (stream.array contains the non-demosaiced data)
        rawimg = stream.demosaic()

# Write to PNG
print("Writing 16-bit PNG...")
t0 = time.time()
with open('16bit.png', 'wb') as outfile:
    writer = png.Writer(width=rawimg.shape[1], height=rawimg.shape[0], bitdepth=16, greyscale=False)
    # rawimg is a (w, h, 3) RGB uint16 array
    # but PyPNG needs a (w, h*3) array
    png_data = np.reshape(rawimg, (-1, rawimg.shape[1]*3))
    # Scale 10 bit data to 16 bit values (else it will appear black)
    # NOTE: Depending on your photo and the settings,
    #  it might still appear quite dark!
    png_data *= int(2**6)
    writer.write(outfile, png_data)
t1 = time.time()

print(f"Encoding took {(t1 - t0):.2f} seconds")

 

Posted by Uli Köhler in Python, Raspberry Pi

How to install pypy3 on the Raspberry Pi

This post shows you an easy way of getting pypy3 running on the Raspberry Pi. I used Raspbian Buster on a Raspberry Pi 3 for this example. On Raspbian buster this will install pypy3 7.x!

First install pypy3 and virtualenv:

sudo apt update && sudo apt -y install pypy3 pypy3-dev virtualenv

Now we can create a virtualenv to install pypy packages into:

virtualenv -p /usr/bin/pypy3 ~/pypy3-virtualenv

Now we can activate the virtualenv. You need to do this every time you want to use pypy, for each shell / SSH connection separately:

source ~/pypy3-virtualenv/bin/activate

If your shell prompt is now prefixed by (pypy3-virtualenv) you have successfully activated the virtualenv:

(pypy3-virtualenv) uli@raspberrypi:~ $

Now python points to pypy3 and pip will install packages locally to ~/pypy3-virtualenv.

Now you can use e.g.

python3 myscript.py

to run your script (both python and python3 will point to pypy3 if you activated the virtual environment!).

Note: Installing pypy3-dev is not strictly neccessary to get pypy3 running, but you need it in order to compile native librarie like numpy.

Posted by Uli Köhler in Python, Raspberry Pi

How to capture RaspberryPi camera 10-bit raw image in Python

You can use the picamera Python library to capture a raw sensor image of a camera attached to the Raspberry Pi via CSI:

#!/usr/bin/env python3
import picamera
import picamera.array
import numpy as np

# Capture image
print("Capturing image...")
with picamera.PiCamera() as camera:
    with picamera.array.PiBayerArray(camera) as stream:
        camera.capture(stream, 'jpeg', bayer=True)
        # Demosaic data and write to rawimg
        # (stream.array contains the non-demosaiced data)
        rawimg = stream.demosaic()

rawimg is a numpy uint16 array of dimensions (w, h, 3), e.g. (1944, 2592, 3) and contains integer values from 0 to 1023.

You can, for example, save it in a NumPy file using

np.save("rawimg.npy", rawimg) # Reload with np.load("rawimg.npy")

or save it in a compressed format using

np.savez_compressed("rawimg.npz", rawimg) # Reload with np.load("rawimg.npz")
Posted by Uli Köhler in Python, Raspberry Pi

How to fix ModuleNotFoundError: No module named ‘picamera’

Problem:

You want to run a Python script using the Raspberry Pi camera but you see an error message like

Traceback (most recent call last):
  File "mycamera.py", line 2, in <module>
    import picamera
ModuleNotFoundError: No module named 'picamera'

Solution:

You need to install the picamera Python module using pip:

sudo pip3 install picamera

or, if you are still using Python 2.x:

sudo pip install picamera

In case you see

sudo: pip3: command not found

install pip3 using

sudo apt install -y python3-pip

 

Posted by Uli Köhler in Python, Raspberry Pi

Simulating survival data for Kaplan-Meier plots in Python

Libraries like lifelines provide a plethora of example datasets that one can work with. However, for many tasks you need to simulate specific behaviour in survival curves.

In this post, we demonstrate a simple algorithm to generate survival data in a format comparable to the one used in the lifelines example datasets like load_leukemia().

The generation algorithm is based on the following assumptions:

  • There is a strict survival plateau with a given survival probability starting at a given point in time
  • The progression from 100% survival, t=0 to the survival plateau is approximately linear (i.e. if you would generate an infinite number of datapoints, the survival curve would be linear)
  • No censoring events shall be generated except for censoring all surviving participants at the end point of the timeline.

Code:

import numpy as np
import random
from lifelines import KaplanMeierFitter

def simulate_survival_data_linear(N, survival_plateau, t_plateau, t_end):
    """
    Generate random simulated survival data using a linear model
    
    Keyword parameters
    ------------------
    N : integer
        Number of entries to generate
    survival_plateau : float
        The survival probability of the survival plateau
    t_plateau : float
        The time point where the survival plateau starts
    t_end : float
        The time point where all surviving participants will be censored.
    
    Returns
    -------
    A dict with "Time" and "Event" numpy arrays: 0 is censored, 1 is event
    """
    data = {"Time": np.zeros(N), "Event": np.zeros(N)}

    for i in range(N):
        r = random.random()
        if r <= survival_plateau:
            # Event is censoring at the end of the time period
            data["Time"][i] = t_end
            data["Event"][i] = 0
        else: # Event occurs
            # Normalize where we are between 100% and the survival plateau
            p = (r - survival_plateau) / (1 - survival_plateau)
            # Linear model: Time of event linearly depends on uniformly & randomly chosen position
            #  in range (0...tplateau)
            t = p * t_plateau
            data["Time"][i] = t
            data["Event"][i] = 1
    return data

# Example usage
data1 = simulate_survival_data_linear(250, 0.2, 18, 24)
data2 = simulate_survival_data_linear(250, 0.4, 17.2, 24)

Given data1 and data2 (see the usage example at the end of the code) you can plot them using

# Plot bad subgroup
kmf1 = KaplanMeierFitter()
kmf1.fit(data1["Time"], event_observed=data1["Event"], label="Bad subgroup")
ax = kmf1.plot()

# Plot good subgroup
kmf2 = KaplanMeierFitter()
kmf2.fit(data2["Time"], event_observed=data2["Event"], label="Good subgroup")
ax = kmf2.plot(ax=ax)

# Set Y axis to fixed scale
ax.set_ylim([0.0, 1.0])

Thi

Do not want a survival plateau?

Just set t_end = t_survival:

# Example usage
data1 = simulate_survival_data_linear(250, 0.2, 24, 24)
data2 = simulate_survival_data_linear(250, 0.4, 24, 24)
# Code to plot: See above

What happens if you have a low number of participants?

Let’s use 25 instead of 250 as above:

# Example usage
data1 = simulate_survival_data_linear(25, 0.2, 24, 24)
data2 = simulate_survival_data_linear(25, 0.4, 24, 24)
# Plot code: See above

Although we generated the data with the same data, the difference is much less clear in this example, especially towards the end of the timeline (note however that the data is generated randomly, so you might see a different result). You can see a large portion of the confidence intervals overlappings near t=24. In other words, based on this data it is not clear that the two groups of patients are significantly different (in other words, P \geq 0.05)

Posted by Uli Köhler in Python, Statistics

How to use custom legend label in lifelines Kaplan-Meier plot?

Using the lifelines library, you can easily plot Kaplan-Meier plots, e.g. as seen in our previous post Minimal Python Kaplan-Meier Plot example:

from lifelines.datasets import load_leukemia
from lifelines import KaplanMeierFitter
df = load_leukemia()

kmf = KaplanMeierFitter()
kmf.fit(df['t'], df['Rx']) # t = Timepoints, Rx: 0=censored, 1=event
kmf.plot()

What if you want a custom label instead of KM_estimates to appear in the legend?

Use kmf.fit(..., label='Your label'). Since we use the leukemias dataset for this example, we use the label 'Leukemia'

Full example:

from lifelines.datasets import load_leukemia
from lifelines import KaplanMeierFitter
# Load datasets
df_leukemia = load_leukemia()

# Fit & plot leukemia dataset
kmf_leukemia = KaplanMeierFitter()
kmf_leukemia.fit(df_leukemia['t'], df_leukemia['Rx'], label="Leukemia")
ax = kmf_leukemia.plot()
# Set Y axis to fixed scale
ax.set_ylim([0.0, 1.0])

Posted by Uli Köhler in Python, Statistics

How plot multiple Kaplan-Meier curves using lifelines

Using the lifelines library, you can easily plot Kaplan-Meier plots, e.g. as seen in our previous post Minimal Python Kaplan-Meier Plot example:

from lifelines.datasets import load_leukemia
from lifelines import KaplanMeierFitter
df = load_leukemia()

kmf = KaplanMeierFitter()
kmf.fit(df['t'], df['Rx']) # t = Timepoints, Rx: 0=censored, 1=event
kmf.plot()

What if you want to plot multiple survival curves?

The call to kmf.plot() returns a Matplotlib ax object which you can use on a second kmf2.plot() call als argument: kmf2.plot(ax=ax).

Full example (note that we also set a fixed Y range of [0.0, 1.0], see How to make Y axis start from 0 in lifelines Kaplan-Meier plots):

from lifelines.datasets import load_leukemia, load_lymphoma
from lifelines import KaplanMeierFitter
# Load datasets
df_leukemia = load_leukemia()
df_lymphoma = load_lymphoma()

# Fit & plot leukemia dataset
kmf_leukemia = KaplanMeierFitter()
kmf_leukemia.fit(df_leukemia['t'], df_leukemia['Rx'], label="Leukemia")
ax = kmf_leukemia.plot()

# Fit & plot lymphoma dataset
kmf_lymphoma = KaplanMeierFitter()
kmf_lymphoma.fit(df_lymphoma['Time'], df_lymphoma['Censor'], label="Lymphoma")
ax = kmf_lymphoma.plot(ax=ax)

# Set Y axis to fixed scale
ax.set_ylim([0.0, 1.0])

Posted by Uli Köhler in Python, Statistics

How to make Y axis start from 0 in lifelines Kaplan-Meier plots

Using the lifelines library, you can easily plot Kaplan-Meier plots, e.g. as seen in our previous post Minimal Python Kaplan-Meier Plot example:

from lifelines.datasets import load_leukemia
from lifelines import KaplanMeierFitter
df = load_leukemia()

kmf = KaplanMeierFitter()
kmf.fit(df['t'], df['Rx']) # t = Timepoints, Rx: 0=censored, 1=event
kmf.plot()

What if you want to make the Y axis start from 0.0 and not from approx. 0.2 as in this example?

Remember that lifelines just calls matplotlib internally and km.plot() returns the ax object that you can use to manipulate the plot. In this specific case, you can use

ax.set_ylim([0.0, 1.0])

to stop autoranging the Y axis and set its range to fixed [0.0, 1.0].

Full example:

from lifelines.datasets import load_leukemia
from lifelines import KaplanMeierFitter
df = load_leukemia()

kmf = KaplanMeierFitter()
kmf.fit(df['t'], df['Rx']) # t = Timepoints, Rx: 0=censored, 1=event
ax = kmf.plot()
# Set Y axis range to [0.0, 1.0]
ax.set_ylim([0.0, 1.0])

Posted by Uli Köhler in Python, Statistics

How to hide confidence interval in lifelines Kaplan-Meier plots

Using the lifelines library, you can easily plot Kaplan-Meier plots, e.g. as seen in our previous post Minimal Python Kaplan-Meier Plot example:

from lifelines.datasets import load_leukemia
from lifelines import KaplanMeierFitter
df = load_leukemia()

kmf = KaplanMeierFitter()
kmf.fit(df['t'], df['Rx']) # t = Timepoints, Rx: 0=censored, 1=event
kmf.plot()

What if you just want to show the dark blue curve and hide the light-blue confidence interval?

Easy: Just use km.plot(ci_show=False). ci in ci_show means confidence interval. Example:

from lifelines.datasets import load_leukemia
from lifelines import KaplanMeierFitter
df = load_leukemia()

kmf = KaplanMeierFitter()
kmf.fit(df['t'], df['Rx']) # t = Timepoints, Rx: 0=censored, 1=event
kmf.plot(ci_show=False)

Posted by Uli Köhler in Python, Statistics

Minimal Python Kaplan-Meier Plot example

Install the lifelines library:

sudo pip3 install lifelines

Now you can use this snippet to create a basic Kaplan-Meier plot from an example dataset included with the library:

from lifelines.datasets import load_leukemia
from lifelines import KaplanMeierFitter
# Load example dataset
df = load_leukemia()

# Create model from data
kmf = KaplanMeierFitter()
kmf.fit(df['t'], df['Rx']) # t = Timepoints, Rx: 0=censored, 1=event
# Plot model's survival curve
kmf.plot()

Note that different datasets might have different names for the time column (t in this example) and the event/censoring column (Rx in this example)

We exported the plot using

import matplotlib.pyplot as plt 
plt.savefig("Kaplan-Meier-Example-Leukemias.svg")

 

Posted by Uli Köhler in Python, Statistics

How to sort files by modification date in Python

You can use sorted() to sort the files, and use key=lambda t: os.stat(t).st_mtime to sort by modification time. This will sort with ascending modification time (i.e. most recently modified file last).

In case you want to sort with descending modification time (i.e. most recently modified file first), use key=lambda t: -os.stat(t).st_mtime

Full example:

# List all files in the home directory
files = glob.glob(os.path.expanduser("~/*"))

# Sort by modification time (mtime) ascending and descending
sorted_by_mtime_ascending = sorted(files, key=lambda t: os.stat(t).st_mtime)
sorted_by_mtime_descending = sorted(files, key=lambda t: -os.stat(t).st_mtime)

 

Posted by Uli Köhler in Python

How to read str from binary file in Python

When you have a file-like object in Python, .read() will always return bytes. You can use any of the following solutions to get a str from the binary file.

Option 1: Decode the bytes

You can call .decode() on the bytes. Ensure to use the right encoding. utf-8 is often correct if you don’t know what the encoding is, but

binary = myfile.read() # type: bytes
text = binary.decode("utf-8")

# Short version
text = myfile.read().decode("utf-8")

Option 2: Wrap the file so it appears like a file in text mode

Use io.TextIOWrapper like this:

import io

text_file = io.TextIOWrapper(myfile, encoding="utf-8")
text = text_file.read()
Posted by Uli Köhler in Python

Wrap binary file-like object to get decoded text file-like object in Python

Problem:

In Python, you have a file-like object that reads binary data (e.g. if you open a file using open("my-file", "rb"))

You want to pass that file-like object to a function that expects a file-like object that expects a file-like object in text mode (i.e. where you can read str from, not bytes).

Solution:

Use io.TextIOWrapper:

with open("fp-lib-table", "rb") as infile:
    # infile.read() would return bytes
    text_infile = io.TextIOWrapper(infile)
    # text-infile.read() would return a str

In case you need to use a specific encoding, use encoding=...:

text_infile = io.TextIOWrapper(infile, encoding="utf-8")

 

Posted by Uli Köhler in Python

Reading and writing KiCAD libraries in Python

This snippet provides a parser and serializer for KiCAD schematic symbol libraries and DCM symbol documentation libraries. It will parse the KiCAD libraries into a number of records, consisting of a list of lines. An extended version of this snippet has been used in my KiCADSamacSysImporter project

__author__ = "Uli Köhler"
__license__ = "CC0 1.0 Universal"

class KiCADDocLibrary(object):
    def __init__(self, records=[]):
        self.records = records
        
    @staticmethod
    def read(file):
        # A record is everything between '$CMP ...' line and '$ENDCMP' line
        records = []
        current_record = None
        for line in file:
            line = line.strip()
            if line.startswith('$CMP '):
                current_record = []
            # Add line to record if we have any current record
            if current_record is not None:
                current_record.append(line)
            if line.startswith("$ENDCMP"):
                if current_record is not None:
                    records.append(current_record)
                    current_record = None
        return KiCADDocLibrary(records)
        
    def write(self, out):
        # Write header
        out.write("EESchema-DOCLIB  Version 2.0\n")
        # Write records
        for rec in self.records:
            out.write("#\n")
            for line in rec:
                out.write(line)
                out.write("\n")
        # Write footer
        out.write("#\n#End Doc Library\n")        

        
class KiCADSchematicSymbolLibrary(object):
    def __init__(self, records=[]):
        self.records = records
        
    @staticmethod
    def read(file):
        # A record is everything between 'DEF ...' line and 'ENDDEF' line
        records = []
        current_record = None
        current_comment_lines = [] # 
        for line in file:
            line = line.strip()
            if line.startswith("#encoding"):
                continue # Ignore - we always use #encoding utf-8
            # Comment line processing
            if line.startswith("#"):
                current_comment_lines.append(line)
            # Start of record
            if line.startswith('DEF '):
                current_record = current_comment_lines
                current_comment_lines = []
            # Add line to record if we have any current record
            if current_record is not None:
                current_record.append(line)
            if line.startswith("ENDDEF"):
                if current_record is not None:
                    records.append(current_record)
                    current_record = None
            # Clear comment lines
            # We can only do this now to avoid clearing them before
            #  they are used in the DEF clause
            if not line.startswith("#"):
                current_comment_lines = []
        return KiCADSchematicSymbolLibrary(records)
        
    def write(self, out):
        # Write header
        out.write("EESchema-LIBRARY Version 2.4\n#encoding utf-8\n")
        # Write records
        for rec in self.records:
            for line in rec:
                out.write(line)
                out.write("\n")
        # Write footer
        out.write("#\n#End Library\n")        

 

Posted by Uli Köhler in Electronics, KiCAD, Python

Python example: List files in ZIP archive

This example lists all the filenames contained in a ZIP file using Python’s built-in zipfile library.

#!/usr/bin/env python3
import zipfile

def list_files_in_zip(filename):
    """
    List all files inside a ZIP archive
    Yields filename strings.
    """
    with zipfile.ZipFile(filename) as thezip:
        for zipinfo in thezip.infolist():
            yield zipinfo.filename
                
# Usage example
for filename in list_files_in_zip("myarchive.zip"):
    print(filename)

 

Posted by Uli Köhler in Python