Graph layouting via Graphviz


You want to display a Graph.


Create a simple text file describing your graph and save it with a .dot file extension:

graph {
node1 -- node2;
node2 -- node3;
node3 -- node4;
node4 -- node1;

Afterwards you can use a program of the graphviz package (sudo apt-get install graphviz) in order to visualize the graph. This package contains different layouting programs like dot, neato, fdp (all from the GraphViz project) etc. Simply call one of these programs in order to visualize the graph:

neato -Tsvg -o outputFile.svg

Output from this command:

Yet another Atom Arduino blinker

While experimenting with Atom I produced a minimal example for making my old Arduino Uno blink its LED periodically.

Although there are plenty of examples out there (some of them even work!) I didn’t want to introduce any dependency to the Arduino libraries. Besides making it harder to build (even if arscons is quite nice) it increases the object size and Arduino only supports a limited range of processors. I need the flexibility to use the code on controlles like the ATTiny45

Mehr lesen

Mapping STRING aliases to UniProt IDs

In a recent project, I needed to compare STRING records to other PPI databases. However, this is not always as easy as it sounds, because STRING uses KEGG protein identifiers. Fortunately, at the STRING download page, a list of alias mappings is freely downloadable.

There’s still one major problem left, though: I couldn’t find any documentation about the format. It seems to be somewhat easy once you’ve figured out the basics, but I created a reusable Python function that filters a given organism and outputs a STRING ID, UniProt ID CSV:

Mehr lesen

Filtering STRING PPI dumps by taxonomy

Recently I needed to filter a STRING protein-view database dump (e.g. protein.links.full.v9.05.txt.gz) by taxonomy ID. The original dataset was way too large (it had more than 670 million records).

In order to filter with constant memory (After all, the full STRING dump is 47GB large), I created this script that allows to filter for binary PPIs both matching the given organism (NCBI taxonomy ID), but also allows to filter for binary PPIs with at least one interacting protein of the given organism. Usually this doesn’t really make a difference for STRING.

Mehr lesen A replacement for

Recently we had to work with the tool to convert CONTRAST and TwinScan GTF output to the GFF format which can be read by many annotation tools.

Working with that script was really hard, it did not report errors at all, plus it is not programmatically reusable at all. There are different versions of the perl script on the internet, but what we needed was a standardized, short, readable version that does proper command line parsing using a standard tool like argparse and a conversion function that is usable from other scripts.

Mehr lesen