A GeneOntology OBO v1.4 parser in Python

The GeneOntology Consortium provides bulk data download for the GO terms in the OBO v1.2 format.

If you Google GO OBO parser, there is something missing. You can easily find parsers in Perl, parsers in Java, but not even BioPython has a parser in Python. The format itself, however seems like it’s tailor-made for Python’s generator concept. Only a few SLOCs are needed to get it work without storing everything in RAM.

I used this parser in a prototype project that allows to search GO interactively (it’s fast). I’m not sure when/if I’ll publish that, but here is the parser code.

Continue reading →

Posted by Uli Köhler in Bioinformatics, Python

Bash: TCP/Internet connection handling

There’re many problems nowadays, which could easier be solved through the internet. In this post I descripe how to address these problems by bash alone.

The standard way to connect to a server in the internet, is to embed the connection stream

exec 3<>/dev/tcp/$server/$ircPort || echo "Some text or doing, if connecting failed"

Continue reading →

Posted by Yann Spöri in Allgemein

Installing Konsole Solarized Theme

Problem:

You’re using the KDE4 Konsole and you want to install the Solarized color scheme plugin. However, you are way too lazy to figure out how to do that manually.

Continue reading →

Posted by Uli Köhler in Allgemein

gtf2gff.py: A replacement for gtf2gff.pl

Recently we had to work with the gtf2gff.pl tool to convert CONTRAST and TwinScan GTF output to the GFF format which can be read by many annotation tools.

Working with that script was really hard, it did not report errors at all, plus it is not programmatically reusable at all. There are different versions of the perl script on the internet, but what we needed was a standardized, short, readable version that does proper command line parsing using a standard tool like argparse and a conversion function that is usable from other scripts.

Continue reading →

Posted by Uli Köhler in Allgemein

C++: Iterating lines in a GZ file using boost::iostreams

Problem:

You’ve got a gzipped file that you want to decompress using C++. You don’t want to use pipes to gzip in an external process. You don’t want to use zlib and manual buffering either.

Continue reading →

Posted by Uli Köhler in C/C++

Reading QUASAR scoring matrices in C++

Problem:

You want to read alignment matrices like BLOSUM62 in the QUASAR format. The solution needs to be integrable into C++ code easily.

Continue reading →

Posted by Uli Köhler in Bioinformatics, C/C++

A simple tool for FASTA statistics

The issue

It is surprisingly difficult to compute simple statistics of FASTA files using existing software. I recently needed to compute the nucleotide count and relative GC frequency of a single sequence in FASTA format, but unless you install dependency-heavy native software like FASTX or you develop it by yourself using BioPython or similar, there doesn’t seem to be a simple, dependency-free solution for this simple set of problem.

Continue reading →

Posted by Uli Köhler in Bioinformatics, Python

Docker: Remove all images and containers

Problem:

You use Docker, but working with it created lots of images and containers. You want to remove all of them to save disk space.

Continue reading →

Posted by Uli Köhler in Container, Docker

Java FX: How to set the column constraint property using FXML

Problem:

You have a Java FXML file document with a TableView and want to set the columnResizePolicy of this TableView in the fxml document.

Continue reading →

Posted by Yann Spöri in Allgemein

A simple mmap() readonly example

Problem:

You want to use mmap() from sys/stat.h POSIX header to map a file for reading (not writing). You can’t find any simple bare example on the internet.

Continue reading →

Posted by Uli Köhler in C/C++

How to create msgpack DEB packages

Problem:

You want to create a binary DEB package of the msgpack C++ binding. However, there is no official DEB package available.

Continue reading →

Posted by Uli Köhler in Linux

Checking if Hugepages are enabled in Linux

Problem:

On your Linux system, you want to check whether transparent hugepages are enabled on your system.

Solution:

It’s pretty simple:

cat /sys/kernel/mm/transparent_hugepage/enabled

You will get an output like this:

always [madvise] never

You’ll see a list of all possible options ( always, madvise, never ), with the currently active option being enclosed in brackets.madvise is the default.

Continue reading →

Posted by Uli Köhler in Linux, Performance

How to build libsodium (NaCl) DEB packages

Problem:

You want to create a binary DEB package of libsodium. However, there is no official DEB package available for the latest version.

Continue reading →

Posted by Uli Köhler in Cryptography, Linux

User Mode Linux for Beginners – Setup and first VM

Why another UML tutorial?

This is not the first tutorial on UML — there are hundreds of them publicly available on the internet. However, none of them seems to fulfill my requirements:

  • All my computers and servers (= UML hosts) run on x86_64, not on i386.
  • Use an up-to date (self-compiled) UML version, so you can use the latest features
  • One simple set of scripts. Execute them in the correct order to get it up and running – no config file editing etc.
  • Root should not be required (disregarding debootstrap) —> No /mnt mounts or similar
  • No nasty filesystem image that eats up space on the host and limits resources on the UML VM. HostFS eliminates all those disadvantages – if you want to limit VM space, use quotas.
  • No global state (e.g. mounting sth into /mnt), you should get another VM up and running concurrently by executing the setup scripts in another directory.
  • Step-by-step instructions, as automated and portable as possible, but still made to be read by humans.
  • No expansive techtalk. There are reference documentations out there for techtalk. A tutorial should tell you how to get it working, not why it works.

Continue reading →

Posted by Uli Köhler in Container, Linux

How to use mkdir() from sys/stat.h

Problem:

You want to use the mkdir() function from the sys/stat.h POSIX header, but you don’t know what the mode_t argument should look like.

Continue reading →

Posted by Uli Köhler in C/C++

Building LevelDB Debian (.deb) packages

Problem:

You intend to install LevelDB, but you don’t want to manually install & compile it as described here.

Instead, you just want to use the debian packaging system and some reproducible method of creating a DEB package from LevelDB.

Reasons for preferring not to compile & install manually could be:

  • You want to deploy LevelDB to one ore more environments that don’t have a complete build environment
  • You prefer a clean install-uninstall-purge package lifetime management
  • You need a reproducible process to deploy LevelDB

Continue reading →

Posted by Uli Köhler in Allgemein

mmap with Boost IOStreams: A minimalist’s example

The following C++ program uses boost::iostreams to memory-map a file, read it’s content into a std::string and print it to cout.

It provides a minimal example of how to use the boost::iostreams portable mmap functionality.

//Compile like this: g++ -o mmap mmap.cpp -lboost_iostreams
#include <boost/iostreams/device/mapped_file.hpp>
#include <iostream>
#include <string>
using namespace std;
using namespace boost::iostreams;

int main(int argc, char** argv) {
   //Initialize the memory-mapped file
   mapped_file_source file(argv[1]);
   //Read the entire file into a string
   string fileContent(file.data(), file.size());
   //Print the string
   cout << fileContent;
   //Cleanup
   file.close();
}

Also see A simple mmap() readonly example

Posted by Uli Köhler in C/C++

Reading TAR files in C++

This article describes a method of  reading TAR archives (including .tar.gz and .tar.bz2) in C++ using Boost IOStreams.

You could use libtar for this, but the original version hasn’t been updated since 2003 and doesn’t provide you flexibility and insight to the internal structure of a TAR archive. Continue reading →

Posted by Uli Köhler in Algorithms, C/C++
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Cookie settingsACCEPTPrivacy &amp; Cookies Policy