Programming languages

Center star approximation: Identifying the center string in Python

Problem:

You need to calculate the center star approximation for a given set of sequences. Instead of calculating the sequence distances and center string by hand, you want the computer to do the hard work.

Continue reading →

Posted by Uli Köhler in Bioinformatics, pandas, Python

Parsing NCBI GeneInfo in Python

Problem:

You need to parse files in the NCBI GeneInfo format, like those that can be downloaded from the NCBI FTP GENE_INFO directory, in Python. You want to avoid any dependencies.

Continue reading →

Posted by Uli Köhler in Bioinformatics, Python

A simple GFF3 parser in Python

Problem:

You need to parse a GFF3 file containing information about sequence features. You prefer to use a minimal, depedency-free solution instead of importing the GFF3 data into a database right away. However, you need to have a standard-compatible parser

Continue reading →

Posted by Uli Köhler in Bioinformatics, Python

Haskell: Compress with GZip and write to file

Problem:

In Haskell, you want to gzip-compress a string and write it to a file.

Continue reading →

Posted by Uli Köhler in Haskell

A GeneOntology OBO v1.4 parser in Python

The GeneOntology Consortium provides bulk data download for the GO terms in the OBO v1.2 format.

If you Google GO OBO parser, there is something missing. You can easily find parsers in Perl, parsers in Java, but not even BioPython has a parser in Python. The format itself, however seems like it’s tailor-made for Python’s generator concept. Only a few SLOCs are needed to get it work without storing everything in RAM.

I used this parser in a prototype project that allows to search GO interactively (it’s fast). I’m not sure when/if I’ll publish that, but here is the parser code.

Continue reading →

Posted by Uli Köhler in Bioinformatics, Python

C++: Iterating lines in a GZ file using boost::iostreams

Problem:

You’ve got a gzipped file that you want to decompress using C++. You don’t want to use pipes to gzip in an external process. You don’t want to use zlib and manual buffering either.

Continue reading →

Posted by Uli Köhler in C/C++

Reading QUASAR scoring matrices in C++

Problem:

You want to read alignment matrices like BLOSUM62 in the QUASAR format. The solution needs to be integrable into C++ code easily.

Continue reading →

Posted by Uli Köhler in Bioinformatics, C/C++

A simple tool for FASTA statistics

The issue

It is surprisingly difficult to compute simple statistics of FASTA files using existing software. I recently needed to compute the nucleotide count and relative GC frequency of a single sequence in FASTA format, but unless you install dependency-heavy native software like FASTX or you develop it by yourself using BioPython or similar, there doesn’t seem to be a simple, dependency-free solution for this simple set of problem.

Continue reading →

Posted by Uli Köhler in Bioinformatics, Python

A simple mmap() readonly example

Problem:

You want to use mmap() from sys/stat.h POSIX header to map a file for reading (not writing). You can’t find any simple bare example on the internet.

Continue reading →

Posted by Uli Köhler in C/C++

How to use mkdir() from sys/stat.h

Problem:

You want to use the mkdir() function from the sys/stat.h POSIX header, but you don’t know what the mode_t argument should look like.

Continue reading →

Posted by Uli Köhler in C/C++

mmap with Boost IOStreams: A minimalist’s example

The following C++ program uses boost::iostreams to memory-map a file, read it’s content into a std::string and print it to cout.

It provides a minimal example of how to use the boost::iostreams portable mmap functionality.

//Compile like this: g++ -o mmap mmap.cpp -lboost_iostreams
#include <boost/iostreams/device/mapped_file.hpp>
#include <iostream>
#include <string>
using namespace std;
using namespace boost::iostreams;

int main(int argc, char** argv) {
   //Initialize the memory-mapped file
   mapped_file_source file(argv[1]);
   //Read the entire file into a string
   string fileContent(file.data(), file.size());
   //Print the string
   cout << fileContent;
   //Cleanup
   file.close();
}

Also see A simple mmap() readonly example

Posted by Uli Köhler in C/C++

Reading TAR files in C++

This article describes a method of  reading TAR archives (including .tar.gz and .tar.bz2) in C++ using Boost IOStreams.

You could use libtar for this, but the original version hasn’t been updated since 2003 and doesn’t provide you flexibility and insight to the internal structure of a TAR archive. Continue reading →

Posted by Uli Köhler in Algorithms, C/C++

Simple C++ HTTP download using libcurl easy API

Problem

Using the libcurl easy API you want to download a file using HTTP GET. No extended features such as authentication shall be used.

The download result shall be stored in a std::string

Continue reading →

Posted by Uli Köhler in C/C++, Networking

How to compile & install libc++ on Linux

Problem:

You want to compile and install libc++ (sometimes also named libcxx), but CMake complains with this error message

CMake Error at cmake/Modules/MacroEnsureOutOfSourceBuild.cmake:7 (message):
libcxx requires an out of source build. Please create a separate</em>

build directory and run 'cmake /path/to/libcxx [options]' there.
Call Stack (most recent call first):
 CMakeLists.txt:24 (MACRO_ENSURE_OUT_OF_SOURCE_BUILD)
CMake Error at cmake/Modules/MacroEnsureOutOfSourceBuild.cmake:8 (message):
 In-source builds are not allowed.

CMake would overwrite the makefiles distributed with Compiler-RT.
 Please create a directory and run cmake from there, passing the path
 to this source directory as the last argument.
 This process created the file `CMakeCache.txt' and the directory `CMakeFiles'.
 Please delete them.
Call Stack (most recent call first):
 CMakeLists.txt:24 (MACRO_ENSURE_OUT_OF_SOURCE_BUILD)

Continue reading →

Posted by Uli Köhler in Build systems, C/C++, Linux

git svn: Clone latest revision only

Problem:

You want to use git-svn to clone a SVN repository, but you don’t want to clone the entire history (which can be quite slow) but only the latest revision.

Continue reading →

Posted by Uli Köhler in git, Shell, Subversion, Version management

SVN: Find last revision number without cloning

Problem:

You want to find out what the last revision number of a remote subversion repository is without cloning it (e.g. because cloning takes a looong time with subversion).

Continue reading →

Posted by Uli Köhler in Shell, Subversion, Version management