Problem:
In Haskell, you want to gzip-compress a string and write it to a file.
The GeneOntology Consortium provides bulk data download for the GO terms in the OBO v1.2 format.
If you Google GO OBO parser, there is something missing. You can easily find parsers in Perl, parsers in Java, but not even BioPython has a parser in Python. The format itself, however seems like it’s tailor-made for Python’s generator concept. Only a few SLOCs are needed to get it work without storing everything in RAM.
I used this parser in a prototype project that allows to search GO interactively (it’s fast). I’m not sure when/if I’ll publish that, but here is the parser code.
You’ve got a gzipped file that you want to decompress using C++. You don’t want to use pipes to gzip in an external process. You don’t want to use zlib and manual buffering either.
You want to read alignment matrices like BLOSUM62 in the QUASAR format. The solution needs to be integrable into C++ code easily.
It is surprisingly difficult to compute simple statistics of FASTA files using existing software. I recently needed to compute the nucleotide count and relative GC frequency of a single sequence in FASTA format, but unless you install dependency-heavy native software like FASTX or you develop it by yourself using BioPython or similar, there doesn’t seem to be a simple, dependency-free solution for this simple set of problem.
You want to use mmap()
from sys/stat.h
POSIX header to map a file for reading (not writing). You can’t find any simple bare example on the internet.
You want to use stat()
from sys/stat.h
POSIX header in order to get the size of a file.
You want to use the mkdir()
function from the sys/stat.h
POSIX header, but you don’t know what the mode_t
argument should look like.
The following C++ program uses boost::iostreams
to memory-map a file, read it’s content into a std::string
and print it to cout
.
It provides a minimal example of how to use the boost::iostreams
portable mmap
functionality.
//Compile like this: g++ -o mmap mmap.cpp -lboost_iostreams #include <boost/iostreams/device/mapped_file.hpp> #include <iostream> #include <string> using namespace std; using namespace boost::iostreams; int main(int argc, char** argv) { //Initialize the memory-mapped file mapped_file_source file(argv[1]); //Read the entire file into a string string fileContent(file.data(), file.size()); //Print the string cout << fileContent; //Cleanup file.close(); }
Also see A simple mmap() readonly example
This article describes a method of reading TAR archives (including .tar.gz and .tar.bz2) in C++ using Boost IOStreams.
You could use libtar for this, but the original version hasn’t been updated since 2003 and doesn’t provide you flexibility and insight to the internal structure of a TAR archive. Continue reading →
Using the libcurl easy API you want to download a file using HTTP GET. No extended features such as authentication shall be used.
The download result shall be stored in a std::string
You want to compile and install libc++ (sometimes also named libcxx), but CMake complains with this error message
CMake Error at cmake/Modules/MacroEnsureOutOfSourceBuild.cmake:7 (message):
libcxx requires an out of source build. Please create a separate</em>
build directory and run 'cmake /path/to/libcxx [options]' there.
Call Stack (most recent call first):
CMakeLists.txt:24 (MACRO_ENSURE_OUT_OF_SOURCE_BUILD)
CMake Error at cmake/Modules/MacroEnsureOutOfSourceBuild.cmake:8 (message):
In-source builds are not allowed.
CMake would overwrite the makefiles distributed with Compiler-RT.
Please create a directory and run cmake from there, passing the path
to this source directory as the last argument.
This process created the file `CMakeCache.txt' and the directory `CMakeFiles'.
Please delete them.
Call Stack (most recent call first):
CMakeLists.txt:24 (MACRO_ENSURE_OUT_OF_SOURCE_BUILD)
You want to use git-svn
to clone a SVN repository, but you don’t want to clone the entire history (which can be quite slow) but only the latest revision.
You want to find out what the last revision number of a remote subversion repository is without cloning it (e.g. because cloning takes a looong time with subversion).
Some time ago I wrote a guide on how to compile and install LevelDB on Linux.
Recently I’m desperately trying to get into LLVM and a tutorial series on how to use LLVM with C/C++ is coming shortly.
As I’m using LevelDB in many of my projects I’d like a way of generating a LLVM IR (intermediate representation) of the LevelDB C++ source – I could link a LLVM program to the native binary, but in order to profit from LLVMs features I suppose using IRs for as many dependencies as possible is the way to go.
Generally there are two ways to go:
I usually tend to use clang++ for LLVM tasks because even with colorgcc and some recent improvements in gcc error message generation I prefer the clang++ error messages, even if I have way more experience with gcc error messages. Additionally the g++ with LLVM backend does seem to have some bugs, including interpreting -emit-llvm
as -e -m -i …, plus recent distribution versions don’t work too well with the LLVM gold plugin and it has proved difficult to tell GCC reliably that it shall use llvm-ld
as linker.
You encounter a GCC error message of the form
error: default argument for parameter of type <some type> has type <some other type>
In C++ you want to check if a given file exists, but you can’t use stat()
because your code needs to work cross-plaform.
You have installed zsh but you don’t see the menu-style autocompletion (where you can navigate the suggestions using the arrow keys on the keyboard)
The following small ANSI C99 program reads a String from stdin and prints out a Brainfuck program that prints the same String on stdout.
Compile using gcc -o bf bf.c
Use it like this:
cat my.txt | ./bf > my.bf
Source code:
#include <stdio.h> #include <stdlib.h> int main(int argc, char** argv) { unsigned char c; unsigned char curval = 0; //Initialize reg+1 with 8 while(1) { c = getchar(); if(feof(stdin)) {break;} while(curval != c) { if(curval < c) { putchar('+'); curval++; } else if(curval > c) { putchar('-'); curval--; } } putchar('.'); } }
How does it work?
Basically it uses just one of the registers of the Brainfuck Turing machine and incremets or decrements the register to be able to print out the next byte. It doesn’t use any of the more ‘advanced’ features in Brainfuck like loops.
In the Linux shell, you have a file path and you want to strip everything but the filename, for example you have the path ../images/photo.jpg
and want only photo.jpg