C/C++

How to download & sync PubMed baseline + updates

In our previous post How to download PubMed baseline data using rsync we showed how you can update PubMed’s baseline data. This dataset is only updated yearly – however, you can download the updatefiles which are typically updated once per day.

The commands to download & sync both sets of files into the PubMed directory:

rsync -Pav --delete ftp.ncbi.nlm.nih.gov::pubmed/baseline/\*.xml.gz PubMed/
rsync -Pav --delete ftp.ncbi.nlm.nih.gov::pubmed/updatefiles/\*.xml.gz PubMed/

The --delete option will ensure that files that are deleted on the server will also be deleted locally. For example, when a new baseline dataset is being published, you need to delete the old year’s files to avoid having to process duplicate data.

Posted by Uli Köhler in Bioinformatics, C/C++

How to control boost::iostreams gzip compression level

In our previous post How to gzip-compress on-the-fly in C++ using boost::iostreams we showed how to create a gzip-compressing output stream using the boost::iostreams library.

This example shows how to control the compression rate of gzip_compressor:

Instead of constructing boost::iostreams::gzip_compressor() without arguments, use boost::iostreams::gzip_params(level) as the argument, where level (1..9) represents the compression level with 9 representing the highest compression level and 1 representing the lowest compression level. Higher levels of compression lead to reduced filesizes but are slower (i.e. consume more CPU time) during compression.

If filesize matters to you, I recommend choosing level 9 since compression even with the high level is extremely fast on modern computers.

Full example:

#include <fstream>
#include <iostream>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/gzip.hpp>
using namespace std;

int main(int argc, char** argv) {
    if(argc < 2) {
        cerr << "Usage: " << argv[0] << " <output .gz file>" << endl;
    }
    //Read filename from the first command line argument
    ofstream file(argv[1], ios_base::out | ios_base::binary);
    boost::iostreams::filtering_streambuf<boost::iostreams::output> outbuf;
    outbuf.push(boost::iostreams::gzip_compressor(
        boost::iostreams::gzip_params(9)
    ));
    outbuf.push(file);
    //Convert streambuf to ostream
    ostream out(&outbuf);
    //Write some test data
    out << "This is a test text!\n";
    //Cleanup
    boost::iostreams::close(outbuf); // Don't forget this!
    file.close();
}
cmake_minimum_required(VERSION 3.0)
find_package(Boost 1.36.0 COMPONENTS iostreams)

include_directories(${Boost_INCLUDE_DIRS})
add_executable(iostreams-gz-compress iostreams-gz-compress.cpp)
target_link_libraries(iostreams-gz-compress ${Boost_LIBRARIES})

 

Posted by Uli Köhler in C/C++

How to check if XML element exists in PugiXML

Checking if an element exists in PugiXML is simple: Just call bool(element) or use the element directly inside an if clause:

// Example on using bool(element)
cout << "<root-element> exists: " << std::boolalpha
     << bool(doc.child("root-element")) << endl;
cout << "<not-root-element> exists: " << std::boolalpha
     << bool(doc.child("not-root-element")) << endl;

// Example on using the element directly inside an if clause
if(doc.child("root-element")) {
    cout << "Yes, <root-element> exists!" << endl;
}

Full example:

#include <iostream>
#include <pugixml.hpp>
using namespace std;
using namespace pugi;

int main() {
    xml_document doc;
    xml_parse_result result = doc.load_file("test.xml");

    // Example on using bool(element)
    cout << "<root-element> exists: " << std::boolalpha
         << bool(doc.child("root-element")) << endl;
    cout << "<not-root-element> exists: " << std::boolalpha
         << bool(doc.child("not-root-element")) << endl;

    // Example on using the element directly inside an if clause
    if(doc.child("root-element")) {
        cout << "Yes, <root-element> exists!" << endl;
    }
}
add_executable(pugixml-example pugixml-example.cpp)
target_link_libraries(pugixml-example pugixml)
<?xml version="1.0" encoding="UTF-8"?>
<root-element>Test text</root-element>

Compile using

cmake .
make
Posted by Uli Köhler in C/C++

How to gzip-compress on-the-fly in C++ using boost::iostreams

This minimal example shows you how to write data to a .gz file in C++, compressing the data on-the-fly using boost::iostreams. Using the modern iostreams layer, as opposed to a block-based approach like zlib allows you to use the full power and ease-of-use of std::ostream.

#include <fstream>
#include <iostream>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/gzip.hpp>
using namespace std;

int main(int argc, char** argv) {
    if(argc < 2) {
        cerr << "Usage: " << argv[0] << " <output .gz file>" << endl;
    }
    //Read filename from the first command line argument
    ofstream file(argv[1], ios_base::out | ios_base::binary);
    boost::iostreams::filtering_streambuf<boost::iostreams::output> outbuf;
    outbuf.push(boost::iostreams::gzip_compressor());
    outbuf.push(file);
    //Convert streambuf to ostream
    ostream out(&outbuf);
    //Write some test data
    out << "This is a test text!\n";
    //Cleanup
    boost::iostreams::close(outbuf); // Don't forget this!
    file.close();
}
cmake_minimum_required(VERSION 3.0)
find_package(Boost 1.36.0 COMPONENTS iostreams)

include_directories(${Boost_INCLUDE_DIRS})
add_executable(iostreams-gz-compress iostreams-gz-compress.cpp)
target_link_libraries(iostreams-gz-compress ${Boost_LIBRARIES})

 

Posted by Uli Köhler in Boost, C/C++

How to decompress GZ files on-the-fly in C++ using boost::iostreams

This minimal example shows you how to open a .gz file in C++, decompress it on-the-fly using boost::iostreams and then copy its contents to stdout:

#include <fstream>
#include <iostream>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/gzip.hpp>
using namespace std;

int main(int argc, char** argv) {
    if(argc < 2) {
        cerr << "Usage: " << argv[0] << " <gzipped input file>" << endl;
    }
    //Read from the first command line argument, assume it's gzipped
    ifstream file(argv[1], ios_base::in | ios_base::binary);
    boost::iostreams::filtering_streambuf<boost::iostreams::input> inbuf;
    inbuf.push(boost::iostreams::gzip_decompressor());
    inbuf.push(file);
    //Convert streambuf to istream
    istream instream(&inbuf);
    //Copy everything from instream to 
    cout << instream.rdbuf();
    //Cleanup
    file.close();
}
cmake_minimum_required(VERSION 3.0)
find_package(Boost 1.36.0 COMPONENTS iostreams)

include_directories(${Boost_INCLUDE_DIRS})
add_executable(iostreams-gz-decompress iostreams-gz-decompress.cpp)
target_link_libraries(iostreams-gz-decompress ${Boost_LIBRARIES})
Posted by Uli Köhler in Boost, C/C++

How to parse .xml.gz using PugiXML and boost::iostreams

In our previous post Minimal PugiXML file reader example we provided a short example of how to read from an uncompressed XML file using PugiXML. In practice, many large XML files are distributed as .xml.gz package.

Since you can use boost::iostreams to decompress gzipped data on the fly and pipe it directly into PugiXML, you don’t need to store the uncompressed data on your hard drive.

#include <iostream>
#include <fstream>
#include <pugixml.hpp>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/gzip.hpp>
using namespace std;
using namespace pugi;

int main() {
    // Open "raw" gzipped data stream
    ifstream file("test.xml.gz", ios_base::in | ios_base::binary);
    // Configure decompressor filter
    boost::iostreams::filtering_streambuf<boost::iostreams::input> inbuf;
    inbuf.push(boost::iostreams::gzip_decompressor());
    inbuf.push(file);
    //Convert streambuf to istream
    istream instream(&inbuf);
    // Parse from stream
    xml_document doc;
    xml_parse_result result = doc.load(instream);
    // Print content of root element
    cout << "Load result: " << result.description() << "\n"
         << doc.child("root-element").child_value() // "Test text"
         << endl;
}
cmake_minimum_required(VERSION 3.0)
find_package(Boost 1.36.0 COMPONENTS iostreams)

include_directories(${Boost_INCLUDE_DIRS})
add_executable(pugixml-example pugixml-example.cpp)
target_link_libraries(pugixml-example pugixml ${Boost_LIBRARIES})
<?xml version="1.0" encoding="UTF-8"?>
<root-element>Test text</root-element>

Download all three files and then run

gzip test.xml
cmake .
make
./pugixml-example

You should see an output like

Load result: No error
Test text

 

Posted by Uli Köhler in Boost, C/C++

How to fix RapidJSON segmentation faults when building nested Documents

Problem:

You want to build a RapidJSON application that builds a JSON from scratch and is using Documents nested inside other documents, but when you try to run it, you see an error message like

zsh: segmentation fault (core dumped)  ./rapidjson-example

Solution:

Segmentation faults (i.e. illegal memory accesses) can have many reasons, but the most common one is that you use local allocators.

In order to fix the issue, use one allocator for your entire application.

MemoryPoolAllocator<> jsonAlloc; // I recommend to declare this statically

// ...
doc.AddMember("text", Value().SetString("Hello JSON!"), jsonAlloc);

Note that MemoryPoolAllocator never releases any memory from its memory pool.

Continue reading →

Posted by Uli Köhler in C/C++

How to fix RapidJSON Assertion `!hasRoot_’ failed.

Problem:

Your program is using RapidJSON but when running it you see an error message like

rapidjson-example: /usr/include/rapidjson/writer.h:452: void rapidjson::Writer<OutputStream, SourceEncoding, TargetEncoding, StackAllocator, writeFlags>::Prefix(rapidjson::Type) [with OutputStream = rapidjson::BasicOStreamWrapper<std::basic_ostream<char> >; SourceEncoding = rapidjson::UTF8<>; TargetEncoding = rapidjson::UTF8<>; StackAllocator = rapidjson::CrtAllocator; unsigned int writeFlags = 0]: Assertion `!hasRoot_' failed.

Solution:

You are using a Writer for more than one Document. While you can use the Stream backing the Writer for any number of documents, each Writer must only be used once!

To fix the issue, create a Writer instance (on the same output Stream) for each document you intend to write.

Continue reading →

Posted by Uli Köhler in C/C++

How to create and serialize a document in RapidJSON

RapidJSON is a JSON library optimized for speed – hence it lacks some convieniece and lacks easy-to-use documentation on how to create JSON documents from scratch.

Here’s how you can create a Document:

// Generate document: {"text": "Hello JSON!"}
Document doc;
doc.SetObject(); // Make doc an object !
doc.AddMember("text", "Hello JSON!", doc.GetAllocator());

Full example, which prints to cout:

#include <iostream>
#include <rapidjson/document.h>
#include <rapidjson/writer.h>
#include <rapidjson/ostreamwrapper.h>
using namespace rapidjson;
using namespace std;

int main() {
    // Generate document: {"text": "Hello JSON!"}
    Document doc;
    doc.SetObject(); // Make doc an object !
    doc.AddMember("text", "Hello JSON!", doc.GetAllocator());
    // Write to stdout
    OStreamWrapper out(cout);
    Writer<OStreamWrapper> writer(out);
    doc.Accept(writer);
}

 

Posted by Uli Köhler in C/C++

How to write JSON to cout in RapidJSON

RapidJSON does not provide a straightforward way of serializing JSON to cout (= stdout), but you can use OStreamWrapper to do that:

#include <rapidjson/writer.h>
#include <rapidjson/ostreamwrapper.h>
// ... 
OStreamWrapper out(cout);
Writer<OStreamWrapper> writer(out);
doc.Accept(writer);

Full example:

#include <iostream>
#include <rapidjson/document.h>
#include <rapidjson/writer.h>
#include <rapidjson/ostreamwrapper.h>
using namespace rapidjson;
using namespace std;

int main() {
    // Generate document: {"text": "Hello JSON!"}
    Document doc;
    doc.SetObject(); // Make doc an object !
    doc.AddMember("text", "Hello JSON!", doc.GetAllocator());
    // Write to stdout
    OStreamWrapper out(cout);
    Writer<OStreamWrapper> writer(out);
    doc.Accept(writer);
}

 

Posted by Uli Köhler in C/C++

How to iterate PugiXML children using C++11 foreach-loop

PugiXML allows you to use the C++11 for loop (also known as range-based for loop or foreach loop) to iterate the children of a node easily:

<?xml version="1.0" encoding="UTF-8"?>
<root-element>
    <sub>A</sub>
    <sub>B</sub>
    <sub>C</sub>
</root-element>
#include <iostream>
#include <pugixml.hpp>
using namespace std;
using namespace pugi;

int main() {
    xml_document doc;
    xml_parse_result result = doc.load_file("test.xml");

    for(const auto& child : doc.child("root-element")) {
        cout << child.child_value() << endl;
    }
}
add_executable(pugixml-for pugixml-for.cpp)
target_link_libraries(pugixml-for pugixml)

Running the example will print

A
B
C
Posted by Uli Köhler in C/C++

Minimal PugiXML file reader example

XML:

<?xml version="1.0" encoding="UTF-8"?>
<root-element>Test text</root-element>

C++:

#include <iostream>
#include <pugixml.hpp>
using namespace std;
using namespace pugi;

int main() {
    xml_document doc;
    xml_parse_result result = doc.load_file("test.xml");
    
    cout << "Load result: " << result.description() << "\n"
         << doc.child("root-element").child_value() // "Test text"
         << endl;
}

Build configuration

add_executable(pugixml-example pugixml-example.cpp)
target_link_libraries(pugixml-example pugixml)

Compile using

cmake .
make

 

Posted by Uli Köhler in C/C++

How to decompress gzip files using zlib – a minimal example

zlib is a great library in use by hundreds of thousands of programs (including Python). However, it lacks documentation.

This minimal example shows you how to create a simple zcat-like program that decompresses a gzipped input file and prints its contents to stdout.

/**
 * zzcat.c -- Minimal gzip decompression example using zlib
 * Written by Uli Köhler (techoverflow.net).
   Not copyrighted -- provided to the public domain
 */
#include <stdio.h>
#include <zlib.h>

#define BUFSIZE 16384

/* compress or decompress from fin (command line argument) to stdout */
int main(int argc, char **argv)
{
    if(argc <= 1) { // <= (number of expected CLI arguments)
        fprintf(stderr, "Usage: %s <input file>\n", argv[0]);
        return -1;
    }

    gzFile fin = gzopen(argv[1], "rb");

    char buf[BUFSIZE];
    int n;

    while((n = gzread(fin, buf, BUFSIZE)) > 0) {
        fwrite(buf, 1, n, stdout);
    }

    return 0;
}

Compile using

g++ -o zzcat zzcat.c -lz

Usage example:

# Create test file
echo "foo" | gzip -c > test.txt.gz
# Uncompress using zzcat! 
./zzcat test.txt.gz # This will print "foo"

 

Posted by Uli Köhler in C/C++

C/C++ program template with one CLI argument

This example program provides an example of a CLI executable in C/C++ that uses one command line argument and exits with a usage message if that argument is not present:

#include <stdio.h>

int main(int argc, char **argv)
{
    if(argc <= 1) { // <= (number of expected CLI arguments)
        fprintf(stderr, "Usage: %s <input file>\n", argv[0]);
        return -1;
    }
    
    // TODO Your code goes here!
    printf("Input file: %s\n", argv[1]);
    return 0;
}

Compile with

g++ -o cli-onearg cli-onearg.cpp

Usage example:

$ ./cli-onearg  
Usage: ./cli-onearg <input file>
$ ./cli-onearg my-input-file.txt
Input file: my-input-file.txt

 

Posted by Uli Köhler in C/C++

How to fix /usr/bin/ld: cannot find -lzlib

Problem:

You want to compile a C++ executable that uses zlib, but you see an error message like

/usr/bin/ld: cannot find -lzlib
collect2: error: ld returned 1 exit status

Solution:

Use -lz instead of -lzlib. The zlib library is named libz.so, not libzlib.so!

Example:

g++ -o ztest zpipe.c -lz

 

Posted by Uli Köhler in C/C++

How to make strcmp/strncmp case-insensitive in C/C++

To make strcmp case-insensitive, use strcasecmp from #include <strings.h>strcasecmp can be used in exactly the same way as strcmp.

To make strncmp case-insensitive, use strncasecmp from #include <strings.h>strncasecmp can be used in exactly the same way as strncmp.

Note that both of these will not deal with unicode characters correctly, but will work just fine in most applications.

Posted by Uli Köhler in C/C++

boost::lexical_cast minimal example

#include <boost/lexical_cast.hpp>
#include <iostream>

int main() {
    int a = boost::lexical_cast<int>("123");
    int b = boost::lexical_cast<int>("456");
    
    int c = a + b;
    std::cout << c << std::endl; //Prints 579
}

 

Posted by Uli Köhler in Boost, C/C++

Computing distance between gp_Pnt and gp_Ax1 in OpenCASCADE

OCCUtils provides convenience functions for computing the orthogonal direction to two directions:

#include <occutils/Axis.hxx>
using namespace OCCUtils;

gp_Ax1 axis = /* ... */;
gp_Pnt pnt = /* ... */;
double distance = Axis::Distance(axis, pnt);

Alternatively, you can also use Point::Distance() which internally just calls Axis::Distance() but might make your code more readable under some circumstances. Note that the argument order is inverted!

#include <occutils/Point.hxx>
using namespace OCCUtils;

gp_Ax1 axis = /* ... */;
gp_Pnt pnt = /* ... */;
double distance = Point::Distance(pnt, axis);

In case you can’t use OCCUtils, here’s the code to do it manually:

double distance = gp_Lin(axis).Distance(pnt);
Posted by Uli Köhler in C/C++, OpenCASCADE

How to check if gp_Ax1 contains gp_Pnt in OpenCASCADE

OCCUtils provides convenience functions for computing the orthogonal direction to two directions:

#include <occutils/Axis.hxx>
using namespace OCCUtils;

gp_Ax1 axis = /* ... */;
gp_Png point = /* ... */;
bool pointIsOnAxis = Axis::Contains(axis, point);

In case you can’t use OCCUtils, here’s the code to do it manually:

gp_Lin(axis).Contains(pnt, Precision::Confusion());
Posted by Uli Köhler in C/C++, OpenCASCADE