AutoBenchmark: Automatic multi-interval benchmarking in C++ using std::chrono


Some part of your C++ code is suffering from performance issues. You are looking for a lightweight solution that allows you to easily record different time points and adaptively print the results (i.e. you don’t want to know something ran for 1102564643 nanoseconds, you just want to now that it took 1.102 seconds)


I wrote AutoBenchmark so you can have the most hassle-free C++11 experience possible for your micro-benchmarking needs.

AutoBenchmark allows you to record different points in time, each with a label. The first time point is recorded when this instance is constructed. AutoBenchmark supports an arbitrary number of time points.
When an instance of this class is destructed, it will automatically print all the benchmark results, but only if a configurable amount of time has passed since its construction – this is extremely handy especially if you have multiple exit points in your function that would otherwise require calling Print() multiple times.
It allows you to ignore the benchmark when some performance goal is passed (e.g. if you have a for loop that is slow only for some datapoints, you can configure AutoBenchmark to print infos only for the slow runs).
The default behaviour (i.e. constructor with default parameters) is to disable automatic printing – in that case, you can call Print() yourself.

Header (AutoBenchmark.hpp):

 * AutoBenchmark v1.0
 * Written by Uli Köhler
 * Published under CC0 1.0 Universal
#pragma once

#include <chrono>
#include <string>
#include <limits>
#include <vector>

using namespace std;

 * Automatic benchmark: Allows you to record different points in time,
 * each with a label. The first time point is recorded when this instance
 * is constructed. This class supports an arbitrary number of time points.
 * When an instance of this class is destructed, it will automatically
 * print all the benchmark results, but only if a configurable amount
 * of time has passed since its construction, allowing you to automatically
class AutoBenchmark {
     * Initialize a benchmark that automatically prints its records
     * on destruction if the total time consumed is >= autoPrintThreshold
     * at the time of destruction. Only the time up until the last Record()ed
     * label is printed.
     * @param autoPrintThreshold: How many seconds will need to have passed
     * so that the destructor will automatically print. Default is to never print.
     * @param benchmarkLabel: A label that will be printed once, before all the results
     * @param lineLabel: A label (e.g. indent) that will be printed before each result line
    AutoBenchmark(double autoPrintThreshold=std::numeric_limits::max(), const std::string& benchmarkLabel = "", const std::string& lineLabel = "    ");
     * Record a datapoint
    void Record(const std::string& label = "");
    void Record(const char *label = "");
     * Print all time deltas
    void Print();
     * Reset the benchmark, as if it were a new instance.
    void Reset();

     * Return now() - first timepoint in seconds.
    double TotalSeconds();

    vector times;
    vector labels;
    double autoPrintThreshold;
    std::string benchmarkLabel;
    std::string lineLabel;

Source (AutoBenchmark.cpp):

#include "AutoBenchmark.hpp"

#include <iostream>

AutoBenchmark::AutoBenchmark(double autoPrintThreshold, const std::string& benchmarkLabel, const std::string& lineLabel)
    : autoPrintThreshold(autoPrintThreshold), benchmarkLabel(benchmarkLabel), lineLabel(lineLabel) {
    labels.emplace_back("Begin"); // Just to keep indices the same

AutoBenchmark::~AutoBenchmark() {
    if(TotalSeconds() >= autoPrintThreshold) {

void AutoBenchmark::Record(const std::string &label) {
    labels.emplace_back(label); // Just to keep indices the same

void AutoBenchmark::Record(const char *label) {

void AutoBenchmark::Print() {
    if(benchmarkLabel.length()) {
        cout << benchmarkLabel << '\n';
    for (size_t i = 1; i < times.size(); i++) {
        // Compute time interval for size comparison
        chrono::duration<double, std::nano> ns = times[i] - times[i - 1];
        chrono::duration<double, std::micro> us = times[i] - times[i - 1];
        chrono::duration<double, std::milli> ms = times[i] - times[i - 1];
        chrono::duration s = times[i] - times[i - 1];
        chrono::duration<double, std::ratio<60>> min = times[i] - times[i - 1];
        chrono::duration<double, std::ratio<3600>> hrs = times[i] - times[i - 1];
        // Print
        if(ns.count() < 1000.0) {
            cout << lineLabel << labels[i] << " took " << ns.count() << " ns\n";
        } else if(us.count() < 1000.0) {
            cout << lineLabel << labels[i] << " took " << us.count() << " μs\n";
        } else if (ms.count() < 1000.0) {
            cout << lineLabel << labels[i] << " took " << ms.count() << " ms\n";
        } else if (s.count() < 60.0) {
            cout << lineLabel << labels[i] << " took " << s.count() << " seconds\n";
        } else if (min.count() < 1000.0) {
            cout << lineLabel << labels[i] << " took " << min.count() << " minutes\n";
        } else {
            cout << lineLabel << labels[i] << " took " << hrs.count() << " hours\n";
    cout << flush;

void AutoBenchmark::Reset() {

double AutoBenchmark::TotalSeconds() {
    chrono::duration s = chrono::system_clock::now() - times[0];
    return s.count();

Usage example:

#include "AutoBenchmark.hpp"

void MySlowFunction() {
    // Every run that takes >= 0.3 seconds will auto-print
    AutoBenchmark myBenchmark(0.3, "Results of running MySlowFunction():");
    // .. do task 1 ...
    myBenchmark.Record("Running task 1"); // will print as: Running task 1 took 1.2ms
    // .. do task 2 ...
    myBenchmark.Record("Running task 2");

    // Loop example
    for(size_t i = 0; ....) {
        // ... do loop iteration task here ...
        myBenchmark.Record("Loop iteration " + std::to_string(i));

    // myBenchmark will be destructed here, so if MySlowFunction() took
    // more than 0.3s to run until it returned, the result will be printed
    // to cout automatically.

If MySlowFunction() took more than 0.3s to run overall, AutoBenchmark will print the results when it is destructed – i.e. when MySlowFunction( ) returns:

Results of running MySlowFunction():
    Running task 1 took 260.826 ms
    Running task 2 took 36.148 μs
    Loop iteration 0 took 2.5522 seconds
    Loop iteration 1 took 664.059 ms
    Loop iteration 2 took 22.2772 ms
    Loop iteration 3 took 57.4024 ms
    Loop iteration 4 took 16.9928 ms
    Loop iteration 5 took 14.0497 ms
    Loop iteration 6 took 62.5218 ms


ISO8601 UTC time as std::string using C++11 chrono

You want to use the C++11 standard’s chrono library to generate a ISO8601-formatted timestamp as a std::string, e.g. 2018-03-30T16:51:00Z


You can use this function which uses std::put_time with a std::ostringstream to generate the resulting std::string.

#include <iostream>
#include <chrono>
#include <iomanip>
#include <sstream>

 * Generate a UTC ISO8601-formatted timestamp
 * and return as std::string
std::string currentISO8601TimeUTC() {
  auto now = std::chrono::system_clock::now();
  auto itt = std::chrono::system_clock::to_time_t(now);

  std::ostringstream ss;
  ss << std::put_time(gmtime(&itt), "%FT%TZ");
  return ss.str();

// Usage example
int main() {
    std::cout << currentISO8601TimeUTC() << std::endl;


How to fix GCC error ‘the lambda has no capture-default’

When encountering a GCC error like this:

error: the lambda has no capture-default

fixing it is usually quite easy. Look for a Lambda function that captures some variable like this

[&myVar] (/* ... */) {/* ... */}

&myVar means “capture myVar by reference”.

In most cases you can just capture all local variables by using a capture default:

[&] (/* ... */) {/* ... */}

In rare cases this will have unintended side-effects as you now are capturing all variables by reference where you might want to capture some by copy – so be sure to check your code.

Note that this error is GCC version-dependent. For me using GCC 7.2 fixed the error.

How to fix GCC error: invalid use of incomplete type ‘class …’


You are compiling a C/C++ program using GCC. You get an error message similar to this:

error: invalid use of incomplete type ‘class SomeType’


There are multiple possible issues, but in general this error means that GCC can’t find the full declaration of the given class or struct.

The most common issue is that you are missing an #include clause. Find out in which header file the declaration resides, i.e. if the error message mentions class Map, look for something like

class Map {
   // ...

Usually the classes reside in header files that are similar to their name, e.g. MyClass might reside in a header file that is called MyClass.h, MyClass.hpp or MyClass.hxx, so be sure to look for those files first. Note that you might also be looking for a type from a library. Often the best approach is to google C++ <insert the missing type here> to find out where it might be located.

Another possible reason is that you have your #include clause after the line where the error occurs. If this is the case, ensure that all required types are included before they are used.

For other reasons, see StackOverflow, e.g. this post

Advantages and disadvantages of hugepages

In a previous post, I’ve written about how to check and enable transparent hugepages in Linux globally.

Although this post is important if you actually have a usecase for hugepages, I’ve seen multiple people getting fooled by the prospect that hugepages will magically increase performance. However, hugepaging is a complex topic and, if used in the wrong way, might easily decrease overall performance. Read more

In-place trimming/stripping in C

For an explanation of in-place algorithms see my previous post on zero-copy in-place splitting

The problem

You have a C string possibly containing whitespace at the beginning and/or the end.

char* s = " abc   \n\r";

Using an in-place algorithm, you want to remove the whitespace from this string.

Doing this is also possible using boost::algorithm::trim, but it has the same caveats as boost::algorithm::split as discussed in my previous post about C splitting Read more

Using Arduino Leonardo as an USB/UART adapter

In contrasts to older designs like the Arduino Uno, the Arduino Leonardo features a separate connection Serial1 for TTLUART whereas Serial is used for the USB CDC UART interface.

This allows one to use the Leonardo as an USB/UART bridge without having to resort to more expensive boards like the Arduino Mega 2560. In order to do this, use this sketch which can also be modified to provide an intelligent UART bridge.

Remember to adjust the baudrate for your application. This version of the sketch does not support automatic baudrate selection via the CDC peripheral.

Read more

Reading the STM32 unique device ID in C

All STM32 microcontrollers feature a 96-bit factory-programmed unique device ID. However, for me it was hard to find an adequately licensed example on how to read it in a manner compatible with different families and compilers.

Here’s a simple header that defines a macro for the device ID address. While I checked the address for both STM32F4 and STM32F0 families, other families might have slightly different addresses for the device ID. Check the reference manual corresponding to your STM32 family if errors occur.

Read more

Reading STM32F0 internal temperature and voltage using ChibiOS

The STM32F0 series of 32-bit ARM Cortex M0 microcontrollers contain a huge number of internal peripherals despite their low price starting at 0,32€ @1pc. Amongst them is an internal, factory-calibrated temperature sensor and a supply voltage sensor (that specifically senses VDDA, the analog supply voltage rail) connect to channels 16 and 17 of the internal ADC.

While I usually like the STM32 documentation, it was quite hard to implement code that produced realistic values. While the STM32F0 reference manual contains both formulas and a short section of example code, I believe that some aspects of the calculation are understated in the computation:

Section 13.9 in RM0091 provides a formula for computing the temperature from the raw temperature sensor output and the factory calibration values. However it is not stated anywhere (at least in Rev7, the current RM0091 revision) that this formula is only correct for a VDDA of exactly 3.30V.

Read more

Using the lwIP SNTP client with ChibiOS

A common task with embedded systems is to use the RTC to timestamp events. However, the system architect needs to find a way of synchronizing the devices RTC time with an external time source. Additionally, the designer needs to deal with the problem of drifting RTC clocks, especially for long-running devices. This article discusses an lwIP+SNTP-based approach for STM32 devices using the ChibiOS RTOS. The lwIP-specific part of this article is also applicable to other types of microcontrollers.

For high-accuracy or long-running applications, RTC clock drift also has to be taken into account. Depending on the clock source in use, the clock frequency can deviate significantly from the nominal value.

On the STM32F4 for example, you can derive the RTC clock from: The HSE/HSI main oscillator The LSI oscillator * The LSE oscillator, i.e. a 32.768 kHz external crystal.

Read more