In-place trimming/stripping in C

For an explanation of in-place algorithms see my previous post on zero-copy in-place splitting

The problem

You have a C string possibly containing whitespace at the beginning and/or the end.

char* s = " abc   \n\r";

Using an in-place algorithm, you want to remove the whitespace from this string.

Doing this is also possible using boost::algorithm::trim, but it has the same caveats as boost::algorithm::split as discussed in my previous post about C splitting

What is whitespace

For the scope of this post, we define whitespace as characters for which

isspace(char c)

from ctype.h returns true. This can be adapted to user-specific needs.

Removing whitespace from the beginning

Removing whitespace at the start of the string is quite easy, but it has an important caveat.

#include <ctype.h>

char* trimLeft(char* s) {
    while(isspace(*s)) {
        s++;
    }
    return s;
}

The idea is to create a new pointer that is advanced to the first non-whitespace character. Now to the caveat.

Let’s assume your code block looks like this:

char* s = strdup(/*...*/);
char* sLeftTrimmed = trimLeft(s);
// ... do sth with sLeftTrimmed
free(s);

This code will work just fine. But remember you’ll always need to free s, not sLeftTrimmed. The following code leads to undefined behaviour:

char* s = strdup(/*...*/);
char* s = trimLeft(s);
// ... do sth with sLeftTrimmed
free(s);

This caveat is dangerous because in case s does not contain whitespace at the start, it will work just fine. When whitespace is stripped, however, free might do anything, e.g. nothing at all (leading to s not being freed), corrupting other parts of your program (making it nearly impossible to debug) or just crash randomly. You should read What Every C Programmer Should Know About Undefined Behavior to get to know this type of issue.

Removing whitespace from the end

This part of the algorithm is a bit more complex, but it doesn’t suffer from the wrong free() usage caveat as trimLeft().

#include <ctype.h>

char* trimRight(char* s) {
    //Safeguard against empty strings
    int len = strlen(s);
    if(len == 0) {
        return s;
    }
    //Actual algorithm
    char* pos = s + len - 1;
    while(pos >= s && isspace(*pos)) {
        *pos = '\0';
        pos--;
    }
    return s;
}

The idea is to in-place-replace every space at the end by \0. All functions

Note that the go-backwards strategy works well for ASCII encodings, and with little modification UTF-16 and UTF-32 encodings, but using it for UTF8 requires more extensive backtracking until you find the first byte of the codepoint.

Both-sided trimming

Given the functions for trimming each side, trimming both sides is trivial:

char* trim(char* s) {
    return trimRight(trimLeft(s));
}

The rationale behind first trimming left is that searching for the end of the string in trimRight() via strlen() might be slightly faster due to the shorter string. For real-world usecases however, this should make little difference.

Note that the free() caveat as discussed in the trimLeft() section also applies to trim().

Also see the full trim example