In-place trimming/stripping in C

For an explanation of in-place algorithms see my previous post on zero-copy in-place splitting

The problem

You have a C string possibly containing whitespace at the beginning and/or the end.

trim_sample_string.c
char* s = " abc   \n\r";

Using an in-place algorithm, you want to remove the whitespace from this string.

Doing this is also possible using boost::algorithm::trim, but it has the same caveats as boost::algorithm::split as discussed in my previous post about C splitting

What is whitespace

For the scope of this post, we define whitespace as characters for which

isspace-prototype.c
isspace(char c)

from ctype.h returns true. This can be adapted to user-specific needs.

Removing whitespace from the beginning

Removing whitespace at the start of the string is quite easy, but it has an important caveat.

trim_left.c
#include <ctype.h>

char* trimLeft(char* s) {
    while(isspace(*s)) {
        s++;
    }
    return s;
}

The idea is to create a new pointer that is advanced to the first non-whitespace character. Now to the caveat.

Let’s assume your code block looks like this:

trimleft_usage_good.c
char* s = strdup(/*...*/);
char* sLeftTrimmed = trimLeft(s);
// ... do sth with sLeftTrimmed
free(s);

This code will work just fine. But remember you’ll always need to free s, not sLeftTrimmed. The following code leads to undefined behaviour:

trimleft_usage_bad.c
char* s = strdup(/*...*/);
char* s = trimLeft(s);
// ... do sth with sLeftTrimmed
free(s);

This caveat is dangerous because in case s does not contain whitespace at the start, it will work just fine. When whitespace is stripped, however, free might do anything, e.g. nothing at all (leading to s not being freed), corrupting other parts of your program (making it nearly impossible to debug) or just crash randomly. You should read What Every C Programmer Should Know About Undefined Behavior to get to know this type of issue.

Removing whitespace from the end

This part of the algorithm is a bit more complex, but it doesn’t suffer from the wrong free() usage caveat as trimLeft().

trim_right.c
#include <ctype.h>

char* trimRight(char* s) {
    //Safeguard against empty strings
    int len = strlen(s);
    if(len == 0) {
        return s;
    }
    //Actual algorithm
    char* pos = s + len - 1;
    while(pos >= s && isspace(*pos)) {
        *pos = '\0';
        pos--;
    }
    return s;
}

The idea is to in-place-replace every space at the end by \0. All functions

Note that the go-backwards strategy works well for ASCII encodings, and with little modification UTF-16 and UTF-32 encodings, but using it for UTF8 requires more extensive backtracking until you find the first byte of the codepoint.

Both-sided trimming

Given the functions for trimming each side, trimming both sides is trivial:

trim.c
char* trim(char* s) {
    return trimRight(trimLeft(s));
}

The rationale behind first trimming left is that searching for the end of the string in trimRight() via strlen() might be slightly faster due to the shorter string. For real-world usecases however, this should make little difference.

Note that the free() caveat as discussed in the trimLeft() section also applies to trim().

Also see the full trim example


Check out similar posts by category: C/C++