In-place trimming/stripping in C
For an explanation of in-place algorithms see my previous post on zero-copy in-place splitting
The problem
You have a C string possibly containing whitespace at the beginning and/or the end.
char* s = " abc \n\r";
Using an in-place algorithm, you want to remove the whitespace from this string.
Doing this is also possible using boost::algorithm::trim
, but it has the same caveats as boost::algorithm::split
as discussed in my previous post about C splitting
What is whitespace
For the scope of this post, we define whitespace as characters for which
isspace(char c)
from ctype.h
returns true
. This can be adapted to user-specific needs.
Removing whitespace from the beginning
Removing whitespace at the start of the string is quite easy, but it has an important caveat.
#include <ctype.h>
char* trimLeft(char* s) {
while(isspace(*s)) {
s++;
}
return s;
}
The idea is to create a new pointer that is advanced to the first non-whitespace character. Now to the caveat.
Let’s assume your code block looks like this:
char* s = strdup(/*...*/);
char* sLeftTrimmed = trimLeft(s);
// ... do sth with sLeftTrimmed
free(s);
This code will work just fine. But remember you’ll always need to free s
, not sLeftTrimmed
. The following code leads to undefined behaviour:
char* s = strdup(/*...*/);
char* s = trimLeft(s);
// ... do sth with sLeftTrimmed
free(s);
This caveat is dangerous because in case s
does not contain whitespace at the start, it will work just fine. When whitespace is stripped, however, free
might do anything, e.g. nothing at all (leading to s not being freed), corrupting other parts of your program (making it nearly impossible to debug) or just crash randomly. You should read What Every C Programmer Should Know About Undefined Behavior to get to know this type of issue.
Removing whitespace from the end
This part of the algorithm is a bit more complex, but it doesn’t suffer from the wrong free()
usage caveat as trimLeft()
.
#include <ctype.h>
char* trimRight(char* s) {
//Safeguard against empty strings
int len = strlen(s);
if(len == 0) {
return s;
}
//Actual algorithm
char* pos = s + len - 1;
while(pos >= s && isspace(*pos)) {
*pos = '\0';
pos--;
}
return s;
}
The idea is to in-place-replace every space at the end by \0
. All functions
Note that the go-backwards strategy works well for ASCII encodings, and with little modification UTF-16 and UTF-32 encodings, but using it for UTF8 requires more extensive backtracking until you find the first byte of the codepoint.
Both-sided trimming
Given the functions for trimming each side, trimming both sides is trivial:
char* trim(char* s) {
return trimRight(trimLeft(s));
}
The rationale behind first trimming left is that searching for the end of the string in trimRight()
via strlen()
might be slightly faster due to the shorter string. For real-world usecases however, this should make little difference.
Note that the free()
caveat as discussed in the trimLeft()
section also applies to trim()
.
Also see the full trim example