Using fixed width integers is space-inefficient in many cases, especially if the majority of values are low and only use the less-significant bytes.
This guide describes the basics of varint (varying-length integer) encoding while focusing on C++ as programming language, but the basic concepts apply to any language.
Varint encodings use only the bytes that are needed to represent you integer value appropriately. A varint algorithm can represent the number 10 in only one byte while using 4 bytes to encode 800000000 (800 million). In many application this yields a significant overhead reduction since you would need to use larger integers if there is a slight change that your values grow beyond the boundary of the integer type that is applicable for the majority of your values. Additionally, you usually can only use 8,16,32 or 64 bit integers while 48 bit integers need to be coded manually in most languages. For example, if most of your values are between 0 and 100, but a few might be larger than 16384 (for unsigned integers), you would usually use a full 32-bit integer, even if most values could be represented by a single byte.
Continue reading →