Primitive sum of values in a 3D array vs Numba JIT Benchmark in Python

This function sums up all values from the given 3D array:

def primitive_pixel_sum(frame):
    result = 0.0
    for x in range(frame.shape[0]):
        for y in range(frame.shape[1]):
            for z in range(frame.shape[2]):
                result += frame[x,y,z]
    return result

Whereas the following function is exactly the same algorithm, but using numba.jit:

import numba

@numba.jit
def numba_pixel_sum(frame):
    result = 0.0
    for x in range(frame.shape[0]):
        for y in range(frame.shape[1]):
            for z in range(frame.shape[2]):
                result += frame[x,y,z]
    return result

We can benchmark them in Jupyter using

%%timeit
primitive_pixel_sum(frame)

and

%%timeit
numba_pixel_sum(frame)

respectively.

Results

We tested this with a random camera image taken from OpenCV of shape (480, 640, 3)

primitive_pixel_sum():

1.78 s ± 253 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

numba_pixel_sum():

4.06 ms ± 413 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

It should be clear from these results that the numba version is 438 times faster compared to the primitive version.

Note that when compiling complex functions using numba.jit it can take many milliseconds or even seconds to compile – possibly longer than a simple Python function would take.

Since it’s so simple to use Numba, my recommendation is to just try it out for every function you suspect will eat up a lot of CPU time. Over time you will be able to develop an intuition for which functions it’s worth to use Numba and which functions won’t work at all or if it will be slower overall than just using Python.

Remember than often you can also use NumPy functions to achieve the same result. In our example, you could achieve the same thing using

np.sum(frame)

which is even faster than Numba:

%%timeit
np.sum(frame)

Result:

2.5 ms ± 7.17 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)