Scalar vs packed operations in SSE

If you look at any SSE instruction table, you might notice that there are two basic types of operations:

For most operations, there are two versions, one packed and one scalar.

What’s the difference between them? It’s pretty simple:

SSE gains it performance from using packed operations implementing the SIMD paradigm (using a single instruction, multiple values are processed). However, it is occasionally useful to avoid expensive copying by using scalar operations operation on the SSE registers.

Also see the Original source


Check out similar posts by category: Performance