Scalar vs packed operations in SSE

If you look at any SSE instruction table, you might notice that there are two basic types of operations:

For most operations, there are two versions, one packed and one scalar.

What’s the difference between them? It’s pretty simple:

SSE gains it performance from using packed operations implementing the SIMD paradigm (using a single instruction, multiple values are processed). However, it is occasionally useful to avoid expensive copying by using scalar operations operation on the SSE registers.

Also see the Original source