Scalar vs packed operations in SSE

If you look at any SSE instruction table, you might notice that there are two basic types of operations:

  • Packed instructions (the assembly instruction ends with PS)
  • Scalar instructions (the assembly instruction ends with SS)

For most operations, there are two versions, one packed and one scalar.

What’s the difference between them? It’s pretty simple:

  • Scalar operations operate on only one element, for example a single integer.
  • Packed operations operate on any element in the vector in parallel, e.g. they multiply 4 32-bit integers in a single instruction.

SSE gains it performance from using packed operations implementing the SIMD paradigm (using a single instruction, multiple values are processed). However, it is occasionally useful to avoid expensive copying by using scalar operations operation on the SSE registers.

Also see the Original source