Scalar vs packed operations in SSE
If you look at any SSE instruction table, you might notice that there are two basic types of operations:
- Packed instructions (the assembly instruction ends with PS)
- Scalar instructions (the assembly instruction ends with SS)
For most operations, there are two versions, one packed and one scalar.
What’s the difference between them? It’s pretty simple:
- Scalar operations operate on only one element, for example a single integer.
- Packed operations operate on any element in the vector in parallel, e.g. they multiply 4 32-bit integers in a single instruction.
SSE gains it performance from using packed operations implementing the SIMD paradigm (using a single instruction, multiple values are processed). However, it is occasionally useful to avoid expensive copying by using scalar operations operation on the SSE registers.
Also see the Original source