Scalar vs packed operations in SSE

If you look at any SSE instruction table, you might notice that there are two basic types of operations:

Packed instructions (the assembly instruction ends with PS)
Scalar instructions (the assembly instruction ends with SS)

For most operations, there are two versions, one packed and one scalar.

What’s the difference between them? It’s pretty simple:

Scalar operations operate on only one element, for example a single integer.
Packed operations operate on any element in the vector in parallel, e.g. they multiply 4 32-bit integers in a single instruction.

SSE gains it performance from using packed operations implementing the SIMD paradigm (using a single instruction, multiple values are processed). However, it is occasionally useful to avoid expensive copying by using scalar operations operation on the SSE registers.

Also see the Original source

If this post helped you, please consider buying me a coffee or donating via PayPal to support research & publishing of new posts on TechOverflow

Buy me a coffee