Example 4-1 – Intel ARCHITECTURE IA-32 User Manual
Page 225

Optimizing for SIMD Integer Applications
4
4-5
•
Don’t empty when already empty: If the next instruction uses an
MMX register,
_mm_empty()
incurs a cost with no benefit.
•
Group Instructions: Try to partition regions that use
x87 FP
instructions from those that use 64-bit SIMD integer instructions.
This eliminates needing an
emms
instruction within the body of a
critical loop.
•
Runtime initialization: Use
_mm_empty()
during runtime
initialization of
__m64
and
x87 FP
data types. This ensures
resetting the register between data type transitions. See Example 4-1
for coding usage.
Further, you must be aware that your code generates an MMX
instruction, which uses the MMX registers with the Intel C++ Compiler,
in the following situations:
•
when using a 64-bit SIMD integer intrinsic from MMX technology,
SSE, or SSE2
•
when using a 64-bit SIMD integer instruction from MMX
technology, SSE, or SSE2 through inline assembly
•
when referencing an
__m64
data type variable
Additional information on the x87 floating-point programming model
can be found in the IA-32 Intel® Architecture Software Developer’s
Manual, Volume 1. For more documentation on
emms
, visit
Example 4-1
Resetting the Register between __m64 and FP Data Types
Incorrect Usage
Correct Usage
__m64 x = _m_paddd(y, z);
__m64 x = _m_paddd(y, z);
float f = init();
float f = (_mm_empty(), init());