Example 4-1 – Intel ARCHITECTURE IA-32 User Manual

Page 225

Optimizing for SIMD Integer Applications

4-5

•

Don’t empty when already empty: If the next instruction uses an
MMX register,

_mm_empty()

incurs a cost with no benefit.

•

Group Instructions: Try to partition regions that use

x87 FP

instructions from those that use 64-bit SIMD integer instructions.
This eliminates needing an

emms

instruction within the body of a

critical loop.

•

Runtime initialization: Use

_mm_empty()

during runtime

initialization of

__m64

and

x87 FP

data types. This ensures

resetting the register between data type transitions. See Example 4-1
for coding usage.

Further, you must be aware that your code generates an MMX
instruction, which uses the MMX registers with the Intel C++ Compiler,
in the following situations:

•

when using a 64-bit SIMD integer intrinsic from MMX technology,
SSE, or SSE2

•

when using a 64-bit SIMD integer instruction from MMX
technology, SSE, or SSE2 through inline assembly

•

when referencing an

__m64

data type variable

Additional information on the x87 floating-point programming model
can be found in the IA-32 Intel® Architecture Software Developer’s
Manual, Volume 1. For more documentation on

emms

, visit

http://developer.intel.com

Example 4-1

Resetting the Register between __m64 and FP Data Types

Incorrect Usage

Correct Usage

__m64 x = _m_paddd(y, z);

float f = init();

float f = (_mm_empty(), init());