5 optimizing for simd floating-point applications, General rules for simd floating-point code, Chapter 5 – Intel ARCHITECTURE IA-32 User Manual
Page 263: Optimizing for simd floating-point applications, General rules for simd floating-point code -1

5-1
5
Optimizing for SIMD
Floating-point Applications
This chapter discusses general rules of optimizing for the
single-instruction, multiple-data (SIMD) floating-point instructions
available in Streaming SIMD Extensions (SSE), Streaming SIMD
Extensions 2 (SSE2)and Streaming SIMD Extensions 3 (SSE3). This
chapter also provides examples that illustrate the optimization
techniques for single-precision and double-precision SIMD
floating-point applications.
General Rules for SIMD Floating-point Code
The rules and suggestions listed in this section help optimize
floating-point code containing SIMD floating-point instructions.
Generally, it is important to understand and balance port utilization to
create efficient SIMD floating-point code. The basic rules and
suggestions include the following:
•
Follow all guidelines in Chapter 2 and Chapter 3.
•
Exceptions: mask exceptions to achieve higher performance. When
exceptions are unmasked, software performance is slower.
•
Utilize the flush-to-zero and denormals-are-zero modes for higher
performance to avoid the penalty of dealing with denormals and
underflows.
•
Incorporate the prefetch instruction where appropriate (for details,
refer to Chapter 6, “Optimizing Cache Usage”).
•
Use MMX technology instructions and registers if the computations
can be done in SIMD integer for shuffling data.