5 optimizing for simd floating-point applications, General rules for simd floating-point code, Chapter 5 – Intel ARCHITECTURE IA-32 User Manual

Page 263: Optimizing for simd floating-point applications, General rules for simd floating-point code -1

5-1

Optimizing for SIMD
Floating-point Applications

This chapter discusses general rules of optimizing for the
single-instruction, multiple-data (SIMD) floating-point instructions
available in Streaming SIMD Extensions (SSE), Streaming SIMD
Extensions 2 (SSE2)and Streaming SIMD Extensions 3 (SSE3). This
chapter also provides examples that illustrate the optimization
techniques for single-precision and double-precision SIMD
floating-point applications.

General Rules for SIMD Floating-point Code

The rules and suggestions listed in this section help optimize
floating-point code containing SIMD floating-point instructions.
Generally, it is important to understand and balance port utilization to
create efficient SIMD floating-point code. The basic rules and
suggestions include the following:

•

Follow all guidelines in Chapter 2 and Chapter 3.

•

Exceptions: mask exceptions to achieve higher performance. When
exceptions are unmasked, software performance is slower.

•

Utilize the flush-to-zero and denormals-are-zero modes for higher
performance to avoid the penalty of dealing with denormals and
underflows.

•

Incorporate the prefetch instruction where appropriate (for details,
refer to Chapter 6, “Optimizing Cache Usage”).

•

Use MMX technology instructions and registers if the computations
can be done in SIMD integer for shuffling data.