Intel ARCHITECTURE IA-32 User Manual
Page 293

Optimizing Cache Usage
6
6-3
•
Facilitate compiler optimization:
— Minimize use of global variables and pointers
— Minimize use of complex control flow
— Use the
const
modifier, avoid
register
modifier
— Choose data types carefully (see below) and avoid type casting.
•
Use cache blocking techniques (for example, strip mining):
— Improve cache hit rate by using cache blocking techniques such
as strip-mining (one dimensional arrays) or loop blocking (two
dimensional arrays)
— Explore using hardware prefetching mechanism if your data
access pattern has sufficient regularity to allow alternate
sequencing of data accesses (e.g., tiling) for improved spatial
locality; otherwise use
prefetchnta
.
•
Balance single-pass versus multi-pass execution:
— An algorithm can use single- or multi-pass execution defined as
follows: single-pass, or unlayered execution passes a single data
element through an entire computation pipeline. Multi-pass, or
layered execution performs a single stage of the pipeline on a
batch of data elements before passing the entire batch on to the
next stage.
— General guideline to minimize pollution: if your algorithm is
single-pass use
prefetchnta
; if your algorithm is multi-pass
use
prefetcht0
.
•
Resolve memory bank conflict issues:
— Minimize memory bank conflicts by applying array grouping to
group contiguously used data together or allocating data within
4 KB memory pages.
•
Resolve cache management issues:
— Minimize disturbance of temporal data held within the
processor’s caches by using streaming store instructions, as
appropriate.