Cacheability control, Cacheability control -9 – Intel ARCHITECTURE IA-32 User Manual

Page 299

Optimizing Cache Usage

6-9

Currently, the

prefetch

instruction provides a greater performance gain

than preloading because it:

•

has no destination register, it only updates cache lines.

•

does not stall the normal instruction retirement.

•

does not affect the functional behavior of the program.

•

has no cache line split accesses.

•

does not cause exceptions except when

LOCK

prefix is used; the

LOCK

prefix is not a valid prefix for use with the

prefetch

instructions

and should not be used.

•

does not complete its own execution if that would cause a fault.

The current advantages of the prefetch over preloading instructions are
processor-specific. The nature and extent of the advantages may change
in the future.

In addition, there are cases where a prefetch instruction will not perform
the data prefetch. These include:

•

the

prefetch

causes a DTLB (Data Translation Lookaside Buffer)

miss. This applies to Pentium 4 processors with CPUID signature
corresponding to family 15, model 0, 1 or 2. The prefetch
instruction resolves a DTLB miss and fetches data on Pentium 4
processors with CPUID signature corresponding to family 15,
model 3.

•

an access to the specified address causes a fault/exception.

•

the memory subsystem runs out of request buffers between

the

first-level cache

and the second-level cache.

•

the

prefetch

targets an uncacheable memory region, for example,

USWC and UC.

•

LOCK

prefix is used. This causes an invalid opcode exception.

Cacheability Control

This section covers the mechanics of the cacheability control
instructions.