Cacheability control, Cacheability control -9 – Intel ARCHITECTURE IA-32 User Manual
Page 299

Optimizing Cache Usage
6
6-9
Currently, the
prefetch
instruction provides a greater performance gain
than preloading because it:
•
has no destination register, it only updates cache lines.
•
does not stall the normal instruction retirement.
•
does not affect the functional behavior of the program.
•
has no cache line split accesses.
•
does not cause exceptions except when
LOCK
prefix is used; the
LOCK
prefix is not a valid prefix for use with the
prefetch
instructions
and should not be used.
•
does not complete its own execution if that would cause a fault.
The current advantages of the prefetch over preloading instructions are
processor-specific. The nature and extent of the advantages may change
in the future.
In addition, there are cases where a prefetch instruction will not perform
the data prefetch. These include:
•
the
prefetch
causes a DTLB (Data Translation Lookaside Buffer)
miss. This applies to Pentium 4 processors with CPUID signature
corresponding to family 15, model 0, 1 or 2. The prefetch
instruction resolves a DTLB miss and fetches data on Pentium 4
processors with CPUID signature corresponding to family 15,
model 3.
•
an access to the specified address causes a fault/exception.
•
the memory subsystem runs out of request buffers between
the
first-level cache
and the second-level cache.
•
the
prefetch
targets an uncacheable memory region, for example,
USWC and UC.
•
a
LOCK
prefix is used. This causes an invalid opcode exception.
Cacheability Control
This section covers the mechanics of the cacheability control
instructions.