Intel IA-32 User Manual
Page 278
7-10 Vol. 3A
MULTIPLE-PROCESSOR MANAGEMENT
7.2.3
Out-of-Order Stores For String Operations in Pentium 4,
Intel Xeon, and P6 Family Processors
The Pentium 4, Intel Xeon, and P6 family processors modify the processors operation during the
string store operations (initiated with the MOVS and STOS instructions) to maximize perfor-
mance. Once the “fast string” operations initial conditions are met (as described below), the
processor will essentially operate on, from an external perspective, the string in a cache line by
cache line mode. This results in the processor looping on issuing a cache-line read for the source
address and an invalidation on the external bus for the destination address, knowing that all
bytes in the destination cache line will be modified, for the length of the string. In this mode
interrupts will only be accepted by the processor on cache line boundaries. It is possible in this
mode that the destination line invalidations, and therefore stores, will be issued on the external
bus out of order.
Code dependent upon sequential store ordering should not use the string operations for the entire
data structure to be stored. Data and semaphores should be separated. Order dependent code
should use a discrete semaphore uniquely stored to after any string operations to allow correctly
ordered data to be seen by all processors.
Initial conditions for “fast string” operations:
•
EDI and ESI must be 8-byte aligned for the Pentium III processor. EDI must be 8-byte
aligned for the Pentium 4 processor.
•
String operation must be performed in ascending address order.
Figure 7-1. Example of Write Ordering in Multiple-Processor Systems
Processor #1
Processor #2
Processor #3
Write A.3
Write B.3
Write C.3
Write A.1
Write B.1
Write A.2
Write A.3
Write C.1
Write B.2
Write C.2
Write B.3
Write C.3
Order of Writes From Individual Processors
Write A.2
Write B.2
Write C.2
Write A.1
Write B.1
Write C.1
Writes from all
processors are
not guaranteed
to occur in a
particular order.
Each processor
is guaranteed to
perform writes in
program order.
Writes are in order
with respect to
individual processes.
Example of order of actual writes
from all processors to memory