Intel IA-32 User Manual
Page 279
Vol. 3A 7-11
MULTIPLE-PROCESSOR MANAGEMENT
•
The initial operation counter (ECX) must be equal to or greater than 64.
•
Source and destination must not overlap by less than a cache line (64 bytes, Pentium 4 and
Intel Xeon processors; 32 bytes P6 family and Pentium processors).
•
The memory type for both source and destination addresses must be either WB or WC.
7.2.4
Strengthening or Weakening the Memory Ordering Model
The IA-32 architecture provides several mechanisms for strengthening or weakening the
memory ordering model to handle special programming situations. These mechanisms include:
•
The I/O instructions, locking instructions, the LOCK prefix, and serializing instructions
force stronger ordering on the processor.
•
The SFENCE instruction (introduced to the IA-32 architecture in the Pentium III
processor) and the LFENCE and MFENCE instructions (introduced in the Pentium 4 and
Intel Xeon processors) provide memory ordering and serialization capability for specific
types of memory operations.
•
The memory type range registers (MTRRs) can be used to strengthen or weaken memory
ordering for specific area of physical memory (see Section 10.11, “Memory Type Range
Registers (MTRRs)”). MTRRs are available only in the Pentium 4, Intel Xeon, and P6
family processors.
•
The page attribute table (PAT) can be used to strengthen memory ordering for a specific
page or group of pages (see Section 10.12, “Page Attribute Table (PAT)”). The PAT is
available only in the Pentium 4, Intel Xeon, and Pentium III processors.
These mechanisms can be used as follows.
Memory mapped devices and other I/O devices on the bus are often sensitive to the order of
writes to their I/O buffers. I/O instructions can be used to (the IN and OUT instructions) impose
strong write ordering on such accesses as follows. Prior to executing an I/O instruction, the
processor waits for all previous instructions in the program to complete and for all buffered
writes to drain to memory. Only instruction fetch and page tables walks can pass I/O instruc-
tions. Execution of subsequent instructions do not begin until the processor determines that the
I/O instruction has been completed.
Synchronization mechanisms in multiple-processor systems may depend upon a strong
memory-ordering model. Here, a program can use a locking instruction such as the XCHG
instruction or the LOCK prefix to insure that a read-modify-write operation on memory is
carried out atomically. Locking operations typically operate like I/O operations in that they wait
for all previous instructions to complete and for all buffered writes to drain to memory (see
Section 7.1.2, “Bus Locking”).