beautypg.com

Store buffers and memory ordering – Intel IA-32 User Manual

Page 629

background image

Vol. 3A 17-37

IA-32 ARCHITECTURE COMPATIBILITY

An exception to this behavior occurs when a stack access is data aligned, and the stack pointer
is pointing to the last aligned piece of data that size at the top of the stack (ESP is FFFFFFFCH).
When this data is popped, no segment limit violation occurs and the stack pointer will wrap
around to 0.

The address space of the P6 family, Pentium, and Intel486 processors may wraparound at
1 MByte in real-address mode. An external A20M# pin forces wraparound if enabled. On Intel
8086 processors, it is possible to specify addresses greater than 1 MByte. For example, with a
selector value FFFFH and an offset of FFFFH, the effective address would be 10FFEFH
(1 MByte plus 65519 bytes). The 8086 processor, which can form addresses up to 20 bits long,
truncates the uppermost bit, which “wraps” this address to FFEFH. However, the P6 family,
Pentium, and Intel486 processors do not truncate this bit if A20M# is not enabled.

If a stack operation wraps around the address limit, shutdown occurs. (The 8086 processor does
not have a shutdown mode or a limit.)

The behavior when executing near the limit of a 4-GByte selector (limit=0xFFFFFFFF) is
different between the Pentium Pro and the Pentium 4 family of processors. On the Pentium Pro,
instructions which cross the limit -- for example, a two byte instruction such as INC EAX that
is encoded as 0xFF 0xC0 starting exactly at the limit faults for a segment violation (a one byte
instruction at 0xFFFFFFFF does not cause an exception). Using the Pentium 4 microprocessor
family, neither of these situations causes a fault.

17.33. STORE BUFFERS AND MEMORY ORDERING

The Pentium 4, Intel Xeon, and P6 family processors provide a store buffer for temporary
storage of writes (stores) to memory (see Section 10.10, “Store Buffer”). Writes stored in the
store buffer(s) are always written to memory in program order, with the exception of “fast
string” store operations (see Section 7.2.3, “Out-of-Order Stores For String Operations in
Pentium 4, Intel Xeon, and P6 Family Processors”).

The Pentium processor has two store buffers, one corresponding to each of the pipelines. Writes
in these buffers are always written to memory in the order they were generated by the processor
core.

It should be noted that only memory writes are buffered and I/O writes are not. The Pentium 4,
Intel Xeon, P6 family, Pentium, and Intel486 processors do not synchronize the completion of
memory writes on the bus and instruction execution after a write. An I/O, locked, or serializing
instruction needs to be executed to synchronize writes with the next instruction (see Section 7.4,
“Serializing Instructions”).

The Pentium 4, Intel Xeon, and P6 family processors use processor ordering to maintain consis-
tency in the order that data is read (loaded) and written (stored) in a program and the order the
processor actually carries out the reads and writes. With this type of ordering, reads can be
carried out speculatively and in any order, reads can pass buffered writes, and writes to memory
are always carried out in program order. (See Section 7.2, “Memory Ordering,” for more infor-
mation about processor ordering.) The Pentium III processor introduced a new instruction to
serialize writes and make them globally visible. Memory ordering issues can arise between a
producer and a consumer of data. The SFENCE instruction provides a performance-efficient