1 load-load order trap, 2 store-load order trap, 2 other mbox replay traps – Compaq 21264 User Manual
Page 60: 12 i/o write buffer and the wmb instruction, 1 memory barrier (mb/wmb/tb fill flow), Load-load order trap, Store-load order trap, Other mbox replay traps, I/o write buffer and the wmb instruction, Memory barrier (mb/wmb/tb fill flow)
2–32
Internal Architecture
Alpha 21264/EV67 Hardware Reference Manual
I/O Write Buffer and the WMB Instruction
2.11.1.1 Load-Load Order Trap
The Mbox ensures that load instructions that read the same physical byte(s) ultimately
issue in correct order by using the load-load order trap. The Mbox compares the
address of each load instruction, as it is issued, to the address of all load instructions in
the load queue. If the Mbox finds a newer load instruction in the load queue, it invokes
a load-load order trap on the newer instruction. This is a replay trap that aborts the tar-
get of the trap and all newer instructions from the machine and refetches instructions
starting at the target of the trap.
2.11.1.2 Store-Load Order Trap
The Mbox ensures that a load instruction ultimately issues after an older store instruc-
tion that writes some portion of its memory operand by using the store-load order trap.
The Mbox compares the address of each store instruction, as it is issued, to the address
of all load instructions in the load queue. If the Mbox finds a newer load instruction in
the load queue, it invokes a store-load order trap on the load instruction. This is a replay
trap. It functions like the load-load order trap.
The Ibox contains extra hardware to reduce the frequency of the store-load trap. There
is a 1-bit by 1024-entry VPC-indexed table in the Ibox called the stWait table. When an
Icache instruction is fetched, the associated stWait table entry is fetched along with the
Icache instruction. The stWait table produces 1 bit for each instruction accessed from
the Icache. When a load instruction gets a store-load order replay trap, its associated bit
in the stWait table is set during the cycle that the load is refetched. Hence, the trapping
load instruction’s stWait bit will be set the next time it is fetched.
The IQ will not issue load instructions whose stWait bit is set while there are older unis-
sued store instructions in the queue. A load instruction whose stWait bit is set can be
issued the cycle immediately after the last older store instruction is issued from the
queue. All the bits in the stWait table are unconditionally cleared every 16384 cycles, or
every 65536 cycles if I_CTL[ST_WAIT_64K] is set.
2.11.2 Other Mbox Replay Traps
The Mbox also uses replay traps to control the flow of the load queue and store queue,
and to ensure that there are never multiple outstanding misses to different physical
addresses that map to the same Dcache or Bcache line. Unlike the order traps, however,
these replay traps are invoked on the incoming instruction that triggered the condition.
2.12 I/O Write Buffer and the WMB Instruction
The I/O write buffer (IOWB) consists of four 64-byte entries with the associated
address and control logic used to buffer I/O write data between the store queue (SQ)
and the system port.
2.12.1 Memory Barrier (MB/WMB/TB Fill Flow)
The Cbox CSR SYSBUS_MB_ENABLE bit determines if MB instructions produce
external system port transactions. When the SYSBUS_MB_ENABLE bit equals 0, the
Cbox CSR MB_CNT[3:0] field contains the number of pending uncommitted transac-
tions. The counter will increment for each of the following commands:
•
RdBlk, RdBlkMod, RdBlkI