A.2.1, A.2.2 – AMD ATHLON 64 User Manual
Page 40

40
Appendix A
40555
Rev. 3.00
June 2006
Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™ 
ccNUMA Multiprocessor Systems
Likewise packets to be transmitted from the MCT to the XBar are queued in the “MCT-to-XBar” 
buffers. The buffers in the SRI, XBar and MCT can be viewed as staggered queues on the various 
units.
A.2
Why Is the Crossfire Case Slower Than the
No Crossfire Case on an Idle System?
The following analysis highlights some of the important characteristics of the underlying resources 
that come into play when there is crossfire versus no crossfire.
A.2.1
What Resources Are Used When a Single Read-Only or 
Write-Only Thread Accesses Remote Data?
When a thread running on node 0 reads data from node 1, on an otherwise idle system, there is traffic 
on both the incoming and outgoing links.
When a node makes a read memory request from a memory controller, it first sends a request for the 
memory to that memory controller, which can be local or remote. That memory controller then sends 
probes to all other nodes in the system to see if they have the memory in their cache. Once it receives 
the response from the nodes, it sends a response to the requesting node. Finally it also sends the read 
data to the requesting node.
When a thread running on node 0 reads data from node 1, it sees non-data traffic (loaded at 
752 MB/s) on the outgoing link and both data and non-data traffic on the incoming link (2.2 GB/s). 
There is also some non-data traffic on the coherent HyperTransport links that connect nodes other 
than nodes 0 and 1 because of the probes and the responses.
When a thread running on node 0 writes data to node 1, it sees as much data traffic on the incoming 
link as it does on the outgoing link (incoming and outgoing link each at 2.2 GB/s). In this synthetic 
test case, there are several successive writes happening to successive cache line elements of a 64MB 
array. These result in steady state condition of a cache line eviction or write back for each write 
access. Each write access from node 0 to node 1 triggers a data read from node 1 and then a data write 
to node 1.
A.2.2
What Resources Are Used When Two Write-only Threads Fire at 
Each Other (Crossfire) on an Idle System?
Assuming the coherent HyperTransport links between node 0 and node 1 have infinite throughput 
capacity, it is expected that, when the write-only threads fire at each other, the throughput on each of 
these links would be twice that observed when a single write-only thread running on node 0 is writing 
to node 1, i.e., 2*(2.2 GB/s).
The theoretical maximum HyperTransport bandwidth of each coherent HyperTransport link between 
node 0 and node 1 is at 4 GB/s. Hence we can not expect the HyperTransport bandwidth to reach the 
