AMD ATHLON 64 User Manual
Page 43

Appendix A
43
Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™
ccNUMA Multiprocessor Systems
40555
Rev. 3.00
June 2006
A.5
Why Is 0 Hop-1 Hop Case Slower Than
0 Hop-0 Hop Case on a System under High 
Background Load (High Subscription) for Write-
Only Threads?
When a 0 hop-0 hop scenario is subjected to a very high background load, the system sees the 
following traffic pattern, where each node gets memory requests from the threads as described:
•
Node 0: 2 foreground threads.
•
Node 1: 1 background thread.
•
Node 3: 1 background thread.
•
Node 2: 1 background thread.
In the 0 hop-1 hop case, the system sees the following traffic pattern:
•
Node 0: 1 foreground thread
•
Node 1: 1 foreground and 1 background threads.
•
Node 3: 1 background thread.
•
Node 2: 1 background thread.
The 0 hop-1 hop case suffers from a greater load imbalance than the 0 hop-0 hop case, with node 1 
suffering the worst effect of this imbalance.
Each of the background threads, as before, asks for data at a rate of 4GB/s and each of the foreground 
threads asks for data at a rate of 2.98 GB/s.
Data shows that there is a total memory access rate of 4.78 GB/s on node 1 and several buffer queues 
on node 1 are saturated and cannot absorb the data provided by the memory controller any faster.
A.6
Support for a ccNUMA-Aware Scheduler for 
AMD64 ccNUMA Multiprocessor Systems
Developers should ensure that the OS is properly configured to support ccNUMA. All versions of 
Microsoft
®
Windows
®
XP for AMD64 and Windows Server for AMD64 support ccNUMA without
any configuration changes. The 32-bit versions of Windows Server 2003, Enterprise Edition and 
Windows Server 2003, Datacenter Edition require the /PAE boot parameter to support ccNUMA. For 
64-bit Linux
®
