AMD ATHLON 64 User Manual
AMD Notebooks
Table of contents
Document Outline
- Contents
- List of Figures
- Revision History
- Chapter 1 Introduction
- Chapter 2 Experimental Setup
- Chapter 3 Analysis and Recommendations
- 3.1 Scheduling Threads
- 3.2 Data Locality Considerations
- Figure 4. Read-Only Thread Running on Node 0, Accessing Data from 0, 1 and 2 Hops Away on an Idle System
- Figure 5. Write-Only Thread Running on Node 0, Accessing Data from 0, 1 and 2 Hops Away on an Idle System
- 3.2.1 Keeping Data Local by Virtue of first Touch
- 3.2.2 Data Placement Techniques to Alleviate Unnecessary Data Sharing Between Nodes Due to First Touch
- 3.3 Avoid Cache Line Sharing
- 3.4 Common Hop Myths Debunked
- 3.4.1 Myth: All Equal Hop Cases Take Equal Time.
- Figure 6. Crossfire 1 Hop-1 Hop Case vs No Crossfire 1 Hop-1 Hop Case on an Idle System
- Figure 7. Crossfire 1 Hop-1 Hop Case vs No Crossfire 1 Hop-1 Hop Case under a Low Background Load (High Subscription)
- Figure 8. Crossfire 1 Hop-1 Hop Case vs No Crossfire 1 Hop-1 Hop Case under a Very High Background Load (High Subscription)
- Figure 9. Crossfire 1 Hop-1 Hop Case vs No Crossfire 1 Hop-1 Hop Case under a Very High Background Load (Full Subscription)
- 3.4.2 Myth: Greater Hop Distance Always Means Slower Time.
- Figure 10. Both Read-Only Threads Running on Node 0 (Different Cores) on an Idle System
- Figure 11. Both Write-Only Threads Running on Node 0 (Different Cores) on an Idle System
- Figure 12. Both Write-Only Threads Running on Node 0 (Different Cores) under Low Background Load (High Subscription)
- Figure 13. Both Write-Only Threads Running on Node 0 (Different Cores) under Medium Background Load (High Subscription)
- Figure 14. Both Write-Only Threads Running on Node 0 (Different Cores) under High Background Load (High Subscription)
- Figure 15. Both Write-Only Threads Running on Node 0 (Different Cores) under Very High Background Load (High Subscription)
- 3.4.1 Myth: All Equal Hop Cases Take Equal Time.
- 3.5 Locks
- 3.6 Parallelism Exposed by Compilers on AMD ccNUMA Multiprocessor Systems
- Chapter 4 Conclusions
- Appendix A
- A.1 Description of the Buffer Queues
- A.2 Why Is the Crossfire Case Slower Than the No Crossfire Case on an Idle System?
- A.2.1 What Resources Are Used When a Single Read-Only or Write-Only Thread Accesses Remote Data?
- A.2.2 What Resources Are Used When Two Write-only Threads Fire at Each Other (Crossfire) on an Idle System?
- A.2.3 What Role Do Buffers Play in the Throughput Observed?
- A.2.4 What Resources Are Used When Write-Only Threads Do Not Fire at Each Other (No Crossfire) on an Idle System?
- A.3 Why Is the No Crossfire Case Slower Than the Crossfire Case on a System under a Very High Background Load (Full Subscription)?
- A.4 Why Is 0 Hop-0 Hop Case Slower Than the 0 Hop-1 Hop Case on an Idle System for Write- Only Threads?
- A.5 Why Is 0 Hop-1 Hop Case Slower Than 0 Hop-0 Hop Case on a System under High Background Load (High Subscription) for Write- Only Threads?
- A.6 Support for a ccNUMA-Aware Scheduler for AMD64 ccNUMA Multiprocessor Systems
- A.7 Tools and APIs for Thread/Process and Memory Placement (Affinity) for AMD64 ccNUMA Multiprocessor Systems
- A.8 Tools and APIs for Node Interleaving in Various OSs for AMD64 ccNUMA Multiprocessor Systems