Intel ARCHITECTURE IA-32 User Manual
Page 11

xi
Key Practices of System Bus Optimization .................................................................... 7-17
Key Practices of Memory Optimization .......................................................................... 7-17
Key Practices of Front-end Optimization ........................................................................ 7-18
Key Practices of Execution Resource Optimization ....................................................... 7-18
Generality and Performance Impact............................................................................... 7-19
Choice of Synchronization Primitives ............................................................................. 7-20
Synchronization for Short Periods .................................................................................. 7-22
Optimization with Spin-Locks ......................................................................................... 7-25
Synchronization for Longer Periods ............................................................................... 7-26
Prevent Sharing of Modified Data and False-Sharing .................................................... 7-30
Placement of Shared Synchronization Variable ............................................................. 7-31
Conserve Bus Bandwidth ............................................................................................... 7-34
Understand the Bus and Cache Interactions.................................................................. 7-35
Avoid Excessive Software Prefetches ............................................................................ 7-36
Improve Effective Latency of Cache Misses................................................................... 7-36
Use Full Write Transactions to Achieve Higher Data Rate ............................................. 7-37
Cache Blocking Technique ............................................................................................. 7-38
Shared-Memory Optimization......................................................................................... 7-39
Minimize Sharing of Data between Physical Processors.......................................... 7-39
Batched Producer-Consumer Model ........................................................................ 7-40
Eliminate 64-KByte Aliased Data Accesses ................................................................... 7-42
Preventing Excessive Evictions in First-Level Data Cache ............................................ 7-43
Per-thread Stack Offset ............................................................................................ 7-44
Per-instance Stack Offset ......................................................................................... 7-46
Avoid Excessive Loop Unrolling ..................................................................................... 7-48
Optimization for Code Size............................................................................................. 7-49
Use Legacy 32-Bit Instructions When The Data Size Is 32 Bits....................................... 8-1
Use Extra Registers to Reduce Register Pressure .......................................................... 8-2
Use 64-Bit by 64-Bit Multiplies That Produce 128-Bit Results Only When Necessary..... 8-2