Memory accesses, Alignment, Memory accesses -29 – Intel ARCHITECTURE IA-32 User Manual
Page 101: Alignment -29

General Optimization Guidelines
2
2-29
Memory Accesses
This section discusses guidelines for optimizing code and data memory
accesses. The most important recommendations are:
•
align data, paying attention to data layout and stack alignment
•
enable store forwarding
•
place code and data on separate pages
•
enhance data locality
•
use prefetching and cacheability control instructions
•
enhance code locality and align branch targets
•
take advantage of write combining
Alignment and forwarding problems are among the most common
sources of large delays on the Pentium 4 processor.
Alignment
Alignment of data concerns all kinds of variables:
•
dynamically allocated
•
members of a data structure
•
global or local variables
•
parameters passed on the stack
Misaligned data access can incur significant performance penalties. This
is particularly true for cache line splits. The size of a cache line is
64 bytes in the Pentium 4, Intel Xeon, and Pentium M processors.
On the Pentium 4 processor, an access to data unaligned on 64-byte
boundary leads to two memory accesses and requires several µops to be
executed (instead of one). Accesses that span 64-byte boundaries are
likely to incur a large performance penalty, since they are executed near
retirement, and can incur stalls that are on the order of the depth of the
pipeline.