Performance comparisons of memory copy routines – Intel ARCHITECTURE IA-32 User Manual
Page 342
IA-32 Intel® Architecture Optimization
6-52
Performance Comparisons of Memory Copy Routines
The throughput of a large-region, memory copy routine depends on
several factors:
•
coding techniques that implements the memory copy task
•
characteristics of the system bus (speed, peak bandwidth, overhead
in read/write transaction protocols)
•
microarchitecture of the processor
A comparison of the two coding techniques discussed above and two
un-optimized techniques is shown in Table 6-2.
add esi,ecx
add edi,ecx
sub edx,ecx
jnz main_loop
sfence
}
}
Table 6-2
Relative Performance of Memory Copy Routines
Processor, CPUID
Signature and
FSB Speed
Byte
Sequential
DWORD
Sequential
SW prefetch +
8 byte
streaming
store
4KB-Block
HW prefetch
+ 16 byte
streaming
stores
Pentium M processor,
0x6Dn, 400
1.3X
1.2X
1.6X
2.5X
Intel Core Solo and
Intel Core Duo
processors, 0x6En,
667
3.3X
3.5X
2.1X
4.7X
Pentium D processor,
0xF4n, 800
3.4X
3.3X
4.9X
5.7X