Example 5-9, Horizontal add using movhlps/movlhps -19, Figure 5-3 – Intel ARCHITECTURE IA-32 User Manual
Page 281: Figure 5-3 schematically p, Lhps, while example 5-9

Optimizing for SIMD Floating-point Applications
5
5-19
Figure 5-3
Horizontal Add Using movhlps/movlhps
Example 5-9
Horizontal Add Using movhlps/movlhps
void horiz_add(Vertex_soa *in, float *out) {
__asm {
mov ecx, in
// load structure addresses
mov edx, out
movaps xmm0, [ecx]
// load A1 A2 A3 A4 => xmm0
movaps xmm1, [ecx+16]
// load B1 B2 B3 B4 => xmm1
movaps xmm2, [ecx+32]
// load C1 C2 C3 C4 => xmm2
movaps xmm3, [ecx+48]
// load D1 D2 D3 D4 => xmm3
continued
A1+A2+A3+A4
B1+B2+B3+B4
C1+C2+C3+C4
D1+D2+D3+D4
A1+A3
B1+B3
C1+C3
D1+D3
A2+A4
B2+B4
C2+C4
D2+D4
A1+A3
A2+A4
B1+B3
B2+B4
C1+C3
C2+C4
D1+D3
D2+D4
A1
A2
A3
A4
B1
B2
B3
B4
C1
C2
C3
C4
D1
D2
D3
D4
A1
A2
B1
B2
A3
A4
B3
B4
C1
C2
D1
D2
C3
C4
D3
D4
ADDPS
SHUFPS
SHUFPS
ADDPS
ADDPS
M O VLHPS
M O VLHPS
xm m 0
xm m 2
M O VHLPS
M O VHLPS
xm m 1
xm m 3