Intel ARCHITECTURE IA-32 User Manual
Page 527

IA-32 Instruction Latency and Throughput
C
C-13
MOVLHPS
3
xmm, xmm
4
4
2
2
MMX_SHFT
MOVMSKPS r32, xmm
6
6
2
2
FP_MISC
MOVSS xmm, xmm
4
4
2
2
MMX_SHFT
MOVUPS xmm, xmm
6
6
1
1
FP_MOVE
MULPS xmm, xmm
7
6
4+1
2
2
2
FP_MUL
MULSS xmm, xmm
7
6
2
2
FP_MUL
ORPS
3
xmm, xmm
4
4
2
2
2
2
MMX_ALU
RCPPS
3
xmm, xmm
6
6
2
4
4
2
MMX_MISC
RCPSS
3
xmm, xmm
6
6
1
2
2
1
MMX_MISC,
MMX_SHFT
RSQRTPS
3
xmm, xmm
6
6
2
4
4
2
MMX_MISC
RSQRTSS
3
xmm, xmm
6
6
4
4
1
MMX_MISC,
MMX_SHFT
SHUFPS
3
xmm, xmm,
imm8
6
6
2
2
2
2
MMX_SHFT
SQRTPS xmm, xmm
40
39
29+28
40
39
58
FP_DIV
SQRTSS xmm, xmm
32
23
30
32
23
29
FP_DIV
SUBPS xmm, xmm
5
4
4
2
2
2
FP_ADD
SUBSS xmm, xmm
5
4
3
2
2
1
FP_ADD
UCOMISS xmm, xmm
7
6
1
2
2
1
FP_ADD,
FP_MISC
UNPCKHPS
3
xmm,
xmm
6
6
3
2
2
2
MMX_SHFT
UNPCKLPS
3
xmm,
xmm
4
4
3
2
2
2
MMX_SHFT
XORPS
3
xmm, xmm
4
4
2
2
2
2
MMX_ALU
FXRSTOR
150
FXSAVE
100
Table C-4
Streaming SIMD Extension Single-precision Floating-point
Instructions (continued)
Instruction
Latency
1
Throughput
Execution Unit
2