Use of the inc and dec instructions, Use of the shift and rotate instructions, Flag register accesses – Intel ARCHITECTURE IA-32 User Manual
Page 147
General Optimization Guidelines
2
2-75
Use of the inc and dec Instructions
The
inc
and
dec
instructions modify only a subset of the bits in the flag
register. This creates a dependence on all previous writes of the flag
register. This is especially problematic when these instructions are on
the critical path because they are used to change an address for a load on
which many other instructions depend.
Assembly/Compiler Coding Rule 42. (M impact, H generality)
inc
and
dec
instructions should be replaced with an
add
or
sub
instruction, because
add
and
sub
overwrite all flags, whereas inc and dec do not, therefore
creating false dependencies on earlier instructions that set the flags.
Use of the shift and rotate Instructions
The
shift and rotate
instructions have a longer latency on Pentium 4
processor with CPUID signature corresponding to family 15 and model
encoding of 0, 1 or 2. The latency of a sequence of
add
s will be shorter
for left shifts of three or less. Fixed and variable shifts have the same
latency.
The
rotate
by immediate and
rotate
by register instructions are more
expensive than a
shift
. The
rotate
by 1 instruction has the same
latency as a
shift
.
Assembly/Compiler Coding Rule 43. (ML impact, L generality) Avoid
rotate
by register or
rotate
by immediate instructions. If possible, replace
with a
rotate
by 1 instruction.
Flag Register Accesses
A ‘partial flag register stall’ happens when an instruction modifies a part
of the flag register and the following instruction is dependent on the
outcome of the flags. This happens most often with shift instructions
(sar, sal, shr, shl). Although the flags are not modified in the case of zero
shift count, but the shift count is usually known only at execution time.
Therefore, the front-end stalls until the instruction is retired. Other
instructions that can modify some part of the flag register include