Table 14-26 – ARM Cortex R4F User Manual
Page 396

Cycle Timings and Interlock Behavior
ARM DDI 0363E
Copyright © 2009 ARM Limited. All rights reserved.
14-32
ID013010
Non-Confidential, Unrestricted Access
14.21 Floating-point single-precision data processing instructions
This section describes the cycle timing behavior for all single-precision VFP
CDP
instructions.
This includes arithmetic instructions such as
VMUL.F32
, data and immediate moving instructions
such as
“VMOV.F32
,
VABS.F32
,
VNEG.F32
, and
“VMOV
, and comparison
instructions and conversion instructions.
Table 14-26 shows the floating-point single-precision data processing instructions cycle timing
behavior.
Table 14-26 Floating-point single-precision data processing instructions cycle timing
behavior
Example instruction
Cycles
Early Reg
Result latency
VMLA.F32
a
a. Also
VMLS.F32
,
VNMLS.F32
, and
VNMLA.F32
.
1
b
b.
VMLA.F32
completes out-of-order, and can take an extra cycle (two in total) if an add
instruction (
VADD
) or certain dual-issued instruction pairs are in the iss-stage when the
instruction completes.
,
5
c
c. Except when the instruction dependent on the result
is another
VMLA.F32
instruction, and the dependent operand is the accumulate operand,
. In this case, the
result latency is reduced to 3 cycles.
VADD.F32
d
d. Also
VSUB.F32
,
VMUL.F32
, and
VNMUL.F32
.
1
,
2
VDIV.F32
2
,
16
VSQRT.F32
2
16
VMOV.F32
1
-
1
VMOV.F32
e
e. Also
VABS.F32
and
VNEG.F32
.
1
-
1
VCMP.F32
f
f. Also
VCMPE.F32
.
1
,
-
VCMPE.F32
1
-
VCVT.F32.U32
g
g. Also
VCVT.F32.S32
.
1
2
VCVT.F32.U32
h
h. Also
VCVT.F32.U16
,
VCVT.F32.S32
, and
VCVT.F32.S16
.
1
2
VCVTR.U32.F32
i
i. Also
VCVT.U32.F32
,
VCVTR.S32.F32
, and
VCVT.S32.F32
.
1
2
VCVT.U32.F32
j
j. Also
VCVT.U16.F32
,
VCVT.S32.F32
, and
VCVT.S16.F32
.
1
2
VCVT.F64.F32
3
5