Altera_fp_matrix_mult functional description, Altera_fp_matrix_mult functional description -2 – Altera Floating-Point User Manual
Page 37

Table 3-1: ALTERA_FP_MATRIX_MULT Resource Utilization and Performance for the Arria 10 and Stratix V
Devices
Family
Data
Format
Matrix A
Size
Matrix B
Size
Vector
Size
Memory
Blocks
ALMs
M20ks
DSP
Blocks
FMax
(MHz)
Latency
(cycles)
(1)
Arri
a 10
(10A
X06
6H2
F34I
2LP)
Single
8x8
8x8
8
2
979
12
8
409
131
16x16 16x16
8
2
1052
12
8
408
595
32x32 32x32
16
4
1579
25
16
373
2155
64x64 64x64
32
8
2677
49
32
379
8339
Strat
ix V
(5SG
XEA
7K2
F40
C2)
Single
8x8
8x8
8
2
2637
14
8
404
125
16x16 16x16
8
2
2868
15
8
367
588
32x32 32x32
16
4
5427
27
16
356
2146
64x64 64x64
32
8
10311 51
32
348
8328
ALTERA_FP_MATRIX_MULT Functional Description
The matrix multiplier in the ALTERA_FP_MATRIX_MULT IP core multiplies matrix A and matrix B to
generate the output matrix C.
The following figure shows the equation:
Figure 3-1: ALTERA_FP_MATRIX_MULT Equation
A . B
C =
The matrix A and B can be loaded when the ready signal on their respective interfaces are asserted. When
the input matrices are loaded, the core will start computing the output. Valid signal on the output
interface will be asserted to indicate valid output data. The input data may be loaded at any time the ready
signal is asserted even when the previously loaded data is still being computed.
(1)
Latency is the time take to compute a dot product and does not include the time taken to load the input
matrices
3-2
ALTERA_FP_MATRIX_MULT Functional Description
UG-01058
2014.12.19
Altera Corporation
ALTERA_FP_MATRIX_MULT IP Core