Systolic delay register, Systolic delay register -7, Whereas the – Altera Integer Arithmetic IP User Manual
Page 108

Systolic Delay Register
In a systolic architecture, the input data is fed into a cascade of registers acting as a data buffer. Each
register delivers an input sample to a multiplier where it is multiplied by the respective coefficient. The
chain adder stores the gradually combined results from the multiplier and the previously registered result
from the
chainin[]
input port to form the final result. Each multiply-add element must be delayed by a
single cycle so that the results synchronize appropriately when added together. Each successive delay is
used to address both the coefficient memory and the data buffer of their respective multiply-add elements.
For example, a single delay for the second multiply add element, two delays for the third multiply-add
element, and so on.
Figure 9-7: Systolic Registers
x(t)
c(0)
c(1)
c(2)
y(t)
c(N-1)
Systolic registers
S
-1
S
-1
S
-1
S
-1
S
-1
S
-1
S
-1
S
-1
S
-1
S
-1
x(t) represents the results from a continuous stream of input samples and y(t) represents the summation
of a set of input samples, and in time, multiplied by their respective coefficients. Both the input and
output results flow from left to right. The c(0) to c(N-1) denotes the coefficients. The systolic delay
registers are denoted by S
-1
, whereas the
–1
represents a single clock delay. Systolic delay registers are
added at the inputs and outputs for pipelining in a way that ensures the results from the multiplier
operand and the accumulated sums stay in synch. This processing element is replicated to form a circuit
that computes the filtering function. This function is expressed in the following equation.
N represents the number of cycles of data that has entered into the accumulator, y(t) represents the output
at time t, A(t) represents the input at time t, and B(i) are the coefficients. The t and i in the equation
correspond to a particular instant in time, so to compute the output sample y(t) at time t, a group of input
samples at N different points in time, or A(n), A(n-1), A(n-2), … A(n-N+1) is required. The group of N
input samples are multiplied by N coefficients and summed together to form the final result y.
The systolic register architecture is available only for sum-of-2 and sum-of-4 modes.
The following figure shows the systolic delay register implementation of 2 multipliers.
UG-01063
2014.12.19
Systolic Delay Register
9-7
ALTMULT_ADD (Multiply-Adder)
Altera Corporation