Fc100 - floating point fast fourier transform, Radix-32 vs radix 2 – Sundance FC100 v.2.3 User Manual
Page 9
FC100 - Floating Point Fast Fourier Transform
v2.3
Fast Fourier Transform product manual
October 2005
www.sundance.com
- 9 -
Memory latency
The FFT core generates the addresses for twiddles factors, data input and data output.
The memory latency is calculated as the number of clock cycles it takes between the
address is valid on the core address bus and the twiddle factors or data are available at the
input of the FFT core. This latency can be up to 15 clock cycles. The FFT core expects
the latency to be the same for the twiddle factors and the data input and to remain the
same during the transform computation. This latency is automatically calculated inside
the FFT core by monitoring the tw_din_valid signal (driven high by the user few clock
cycles after tw_din_addr_valid goes high).
Radix-32 vs Radix 2
Sundance’s radix-32 butterfly architecture allows the core to be connected to much less
memory for the same processing performances than designs with radix-2 butterflies
implemented in parallel. The following table shows how much memory is required to
perform an FFT in both configurations.
FFT length
radix-32 memory required
(in Mbytes)
radix-2 memory required
(in Mbytes)
256 0.02
0.08
512 0.04
0.18
1024 0.08
0.39
2048 0.23
0.86
4096 0.47
1.88
8192 0.94
4.06
16384 1.88
8.75
32768 3.75
18.75
65536 10.00
40.00
131072 20.00
85.00
262144 40.00
180.00
524288 80.00
380.00
1048576 160.00
800.00
Table 5: Radix-32 vs Radix-2 memory usage
Data throughput=maximum data throughput as shown in Table 7
Using a radix-32 architecture substantially reduces the number of memory resources
required. The main benefit is seen at the system level. A single-width PMC module used
to perform long transforms with Sundance’s FFT core, achieves the same level of
processing performances as a radix-2 implementation spread over two 6U CompactPCI
boards bundled with multiple FPGAs and memory devices.