Texas Instruments TMS320C64X User Manual
Page 148
DSP_fft16x16t
4-120
There is one slight break in the flow of packed processing. The real part of the
complex number is in the lower half, and the imaginary part is in the upper half.
The flow breaks for “xl0” and “xl1” because in this case the real part needs to
be combined with the imaginary part because of the multiplication by “j”. This
requires a packed quantity like “xl21xl20” to be rotated as “xl20xl21” so that
it can be combined using ADD2s and SUB2s. Hence, the natural version of C
code shown below is transformed using packed data processing as shown:
xl0 = x[2 * i0 ] − x[2 * i2 ];
xl1 = x[2 * i0 + 1] − x[2 * i2 + 1];
xl20 = x[2 * i1 ] − x[2 * i3 ];
xl21 = x[2 * i1 + 1] − x[2 * i3 + 1];
xt1 = xl0 + xl21;
yt2 = xl1 + xl20;
xt2 = xl0 − xl21;
yt1 = xl1 − xl20;
xl1_xl0 = _sub2(x21_x20, x21_x20)
xl21_xl20 = _sub2(x32_x22, x23_x22)
xl20_xl21 = _rotl(xl21_xl20, 16)
yt2_xt1 = _add2(xl1_xl0, xl20_xl21)
yt1_xt2 = _sub2(xl1_xl0, xl20_xl21)
Also notice that xt1, yt1 end up on separate words, these need to be packed
together to take advantage of the packed twiddle factors that have been
loaded. To achiev this, they are re-aligned as follows:
yt1_xt1 = _packhl2(yt1_xt2, yt2_xt1)
yt2_xt2 = _packhl2(yt2_xt1, yt1_xt2)
The packed words “yt1_xt1” allow the loaded “sc” twiddle factor to be used for
the complex multiplies. The real part of the complex multiply is implemented
using DOTP2. The imaginary part of the complex multiply is implemented
using DOTPN2 after the twiddle factors are swizzled within the half word.
(X + jY) ( C + j S) = (XC + YS) + j (YC − XS).
The actual twiddle factors for the FFT are cosine, − sine. The twiddle factors
stored in the table are cosine and sine, hence the sign of the ”sine” term is
comprehended during multiplication as shown above.
Benchmarks
Cycles
(10 * nx/8 + 19) * ceil[log
4
(nx) − 1] + (nx/8 + 2) * 7 + 28 + BC
where BC = N/8, the number of bank conflicts.
Codesize
1004 bytes