Motorola DSP96002 User Manual
Page 533
B-14
DSP96002 USER’S MANUAL
MOTOROLA
;
; Faster FFT using Programming Tricks found in Typical FORTRAN Libraries
;
; First two passes combined as a four butterfly loop since
; multiplies are trivial.
; 2.25 cycles internal (4 cycles external) per Radix 2
; butterfly.
; Middle passes performed with traditional, triple-nested DO loop.
; 4 cycles internal (8 cycles external) per Radix 2 butterfly
; plus overhead. Note that a new pipelining technique is
; being used to minimize overhead.
; Next to last pass performed with double butterfly loop.
; 4.5 cycles internal (8.5 cycles external) per Radix 2
; butterfly.
; Last pass has separate single butterfly loop.
; 5 cycles internal (9 cycles external) per Radix 2
; butterfly.
;
; For 1024 complex points, average Radix 2 butterfly = 3.8 cycles
; internal and 7.35 cycles external, assuming a single external
; data bus.
;
; Because of separate passes, minimum of 32 points using these
; optimizations. Approximately 150 program words required.
; Uses internal X and Y Data ROMs for twiddle factor coefficients
; for any size FFT up to 1024 complex points.
;
; Assuming internal program and internal data memory (or two
; external data buses), 1024 point complex FFT is 1.57 msec at
; 75 nsec instruction rate. Assuming internal program and
; external data memory, 1024 point complex FFT is 2.94 msec
; at 75 nsec instruction rate.
;
; First two passes
;
; 9 cycles internal, 1.77X faster than 4 cycle Radix 2 bfy
; 16 cycles external, 2.0X faster than 4 cycle Radix 2 bfy
;
; r0 = a pointer in and out
; r6 = a pointer in