beautypg.com

Motorola DSP96002 User Manual

Page 630

background image

MOTOROLA

DSP96002 USER’S MANUAL

B-111

move d2.s,x:(r4)+ ;save lower 2, point to next

_bfly

move x:(r0)+n0,d0.s y:(r4)+n4,d1.s ;adjust r0,r4

_grp

lsr d6 d6.l,n0 ;bflys/2, make old value new offset

lsl d7 n0,n4 ;ngroups*2, move new offset

lea (r0)+n0,r4 ;new lower leg pointer

_stage

move #3,n0 ;offset between 2 butterflies-1

move n0,n4 ;same

move (r4)+ ;point r4 to second bfly

do #n/4,_laststage ;do last stage, 2 bflys at a time

move x:(r0)+,d0.s ;get upper of bfly 1

move x:(r0)-,d1.s ;get lower of bfly 1, point to upper

faddsub.s d0,d1 x:(r4)+,d2.s ;get upper of bfly 2

move x:(r4)-,d3.s ;get lower of bfly 1, point to upper

faddsub.s d2,d3 d1.s,x:(r0)+ ;save upper 1

move d0.s,x:(r0)+n0 ;save lower 1, point to next group

move d3.s,x:(r4)+ ;save upper 2

move d2.s,x:(r4)+n4 ;save lower 2, point to next group

_laststage

end

B.1.45.2 Out-of-place WHT

Since the WHT requires 2 loads and 2 stores per butterfly, the maximum throughput for a WHT butterfly is

4 cycles. However, if the data is split between two memories, then the 2 loads and 2 stores can be per-

formed in 2 cycles. Thus, it is possible to execute each butterfly in 2 cycles. This implementation takes the

input data in a single memory space and on the first stage of the transform, splits the data into X and Y

memory. The middle stages then perform 4 WHT butterflies in 8 cycles. The last stage is split out and also

performs 4 WHT butterflies in 8 cycles. Thus, except for the first stage, all WHT butterflies are performed

in 2 cycles.

In this example, a 16 point transform is performed. The input data are in X:0-f and the output is split be-

tween X and Y memory. The first 8 output values are at x:0-7 and the next 8 output values are at y:0-7 in

bit reversed order starting at x:0. To increase execution speed, an extra block of memory is used at y:0-7.

Thus, with this algorithm, an extra block of memory is required in Y memory equal to one-half of the trans-

form data size in X memory.

If both X and Y memory are on the same port (A or B), then all X and Y memory references are performed

on the same port. Thus, the WHT butterfly executes in 4 cycles. This gives an execution speed of 1.64

milliseconds at 13.5 MIPS. However, if X memory is on port A and Y memory is on port B, then the memory

bandwidth is doubled and an X memory access and Y memory access can occur in a single cycle. This

gives an execution speed of 0.939 milliseconds at 13.5 MIPS.