Examples, Figure 2 : fir optimization example, An253 – Cirrus Logic AN253 User Manual

Page 8

AN253

4. Examples

This example illustrates the process of optimizing code for the MaverickCrunch coprocessor at the archi-
tecture level. The following source code is part of a simple real-time implementation of an FIR filter (Figure
2). There is a data dependency in the accumulate portion of the basic source code (on the left). This data
dependency stalls the MaverickCrunch CDP pipeline for 11 cycles. To reduce this penalty, it is possible
to re-arrange the source code to use the ARM cycles in the shadow of the stall.

Figure 2: FIR Optimization Example

To accomplish this optimization (see Figure 2): 1.) Move the index initialization/update code behind the
cfmuld operation; 2.) Add index initialization code just above the loop construct. This optimization results

// Unoptimized FIR

// Optimized FIR

acc = 0;

add r0,pc,#0x80 ; #0x8544

cfldrd c4, [r0]

for (i = 0; i < n; i++){

// ** initialize array ptrs here **

mov r4,#0

mov r1, r5

b 0x8508 ; (fir + 0x6c)

mov r0, r7

acc += h[i] * z[i];

for (i = 0; i < n; i++){

add r1,r5,r4,lsl #3 // Ptr

mov r4,#0

cfldrd c0, [r1]

b 0x8508 ; (fir + 0x6c)

add r0,r7,r4,lsl #3 // Ptr

cfldrd c1, [r0]

acc += h[i] * z[i];

cfmuld c0, c0, c1 // <-- 11-cycles

cfldrd c0, [r1]

cfaddd c4, c4, c0 // <-- stalled

cfldrd c1, [r0]

cfmuld c0, c0, c1

add r4,r4,#1

cmp r4,r6

// ** increment ptrs here **

blt 0x84d8 ; (fir + 0x3c)

add r1,r5,r4,lsl #3

add r0,r7,r4,lsl #3

}

cfaddd c4, c4, c0

add r4,r4,#1

cmp r4,r6

blt 0x84d8 ; (fir + 0x3c)

}