An253 – Cirrus Logic AN253 User Manual

Page 7

AN253

fmuls c3, c1, c2

fadds c0, c0, c3

Note: Please be aware that some sequences of MaverickCrunch instructions are not supported.

For an up-to-date list of these instruction sequences, please see the appropriate Errata
Sheet. The Errata Sheets are available at www.cirrus.com. Furthermore, a parsing tool is
available that can identify illegal sequences of MaverickCrunch instructions. This tool is
also available from Cirrus Logic.

Attempt to maximize the throughput of the ARM920T, and MaverickCrunch CDP pipelines by in-
terleaving independent ARM and CDP instructions. The maximum throughput of the CDP pipeline is
one CDP instruction every other ARM CLK. This is because the E1, E2, E3, and W pipeline stages of the
CDP pipeline take two ARM cycles to complete. Please note that MaverickCrunch must be operating in
asynchronous mode to realize this optimization.

Consider the following code sequence:

The above code is inefficient because MaverickCrunch Instructions A, B, and C will stall the ARM’s pipe-
line during their execution of the second cycle in the E1, E2, E3, and W stages. The following code inter-
leaves the MaverickCrunch and ARM instructions, which removes the stalls and maximizes the
throughput of both pipelines.

Utilize both the ARM and co-processor registers, and the data caching capabilities of the ARM
core to reduce the latency of fetching data. For example, FIR filters typically have many filter coeffi-
cients - more coefficients than there are available registers. Many costly memory accesses will be exe-
cuted to load the filter coefficients. One solution to this problem is to load and lock-in the filter coefficients
in a data cache at start-up. This will result in an increase in performance because of the reduction in pipe-
line stalling while waiting for data to transfer from (slower) external memory to the coprocessor.