beautypg.com

Pipeline operation, 2 pipeline operation – Texas Instruments TMS320C2XX User Manual

Page 106

background image

Pipeline Operation

5-7

Program Control

5.2

Pipeline Operation

Instruction pipelining consists of a sequence of bus operations that occur dur-
ing the execution of an instruction. The ’C2xx pipeline has four independent
stages: instruction-fetch, instruction-decode, operand-fetch, and instruction-
execute. Because the four stages are independent, these operations can
overlap. During any given cycle, one to four different instructions can be active,
each at a different stage of completion. Figure 5–4 shows the operation of the
4-level-deep pipeline for single-word, single-cycle instructions executing with
no wait states.

The pipeline is essentially invisible to you except in the following cases:

-

A single-word, single-cycle instruction immediately following a modifica-
tion of the global-memory allocation register (GREG) uses the previous
global map.

-

The NORM instruction modifies the auxiliary register pointer (ARP) and
uses the current auxiliary register (the one pointed to by the ARP) during
the execute phase of the pipeline. If the next two instruction words change
the values in the current auxiliary register or the ARP, they will do so during
the instruction decode phase of the pipeline (before the execution of
NORM). This would cause NORM to use the wrong auxiliary register value
and the following instructions to use the wrong ARP value.

Figure 5–4. 4-Level Pipeline Operation

N – 2

N – 3

N – 2

N – 1

N – 1

N

N

N + 1

N + 1

N + 2

N

N – 1

N + 3

N + 2

N + 1

N

Execute

Operand

Decode

Fetch

CLKOUT1

The CPU is implemented using 2-phase static logic. The 2-phase operation
of the ’C2xx CPU consists of a master phase in which all commutation logic
is executed, and a slave phase in which results are latched. Therefore, se-
quential operations require sequential master cycles. Although sequential op-
erations require a deeper pipeline, 2-phase operation provides more time for
the computational logic to execute. This allows the ’C2xx to run at faster clock
rates despite having a deeper pipeline that imposes a penalty on branches and
subroutine calls.