Renesas SH7641 User Manual
Page 95
Section 2 CPU
Rev. 4.00 Sep. 14, 2005 Page 45 of 982
REJ09B0023-0400
Table 2.5
Word Data Sign Extension
This LSI's CPU
Description
Example of Other CPU
MOV.W
@(disp,PC),R1
ADD R1,R0
........
.DATA.W H'1234
Sign-extended to 32 bits, R1
becomes H'00001234, and is then
operated on by the ADD instruction.
ADD.W #H'1234,R0
Note: Immediate data is referenced by @(disp,PC).
Load/Store Architecture: Basic operations are executed between registers. In operations
involving memory, data is first loaded into a register (load/store architecture). However, bit
manipulation instructions such as AND are executed directly on memory.
Delayed Branching: Unconditional branch instructions, etc., are executed as delayed branches.
With a delayed branch instruction, the branch is made after execution of the instruction (called the
slot instruction) immediately following the delayed branch instruction. This minimizes disruption
of the pipeline when a branch is made.
With a delayed branch, the actual branch operation occurs after execution of the slot instruction.
However, instruction execution for register updating, etc., excluding the branch operation, is
performed in delayed branch instruction
→ delay slot instruction order. For example, even though
the contents of the register holding the branch destination address are changed in the delay slot,
the branch destination address remains as the register contents prior to the change.
Table 2.6
Delayed Branch Instructions
This LSI's CPU
Description
Example of Other CPU
BRA TRGET
ADD R1,R0
ADD is executed before branch to
TRGET.
ADD.W R1,R0
BRA TRGET
Multiply/Multiply-and-Accumulate Operations: A 16
× 16 → 32 multiply operation is
executed in 1 to 2 states, and a 16
× 16 + 64 → 64 multiply-and-accumulate operation in 2 states.
A 32
× 32 → 64 multiply operation and a 32 × 32 + 64 → 64 multiply-and-accumulate operation
are each executed in 2 to 3 states.
T Bit: The result of a comparison is indicated by the T bit in the status register (SR), and a
conditional branch is performed according to whether the result is True or False. Processing speed
has been improved by keeping the number of instructions that modify the T bit to a minimum.