Eight double-precision floating-point units, Four velocity engine units – Apple Power Mac G5 (Late 2005) User Manual
Page 8
8
Technology Overview
Power Mac G5
Eight Double-Precision Floating-Point Units
The PowerPC G5 core contains two double-precision fl oating-point units, each capable
of performing a multiply and an add at the same time. This means a Power Mac G5
Quad, with four processor cores and a total of eight fl oating-point units, can complete
up to sixteen 64-bit fl oating-point operations in a single cycle.
Such immense 64-bit computational power accelerates applications in many fi elds,
including audio creation, 3D content creation, and scientifi c visualization and analysis—
resulting in performance levels far beyond those of previous Power Mac generations.
Fused multiply-add example
The fl oating-point units in the PowerPC G5 can complete both a multiply and an add
operation as part of the same machine instruction—accelerating matrix multiplication,
vector dot products, and other scientifi c computations. Referred to as fused multiply-
add, or “fmadd,” this instruction is considered a building block for data-intensive
fl oating-point computation.
The following computation can be completed by a fused multiply-add instruction in
one pass through either of the two fl oating-point units in a PowerPC G5 core:
T = (a * b) + c
On other processors, two instructions are required. The fi rst is a multiply instruction:
U = (a * b)
The product “U” is used by a second instruction, an addition, to complete the
computation:
V = U + c
In processors with comparable clock speeds, the computation of “(a * b) + c” is com-
pleted twice as fast using fused multiply-add. It also delivers a more accurate result,
because round-o∂ occurs just once in the computation of “T”—while on other proces-
sors, round-o∂ occurs twice: in the computation of “U” and in the computation of “V.”
Four Velocity Engine Units
A dual-pipelined Velocity Engine in each processor core is optimized with two inde-
pendent queues and dedicated 128-bit registers and data paths for e∑
cient instruction
and data fl ow. This 128-bit vector processing unit accelerates data manipulation by
applying a single instruction to multiple data at the same time, known as SIMD pro-
cessing. Originally implemented in the PowerPC G4, the Velocity Engine in the PowerPC
G5 uses the same set of 162 instructions, enabling it to accelerate existing Mac OS X
applications that have been optimized for the Velocity Engine.
Vector processing is useful for transforming large sets of data, such as manipulating an
image or rendering a video e∂ ect. For example, when a designer uses a fi lter to apply
a motion blur to an image, each pixel of the image must be changed according to
the same set of instructions—a highly repetitive processing task. Each Velocity Engine
pipeline speeds up this task by processing up to 128 bits of data, in four 32-bit integers,
eight 16-bit integers, sixteen 8-bit integers, or four 32-bit single-precision fl oating-point
values, in a single clock cycle. That works out to 16 simultaneous 32-bit fl oating-point
operations on a Power Mac G5 Quad.
Linpack
A measure of a computer’s fl oating-point
execution performance, the Linpack
benchmark solves a dense system of linear
equations. The Power Mac G5 Quad executed
the double-precision equations 88 percent
faster than the dual 2.7GHz Power Mac G5
and an amazing 626 percent faster than the
dual 1.42GHz Power Mac G4.
Gigafl ops
The gigafl ops test indicates a system’s
vector processing capability by measuring
the maximum number of fl oating-point
operations it can perform. With four Velocity
Engine units, the Power Mac G5 Quad
completed the test 85 percent faster than the
dual 2.7GHz Power Mac G5 and 260 percent
faster than the dual 1.42GHz Power Mac G4.
11.1 gigaflops
Power Mac G5
Quad 2.5GHz
21 gigaflops
Dual 2.7GHz
Power Mac G5
2.9 gigaflops
Dual 1.42GHz
Power Mac G4
41.1 gigaflops
Power Mac G5
Quad 2.5GHz
76.6 gigaflops
Dual 2.7GHz
Power Mac G5
21.3 gigaflops
Dual 1.42GHz
Power Mac G4