Superscalar architecture, Floating point unit, Dynamic branch prediction – HP Vectra 500 Series User Manual
Page 46: Instruction and data cache
46
2 System Board - (SiS Chipset) (Part Number: D4051-63001)
Devices on the Processor Local Bus (D4051-63001)
Superscalar Architecture
The Pentium processor’s superscalar architecture has two instruction
pipelines and a floating-point unit, each capable of independent operation.
The two pipelines allow the Pentium to execute two integer instructions in
parallel, in a single clock cycle. Using the pipelines halves the instruction
execution time and almost doubles the performance of the processor,
compared with an Intel486 microprocessor of the same frequency.
Frequently, the microprocessor can issue two instructions at once (one
instruction to each pipeline). This is called instruction pairing. Each
instruction must be simple. One pipeline will always receive the next
sequential instruction of the one issued to the other pipeline.
Floating Point Unit
The Floating Point Unit (FPU) incorporates optimized algorithms and
dedicated hardware for multiply, divide, and add functions. This increases
the processing speed of common operations by a factor of three.
Dynamic Branch Prediction
The Pentium processor uses dynamic branch prediction. To dynamically
predict instruction branches, the processor uses two prefetch buffers. One
buffer is used to prefetch code in a linear way, the other to prefetch code
depending on the contents of the Branch Target Buffer (BTB). The BTB is a
small cache which keeps a record of the last instruction and address used. It
uses this information to predict the way that the instruction will branch the
next time it is used. When it has made a correct prediction, the branch is
executed without delay, thereby enhancing performance.
Instruction and Data Cache
The Pentium processor has separate on-chip code instruction and data
caches. Each cache is 8 KB in size with a 32-bit line. The cache acts as
temporary storage for data and instructions from the main memory. As the
system is likely to use the same data several times, it is faster to get it from
the on-chip cache than from the main memory.
Each cache has a dedicated Translation Lookaside Buffer (TLB). The TLB is
a cache of the most recently accessed memory pages. The data cache is
configured to be Write-Back on a line-by-line basis (a line is an area of
memory of a fixed size).