beautypg.com

3 profileme counting mode description, 1 cycle counting, 2 inum retire delay cycles – Compaq 21264 User Manual

Page 207: 3 retired instructions cycles, 4 bcache miss or long latency probes cycles, 5 mbox replay traps cycles, Profileme counting mode description, Cycle counting, Inum retire delay cycles, Retired instructions cycles

background image

Alpha 21264/EV67 Hardware Reference Manual

Privileged Architecture Library Code

6–23

Performance Counter Support

6.10.3.3 ProfileMe Counting Mode Description

6.10.3.3.1 Cycle counting

In ProfileMe mode, either counter counts cycles during the window of the profiled
instruction.

6.10.3.3.2 Inum retire delay cycles

This input is used to measure a lower bound on the inum retire delay of the profiled
instruction. The maximum final value of PCTR1 is the length of the ProfileMe window
minus 2.

Counts cycles that a profiled instruction delayed the retire pointer advance during the
ProfileMe window. The 21264/EV67 tracks instructions in the pipeline by allocating
them "inums" near the front of the pipeline. All inums are retired in the order in which
they were allocated at the end of the pipeline.

Inums are allocated in batches of four, so there may be more inums allocated than there
are program instructions in flight. Every inum is retired in order, including those for
aborted instructions.

The "retire pointer" points to the next inum to be retired. An inum retires in the cycle
that the retire pointer advances past the inum.

Let X and Y be consecutive inums in the allocation order. The "inum retire delay" of Y
is [(cycle in which Y retired) – (cycle in which X retired)]. A large inum retire delay
indicates a possible performance bottleneck (for example, an instruction stalled on a
data cache miss).

6.10.3.3.3 Retired instructions cycles

When counting retired instructions in ProfileMe mode, the final count in PCTR0 may
include instructions that retired before the ProfileMe window and may exclude instruc-
tions that retired near the end of the ProfileMe window. These discrepancies are caused
by a variable delay between the time that an instruction retires and the time that PCTR0
is incremented for that retire. This discrepancy is in the range of plus or minus 4 retired
instructions.

6.10.3.3.4 Bcache miss or long latency probes cycles

This input counts the number of times the Bcache result was a miss.

E

ssentially, a long latency probe is a data request from other processes that cause

Bcache misses in a system.

This count is phase shifted three cycles early and thus includes events that occurred
three cycles before the start and before the end of the ProfileMe window.

6.10.3.3.5 Mbox replay traps cycles

This input counts Mbox replay traps.

PCTR1 is enabled to count Mbox replay traps that occur during a window that is the
ProfileMe window phase-shifted one cycle later. The first replay trap counted would be
the 7th cycle after the instruction is fetched.