beautypg.com

Error reporting, 2 error reporting, Table 1. sal 3.0 mca records – Dell PowerEdge 7250 User Manual

Page 14

background image

SR870BN4 Machine Check Error Handling

SR870BN4 Error Reference Guide

Revision

1.0

8

There are two types of machine check events: local and global. A local MCA is when an
individual processor enters machine check. Some examples of local machine checks
include a Distributed Translation Lookaside Buffer (DTLB) data parity error, or when the
processor consumes data with an uncorrectable error.

A machine check is global when all processors enter machine check. On the
SR870BN4 platform, the method used to get all processors into machine check are the
BINIT# and BERR# signals. When a processor takes a local machine check, it may
escalate the error to a global machine check to transition other processors to a known
state and/or for error containment. For example, the processor may assert BINIT# in
response to a transaction time-out event.

The SR870BN4 platform does not assert BINIT#, only BERR#. BERR# is asserted for
platform fatal errors and when an uncorrectable error is detected on outbound data.

For more information on the SR870BN4 implementation of machine check error
handling, refer to the SR870BN4 SAL Error Handling Specification.

4.2 Error

Reporting

SR870BN4 machine check error handling allows enhanced error reporting of processor and
platform errors. These errors are prioritized and signaled to system hardware and software.
System software (PAL/SAL) provides well-defined APIs for application software to acquire
information about system errors in the form of standard data structures. These errors are
logged to non-volatile storage and/or made available for consumption by application software
during runtime. These errors are in the MCA records and they are based on the Itanium™
System Abstraction Layer Specification
Rev 3.0.

On the SR870BN4, based on the MCA records, system events related to Field Replaceable
Units (FRUs) are logged in the BMC SEL. Each MCA record results in the generation of one or
more corresponding BMC SEL events. In addition, an auxiliary log entry event will be logged
corresponding to each MCA record. The SEL messages are IPMI 1.5-compliant platform event
messages.

The following rules are applied to the translation of SAL 3.0 MCA records to IPMI 1.5-compliant
platform event messages :

Table 1. SAL 3.0 MCA Records

MCA SAL Record Section Type

SEL Event : Sensor Type

SEL event: Event Data Bytes

Processor

Processor IERR

SMBIOS Type 4 0-based index
Error Severity

PCI Bus PERR/SERR

Critical Interrupt
PERR
SERR

PCI Bus number

PCI Bus Other Errors

Critical Interrupt
Bus Correctable error
Bus Uncorrectable error

None