beautypg.com

1 qpi fatal error and fatal error #2 – next steps, 5 processor err2 timeout sensor, Processor err2 timeout sensor – Kontron S4600 SEL Troubleshooting User Manual

Page 77: Qpi fatal error and fatal error #2, Next steps

background image

System Event Log Troubleshooting Guide for EPSD

Platforms Based on Intel

®

Xeon

®

Processor E5 4600/2600/2400/1600/1400 Product Families

Processor Subsystem

Revision 1.1

Intel order number G90620-002

67

Byte

Field

Description

[5:4]

– 00b = Unspecified Event Data 3

[3:0]

– Event Trigger Offset

0h = Illegal inbound request

1h = IIO Write Cache Uncorrectable Data ECC Error

2h = IIO CSR crossing 32-bit boundary Error

3h = IIO Received XPF physical/logical redirect interrupt inbound

4h = IIO Illegal SAD or Illegal or non-existent address or memory

5h = IIO Write Cache Coherency Violation

15

Event Data 2

0-3 = CPU1-4

16

Event Data 3

Not used

6.4.3.1

QPI Fatal Error and Fatal Error #2 – Next Steps

This is an Informational event only. Correctable errors are acceptable and normal at a low rate of occurrence. If the error continues:

1. Check the processor is installed correctly.
2. Inspect the socket for bent pins.
3. Cross test the processor. If the issue remains with the processor socket, replace the main board, otherwise the processor.

6.5 Processor ERR2 Timeout Sensor

The BMC supports an ERR2 Timeout Sensor (1 per CPU) that asserts

if a CPU’s ERR2 signal has been asserted for longer than a

fixed time period (> 90 seconds). ERR[2] is a processor signal that indicates when the IIO (Integrated IO module in the processor)
has a fatal error which could not be communicated to the core to trigger SMI. ERR[2] events are fatal error conditions, where the
BIOS and OS will attempt to gracefully handle error, but may not always do so reliably. A continuously asserted ERR2 signal is an
indication that the BIOS cannot service the condition that caused the error. This is usually because that condition prevents the BIOS
from running.

When an ERR2 timeout occurs, the BMC asserts/deasserts the ERR2 Timeout Sensor, and logs a SEL event for that sensor. The
default behavior for BMC core firmware is to initiate a system reset upon detection of an ERR2 timeout. The BIOS setup utility
provides an option to disable or enable system reset by the BMC on detection of this condition.

This manual is related to the following products: