1 qpi fatal error and fatal error #2 – next steps, 5 processor err2 timeout sensor, Processor err2 timeout sensor – Kontron S4600 SEL Troubleshooting User Manual
Page 77: Qpi fatal error and fatal error #2, Next steps

System Event Log Troubleshooting Guide for EPSD
Platforms Based on Intel
®
Xeon
®
Processor E5 4600/2600/2400/1600/1400 Product Families
Processor Subsystem
Revision 1.1
Intel order number G90620-002
67
Byte
Field
Description
[5:4]
– 00b = Unspecified Event Data 3
[3:0]
– Event Trigger Offset
0h = Illegal inbound request
1h = IIO Write Cache Uncorrectable Data ECC Error
2h = IIO CSR crossing 32-bit boundary Error
3h = IIO Received XPF physical/logical redirect interrupt inbound
4h = IIO Illegal SAD or Illegal or non-existent address or memory
5h = IIO Write Cache Coherency Violation
15
Event Data 2
0-3 = CPU1-4
16
Event Data 3
Not used
6.4.3.1
QPI Fatal Error and Fatal Error #2 – Next Steps
This is an Informational event only. Correctable errors are acceptable and normal at a low rate of occurrence. If the error continues:
1. Check the processor is installed correctly.
2. Inspect the socket for bent pins.
3. Cross test the processor. If the issue remains with the processor socket, replace the main board, otherwise the processor.
6.5 Processor ERR2 Timeout Sensor
The BMC supports an ERR2 Timeout Sensor (1 per CPU) that asserts
if a CPU’s ERR2 signal has been asserted for longer than a
fixed time period (> 90 seconds). ERR[2] is a processor signal that indicates when the IIO (Integrated IO module in the processor)
has a fatal error which could not be communicated to the core to trigger SMI. ERR[2] events are fatal error conditions, where the
BIOS and OS will attempt to gracefully handle error, but may not always do so reliably. A continuously asserted ERR2 signal is an
indication that the BIOS cannot service the condition that caused the error. This is usually because that condition prevents the BIOS
from running.
When an ERR2 timeout occurs, the BMC asserts/deasserts the ERR2 Timeout Sensor, and logs a SEL event for that sensor. The
default behavior for BMC core firmware is to initiate a system reset upon detection of an ERR2 timeout. The BIOS setup utility
provides an option to disable or enable system reset by the BMC on detection of this condition.