Next steps59, Next steps – Kontron S5500 SEL Troubleshooting User Manual
Page 68

System Event Log Troubleshooting Guide for Intel
®
S5500/S3420 series Server Boards
Memory subsystem
Revision 1.0
Intel order number G74211-001
59
Byte
Field
Description
16
Event Data 3
[7:5]
– Indicates the Processor Socket to which the DDR3 DIMM having the ECC error is attached:
000b = Processor Socket 1
001b = Processor Socket 2
All other values are reserved.
[4:3]
– Indicates the processor Memory Channel to which the failing DDR3 DIMM is attached:
00b = Channel A
01b = Channel B
10b = Channel C
11b is reserved.
[2:0]
– Indicates the DIMM Socket on the channel to which the failing DDR3 DIMM is attached:
000b = DIMM Socket 1
001b = DIMM Socket 2
All other values are reserved.
Table 61: Correctable and Uncorrectable ECC Error Sensor Event Trigger Offset
– Next Steps
Event Trigger Offset
Description
Next Steps
Hex
Description
01h
Uncorrectable ECC
Error.
An uncorrectable (multi-bit) ECC error has occurred. This is a fatal issue that will typically
lead to an OS crash (unless memory has been configured in a RAS mode). The system
will generate a CATERR# (catastrophic error) and an MCE (Machine Check Exception
Error).
While the error may be due to a failing DRAM chip on the DIMM, it could also be cause by
incorrect seating or improper contact between socket and DIMM, or by bent pins in the
processor socket.
1. If needed, decode DIMM location from hex
version of SEL.
2. Verify DIMM is seated properly.
3. Examine gold fingers on edge of DIMM to
verify contacts are clean.
4. Inspect processor socket this DIMM is
connected to for bent pins, and if found,
replace the board.
5. Consider replacing the DIMM as a
preventative measure. For multiple
occurrences, replace the DIMM.