beautypg.com

Next steps59, Next steps – Kontron S5500 SEL Troubleshooting User Manual

Page 68

background image

System Event Log Troubleshooting Guide for Intel

®

S5500/S3420 series Server Boards

Memory subsystem

Revision 1.0

Intel order number G74211-001

59

Byte

Field

Description

16

Event Data 3

[7:5]

– Indicates the Processor Socket to which the DDR3 DIMM having the ECC error is attached:

000b = Processor Socket 1

001b = Processor Socket 2

All other values are reserved.

[4:3]

– Indicates the processor Memory Channel to which the failing DDR3 DIMM is attached:

00b = Channel A

01b = Channel B

10b = Channel C

11b is reserved.

[2:0]

– Indicates the DIMM Socket on the channel to which the failing DDR3 DIMM is attached:

000b = DIMM Socket 1

001b = DIMM Socket 2

All other values are reserved.

Table 61: Correctable and Uncorrectable ECC Error Sensor Event Trigger Offset

– Next Steps

Event Trigger Offset

Description

Next Steps

Hex

Description

01h

Uncorrectable ECC
Error.

An uncorrectable (multi-bit) ECC error has occurred. This is a fatal issue that will typically
lead to an OS crash (unless memory has been configured in a RAS mode). The system
will generate a CATERR# (catastrophic error) and an MCE (Machine Check Exception
Error).

While the error may be due to a failing DRAM chip on the DIMM, it could also be cause by
incorrect seating or improper contact between socket and DIMM, or by bent pins in the
processor socket.

1. If needed, decode DIMM location from hex

version of SEL.

2. Verify DIMM is seated properly.

3. Examine gold fingers on edge of DIMM to

verify contacts are clean.

4. Inspect processor socket this DIMM is

connected to for bent pins, and if found,
replace the board.

5. Consider replacing the DIMM as a

preventative measure. For multiple
occurrences, replace the DIMM.