beautypg.com

Memory address parity error sensor next steps – Kontron S5500 SEL Troubleshooting User Manual

Page 71

background image

Memory subsystem

System Event Log Troubleshooting Guide for Intel® S5500/S3420 series Server Boards

62

Intel order number G74211-001

Revision 1.0

Byte

Field

Description

16

Event Data 3

[7:5]

– Indicates the Processor Socket to which the DDR3 DIMM having the ECC error is attached:

000b = Processor Socket 1

001b = Processor Socket 2

All other values are reserved.

[4:3]

– Channel Number (if valid) on which the Parity Error occurred. This value will be indeterminate and

should be ignored if ED2 Bit [4] is 0b.

00b = Channel A

01b = Channel B

10b = Channel C

11b = reserved

[2:0]

– DIMM Slot ID (If valid) of the specific DIMM that was involved in the transaction that led to the

parity error. This value will be indeterminate and should be ignored if ED2 Bit [3] is 0b.

000b = DIMM Socket 1

001b = DIMM Socket 2

All other values are reserved.

9.2.2.1

Memory Address Parity Error Sensor Next Steps

These are bit errors that are detected in the memory addressing hardware. An Address Parity Error implies that the memory address
transmitted to the DIMM addressing circuitry has been compromised, and data read or written are compromised in turn. An Address Parity
Error is logged as such in SEL but in all other ways is treated the same as an Uncorrectable ECC Error.
While the error may be due to a failing DRAM chip on the DIMM, it could also be caused by incorrect seating or improper contact between
socket and DIMM, or by bent pins in the processor socket.

1. If needed, decode DIMM location from hex version of SEL.
2. Verify DIMM is seated properly.
3. Examine gold fingers on edge of DIMM to verify contacts are clean.
4. Inspect processor socket this DIMM is connected to for bent pins, and if found, replace the board.
5. Consider replacing the DIMM as a preventative measure. For multiple occurrences, replace the DIMM.