beautypg.com

Correctable dimm errors – Sun Microsystems Sun Fire X4240 User Manual

Page 24

background image

14

Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008

The lines in the display start with event numbers (in hex), followed by a description
of the event.

TABLE 3-1

describes the contents of the display:

Correctable DIMM Errors

If a DIMM has 24 or more correctable errors in 24 hours, it is considered defective
and should be replaced.

At this time, CEs are not logged in the server’s system event logs. They are reported
or handled in the supported OS’s as follows:

Windows Server:

a. A Machine Check error-message bubble appears on the task bar.

b. The user must manually open Event Viewer to view errors. Access Event

Viewer through this menu path:

Start-->Administration Tools-->Event Viewer

c. The user can then view individual errors (by time) to see details of the error.

Solaris:

Solaris FMA reports and (sometimes) retires memory with correctable Error
Correction Code (ECC) errors. See your Solaris Operating System documentation
for details. Use the command:

fmdump -eV

TABLE 3-1

Lines in IPMI Output

Event (hex)

Description

8

UCE caused a Hypertransport sync flood which lead to system's warm
reset. #0x02 refers to a reboot count maintained since the last AC power
reset.

9

BIOS detected and initiated 4 processors in system.

a

BIOS detected a Sync Flood caused this reboot.

b

BIOS detected a hardware error caused the Sync Flood.

c to 1e

BIOS retrieved and reported some hardware evidence, including all
processors' Machine Check Error registers (events 14 to 18).

1f

After BIOS detected that a UCE had occurred, it located the DIMM and
reset. 0x03 refers to reboot count.

21 to 25

BIOS off-lined faulty DIMMs from system memory space and reported
them. Each DIMM of a pair is being reported, since hardware UCE
evidence cannot lead BIOS any further than detection of a faulty pair.

This manual is related to the following products: