beautypg.com

Using the solaris predictive self-healing feature – Sun Microsystems Sun Fire T1000 User Manual

Page 47

background image

Chapter 2

Sun Fire T1000 Server Diagnostics

35

Example:

In this example, MB/CMP0/CH2/R0/D0 (DIMM 0 at J0701) is disabled. Until the
faulty component is replaced, the system can boot using memory that was not
disabled.

Note –

You can use ASR commands to display and control disabled components.

See

“Managing System Components with Automatic System Recovery Commands”

on page 40

.

Using the Solaris Predictive Self-Healing
Feature

The Solaris OS predictive self-healing technology enables Sun Fire T1000 server to
diagnose problems while the Solaris OS is running, and mitigate many serious
problems before they occur.

The Solaris OS uses the fault manager daemon, fmd(1M), which starts at boot time
and runs in the background to monitor the system. If a component generates an
error, the daemon handles the error by correlating the error with data from previous
errors and other related information to diagnose the problem. Once diagnosed, the
fault manager daemon assigns the problem a unique identifier (UUID) that
distinguishes the problem across any set of systems. When possible, the fault
manager daemon initiates steps to self-heal the failed component and take the
component offline. The daemon also logs the fault to the syslogd daemon and
provides a fault notification with a message ID (MSGID). You can use message ID to
get additional information about the problem from Sun’s knowledge article
database.

The predictive self-healing technology covers the following Sun Fire T1000 server
components:

UltraSPARC T1 multicore processor

Memory

I/O bus

ok .#

sc> showfaults -v

ID Time FRU Fault

1 APR 24 12:47:27 MB/CMP0/CH2/R0/D0 MB/CMP0/CH2/R0/D0 deemed

faulty and disabled