beautypg.com

Solaris psh feature overview, Psh-detected fault console message – FUJITSU SPARC T5120 User Manual

Page 70

background image

44

SPARC Enterprise T5120 and T5220 Servers Service Manual • July 2009

Solaris PSH Feature Overview

The Solaris OS uses the Fault Manager daemon, fmd(1M), which starts at boot time
and runs in the background to monitor the system. If a component generates an
error, the daemon handles the error by correlating the error with data from previous
errors and other related information to diagnose the problem. Once diagnosed, the
Fault Manager daemon assigns the problem a Universal Unique Identifier (UUID)
that distinguishes the problem across any set of systems. When possible, the Fault
Manager daemon initiates steps to self-heal the failed component and take the
component offline. The daemon also logs the fault to the syslogd daemon and
provides a fault notification with a message ID (MSGID). You can use the message ID
to get additional information about the problem from the knowledge article database.

The Predictive Self-Healing technology covers the following server components:

Multicore processor

Memory

I/O subsystem

The PSH console message provides the following information about each detected
fault:

Type

Severity

Description

Automated response

Impact

Suggested action

If the Solaris PSH facility detects a faulty component, use the fmdump command to
identify the fault. Faulty FRUs are identified in fault messages using the FRU name.

PSH-Detected Fault Console Message

When a PSH fault is detected, a Solaris console message displayed. The following
example illustrates the type of information contained in a console message generated
when a PSH fault is detected.

SUNW-MSG-ID: SUN4V-8000-DX, TYPE: Fault, VER: 1, SEVERITY: Minor

EVENT-TIME: Wed Sep 14 10:09:46 EDT 2005

PLATFORM: SUNW,system_name, CSN: -, HOSTNAME: wgs48-37

SOURCE: cpumem-diagnosis, REV: 1.5

EVENT-ID: f92e9fbe-735e-c218-cf87-9e1720a28004

DESC: The number of errors associated with this memory module has exceeded

This manual is related to the following products: