Ilom troubleshooting overview – FUJITSU SPARC T5120 User Manual
Page 49
Detecting and Managing Faults
23
ILOM Troubleshooting Overview
ILOM enables you to remotely run diagnostics such as power-on self-test (POST),
that would otherwise require physical proximity to the server’s serial port. You can
also configure ILOM to send email alerts of hardware failures, hardware warnings,
and other events related to the server or to ILOM.
The service processor runs independently of the server, using the server’s standby
power. Therefore, ILOM firmware and software continue to function when the server
OS goes offline or when the server is powered off.
Faults detected by ILOM, POST, and the Solaris Predictive Self-Healing (PSH)
technology are forwarded to ILOM for fault handling.
FIGURE:
Fault Reporting Through the ILOM Fault Manager
In the event of a system fault, ILOM ensures that the Service Required LED is turned
on, FRUID PROMs are updated, the fault is logged, and alerts are displayed. Faulty
FRUs are identified in fault messages using the FRU name.
The service processor can detect when a fault is no longer present. When this
happens, it clears the fault state in the FRU PROM and extinguishes the Service
Required LED.
A fault condition can be removed in two ways:
■
Unaided recovery
– Faults caused by environmental conditions can clear
automatically if the condition responsible for the fault improves over time.
■
Repaired fault
– When a fault is repaired by human intervention, such as a FRU
replacement, the service processor will usually detect the repair automatically and
extinguish the Service Required LED.
Many environmental faults can automatically recover. For example, a temporary
condition may cause the computer room temperature to rise above the maximum
threshold, producing an over temperature fault in the server. If the computer room
temperature then returns to the normal range and the server’s internal temperature
also drops back to an acceptable level, the service processor will detect the new
fault-free condition. It will extinguish the Service Required LED and clear the fault
state from the FRU PROM.