beautypg.com

NEC ExpressA1160 User Manual

Page 135

background image

Section 7

Setting Up the System to Handle Faults
Automatically

A fault is a failure of a hardware component, a software component, or some
combination of components. A fault typically results in both an event and an alert.

An event is a condition that is detected (for example, by a sensor) and reported to a
monitoring entity. The condition can be a system error, a change in an environmental
condition, a system resource that is outside currently accepted limits, or some other
system status that is no longer within specification. Typically, the event is written to an
event log. Some events result in an alert being issued. Some events require additional
troubleshooting.

An alert is a notification that a system event occurred that requires attention. An alert is
always the result of an event, but not all events result in alerts. Some alerts require
additional troubleshooting.

The Express5800/A1160 system is self-healing, with built-in capabilities to recognize
and analyze errors and take corrective actions automatically. In many cases, these
actions resolve the error and restore health to the system without user intervention. To
ensure that the system takes the appropriate actions for your site, you need to select the
preferred behavior for the following fault recovery and troubleshooting attributes:

• Fault

settings

Alert definitions and notification strategy

• SMTP

server

This section describes the procedures for setting these attributes for the Service
Processor and partitions, using remote console interface, or Server Management
software Operator Console. For related information, refer to

Section 9 Managing Partitions, Virtual Machine Monitors, and Virtual Machines

for

information about how to monitor alerts when you receive them.

Section 10 Troubleshooting Hardware Problems

for information about how to

analyze alerts and solve problems that the system does not heal automatically

3.5.13 Customer Data Settings

for setting customer data for remote maintenance

notification