3 serviceability, Serviceability – FUJITSU SPARC ENTERPRISE SERVER M9000 User Manual

Page 55

Chapter 2

System Features

2-15

Hardware and software faults in the system cannot be completely eliminated. To
provide high availability, the system must include mechanisms that enable
continuous system operation even if a failure occurs in hardware, such as
components and devices, or in software, such as the OS, or application software.

M8000/M9000 servers provide the functions listed below to obtain high availability.
Higher availability can also be obtained by combining the server with clustering
software or management software.

■

Supporting redundant configurations and active (hot) replacement of power
supply units and FAN units

■

Supporting redundant configuration of hard disk drive, mirroring by software
and active replacement

■

Extended range of automatic correction of temporary faults in memory, system
buses, and LSI internal data

■

Supporting an enhanced retry function and degradation function for detected
faults

■

Shortening the downtime by using automatic system reboot

■

Shortening the time taken for system startup

■

XSCF collection of fault information, and preventive maintenance using different
types of warnings

■

Supporting the Chipkill function in the memory subsystem, which enables single-
bit error correction to continue processing in response to continuous burst read
errors caused by failures of a memory device

■

Supporting the memory mirroring function enables normal data processing
through the other memory bus, thereby preventing system failures in response to
an error at the bus or device connected to memory bus

■

Memory patrol function has no influence on the workload of software operation
because it is implemented in hardware

2.4.3

Serviceability

Serviceability is characterized by how easily a server fault can be diagnosed, and
how quickly the server can be recovered from the fault or how easily the fault can be
corrected.

To achieve high serviceability rates, it must be possible to identify the causes of
component or device failure. To facilitate recovery from failure, the system must
determine the cause of the failure and isolate the faulty component for replacement.
The system must also notify the system administrator and/or field engineer of the
event and situation in an easy-to-understand format that prevents
misunderstandings.

This manual is related to the following products:

SPARC ENTERPRISE SERVER M8000