3 serviceability, Serviceability – FUJITSU SPARC ENTERPRISE SERVER M9000 User Manual
Page 55

Chapter 2
System Features
2-15
Hardware and software faults in the system cannot be completely eliminated. To 
provide high availability, the system must include mechanisms that enable 
continuous system operation even if a failure occurs in hardware, such as 
components and devices, or in software, such as the OS, or application software.
M8000/M9000 servers provide the functions listed below to obtain high availability. 
Higher availability can also be obtained by combining the server with clustering 
software or management software.
■
Supporting redundant configurations and active (hot) replacement of power 
supply units and FAN units
■
Supporting redundant configuration of hard disk drive, mirroring by software 
and active replacement
■
Extended range of automatic correction of temporary faults in memory, system 
buses, and LSI internal data
■
Supporting an enhanced retry function and degradation function for detected 
faults
■
Shortening the downtime by using automatic system reboot
■
Shortening the time taken for system startup
■
XSCF collection of fault information, and preventive maintenance using different 
types of warnings
■
Supporting the Chipkill function in the memory subsystem, which enables single-
bit error correction to continue processing in response to continuous burst read 
errors caused by failures of a memory device
■
Supporting the memory mirroring function enables normal data processing 
through the other memory bus, thereby preventing system failures in response to 
an error at the bus or device connected to memory bus
■
Memory patrol function has no influence on the workload of software operation 
because it is implemented in hardware
2.4.3
Serviceability
Serviceability is characterized by how easily a server fault can be diagnosed, and 
how quickly the server can be recovered from the fault or how easily the fault can be 
corrected.
To achieve high serviceability rates, it must be possible to identify the causes of 
component or device failure. To facilitate recovery from failure, the system must 
determine the cause of the failure and isolate the faulty component for replacement. 
The system must also notify the system administrator and/or field engineer of the 
event and situation in an easy-to-understand format that prevents 
misunderstandings.
