beautypg.com

IBM 755 User Manual

Page 8

background image

IBM United States Hardware Announcement

110-008

IBM is a registered trademark of International Business Machines Corporation

8

PCI extended error handling

PCI extended error handling (EEH) enabled adapters respond to a special data

packet generated from the affected PCI slot hardware by calling system firmware,

which will examine the affected bus, allow the device driver to reset it, and continue

without a system reboot. For Linux, EEH support extends to the majority of

frequently used devices, although some third-party PCI devices may not provide

native EEH support.

Predictive failure and dynamic component deallocation

Servers with POWER processors have long had the capability to perform predictive

failure analysis on certain critical components such as processors and memory.

When these components exhibit symptoms that would indicate a failure is imminent,

the system can dynamically deallocate and call home about the failing part before

the error is propagated system-wide. In many cases, the system will first attempt to

reallocate resources in such a way that will avoid unplanned outages. In the event

that insufficient resources exist to maintain full system availability, these servers will

attempt to maintain partition availability by user-defined priority.

Uncorrectable error recovery

When the auto-restart option is enabled, the system can automatically restart

following an unrecoverable software error, hardware failure, or environmentally

induced (ac power) failure.

Serviceability
The purpose of serviceability is to repair the system while attempting to minimize

or eliminate service cost (within budget objectives), while maintaining high

customer satisfaction. Serviceability includes system installation, MES (system

upgrades/downgrades), and system maintenance/repair. Depending upon the

system and warranty contract, service may be performed by the customer, an IBM

representative, or an authorized warranty service provider.

The Serviceability features delivered in this system provide a highly efficient service

environment by incorporating the following attributes:

• Design for Customer Set Up (CSU), Customer Installed Features (CIF), and

Customer Replaceable Units (CRU)

• Error detection and Fault Isolation (ED/FI)
• First Failure Data Capture (FFDC)
• Converged service approach across multiple IBM server platforms

Service environments

The HMC is a dedicated server that provides functions for configuring and managing

servers for either partitioned or full-system partition using a GUI or command-line

interface (CLI). An HMC attached to the system allows support personnel (with

client authorization) to remotely log in to review error logs and perform remote

maintenance if required.

The POWER7 processor-based platforms support two main service environments:

• Attachment to one or more HMCs is a supported option by the system. This is the

default configuration for servers supporting logical partitions with dedicated or

virtual I/O. In this case, all servers have at least one logical partition.

• No HMC.

• Full system partition: A single partition owns all the server resources and only one

operating system may be installed.