beautypg.com

Failure indicators – HP Insight Management Agents User Manual

Page 46

background image

Agent information 46


Check the physical proximity of the system to other electrical devices. Since electrical noise may cause a

Bus Fault error, check the AC circuit for other electrical devices.

Ensure that the system temperature is within specified limits. Ensure that fans are operating and are not

blocked.
SCSI Bus Faults can be caused when two or more drives are set to the same SCSI ID. Ensure that storage

system and system SCSI IDs do not conflict.
In some instances, drive failure can cause SCSI Bus Faults. If you continue to receive many of these errors,

replace the drive.

IRQ Deglitch—Displays the number of times that a glitch has been detected on the drive interface cable. Since

the controller retries the operation, problems can cause a drop in performance or, in some cases, data

corruption. Glitches indicate electrical noise on the drive cable or an intermittent failure of the drive electronics.
This item is considered a Problem Indicator that may be correctable without replacing the drive. Verify the status

of the drive by checking the following:

Ensure that all system and storage system cables are intact and seated properly. You may need to replace

cables.

Check the physical proximity of the system to other electrical devices. Since electrical noise may cause a

glitch error, check the AC circuit for other electrical devices.

If you continue to receive many of these errors, replace the drive.

Failure Indicators

Use the Failure Indicators to determine the cause of a drive failure. Typically, the number of failures is zero when the

drive is operating normally. If a counter is not zero and the drive has not failed, there could be an intermittent

problem that may require the drive to be replaced.
The Failure Indicators are:

Spinup Errors—When the physical drive fails due to the failure of a spin-up command, a Spinup Error occurs. If

the failure count is not zero and the drive has failed, replace the drive.
If the counter is not zero and the drive is OK (has not failed), there may be an intermittent problem that requires

drive replacement. If you observe that the count is increasing over time, replace the drive.

Aborted Commands—The Aborted Commands counter records the number of times that a physical SCSI drive

returned an Aborted Command status when a SCSI command was attempted. This error count indicates

unsuccessful termination of the SCSI command. When the physical drive is failed due to aborted commands that

could not be retried successfully, Aborted Commands errors occur. If the number of errors is not zero and the

drive has failed, replace the drive.
If the counter is not zero and the drive is OK (has not failed), there may be an intermittent problem that requires

drive replacement. If you observe that the count is increasing over time, replace the drive.

Reallocation Aborts—When the physical drive is failed due to an error that occurred when the controller was

trying to reallocate a bad sector, a Reallocation Abort error occurs.
Because of the nature of magnetic disks, certain sectors on a drive may have media defects. The reallocation

area part of the drive is set aside to compensate for these defects. The array controller writes information

addressed from unusable sectors to available sectors in the reallocation area.
If the number of reallocation abort errors is not zero and the drive has failed, replace the drive. If the counter is

not zero and the drive is OK (has not failed), there may be an intermittent problem that requires drive

replacement. If you observe that the count is increasing over time, replace the drive.

Media Failures—When this physical drive fails due to unrecoverable media errors, a Media Failure occurs.
If the number of media failure errors is not zero and the drive has failed, replace the drive. If the counter is not

zero and the drive is OK (has not failed), there may be an intermittent problem that requires drive replacement.

If you observe that the count is increasing over time, replace the drive.

Format Errors—When a format operation fails because the controller was unable to remap a bad sector, a

Format Error occurs.
If the number of format errors is not zero and the drive has failed, replace the drive. If the counter is not zero

and the drive is OK (has not failed), there may be an intermittent problem that requires drive replacement. If you

observe that the count is increasing over time, replace the drive.

Hardware Errors—The Hardware Errors counter records the number of times that a physical SCSI drive returned

a Hardware Error status when a SCSI command was attempted. This error status indicates unsuccessful

termination of the SCSI command. The controller typically retries this command several times before failing the

drive.