1 disks showing the removed/failed state, 2 disks showing the predict fail state, 3 disks showing the logging errors state – HP StorageWorks Scalable File Share User Manual

Page 284: 1 disks showing the removed/failed state -60, 2 disks showing the predict fail state -60, 3 disks showing the logging errors state -60

Troubleshooting

9–60

(For information on configuring email alerts, see Section 6.2.)

In the event of disk errors occurring, take action as described in the following sections:

•

Disks showing the removed/failed state (Section 9.34.1)

•

Disks showing the predict fail state (Section 9.34.2)

•

Disks showing the logging errors state (Section 9.34.3)

9.34.1 Disks showing the removed/failed state

The following is an example of an alert generated when a disk shows the r

emoved/failed

state:

array 4: disk Y69BMY3E has been removed or failed (was online)

In such cases, you must replace the failed disk as soon as possible. See Section 8.1.10 for information on

replacing a disk in an SFS20 array.

9.34.2 Disks showing the predict fail state

The following is an example of an alert generated when a disk shows the

predict fail

state:

array 1: disk bay 12 disk P6C8CX7 SMART predicts failure (was online)

In such cases, you must replace the disk as soon as possible, because

predict fail

errors indicate that

a disk is on the verge of failure. See Section 8.1.10 for information on replacing a disk in an SFS20 array.

9.34.3 Disks showing the logging errors state

The following is an example of an alert generated when a disk shows the

logging errors

state:

array 3: disk bay 7: disk Y69BLLYE is logging errors (was online)

In such cases, it is normally not necessary to replace the disk (a disk should not be replaced unless it is

generating a large number of errors). You should, however, investigate the matter further, by entering the

hpls_cciss_info

command on a server attached to the array, using the following syntax:

hpls_cciss_info -E -D controller,port,0,drive_number

For example:

# hpls_cciss_info -E -D 1,1,0,134
Disk 1,1,0,134: logged 32 start 12
[4] type 1 scsi_op 40 sense 0x3 qual 0 sense_code 0x11 (6146 mins ago)
Medium Error: Unrecovered read error
[5] type 2 scsi_op 40 sense 0x0 qual 0 sense_code 0x0 (6146 mins ago)
[6] type 2 scsi_op 40 sense 0x0 qual 0 sense_code 0x0 (6146 mins ago)
[7] type 2 scsi_op 42 sense 0x0 qual 0 sense_code 0x0 (6146 mins ago)
[8] type 1 scsi_op 40 sense 0x3 qual 0 sense_code 0x11 (6146 mins ago)
Medium Error: Unrecovered read error
[9] type 2 scsi_op 40 sense 0x0 qual 0 sense_code 0x0 (6146 mins ago)
[10] type 2 scsi_op 40 sense 0x0 qual 0 sense_code 0x0 (6146 mins ago)
[11] type 2 scsi_op 42 sense 0x0 qual 0 sense_code 0x0 (6146 mins ago)

When you are examining the output from the

hpls_cciss_info

command, note the following points:

•

The

sense 0x0

errors are transient, and can be ignored.

•

Although the

Unrecovered read error

(URE) messages are nontrivial, they are to be expected

on SATA drives; if the RAID functionality is operating normally on the array, the affected data is

reconstructed from the RAID parity stripe and rewritten.

Continue to monitor disks that are logging URE errors; if a particular disk (or disks) in an array is

frequently logging a lot (several hundred) of URE errors, the performance of the array may be

adversely affected.