beautypg.com

34 handling disk errors on sfs20 storage, 34 handling disk errors on sfs20 storage -59, E section 9.34 – HP StorageWorks Scalable File Share User Manual

Page 283: Section 9.34, See section 9.34

background image

Handling Disk Errors on SFS20 storage

9–59

When the resynchronization is complete, the status information will change, as shown in the following

example:

# mdadm --detail /dev/md0
/dev/md0:
.

.

.

State : clean
.

.

.

Number Major Minor RaidDevice State
0 105 96 0 active sync /dev/cciss/c1d6
1 105 32 1 active sync /dev/cciss/c1d2

You can check the progress of the resynchronization process by examining the event log as follows:

sfs> show log facility=storage && age < "5m"
.

.

.

2004/11/02 10:28:56 storage n south2: mds8: /proc/mdstat:
md0 : active raid1 cciss/c1d2[2] cciss/c1d6[0]
10485504 blocks [2/1] [U_]
[=>...................] recovery = 6.8% (721344/10485504)
finish=2.9min speed=55488K/sec
----
.

.

.

When the resynchronization is complete, the

/proc/mdstat

command indicates this, as shown in

the following example:

sfs> show log facility=storage && age < "5m"
.

.

.

2004/11/02 10:56:41 storage n south2: mds8: /proc/mdstat:
md0 : active raid1 cciss/c1d2[1] cciss/c1d6[0]
10485504 blocks [2/2] [UU]
----
.

.

.

9.34 Handling Disk Errors on SFS20 storage

The

sfsmgr show array array_number

command displays any one of the following states for each

of the bays/disks on an SFS20 array:

ok

removed/failed

predict fail

logging errors

See Section 4.5 for more information on these states.

The system log records disk issues, as shown in the following example:

sfs> show log data contains "disk bay" && facility=storage && severity>notice
2006/01/05 13:40:44 storage !! south_test5: P92CB0AMQRA684: array 4: disk bay
1: disk Y69BMY3E has been removed or failed (was online)
2006/01/06 09:32:04 storage !! south_test5: P92CB0AMQRA684: array 4: disk bay
1: disk Y69BMY3E is logging errors (was removed or failed)
2006/01/10 10:43:25 storage !! south_test2: P92CB0AMQR2618: array 1: disk bay
12: disk Y69CHCDE has been removed or failed (was online)
2006/01/26 07:11:35 storage !! south_test5: P92CB0AMQRA683: array 3: disk bay
7: disk Y69BLLYE is logging errors (was online)
sfs>

In addition, if email alerts are configured on the system, disk errors trigger the default

disk_errors

alert

to send email to the configured recipients. The filter for the default

disk_errors

alert is as follows:

facility=storage && severity>notice && data contains "disk bay"