Recovering from compromised fault tolerance, Factors to consider before replacing drives – HP D2220sb-Storage-Blade User Manual
Page 34

Troubleshooting 34
If more drives fail than the fault-tolerance method can manage, fault tolerance is compromised, and the
logical drive fails. If this failure occurs, the operating system rejects all requests and indicates unrecoverable
errors.
For example, fault tolerance might occur when a drive in an array fails while another drive in the array is
being rebuilt.
Compromised fault tolerance can also be caused by problems unrelated to drives. In such cases, replacing
the physical drives is not required.
Recovering from compromised fault tolerance
If fault tolerance is compromised, inserting replacement drives does not improve the condition of the logical
volume. Perform the following procedure to recover data:
1.
Power down the D2220sb ("
" on page
2.
Power up the D2220sb ("
" on page
In some cases, a marginal drive is operational long enough to allow backup of important files.
3.
Make copies of important data, if possible.
4.
Replace any failed drives.
Factors to consider before replacing drives
Be sure that the server blade is the first unit to be powered down and the last to be powered back up. Taking
this precaution ensures that the system does not erroneously mark the drives as failed when the server blade
is powered up.
Before replacing a degraded drive:
•
Open HP SIM and inspect the Error Counter window for each physical drive in the same array to
confirm that no other drives have any errors. (For details, refer to the HP SIM documentation on the
Management CD.)
•
Be sure that the array has a current, valid backup.
•
Use replacement drives that have a capacity at least as great as that of the smallest drive in the array.
The controller immediately fails drives that have insufficient capacity.
To minimize the likelihood of fatal system errors, take these precautions when removing failed drives:
•
Do not remove a degraded drive if any other drive in the array is offline (the online LED is off). In this
situation, no other drive in the array can be removed without data loss.
Exceptions:
o
When RAID 1+0 is used, drives are mirrored in pairs. Several drives can be in a failed condition
simultaneously (and they can all be replaced simultaneously) without data loss, as long as no two
failed drives belong to the same mirrored pair.
o
When RAID 6 is used, two drives can fail simultaneously (and be replaced simultaneously) without
data loss.
o
If the offline drive is a spare, the degraded drive can be replaced.
•
Do not remove a second drive from an array until the first failed or missing drive has been replaced and
the rebuild process is complete. (The rebuild is complete when the online LED on the front of the drive
stops blinking.)