Before replacing drives, Automatic data recovery (rebuild) – HP Smart Array P731m Controller User Manual

Page 24

Drive procedures 24

Before replacing drives

•

Open Systems Insight Manager, and inspect the Error Counter window for each physical drive in the
same array to confirm that no other drives have any errors. For more information about Systems Insight

Manager, see the documentation on the Insight Management DVD or on the HP website

(

http://www8.hp.com/us/en/products/server-software/product-detail.html?oid=489496#!tab=feat

ures

•

Be sure that the array has a current, valid backup.

•

Confirm that the replacement drive is of the same type as the degraded drive (either SAS or SATA and
either hard drive or solid state drive).

•

Use replacement drives that have a capacity equal to or larger than the capacity of the smallest drive
in the array. The controller immediately fails drives that have insufficient capacity.

In systems that use external data storage, be sure that the server is the first unit to be powered down and the

last unit to be powered up. Taking this precaution ensures that the system does not, erroneously, mark the
drives as failed when the server is powered up.
In some situations, you can replace more than one drive at a time without data loss. For example:

•

In RAID 10 configurations, drives are mirrored in pairs. You can replace several drives simultaneously
if they are not mirrored to other removed or failed drives.

•

In RAID 50 configurations, drives are arranged in parity groups. You can replace several drives
simultaneously, if the drives belong to different parity groups. If two drives belong to the same parity

group, replace those drives one at a time.

•

In RAID 6 configurations, you can replace any two drives simultaneously.

•

In RAID 60 configurations, drives are arranged in parity groups. You can replace several drives
simultaneously, if no more than two of the drives being replaced belong to the same parity group.

To remove more drives from an array than the fault tolerance method can support, follow the previous

guidelines for removing several drives simultaneously, and then wait until rebuild is complete (as indicated by
the drive LEDs) before removing additional drives.
However, if fault tolerance has been compromised, and you must replace more drives than the fault tolerance

method can support, delay drive replacement until after you attempt to recover the data (refer to "

Recovering

from compromised fault tolerance

" on page

Automatic data recovery (rebuild)

When you replace a drive in an array, the controller uses the fault-tolerance information on the remaining

drives in the array to reconstruct the missing data (the data that was originally on the replaced drive) and
then write the data to the replacement drive. This process is called automatic data recovery or rebuild. If fault

tolerance is compromised, the controller cannot reconstruct the data, and the data is likely lost permanently.
If another drive in the array fails while fault tolerance is unavailable during rebuild, a fatal system error can

occur, and all data on the array can be lost. However, failure of another drive does not always lead to a fatal

system error in the following exceptional cases:

•

Failure after activation of a spare drive

•

Failure of a drive that is not mirrored to any other failed drives (in a RAID 10 configuration)

•

Failure of a second drive in a RAID 50 or RAID 60 configuration if the two failed drives are in different
parity groups