B.8 proactive data protection – Accusys ExaRAID GUI User Manual

Page 277

Appendix

B-8

B.8 Proactive Data Protection

The most fundamental requirement for a storage system is to protect the
data from all kinds of failures. The RAID controller firmware supports versatile
RAID configurations for different levels of reliability requirement, including
RAID 6 to tolerate double-drive failure, and Triple Parity for extreme data
availability. It provides online utilities for proactive data protection to
monitor disk health, minimize the risk of data loss, and avoid RAID
degradation. RAID configurations can be recovered and imported even
the RAID is corrupted.

• Online disk scrubbing

Bad sectors of a hard disk can be detected only when they are accessed,
so bad sectors may stay a long time undetected if disk access pattern is
unevenly distributed and the sectors reside on seldom-accessed areas. In
disk rebuilding, all data on the surviving hard disks is needed to regenerate
the data of the failed disk, and if there are bad sectors on the surviving disks,
the data cannot be regenerated and gone forever. As the number of
sectors per disk increases, this will be a very common issue to any disk-based
storage systems. The firmware provides online disk scrubbing utility to test the
entire disk surface by a background task and recover any bad sectors
detected.

• Online parity consistency check and recovery

The ability to protect data in parity-based RAID relies on the correctness of
parity information. There are certain conditions that the parity consistency
might be corrupted, such as internal errors of hard drives or abnormal
power-off of system while the cache of hard drives is enabled. To ensure
higher data reliability, the administrator can instruct the controller to
conduct parity check and recovery during disk scrubbing.

• S.M.A.R.T. drive health monitoring and self-test

S.M.A.R.T. stands for Self-Monitoring Analysis Reporting Technology, by which
a hard disk can continuously self-monitor its key components and collect
statistics as indicators of its health conditions. The hard disks are periodically
polled, and the controller will alert the administrator and start disk cloning
when the disks report warnings. The firmware can also instruct the disk drives
to execute device self-test routines embedded in the disk drives; this
effectively helps the users to identify defective disk drives.

• Online bad sector reallocation and recovery with over-threshold alert

Hard disks are likely to have more and more bad sectors after they are in
service. When host computers access bad sectors, the controller rebuilds
data and responds to host. In addition to leveraging on-disk reserved space
for bad block reallocation, the controller uses the reserved space on hard
disks for reallocating data of bad sectors. If the number of bad sectors
increases over the threshold specified by the administrator, alerts will be sent
to the administrator, and disk cloning will be started automatically.