Monitoring io accelerator health, Nand flash and component failure, Health metrics – HP PCIe IO Accelerators for ProLiant Servers User Manual
Page 102: Health monitoring techniques
Monitoring IO Accelerator health 102
Monitoring IO Accelerator health
NAND flash and component failure
The IO Accelerator is a highly fault-tolerant storage subsystem that provides many levels of protection
against component failure and the loss nature of solid state storage. However, as in all storage
subsystems, component failures might occur.
When a large enough number of data blocks is retired due to error, the NAND flash media is considered
worn out. By properly selecting NAND flash media for the hosted application and proactively monitoring
device age and health, you can assure reliable performance over the intended product life.
Health metrics
The IO Accelerator driver manages LEB retirement via use of pre-determined retirement thresholds. The IO
Accelerator Management Tool and the fio-status utility show a health indicator that starts at 100 and
counts down to 0. As certain thresholds are crossed, various actions are taken.
At the 10% healthy threshold, a one-time warning is issued. See "Health monitoring techniques (on page
)" for methods for capturing this alarm event.
At 0%, the device is considered unhealthy. It enters write-reduced mode, which somewhat prolongs its
lifespan so data can be safely migrated. In this state, the IO Accelerator behaves normally except for the
reduced write performance.
At some point after the 0% threshold, the device enters read-only mode. Any attempt to write to the IO
Accelerator causes an error. Some file systems might require special mount options to mount a read-only
block device, beyond specifying that the mount should be read-only. For example, under Linux, ext3
requires that -o ro,noload is used. The noload option tells the file system not to try to replay the
journal.
Read-only mode should be considered a final opportunity to migrate data off the device since device
failure is more likely with continued use.
The IO Accelerator might enter failure mode. In this case, the device is offline and inaccessible. This can
be caused by an internal catastrophic failure, improper firmware upgrade procedures, or device wears
out.
Health monitoring techniques
fio-status
Output from the fio-status utility shows the health percentage and drive state. These items are bold in
the following samput output.
Found 1 ioDrive in this system
Fusion-io driver version: 2.1.0 build 19032
Adapter: ioDrive