Address/control parity, Protection for i/o, Reliability in the cabinet infrastructure – HP RX8620-32 User Manual
Page 33
33
Address/Control parity
The address control path of the memory system is protected so that spurious bit flips in the
address/control path do not cause the correct data to be written to the wrong location, which would
result in data corruption. HP is the leader in delivering this functionality to the mission-critical
marketplace.
Protection for I/O
I/O errors are another significant cause of hardware errors and downtime because the number of
I/O cards in a typical system is significant, and the I/O cards themselves are a part of the system
most exposed to frequent human interaction in the data center.
In order to prevent downtime due to I/O errors, HP has designed the following features into the
Integrity rx7620-16 and rx8620-32 Servers:
• Online replacement of PCI-X cards
• Hardware “firewall” of I/O errors to cell
• High mean time between failures (MTBF) for I/O cards
• Separate PCI-X buses for each I/O card
Taken together, these features will reduce hardware downtime by at least 20% over similar servers.
Integrity rx8620-32 Server crossbar backplane protection
The backplane of the Integrity rx8620-32 Server ties CPU and memory together. Because all partitions
share the backplane, high reliability and true domain isolation are very important. The specific
features that address these areas are as follows:
• Highly reliable ASICs—The backplane ASIC is manufactured and tested with a process that results
in 10X demonstrated reliability over comparable chips. This reliability results in virtually zero
backplane ASIC failures in the field.
• Redundant DC–DC converters—The DC–DC converters that power the backplane chips are fully
redundant, reducing downtime associated with power conversion. (Power conversion is normally a
significant contributor to failure rate.)
• Full end-to-end error correction and independent-partition design—The backplane is built from a
single crossbar with point-to-point connections. Traffic within a partition is contained in that
partition, so there is no sharing of links in a properly configured system. Each port of the crossbar
chip is fully independent, allowing cells of different partitions to coexist without affecting each other
in any way. In other bus-based systems, all domains participate in the coherency scheme and share
address buses. Therefore, in these systems all domains are linked in some fashion, resulting in
shared failure modes that might crash multiple partitions.
Also, unlike other snoopy coherency systems that must accept and respond to all coherency requests
from all domains, Integrity rx8620-32 Server partitions have hardware firewalls dedicated to
guarding partitions from errant transactions generated on failing partitions. A failure in one Integrity
rx8620-32 Server partition will not affect any other partitions.
Finally, all data paths in the fabric are resistant to both random single-bit errors and persistent
single-wire “stuck-at” faults. Therefore, the fabric is resilient to any single-bit failure, including pin,
connector, or solder problems.
Reliability in the cabinet infrastructure
In keeping with its focus on maintaining high availability (HA), the Integrity rx7620-16 and rx8620-32
Servers include protection against failure within the cabinet infrastructure. The HA features in this area
include true dual AC line cord support and complete resilience to service processor failures.