beautypg.com

1 preparing heartbeat – HP StorageWorks Scalable File Share User Manual

Page 33

background image

how Heartbeat is configured. Manual fail back can prevent system oscillation if, for example, a
bad node reboots continuously.

Heartbeat nodes send messages over the network interfaces to exchange status information and
determine whether the other member of the failover pair is alive. The HP SFS G3.0-0
implementation sends these messages using IP multicast. Each failover pair uses a different IP
multicast group.

When a node determines that its partner has failed, it must ensure that the other node in the pair
cannot access the shared disk before it takes over. Heartbeat can usually determine whether the
other node in a pair has been shutdown or powered off. When the status is uncertain, it may be
necessary to power-cycle a partner node to ensure it cannot access the shared disk. This is referred
to as STONITH. HP SFS G3.0-0 uses iLO, rather than remote power controllers for STONITH.

5.2.1 Preparing Heartbeat

1.

Verify that the Heartbeat RPMs are installed:

libnet-1.1.2.1-2.2.el5.rf.x86_64.rpm

pils-2.1.3-1.x86_64.rpm

stonith-2.1.3-1.x86_64.rpm

heartbeat-2.1.3-1.x86_64.rpm

2.

Obtain the failover pair information from the overall Lustre configuration.

3.

Heartbeat uses one or more of the network interfaces to send Heartbeat messages using IP
multicast. Each failover pair of nodes must have IP multicast connectivity over those
interfaces. HP SFS G3.0-0 uses eth0 and ib0.

4.

Each node of a failover pair must have mount-points for all the Lustre servers that might
be run on that node; both the ones it is primarily responsible for and those which might fail
over to it. Ensure that all the mount-points are present on all nodes.

5.

Heartbeat uses iLO for STONITH and requires the iLO IP address or name, and iLO login
and password for each node. Each node in a failover pair must be able to reach the iLO
interface of its peer over the network.

5.2.2 Generating Heartbeat Configuration Files Automatically

Because the version of lustre_config contained in Lustre 1.6 does not produce correct
Heartbeat V2.1.3 configurations, the -t hbv2 option should not be used. The lustre_config
script does however correctly add failover information to the mkfs.lustre parameters (allowing
clients to failover to a different OSS) if the failover NIDs are specified in the CSV file.

The HP SFS G3.0-0 Software tarball includes the
/opt/hp/sfs/scripts/gen_hb_config_files.pl

script which may be used to generate

Heartbeat configuration files for all the nodes from the lustre_config CSV file. The
gen_hb_config_files.pl

script must be run on a node where Heartbeat is installed. An

additional CSV file of iLO and other information must be provided. A sample is included in the
HP SFS G3.0-0 Software tarball at /opt/hp/sfs/scripts/ilos.csv. For more information,
run gen_hb_config_files.pl with the -h switch. The Text::CSV Perl module is required
by gen_hb_config_files.pl.

5.2 Configuring Heartbeat

33