Resource failure, Adjusting the poll intervals, Adjusting the threshold and period values – Dell PowerVault 775N (Rackmount NAS Appliance) User Manual

Page 73: Configuring failover

an existing cluster, MSCS can retrieve the data from the other active nodes. However, when a node forms a cluster, no other

node is available. MSCS uses the quorum disk's recovery logs to update the node's cluster database, thereby maintaining the

correct version of the cluster database and ensuring that the cluster is intact.

For example, if node 1 fails, node 2 continues to operate, writing changes to the cluster database. Before you can restart

node 1, node 2 fails. When node 1 becomes active, it updates its private copy of the cluster database with the changes made

by node 2 using the quorum disk's recovery logs to perform the update.

To ensure cluster unity, the operating system uses the quorum disk to ensure that only one set of active, communicating

nodes is allowed to operate as a cluster. A node can form a cluster only if it can gain control of the quorum disk. A node can

join a cluster or remain in an existing cluster only if it can communicate with the node that controls the quorum disk.

For example, if the private network (cluster interconnect) between cluster nodes 1 and 2 fails, each node assumes that the

other node has failed, causing both nodes to continue operating as the cluster. If both nodes were allowed to operate as the

cluster, the result would be two separate clusters using the same cluster name and competing for the same resources. To

solve this problem, MSCS uses the node that owns the quorum disk to maintain cluster unity and solve this problem. In this

scenario, the node that gains control of the quorum disk is allowed to form a cluster, and the other fails over its resources

and becomes inactive.

Resource Failure

A failed resource is not operational on the current host node. At periodic intervals, MSCS checks to see if the resource appears

operational by periodically invoking the Resource Monitor. The Resource Monitor uses the resource DLL for each resource to

detect if the resource is functioning properly. The resource DLL communicates the results back through the Resource Monitor

to MSCS.

Adjusting the Poll Intervals

You can specify how frequently MSCS checks for failed resources by setting the Looks Alive (general resource check) and

Is Alive (detailed resource check) poll intervals. MSCS requests a more thorough check of the resource's state at each Is

Alive interval than it does at each Looks Alive interval; therefore, the Is Alive poll interval is typically longer than the

Looks Alive poll interval.

NOTE:

Do not adjust the Looks Alive and Is Alive settings unless instructed by technical support.

Adjusting the Threshold and Period Values

If the resource DLL reports that the resource is not operational, MSCS attempts to restart the resource. You can specify the

number of times MSCS can attempt to restart a resource in a given time interval. If MSCS exceeds the maximum number of

restart attempts (Threshold value) within the specified time period (Period value), and the resource is still not operational,

MSCS considers the resource to be failed.

NOTE:

See "

" to configure the Looks alive, Is alive, Threshold, and Period

values for a particular resource.

NOTE:

Do not adjust the Threshold and Period values settings unless instructed by technical support.

Configuring Failover

You can configure a resource to fail over an entire group to another node when a resource in that group fails for any reason.

If the failed resource is configured to cause the group that contains the resource to fail over to another node, Cluster Service

will attempt a failover. If the number of failover attempts exceeds the group's threshold and the resource is still in a failed

state, MSCS will attempt to restart the resource. The restart attempt will be made after a period of time specified by the

resource's Retry Period On Failure property, a property common to all resources.