Fault detection and recovery, Fault recovery – Allied Telesis AlliedWare Plus Operating System Version 5.4.4C (x310-26FT,x310-26FP,x310-50FT,x310-50FP) User Manual
Page 1486
EPSR Introduction and Configuration
Software Reference for x310 Series Switches
57.4
AlliedWare Plus
TM
Operating System - Version 5.4.4C
C613-50046-01 REV A
Fault Detection and Recovery
EPSR uses the following methods to detect outages in a node or a link in the ring:
■
Master node polling fault detection
■
Transit node unsolicited fault detection
Master node
polling
The master node issues healthcheck messages from its primary port as a means of
checking the condition of the EPSR network ring. These messages are sent at regular
periods, controlled by the hellotime parameter of the
failover timer is set each time a healthcheck message leaves the master node’s primary
port. The timeout value for this timer is set by the failover parameter of the
. If the failover timer expires before the transmitted healthcheck
message is received by the master node’s secondary port, the master node assumes that
there is a fault in the ring, and implements its fault recovery procedures. Because this
method relies on a timer expiry, its operation is inherently slower than the “transit node
unsolicited detection method” described next.
Transit node
unsolicited
Transit node unsolicited fault detection relies on transit nodes detecting faults at their
interfaces, and immediately notifying master nodes about the break. When a transit node
detects a connectivity loss, it sends a “links down” message over its good link. Because a
link spans two nodes, both nodes send the “links down” message back to the master node.
These nodes also change their state from “links up” to “links down,” and change the state
of the port connecting to the broken link, from “forwarding” to “blocking.”
Fault Recovery
When the master node detects an outage in the ring by using its detection methods, it
does the following:
1.
Declares the ring to be in a “failed” state.
2.
Unblocks its secondary port to enable the data VLAN traffic to pass between its
primary and secondary ports.
3.
Flushes its own forwarding database (FDB) for (only) the two ring ports.
4.
Sends an EPSR Ring-Down-Flush-FDB control message to all the transit nodes, via
both its primary and secondary ports.
Transit nodes respond to the Ring-Down-Flush-FDB message by flushing their forward
databases for each of their ring ports. As the data starts to flow in the ring’s new
configuration, each of the nodes (master and transit) re-learn their Layer 2 addresses.
During this period, the master node continues to send health check messages over the
control VLAN. This situation continues until the faulty link or node is repaired. For a multi-
domain ring, this process occurs separately for each domain within the ring.
The following figure shows the flow of control frames under fault conditions.
Note
When VCStack is used with EPSR, the EPSR failovertime must be set to at least 5
seconds to avoid any broadcast storms during failover. Broadcast storms may
occur if the switch cannot failover quickly enough before the EPSR failovertime
expires. See the
command for further information about the EPSR
failovertime.
See the
command for further information about VCStack
failover.