Network protection is Key to successful ethernet deployments

Parthasarathy Varadharajan

19 July 2023

Ethernet networks are relied upon to transport real time data and critical business information. Such networks provide critical end-to-end services to subscribers and enterprise users. They must be reliable and resilient. As such, network designers must design their networks to be highly fault tolerant and capable of rapid recovery, to ensure near zero service downtime. The focus of this blog is to detail the various transport protocols implemented in routers/switches used by service providers to achieve a high degree of network protection, and therefore service availability.

Protection broadly involves three phases. These are i) service establishment along with the establishment of alternate/backup services ii) failure detection, and iii) a repair and recovery process. A failure event in a network can occur due to link flapping, a node failure, or a path failure. The protocols required for protection, monitoring and repair vary, depending on the transport mechanism used (i.e. Ethernet vs IP).

Protection techniques must also ensure that service restoration happens within 50 milliseconds, which has become the de-facto industry standard. To meet such stringent performance requirements, software implementations should take advantage of hardware assist techniques supported by silicon vendors.

One such hardware assist technique is ‘offloading’. Offloading involves packet generation and processing of the received packets used for monitoring at a very high frequency. Errors are reported by the hardware offloading mechanism back to the software, in the event of non-reception of packets, which then leads to switching to protected paths. To achieve protection, resources must be provisioned such that when the protected resource becomes unavailable, the backup resource takes over. The backup resource(s) can be placed in standby mode or can actively transport traffic in a load-balanced fashion. Standby mode is easier to implement but consumes more resources. The backup resources are used only when the primary goes down. Active mode is difficult to implement but conserves resources since sharing happens on the protection path.

Given this background, let us examine the plethora of protocols available for protection, starting with Layer2 Ethernet traffic, IP traffic and MPLS transport.

Ethernet links transporting Layer2 traffic can be connected in a mesh fashion to protect against failures. The usage of STP (Spanning Tree Protocol) blocks the redundant links from forwarding traffic. This protocol ensures that redundant links become available on primary link or node failures. Since STP does not meet the convergence time requirements, evolutions of STP, namely RSTP (Rapid Spanning Tree Protocol) and MSTP (Multiple Spanning Tree Protocol) – which guarantee faster convergence and load balancing – are deployed instead.

To ensure predictable topology convergence, operators now use ring topologies instead of mesh networks. The elusive 50 ms convergence with load balancing finally was achieved with the advent of ERPS (Ethernet Ring Protocol Switching). ERPS uses ECFM (Ethernet Connectivity Fault Management) as the monitoring mechanism for faster failure detection. ECFM using hardware offloading support detects failures within ten 10 msec, providing ERPS a 40 msec window for switchover. Hardware-aided MAC flushing and hardware-assisted failover support ensures that 50 msec service protection via ERPS is a reality. Operators can also aggregate links using LACP (Link Aggregation Control Protocol) for higher bandwidth and availability. The usage of micro-BFD (Bidirectional Forwarding Detection) with LACP enables faster failure detection and the convergence of aggregated links.

The routing technology used in IP networks inherently supports redundancy. The best path is used for packet forwarding and the alternate/inferior path, though established, is used for forwarding only when the best path goes down. Since convergence time is high in the plain vanilla routing methodology, IP networks use BFD as a monitoring protocol, along with fast reroute support in routing protocols. This allows them to realize link and node protection and achieve faster convergence. BFD sessions are offloaded after the initial packet exchange and, just like ECFM, can detect failures within 10msec. BFD sessions associated with routing protocols assist in faster routing protocol convergence. To achieve the 50 msec industry benchmark, backup paths are computed using the LFA (Loop free alternate) mechanism and installed in the forwarding path beforehand. Protection techniques supported in the hardware ensure backup paths are installed on failover scenarios, providing service availability even in route-scaled deployments. IP networks also support ECMP (Equal Cost Multi-Path) which provides protection and load balancing. An alternate mechanism in IP networks for node redundancy is achieved via the usage of the VRRP (Virtual Router Redundancy Protocol) protocol along with BFD for monitoring.

IP MPLS networks support a wide variety of protection schemes, with BFD again acting as the monitoring mechanism. RSVP-TE (Resource Reservation Protocol – Traffic Engineering) inherently contains FRR (Fast Re-route) support and further supports the provision of backup tunnels to protect traffic-engineered tunnels. LDP (Label Distribution Protocol) supports FRR techniques, including LFA and RLFA (remote LFA) for protection. Segment routed LSPs (Label Switched Path) not only support LFA and RLFA, but also TI-LFA (Topology Independent LFA) for complete protection coverage.

Conclusion

Service providers design the edge and core of their networks with service protection and high availability as one of their key objectives. Redundancy at every level, including link, node and path, is the cornerstone of such networks. The transport protocols and network topology vary, depending on the services and applications supported. Capgemini’s ISS switching solution supports a rich set of protocols with full support for redundancy, allowing equipment manufacturers to build devices for high service scale and availability.

Author

Varadharajan is an IP/MPLS Architect with 25 years of experience in datacom and the telecommunications industry. He has been involved in the development of frameworks for mobile backhaul gateways, data center switches, secure routers, metro ethernet devices and industrial switches.

Conclusion

Author

Parthasarathy Varadharajan

Related