IEEE 802.1aq: Difference between revisions

Content deleted Content added

Inline

Revision as of 15:06, 16 June 2009

802.1aq (Shortest Path Bridging and Shortest Path Backbone Bridging) combines an Ethernet data path (either Provider Backbone Bridges (PBB) IEEE 802.1ah (Shortest Path Backbone Bridging - SPBB) or IEEE 802.1Q (Shortest Path Bridging SPB) with an IS-IS link state control protocol running between Shortest Path bridges (NNI links). The link state protocol is used to discover and advertise the network topology and compute shortest path trees from all bridges in the SPB Region., In SPBB the Backbone MAC (B-MAC) addresses of the participating nodes and also the service membership information for interfaces to non participating devices (UNI ports) is distributed. Topology data is the input to a calculation engine which computes symmetric shortest path trees based on minimum cost from each participating node to all other participating nodes. In SPB these trees provide a shortest path tree where individual MAC address can be learned and Group Address membership can be distributed. In SPBB the shortest path trees are then used to populate forwarding tables for each participating node's individual B-MAC addresses and for Group addresses. Depending on the topology several different equal cost multi path trees are possible and SPB and SPBB support up two algorithms per IS-IS instance. In shortest path bridging multicast trees are sub trees of the of the default shortest path tree formed by (source, group) pairing.

In SPB/SPBB as with other link state based protocols, the computations are done in a distributed fashion. Each node computes the Ethernet compliant forwarding behavior independently based on a normally synchronized common view of the network (at scales of about 1000 nodes or less) and the service attachment points (UNI ports). Ethernet filtering Database (or forwarding) tables are populated locally to independently and deterministically implement its portion of the network forwarding behavior.

The two different flavors of data path give rise to two slightly different versions of this protocol. One (SPBB) is intended where isolation of non participating device B-MAC addresses is desired and therefore uses a full encapsulation (mac-in-mac a.k.a IEEE 802.1ah and the other(SPB) is intended where isolation of non participating device MAC addresses is not required and reuses only the existing VLAN tag a.k.a IEEE 802.1Q on participating (NNI) links.

Chronologically Shortest Path Bridging came first with the project originally being conceived to address scalability and convergence of MSTP.

At the time the specification of Provider Backbone bridging was progressing and a group at Nortel were exploring the benefits of leveraging the PBB data plane and a link state control plane. Provider Link State Bridging (PLSB) was a strawman proposal brought by Nortel to the IEEE 802.1aq Shortest Path Bridging Working Group, in order to provide a concrete example of such a system. As IEEE 802.1aq standardisation has progressed, some of the detailed mechanisms adopted by PLSB have been replaced by functional equivalents, but all of the key concepts embodied in PLSB are being carried forward into the standard.

The two flavors (SPBB and SPB) will be described separately although the differences are almost entirely in the data plane.

Shortest Path Bridging - SPB

Shortest Path bridging enables shortest path trees for VLAN Bridges all IEEE 802.1 data planes and SPB is the term used in general. Recently there has been a lot of focus on SPBB as explained due to it ability to control the new PBB data plane and leverage certain capabilities such as removing the need to do B-MAC learning and automatically creating individual (unicast) and group (multicast Trees). SPB actually was the original project that endeavored to enable Ethernet VLANs to better utilize mesh networks.

A primary feature of Shortest Path bridging is the ability to use Link State IS-IS to learn network topology. In SPB the mechanism used to identify the tree is to use a different Shortest Path VLAN ID (VID) for each source bridge. The IS-IS topology is leveraged both to allocate unique SPVIDs and to enable shortest path forwarding for individual and group address. Originally targeted for small low configuration networks SPB grew into a larger project encompassing the latest provider control plane for SPB and harmonizing the concepts of Ethernet data plane. Proponents of SPB believe that Ethernet can leverage link state and maintain the attributes that have made Ethernet one of the most encompassing data plane transport technologies. When we refer to Ethernet it is the layer 2 frame format defined by IEEE 802.3 and IEEE 802.1. Ethernet VLAN bridging IEEE 802.1Q is the frame forwarding paradigm that fully supports higher level protocols such as IP.

SPB defines a shortest path Region which is the boundary of the shortest path topology and the rest of the VLAN topology (which may be any number of legacy bridges.) SPB operates by learning the SPB capable bridges and growing the Region to include the SPB capable bridges that have the same Base VID and MSTID configuration digest (Allocation of VIDs for SPB purposes).

SPB builds shortest path trees that support Loop Prevention and optionally support loop mitigation on the SPVID. SPB still allows learning of Ethernet MAC addresses but it can distribute multicast address that can be used to prune the shortest path trees according to the multicast membership either through MMRP or directly using IS-IS distribution of multicast membership.

SPB builds shortest path trees but also interworks with legacy bridges running Rapid Spanning Tree Protocol and Multiple Spanning Tree Protocol. SPB uses techniques from MSTP Regions to interwork with non-SPB regions behaving logically as a large distributed bridge as viewed from outside the region.

SPB supports shortest path trees but SPB also builds a spanning tree which is computed from the link state database and uses the Base VID. This means that SPB can use this traditional spanning tree for computation of the Common and Internal Spanning Tree (CIST). The CIST is the default tree used to interwork with other legacy bridges. It also serves as a fall back spanning tree if there are configuration problems with SPB.

SPB has been designed to manage a moderate number of bridges. SPB differs from SPBB in that MAC addresses are learned on all bridges that lie on the shortest path and a shared VLAN learning is used since destination MACs may be associated with multiple SPVIDs. SPB learns all MACs it forwards even outside the SPB region.

Shortest Path Backbone Bridging - SPBB

SPBB reuses the PBB data plane which does not require that the Backbone Core Bridges (BCB) learn encapsulated Edge addresses. At the edge of the network the Edge destination MAC addresses are learned. SPBB is very similar to PLSB using the same data and control planes but the format and contents of the control messages in PLSB are not compatible.

Individual MAC frames (unicast traffic) from an Ethernet attached device that are received at the SPBB edge are encapsulated in a PBB (mac-in-mac) IEEE 802.1ah header and then traverse the IEEE 802.1aq network unchanged until they are stripped of the encapsulation as they egress back to the non participating attached network at the far side of the participating network.

Ethernet destination addresses (from UNI port attached devices) perform learning over the logical LAN and are forwarded to the appropriate participating B-MAC address to reach the far end Ethernet destination. In this manner Ethernet MAC addresses are never looked-up in the core of an IEEE 802.1aq network. When comparing SPBB to PBB the behavior is almost identical to a PBB IEEE 802.1ah network. PBB does not specify how B-MAC addresses are learned and PBB may use Spanning tree to control the B-VLAN. In SPBB the main difference is that B-MAC address are distributed or computed in the control plane removing the B-MAC layer of learning in PBB. Also SPBB ensures that the route followed is shortest path tree.

The forward and reverse paths used for unicast and multicast traffic in an IEEE 802.1aq network are symmetric. This symmetry permits the normal Ethernet Ethernet Continuity Fault Messages (CFM) IEEE 802.1ag to operate unchanged for SPB and SPBB and has desirable properties with respect to time distribution protocols such as IEEE 1588v2. Also existing Ethernet Loop prevention is augmented by loop mitigation to provide fast data plane convergence.

Group Address and unknown destination individual frames are optimally transmitted to only members of the same Ethernet service. IEEE 802.1aq supports the creation of thousands of logical Ethernet services in the form of E-LINE, E-LAN or E-TREE constructs which are formed between non participating logical ports of the IEEE 802.1aq network. These group address packets are encapsulated with a PBB header which indicates the source participating address in the SA while the DA indicates the locally significant group address this frame should be forwarded on and which source bridge originated the frame. The IEEE 802.1aq multicast forwarding tables are created based on computations such that every bridge which is on the shortest path between a pair of bridges that are members of the same service group, will create proper FDB state to forward or replicate frames it receives to that members of that service group. Since the group address computation produce shortest path trees, there is only ever one copy of a multicast packet on any given link. Since only bridges on a shortest path between participating logical ports create FDB state the multicast makes the efficient use of network resources.

The actual group address forwarding operation operates more or less identically to classical Ethernet, the B-DA+B-VID combination are looked up to find the egress set of next hops. The only difference with classical Ethernet is that reverse learning is disabled for participating Bridge B-MAC addresses and is replaced with a ingress check and discard (when the frame arrives on an incoming interface from an unexpected source). Learning is however implemented at the edges of the SPBB multicast tree to learn the B-MAC to MAC address relationship for correct individual frame encapsulation in the reverse direction (as packets arrive over the Interface).

Properly implemented an IEEE 802.1aq network can support up to 1000 participating bridges and provide 10's of thousands of layer 2 E-LAN services to Ethernet devices. This can be done by simply configuring the ports facing the Ethernet devices to indicate they are members of a given service. As new members come and go the IS-IS protocol will advertise the I-SID membership changes and the computations will grow or shrink the trees in the participating node network as necessary to maintain the efficient multicast property for that service.

IEEE 802.1aq has the property that only the point of attachment of a service needs configuration when a new attachment point comes or goes. The trees produced by the computations will automatically be extended or pruned as necessary to maintain connectivity. In some existing implementations this property is used to automatically (as opposed to through configuration) add or remove attachment points for dual homed technologies such as rings to maintain optimum packet flow between a non participating ring protocol and the IEEE 802.1aq network by activating a secondary attachment point and deactivating a primary attachment point.

Failure Recovery

Failure recovery is as per normal IS-IS with the link failure being advertised and new computations being performed, resulting in new FDB tables. Since no Ethernet addresses are advertised or known by this protocol, there is no re-learning required by the SPBB core and its learned encapsulations are unaffected by a transit node or link failure.

Fast link failure detection may be performed using IEEE 802.1ag Continuity Check Messages (CCMs) which test link status and report a failure to the IS-IS protocol. This allows much faster failure detection than is possible using the IS-IS hello message loss mechanisms.

Operations and Management (OA&M)

See IEEE 802.1ag and ITU-recommendation Y.1731 (external link below)

Equal Cost Multi Tree - ECMT

Two ECMT paths are initially defined however there are many more than two possible. ECMT in an IEEE 802.1aq network is more predictable than with IP or MPLS because of symmetry between the forward and reverse paths. The choice as to which ECMT path will be used is therefore an operator assigned head end decision while it is a local / hashing decision with IP/MPLS.

IEEE 802.1aq when faced with a choice between two equal link cost paths uses the following logic for its first ECMT tie breaking algorithm. First if one path is shorter than the other in terms of hops, the shorter path is chosen, otherwise, the path with the minimum node identifier (IS-IS SysID) is chosen. Other ECMT algorithms are created by simply using known permutations of the SysIds. For example the second defined ECMT algorithm uses the path with the minimum of the inverse of the SysID (~SysID) and can be thought of as taking the path with the maximum node identifier. There are an infinite number of other permutations but only two are defined at present.

A service is assigned to a given ECMT B-VID at the edge of the network by configuration. As a result non participating packets associated with that service are encapsulated with the VID associated with the desired ECMT end to end path. All individual and group address traffic associated with this service will therefore use the proper ECMT B-VID and be carried symmetrically end to end on the proper equal cost multi path. Essentially the operator decides which services go in which ECMT paths, unlike a hashing solution used in other systems such as IP/MPLS. Trees can support LAG groups within a tree "branch" segment where some form of hashing occurs.

This symmetric and end to end ECMT behavior gives IEEE 802.1aq a highly predictable behavior and off line engineering tools can accurately model exact data flows. The behavior is also advantageous to networks where one way delay measurements are important. This is because the one way delay can be accurately computed as 1/2 the round trip delay. Such computations are used by time distribution protocols such as IEEE 1588 for frequency and time of day synchronization as required between precision clock sources and wireless base stations.

SPBB Example

We will work through SPBB behavior on a small example, with emphasis on the shortest path trees for unicast and multicast.

The network shown below [in Figure 1] consisists of 8 participating nodes numbered 0 through 7. These would be switches or routers running the IEEE 802.1aq protocol. Each of the 8 participating nodes has a number of adjacencies numbered 1..5. These would likely correspond to interface indexes, or possibly port numbers. Since 802.1aq does not support parallel interfaces each interface corresponds to an adjacency. The port / interface index numbers are of course local and are shown because the output of the computations produce an interface index (in the case of unicast) or a set of interface indexes (in the case of multicast) which are part of the forwarding information base (FIB) together with a destination MAC address and backbone VID.

Figure 1 - example nodes, links and interface indexes

The network above has a fully meshed inner core of 4 nodes 0..3 and then four outer nodes 4,5,6 and 7 each dual homed onto a pair of inner core nodes.

Normally when nodes come from the factory they have a MAC address assigned which becomes a node identifier but for the purpose of this example we will assume that the nodes have MAC addresses of the form 00:00:00:N:00:00 where N is the node id (0..8) from Figure 1. Therefor node 2 has a MAC address of 00:00:00:00:02:00. Node 2 is connected to node 7 (00:00:00:00:07:00) via interface/5.

The IS-IS protocol runs on all the links shown since they are between participating nodes. The IS-IS hello protocol has a few additions for 802.1aq including information about backbone VIDs to be used by the protocol. We will assume that the operator has chosen to use backbone VIDs 101 and 102 for this instance of 802.1aq on this network.

The node will use their MAC addresses as the IS-IS SysId and joing a single IS-IS level and exchange link state packets (LSPs in IS-IS terminology). The LSPs will contain node information and link information such that every node will learn the full topology of the network. Since we have not specified any link weights in this example, the IS-IS protocol will pick a default link metric for all links, therefore all routing will be minimum hop count.

After topology discovery the next step is distributed calculation of the unicast routes for both ECMP VIDs and population of the unicast forwarding tables (FIBs).

Figure 2 - two ECMP paths between nodes 7 and 5

Consider the route from node 7 to node 5. There are a number of Equal Cost Paths. 802.1aq specifies how to choose two of them. The first is referred to as the Low PATH ID path. This is the path which has the minimum node id on it. In this case the Low PATH ID path is the 7->0->1->5 path (as shown in red in Figure 2). Therefore each node on that path will create a forwarding entry toward the MAC address of node five using the first ECMP VID 101. Conversely, 802.1aq specifies a second ECMP tie breaking algorithm called High PATH ID. This is the path with the maximum node identifier on it and in the example is the 7->2->3->5 path (shown in blue in Figure 2).

Node 7 will therefore have a FIB that among other things indicates:

MAC 00:00:00:05:00 / vid 101 the next hop is interface/1.
MAC 00:00:00:05:00 / vid 102 the next hop is interface/2.

Node 5 will have exactly the inverse in its FIB.

MAC 00:00:00:07:00 / vid 101 the next hop is interface/1.
MAC 00:00:00:07:00 / vid 102 the next hop is interface/2.

The intermediate nodes will also produce consistent results so for example node 1 will have the following entries.

MAC 00:00:00:07:00 / vid 101 the next hop is interface/5.
MAC 00:00:00:07:00 / vid 102 the next hop is interface/4.
MAC 00:00:00:05:00 / vid 101 the next hop is interface/2.
MAC 00:00:00:05:00 / vid 102 the next hop is interface/2.

And node 2 will have entries as follows:

MAC 00:00:00:05:00 / vid 101 the next hop is interface/2.
MAC 00:00:00:05:00 / vid 102 the next hop is interface/3.
MAC 00:00:00:07:00 / vid 101 the next hop is interface/5.
MAC 00:00:00:07:00 / vid 102 the next hop is interface/5.

If we had an attached non participating device at node 7 talking to a non participating device at node 5 (for example device A talks to device C in Figure 3), they would communicate over one of these shortest paths with a mac-in-mac encapsulated frame. The MAC header on any of the NNI links would show an outer source address of 00:00:00:70:00, an outer destination address of 00:00:00:50:00 and a BVID of either 101 or 102 depending on which has been chosen for this set of non participating ports/vids. The header once inserted at node 7 when received from node A, would not change on any of the links until it egressed back to non participating device C at node 5. All participating devices would do a simple DA+VID lookup to determine the outgoing interface, and would also check that incoming interface is the proper next hop for the packet's SA+VID. The addresses of the participating nodes 00:00:00:00:00:00 ... 00:00:00:07:00 are never learned but are advertised by IS-IS as the node's SysId.

Unicast forwarding to a non participating (client - eg A,B,C,D from Figure 3) address is of course only possible when the first hop participating node (eg 7) is able to know which last hop participating node (eg 5) is attached to the desired non participating node (eg C). Since this information is not advertised by IEEE 802.1aq it has to be learned. The mechanism for learning is identical to IEEE 802.1ah, in short, the corresponding outer MAC unicast DA, if not known is replaced by a multicast DA and when a response is received, the SA of that response now tells us the DA to use to reach the non participating node that sourced the response. eg node 7 learns that C is reached by node 5.

Figure 3 - per source, per service multicast for E-LAN

Since we wish to group/scope sets of non participating ports into services and prevent them from multicasting to each other IEEE 802.1aq provides for per source, per service multicast forwarding and a special multicast destination address format to provide this. Since the multicast address must uniquely identify the tree, and because there is a tree per source per unique service, the multicast address contains two components, a service component in the low order 24 bits and a network wide unique identifier in the upper 22 bits. Since this is a multicast address the multicast bit is set, and since we are not using the standard OUI space for these manufactured addresses, the Local 'L' bit is set to disambiguate these addresses. In Figure 3 above, this is represented with the DA=[7,O] where the 7 represents packets originating from node 7 and the colored O represents the E-LAN service we are scoped within.

Prior to creating multicast forwarding for a service, nodes with ports that face that service must be told they are members. For example nodes 7,4,5 and 6 are told they are members of the given service, for example service 200, and further that they should be using bvid 101. This is advertised by ISIS and all nodes then do the SPBB computation to determine if they are participating either as a head end or tail end, or a tandem point between other head and tail ends in the service. Since node 0 is a tandem between nodes 7 and 5 it creates a forwarding entry for packets from node 7 on this service, to node 5. Likewise, since it is a tandem between nodes 7 and 4 it creates forwarding state from node 7 for packets in this service to node 4 this results in a true multicast entry where the DA/VID have outputs on two interfaces 1 and 2. Node 2 on the other hand is only on one shortest path in this service and only creates a single forwarding entry from node 7 to node 6 for packets in this service.

Figure 3 only shows a single E-LAN service and only the tree from one of the members, however very large numbers of E-LAN services with membership from 2 to every node in the network can be supported by advertising the membership, computing the tandem behaviors, manufacturing the known multicast addresses and populating the FIBs. The only real limiting factors are the FIB table sizes and computational power of the individual devices both of which are growing yearly in leaps and bounds.

External links

Sources

@@ Line 51: / Line 51: @@
 Failure recovery is as per normal [[IS-IS]] with the link failure being advertised and new computations being performed, resulting in new FDB tables. Since no Ethernet addresses are advertised or known by this protocol, there is no re-learning required by the SPBB core and its learned encapsulations are unaffected by a transit node or link failure.
-Fast link failure detection may be performed using [[IEEE 802.ag]] Continuity Check Messages (CCMs) which test link status and report a failure to the IS-IS protocol. This allows much faster failure detection than is possible using the IS-IS hello message loss mechanisms.
+Fast link failure detection may be performed using [[IEEE 802.1ag]] Continuity Check Messages (CCMs) which test link status and report a failure to the IS-IS protocol. This allows much faster failure detection than is possible using the IS-IS hello message loss mechanisms.
 == Operations and Management (OA&M) ==