Comware7 IRF MAD LACP : New selection method !

Most people working with IRF will be aware of the MAD (Multiple Active (Master) Detection) process.

In this article we will review the operation of MAD LACP on Comware5 and the changes in the implementation for Comware7 devices.

IRF MAD

MAD is required to fix a split-stack situation, where a failure of the stacking network links would result in 2 or more masters on the network, which all claim to have the same MAC and IP address (which results in unpredictable topologies for LACP, xSTP, OSPF, etc.).

MAD will try to detect multiple masters and keep only 1 master online, the others will shutdown their interfaces (effectively removing them from the network).

In Comware5, there were several MAD detection processes:

  • MAD LACP : Based on proprietary LACP extension. Can use existing LinkAggregation links (in-band), but peer must be Comware device (recent Provision software also has MAD LACP support !).  This requirement only applies to the split-brain detection Bridge Aggregation link of course, so all other peer devices can be Cisco, Avaya, ESX host etc.
  • MAD BFD : Based on BFD ip protocol. Requires a dedicated link between the devices.
  • MAD ARP : Based on IPv4 ARP. (I have not used this in an implementation yet).
  • MAD ND : Based on IPv6 ND. Same principle as ARP method.

Basic principle of MAD

When there is a split stack (the stacking network links have failed), there will be 2 masters. These 2 masters should be able to reach each other via either:

  • Peer devices link aggregation (BAGG) to process MAD LACP
  • Directly connected dedicated links to process MAD BFD (direct links should follow different physical path from the stack links for obvious reasons). An intermediate L2 switch can be used as well to save ports on the devices, just make sure the ports are forwarding (no xSTP or other possible protocols which could block the link).

When the masters can reach each other through LACP or BFD, they will be able to exchange their Member ID (unit ID), and the lowest Member ID will win, the other one will shutdown all the local interfaces (except the MAD-excluded interfaces).

This will result in a stable network, since only 1 master remains online.

MAD LACP on Comware5

MAD LACP on Comware5 will include the domain ID and Master Unit ID into each LACP packet. When there is a split stack, there will be 2 IRF sections, each with their own master.

Assume this start situation, with an IRF system of 4 switches and a neighbor  device to support the MAD LACP (this could be an IRF system as well, it is just simplified in this diagram):

cmw7-mad-lacp-1

Next a double link failure occurs (link from unit1 to unit2 and link from unit1 to unit4).

This will immediately result in 2 IRF systems, each with their own master. Master election on the 2,3,4 side will be done based on configured priority (highest wins), uptime and unit MAC (this order). Assume here that unit2 had the best priority configured:

cmw7-mad-lacp-2

Now, each master will send the LACP packet with his own Unit ID, which the peer device will relay to the other member ports of the Link Aggregation group. These packets are send over each member port, so the right-side IRF would transmit 3x the LACP packet, while the left-side IRF would transmit 1x the LACP packet:

cmw7-mad-lacp-3

The neighbor switch does not know where the split-stack has occurred, so whenever it receives an LACP packet on a BAGG member port, it will replicate the information to all other member ports. This means the original single (1) LACP packet from Unit1 would arrive 3 times (through unit2, unit3 and unit4, which forward the LACP packet inside the IRF stack to the master unit2):

cmw7-mad-lacp-7-2

When the other master receives the LACP packet with the remote master Unit ID, it will compare it to his own Unit ID. The lowest Unit ID will win, so:

  • If the remote Unit ID is higher : write a message in the log file about MAD master conflict, but do nothing else (this device will remain online)
  • If the remote Unit ID is lower : write a message in the log file about MAD master conflict, then shutdown all local interfaces, except the configured MAD excluded interfaces (this device will be offline – removed from the network)

cmw7-mad-lacp-5

Resulting shutdown ports situation:

cmw7-mad-lacp-6

Although this is a very predictable mechanism, the used diagrams also show that the side with the lowest master unit ID will win, so in this example, only 1 switch will remain online, shutting down the other 3 devices. The same could happen with an IRF with 8 systems of course.

MAD LACP on Comware7 : Side with most online members wins !

In Comware7 switches, the LACP proprietary TLVs have been extended with an additional field to exchange the current online members in the IRF system.

This online member count information will be included in the selection process of which Master will win/loose the MAD process.

The new selection process would be:

  1. Side with most online members will win
  2. If equal members, use classic method : lowest master unit ID wins

So, we start from the same setup as the Comware5 example, but this time we have an IRF system with 4 Comware7 devices. The same IRF links fail, and the MAD LACP packets are exchanged. However, this time, the online member count is included:

cmw7-mad-lacp-7-1

So when this LACP packet if relayed by the neighbor switch to the remote IRF system:

cmw7-mad-lacp-7-2

The new selection process will result in the Unit1 which will shutdown its local ports:

cmw7-mad-lacp-7-3

And the resulting final topology has only unit1 down:

cmw7-mad-lacp-7-4

So thanks to the added step in Comware7, the side with most units online will remain online. This result in units 2,3,4 remaining online, while unit 1 would shutdown its interfaces.

Note1: When the reported online members is equal for both sides, the lowest master unit ID will win again.

Note2: In the diagrams, the LACP exchange is shown after the split stack failure, but this was actually already running before the split-stack. So under normal conditions, the Master will send out the extended LACP information over all the BAGG member ports, the neighbor switch will relay each packet back over the member ports, and the Master will receive its own packet information back. Since the received LACP information contains his own Master ID, nothing needs to be done.

Be careful when combining MAD methods

In Comware5, it did not matter if you selected LACP, BFD, etc method or any combination of these (you could have MAD LACP and BFD running at the same time), since the outcome would always be the same.

As of Comware7, you should realize that MAD BFD, ARP and ND are still following the classic (Comware5) rules, only the MAD LACP selection process was changed.

So do not mix MAD LACP with MAD BFD methods on Comware7 devices, since this would lead to unpredictable results, depending on which method detects the split brain first.

This entry was posted in Comware5, Comware7, IRF and tagged , , , , . Bookmark the permalink.

19 Responses to Comware7 IRF MAD LACP : New selection method !

  1. vmrulz says:

    By logging in you’ll post the following comment to Comware7 IRF MAD LACP : New selection method !:

    Hey I just found your blog. Great to see somebody sharing comware information… it is like trying to find a needle in a haystack compared to Cisco IOS.
    Question regarding MAD in v5. We had a VAR setup an IRF with 3 7506’s. Two of them have a MAD BFD linkage while the 3rd (in a separate building) has no MAD linkage. To me this is a mistake but I can’t find anything to backup this conclusion. What do you think?

    dis mad verb
    Current MAD status: Detect
    Excluded ports(configurable):
    Excluded ports(can not be configured):
    Ten-GigabitEthernet1/2/0/1
    Ten-GigabitEthernet1/2/0/2
    Ten-GigabitEthernet1/3/0/1
    Ten-GigabitEthernet1/3/0/2
    Ten-GigabitEthernet2/2/0/1
    Ten-GigabitEthernet2/2/0/2
    Ten-GigabitEthernet2/3/0/1
    Ten-GigabitEthernet2/3/0/2
    Ten-GigabitEthernet3/8/0/1
    Ten-GigabitEthernet3/8/0/2
    Ten-GigabitEthernet3/9/0/1
    Ten-GigabitEthernet3/9/0/2
    MAD LACP disabled.
    MAD BFD enabled interface:
    Vlan-interface500
    mad ip address 172.16.254.9 255.255.255.252 member 1
    mad ip address 172.16.254.10 255.255.255.252 member 2

    Regards
    Ron

    • Hi Ron,

      The principle is that all members should be participating in the MAD process, since you do not know in advance where the split brain will occur.
      In your example with the 2 buildings, the link between the 2 buildings would be more likely to go down as opposed to links directly connected between 2 nearby devices.
      This means that MAD would be essential to operate with the remote building switch as well.

      However, I cannot just call this a mistake, since I do not know the original design requirements.
      For example if it is accepted that split brain can occur and no access devices are dual-homed to both 7500 in building1 AND building2, it could be considered as acceptable.
      When the inter-building link would go down, you just get 2 network islands, so there is no confusion on the network (you do not have a single network with 2 independent devices operating with the same IP/MAC, each island would see a device with the same IP/MAC, but since the islands are totally isolated, this is not a problem). When the link is restored, the remote site 7500 will need to be rebooted to join the IRF (can be automatic if configured), but then the problem is solved.

      It is also easy for me to say “just enable MAD”, but you have to do it properly, otherwise you are better of not configuring it, to avoid a false feeling of split brain security.
      This means for instance : MAD is just a protocol (L2 LACP or L3 BFD) which needs an ethernet link. However, the link used to transport the MAD traffic SHOULD NOT use the same physical path as the links which are used for IRF.
      Suppose you have 2 physical paths between the 2 buildings, and each path has e.g. 6 fibers.
      With this setup, you can build an IRF (using links from both paths), so the IRF will remain intact when either of the 2 paths would fail.
      However, when you would use one (or even 2 fibers, 1 from each path) of the remaining fibers for the MAD traffic, it would not make any sense.

      In case 1 path fails (e.g. road works typically do not cut a single fiber, but all fibers 😦 ), IRF would still work on the remaining path, and MAD would also still work on the remaining path.
      When the second path would fail as well, your IRF would be split, so you will get a new master in the remote site. But since the last MAD transport link is now broken as well, MAD cannot detect the 2 Masters anymore, so you are still left with 2 active Masters …

      I hope this example shows that in some scenario’s it just does not make sense to configure MAD, since it does not bring any added value to the table.

      best regards,Peter

  2. vmrulz says:

    Hello Peter,
    Thank you for the detailed reply. I’m a long time server/storage guy new to comware this year and frankly a junior network guy in general forced into yet another role since we don’t have budget for a real network eng.
    What you said makes sense I think. In our scenario the lone building 2 switch feeds downstream edge switches through BAGG’s and some lone server devices. None of these are homed to the switches in the other building. My concern was that with IRF links broken between the buildings we’d have two islands representing gateway 10.1.1.1.. but if the end devices don’t have a path to both islands then no big deal… I think I just worked this out in my pea brain.

  3. Pingback: IRF MAD Detection - Flomain Networking

  4. Léo says:

    Is MAD required even if the switch has no self IP address ?

    • Yes, since it may be involved in a Layer2 protocol (STP, LACP, …) which is using the switch MAC Address as identifier. In case you have a split brain and no MAD configured, there would be 2 switches on the network with the same STP BridgeID or LACP ID. That can be fun to troubleshoot…

  5. Mostafa says:

    Hello Peter. I hope you are doing well. I have a question regarding MAD LACP. Do we need to enter the command “mad enable” under the bridge aggregation on the intermediate switch ? Or it is only required on the IRF fabric we want to detect the split brain on ? And what if the intermediate device was an IRF fabric as well, how will a split brain be detected by the MAD LACP process ? Do we need to do configure something on the up stream switch ?

    • Hi Mostafa, nice to hear you again 🙂

      1/ The command “mad enable” is only required on the bridge-aggregation of the IRF device.
      When the intermediate device is a Comware based system, it will automatically “proxy” the received extended LACP information over the other member ports of its link-aggregation. So no config required on the intermediate Comware switch (both 5/7).
      When the intermediate device is a Provision (ArubaOS-Switch) based system, you must manually enable the “mad lacp” pass-through (see https://abouthpnetworking.com/2014/11/08/provision-support-for-irf-mad-lacp-split-brain-detection/) .

      2/ When the other system is also an IRF fabric, a false-positive could occur (on paper). In reality however, the extended LACP does not just contain the master id, but also the IRF domain ID. Since the domain ID must be unique per IRF system (this is the real reason why you must set a domain ID), the IRF systems will be able to distinguish in the LACP packets their own Master ID information and the “proxied” Master ID information, since the domain IDs for those 2 sections in the LACP packet will be different. So you would simply have 2 extension TLVs in the LACP packet when you have IRF – IRF, as opposed to 1 when a single IRF fabric would be monitored.

      I hope this clarifies the options!

      • Léo Scharf says:

        Hello there,

        If I may jump in and ask a related question.

        What happens when I enable MAD LACP on an aggregate going to a Flex10 Virtual Connect module ?
        Will it work ?

        Thanks !

      • Hi Léo, sure you can!
        Flex10 runs a different OS, so it does not recognize the MAD LACP extension and it will not proxy/mirror the information back over the other ports to the IRF, so your MAD will effectively not work.
        Most network devices I have seen do not freak out when they get the extended LACP information (which they do not understand), so it will typically not ‘break’ the normal LACP behavior on the link-aggregation either. But better leave it disabled on these type of link-aggregations, since it will never work anyway.
        In case you would be running the 6125XG/6125XLG type of blade switches (these run Comware7), it would work fine.

      • Mostafa says:

        Thanks Peter. So we only need to change the domain id on the intermediate IRF fabric. Another quick one 🙂 The bridge aggregation must be dedicated only for MAD LACP ? Or can we use this bridge aggregation as a trunk as well ? I have a scenario where we have 2x 5900 (rj-45 ports) in a IRF fabric for management and it is connected to a IRF fabric for production (SFP+ ports) and then the up link is to a nexus switch. Can we utilize the uplink between the management and the production fabrics for MAD LACP ? or there must be a dedicated bridge aggregation for MAD LACP only ?

      • Hi Mostafa,
        The whole point of the MAD LACP is that an existing “production” link can be used for the split-brain detection, so yes, this link-aggregation can be a normal VLAN trunk, no dedicated bridge-aggregation required.

  6. Mostafa says:

    Thank you very much for your information Peter 🙂
    Is there any verification commands to ensure that MAD LACP is configured correctly ? or we will only know when a split-brain occurs and check if it will behave as expected ?

    • Mostafa Hijazi says:

      Hello Peter,

      I hope all is well. I just want to revisit this question I asked sometime ago. It’s the same scenario as before.2 IRF frabrics, each fabric is 2 switches. They share a bridge aggregation. I configured mad lacp on this bridge aggregation on both IRF fabrics, and also each IRF has a uniqure domain id , 1 and 2. so when I tested the mad detection by disconnecting the IRF ports, it was able to detect it on frabric 1. but when i tried to disconnect the IRF ports on fabric 2, it did not work . 2 switches were still online. will it work this way ? or we need another bridge aggregation for mad lacp on fabric 2 ?

  7. John says:

    Hi, great write up, thanks for providing such detailed clear advise, I have a question however.
    If I have a distribution switch with say 10 BAGG’s to various access switch IRF clusters, should I enable MAD on all of them? How many would be the ideal number, to provide a stable MAD environment on the DS switch, is one enough?
    Thanks
    John

    • Hi John,
      No, there is no real need. Just ask yourself: “How many different detection systems do I need for a split brain?”.
      Split brain typically only happens after a double-failure (2 links in the stack down), which should be a very rare condition anyway. And only under these circumstances you will need the MAD detection. So if you want to be really sure, you can enable it on 2 link-aggregations, but normally 1 is already sufficient.

  8. Rob says:

    Hi,
    If I have a standalone IRF pair of switches, do I have to use MAD BFD? There are several bridge group interfaces using LACP, but these connect to server.
    Thanks,
    Rob

  9. Seb says:

    Hey Peter,
    how do I recorver from a recovery state if a Multi Actice Master has happend?
    Does it do it automatically if the IRF links are restored or do i need to do something?
    Thanks,
    Seb

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s