When you use IRF to group multiple Comware switches into 1 logical device, it is generally recommended to enable some split brain detection (split brain happens when all the stacking links are down).
For the MAD LACP method, only Comware switch could be used so far, now the Provision switches firmware has been updated, so an LACP link between a Provision and Comware IRF can be used for the MAD LACP.
Background
The split brain detection mechanism, which is known as Multiple Active Detection (MAD), is available through LACP, BFD, ARP and ND. I personally only use the LACP and BFD methods.
The LACP method is easy, since you can use an existing link-aggregation to a peer switch for the MAD detection. However, this uses an extended LACP PDU, using an additional TLV in the LACP packet. The TLV contains the active master ID of the IRF system.
When the peer LACP device receives this information, it should proxy this information back to the original IRF system over all the other ports of the link-aggregation.
As a result, the IRF system will receives its own ID information back on the other ports of the link-aggregation.
When the ID is the same, everything is OK. When the IDs are different however, it means there is a split brain.
Provision support
So far, only Comware devices could be used as the peer detection device, since any other vendor LACP implementation would not proxy the the additional TLV back over the other link-aggregation ports.
Now the Provision firmware has received an update and Provision switches can be used to provide the MAD LACP support.
This is good news for any mixed Provision-Comware networks, where MAD BFD may not have been possible for whatever reason.
Example configuration
Example is using a 3500 with K.15.16.0004 , so all 5400/3500/3800/2920 switches have support for it. The 2620 can also be used with current firmware (check release notes).
The Comware IRF is based on a 3600, a 100Mbit Comware switch (ports used are Ex/0/x, as opposed to Gx/0/x for a Gigabit switch)
The example assumes an IRF system is running already, so it shows only the steps to enable MAD LACP between the Comware IRF and Provision switch.
Steps:
- On Comware IRF, define an LACP link-aggregation to Provision
- On Comware IRF, enable the link-aggregation for MAD LACP
- On Provision, define an LACP link-aggregation to Comware IRF
- On Provision, enable the link-aggregation to perform MAD LACP pass-through
IRF: define LACP link-aggregation to Provision
# Define Bridge Aggregation 24 [switch-irf] interface bridge 24 # Enable LACP [switch-irf-Bridge-Aggregation24] link-aggregation mode dynamic [switch-irf-Bridge-Aggregation24] quit # Assign 2 physical interfaces to BAGG 24 [switch-irf] int range e1/0/24 e2/0/24 [switch-irf-if-range] port link-aggregation group 24 %Jan 1 00:23:34:889 2010 switch-irf LAGG/5/LAGG_ACTIVE: Member port Ethernet1/0/24 of aggregation group BAGG24 becomes ACTIVE. %Jan 1 00:23:34:919 2010 switch-irf IFNET/3/LINK_UPDOWN: Bridge-Aggregation24 link status is UP. [switch-irf-if-range] %Jan 1 00:23:36:479 2010 switch-irf LAGG/5/LAGG_ACTIVE: Member port Ethernet2/0/24 of aggregation group BAGG24 becomes ACTIVE. [switch-irf-if-range] quit [switch-irf]
IRF: enable link-aggregation for MAD LACP
# Enter the BAGG [switch-irf] int bridge 24 # Enable MAD LACP, assign a domain ID (should be unique per IRF system in your network) [switch-irf-Bridge-Aggregation24] mad enable You need to assign a domain ID (range: 0-4294967295) [Current domain is: 0]: 1 The assigned domain ID is: 1 Info: MAD LACP only enable on dynamic aggregation interface. [switch-irf-Bridge-Aggregation24]quit # Review MAD Configured methods [switch-irf] display mad MAD ARP disabled. MAD LACP enabled. MAD BFD disabled. # Review MAD verbose configuration [switch-irf] display mad verbose Current MAD status: Detect Excluded ports(configurable): Excluded ports(can not be configured): GigabitEthernet1/0/25 GigabitEthernet1/0/26 GigabitEthernet2/0/25 GigabitEthernet2/0/26 MAD ARP disabled. MAD enabled aggregation port: Bridge-Aggregation24 MAD BFD disabled. [switch-irf]
Provision: define LACP link-aggregation to Comware
# Create trk object with LACP protocol enabled HP-3500-24(config)# trunk 23,24 trk1 lacp # Review LACP status HP-3500-24(config)# show lacp LACP LACP Trunk Port LACP Admin Oper Port Enabled Group Status Partner Status Key Key ----- ------- ------- ------- ------- ------- ------ ------ 23 Active Trk1 Up Yes Success 0 290 24 Active Trk1 Up Yes Success 0 290 # Review LACP detailed peer information HP-3500-24(config)# show lacp peer LACP Peer Information. System ID: 2c27d7-79dc80 Local Local Port Oper LACP Tx Port Trunk System ID Port Priority Key Mode Timer ------ ------ -------------- ----- --------- ------- -------- ----- 23 Trk1 b8af67-38764b 24 32768 1 Active Slow 24 Trk1 b8af67-38764b 54 32768 1 Active Slow HP-3500-24(config)# # On Comware, verify LACP detailed peer information [switch-irf] display link-aggregation verbose Bridge-Aggregation 24 Loadsharing Type: Shar -- Loadsharing, NonS -- Non-Loadsharing Port Status: S -- Selected, U -- Unselected Flags: A -- LACP_Activity, B -- LACP_Timeout, C -- Aggregation, D -- Synchronization, E -- Collecting, F -- Distributing, G -- Defaulted, H -- Expired Aggregation Interface: Bridge-Aggregation24 Aggregation Mode: Dynamic Loadsharing Type: Shar System ID: 0x8000, b8af-6738-764b Local: Port Status Priority Oper-Key Flag -------------------------------------------------------------------------------- Eth1/0/24 S 32768 1 {ACDEF} Eth2/0/24 S 32768 1 {ACDEF} Remote: Actor Partner Priority Oper-Key SystemID Flag -------------------------------------------------------------------------------- Eth1/0/24 23 0 290 0xdc80, 2c27-d779-dc80 {ACDEF} Eth2/0/24 24 0 290 0xdc80, 2c27-d779-dc80 {ACDEF} [switch-irf]
Provision: enable MAD LACP Pass-through
# enable MAD LACP TLV pass-through (not enabled by default) HP-3500-24(config)# interface trk1 lacp mad-passthrough enable # Review MAD LACP configuration HP-3500-24(config)# show lacp mad-passthrough Trunk-Group LACP-MAD-PASSTHROUGH ------------ --------------------- Trk1 Enabled # Review MAD LACP counters HP-3500-24(config)# show lacp mad-passthrough counters MAD Passthrough MAD Passthrough MAD Passthrough Port Trunk PDUs Tx PDUs Rx PDUs Dropped ------ ------ ---------------- ---------------- ---------------- 23 Trk1 4 10 6 24 Trk1 4 11 7 HP-3500-24(config)#
Validation
The setup validated by shutting down the IRF links, to force a split brain. The commands are executed on the IRF member 2 console, which will shutdown its ports as a result of the detection.
# Forced shutdown of IRF stacking links [switch-irf] int range g1/0/25 g1/0/26 [switch-irf-if-range] shutdown # Console is logged out, since new Master is selected for this partition <switch-irf> #Jan 1 00:39:15:197 2010 switch-irf SHELL/4/LOGIN: Trap 1.3.6.1.4.1.25506.2.2.1.1.3.0.1: login from Console %Jan 1 00:39:15:353 2010 switch-irf SHELL/5/SHELL_LOGIN: Console logged in from aux1. # Initial ENTER commands need to wait for the Management to become available again System is busy in recovering configuration, please wait a moment... System is busy in recovering configuration, please wait a moment... # New console login is now effective <switch-irf> # No console messages have been seen, since the console was not active yet # so review the log file <switch-irf> dis logbuffer reverse Logging buffer configuration and contents:enabled Allowed max buffer size : 1024 Actual buffer size : 512 Channel number : 4 , Channel name : logbuffer Dropped messages : 0 Overwritten messages : 0 Current messages : 10 %Jan 1 00:39:30:497 2010 switch-irf SHELL/6/SHELL_CMD: -Task=au1-IPAddr=**-User=**; Command is dis logbuffer reverse %Jan 1 00:39:15:513 2010 switch-irf SHELL/5/SHELL_LOGIN: Console logged in from aux1. %Jan 1 00:39:14:878 2010 switch-irf IFNET/3/LINK_UPDOWN: Bridge-Aggregation24 link status is DOWN. %Jan 1 00:39:14:863 2010 switch-irf LAGG/5/LAGG_INACTIVE_PHYSTATE: Member port Ethernet2/0/24 of aggregation group BAGG24 becomes INACTIVE because the port's physical state (down) is improper for being attached. %Jan 1 00:39:14:862 2010 switch-irf LAGG/5/LAGG_INACTIVE_CONFIGURATION: Member port Ethernet1/0/24 of aggregation group BAGG24 becomes INACTIVE because the port's configuration is improper for being attached. %Jan 1 00:39:14:862 2010 switch-irf IFNET/3/LINK_UPDOWN: Ethernet2/0/24 link status is DOWN. %Jan 1 00:39:14:862 2010 switch-irf MAD/1/MAD_COLLISION_DETECTED: Multi-active devices detected, please fix it. %Jan 1 00:39:14:607 2010 switch-irf STM/3/STM_LINK_STATUS_DOWN: IRF port 2 is down. %Jan 1 00:39:14:607 2010 switch-irf HA/5/HA_SLAVE_TO_MASTER: Slave board in slot 2 changes to master. %Jan 1 00:00:33:320 2010 switch-irf IC/6/SYS_RESTART: -Slot=1; System restarted -- HP Platform Software. <switch-irf> # Verify MAD Process has shutdown all interfaces, # so only the other IRF Member remains online on the network <switch-irf> dis interface brief down The brief information of interface(s) under bridge mode: Link: ADM - administratively down; Stby - standby Interface Link Cause BAGG24 DOWN MAD ShutDown Eth2/0/1 DOWN MAD ShutDown Eth2/0/2 DOWN MAD ShutDown Eth2/0/3 DOWN MAD ShutDown Eth2/0/4 DOWN MAD ShutDown Eth2/0/5 DOWN MAD ShutDown Eth2/0/6 DOWN MAD ShutDown Eth2/0/7 DOWN MAD ShutDown Eth2/0/8 DOWN MAD ShutDown Eth2/0/9 DOWN MAD ShutDown Eth2/0/10 DOWN MAD ShutDown Eth2/0/11 DOWN MAD ShutDown Eth2/0/12 DOWN MAD ShutDown Eth2/0/13 DOWN MAD ShutDown Eth2/0/14 DOWN MAD ShutDown Eth2/0/15 DOWN MAD ShutDown Eth2/0/16 DOWN MAD ShutDown Eth2/0/17 DOWN MAD ShutDown Eth2/0/18 DOWN MAD ShutDown Eth2/0/19 DOWN MAD ShutDown Eth2/0/20 DOWN MAD ShutDown Eth2/0/21 DOWN MAD ShutDown Eth2/0/22 DOWN MAD ShutDown Eth2/0/23 DOWN MAD ShutDown Eth2/0/24 DOWN Link-Aggregation interface down GE2/0/25 DOWN Not connected GE2/0/26 DOWN Not connected GE2/0/27 DOWN MAD ShutDown GE2/0/28 DOWN MAD ShutDown <switch-irf>
On the Provision side, verify that the link-aggregation has only 1 active port remaining:
HP-3500-24(config)# show lacp LACP LACP Trunk Port LACP Admin Oper Port Enabled Group Status Partner Status Key Key ----- ------- ------- ------- ------- ------- ------ ------ 23 Active Trk1 Up Yes Success 0 290 24 Active Trk1 Down No Success 0 290
Supplemental validation
Run another split-brain check, to see the console output. This can be forced using the mad restore command.
The original mad restore command is intended to be used in this rare occasion:
* IRF configured between switches (example SW1/SW2)
* Split brain occurs, SW2 MAD detects it and shuts down all interfaces
* SW1 is the only surviving node, network still ok
* SW1 encounters a power failure, so the network is down (SW2 has all ports down, so no more network)
* Instead of performing a full reboot of SW2 to get it online again, the admin can use mad restore on SW2 to enable the interfaces again. Network will be back online after this command, since there is no more split brain condition (SW1 is powered down).
You can abuse this functionality to run multiple split brain tests without having to do a full reboot of the switches, this is what is done in this example.
Since the mad restore will be done on member2, the interfaces will come UP, MAD LACP will detect the split brain again, and all interfaces will be SHUTDOWN again. But this time you can follow the process on the console log output as well.
[switch-irf] mad restore This command will restore the device from multi-active conflict state. Continue? [Y/N]:y Restoring from multi-active conflict state, please wait... [switch-irf] %Jan 1 00:52:42:448 2010 switch-irf IFNET/3/LINK_UPDOWN: Ethernet2/0/24 link status is UP. %Jan 1 00:52:42:568 2010 switch-irf MAD/1/MAD_COLLISION_DETECTED: Multi-active devices detected, please fix it. %Jan 1 00:52:42:709 2010 switch-irf IFNET/3/LINK_UPDOWN: Ethernet2/0/24 link status is DOWN. [switch-irf]
Conclusion
This example shows how a Provision LACP link-aggregation can be used to assist a Comware IRF system for the split brain detection.
Hello quick question about enabling LACP MAD on a bridge aggregation link if prompts you for the domain id which i leave as the ID of the IRF pair i am on but why does it prompt for this and in what scenarios would you chose a different domain ID to the switch you are on ?
ok figured it out thought i was setting a domain per bridge-aggregation interface but it seems to change the whole switches irf domain
Hi, I have a couple of questions.
1. I am using route aggregation interfaces exclusively between the core/distribution layers. Can MAD LACP be enabled on a route aggregation interface in the same way as a bridge aggregation interface? The command is accepted, but I cannot find a reference to a MAD/RAGG config example.
2. Some references state that MAD needs to be enabled on both ends of a link and others do not. Is this necessary for the comware LACP extensions to work properly?
Hi Rob,
1. the MAD information is part of the LACP packet exchange, so if LACP is ok, MAD is ok. In fact, from a link-agg point of view, there is no difference between a BAGG with LACP and RAGG with LACP. This is only a switch local config difference with regards to routing/switching over the link-aggregation, so it has no impact on the MAD LACP process.
2. Both ends must understand the MAD LACP extensions. Comware switches understand these LACP extensions by default, no config required except for enabling LACP (so target does not need to be an IRF, just a Comware device). When you have a non-Comware device, such as the ArubaOS-switch (Provision), you need to enable support for the MAD LACP extensions manually.
Pingback: How can the administrator correct the issue? - Exam HP0-Y47 at ExamsDB