Comware5 ISSU: Unknown (manual procedure)

Remember to read the basic ISSU article first:

https://abouthpnetworking.com/2014/03/24/comware5-issu-basics/

In the previous post we have seen the:

In this article, we will look at the detailed steps of the Unknown ISSU.

Unknown ISSU  – Disclaimer

Just to make it clear : when the switch reports Unknown ISSU state, HP says ISSU is not possible. Period.

So below procedure is just my own procedure to get an ISSU-like firmware update with minimal downtime. I have used it many times, but use at your own risk !

My idea would be: if you need a full reboot anyway based by normal HP rules, you may as well use below procedure, but it’s still your call !

Make sure you have console connection access to both units during the process.

Note: I did have the procedure validated by HP, and HP engineer confirmed it seems fine but that is not an official statement.

Unknown ISSU – Overview

I am actually doing the exact same procedure as the incompatible ISSU, but now with manual steps instead of the “issu” cli series. Again, we need MAD as a core technology of this update procedure, so make sure IRF MAD is properly configured and operational before you start with this.

I highly recommend using MAD BFD (vs MAD LACP) for this procedure, for 2 reasons:

  • Management: It is very easy to temporary disable MAD BFD, simply shutdown the MAD BFD VLAN Interface and it is done (will be needed in this procedure)
  • Version Stability: I am (ab)using the split brain detection in this procedure for a firmware update. You should realize that the split brain MAD detection is supposed to work to detect a split in a previously running IRF system (meaning: running with the same software version). I have had an issue once where it seemed the MAD LACP TLV (which is a proprietary format) format had changed between versions. This meant MAD LACP worked with firmware version 1, it worked with firmware version 2, but it did not detect a split brain between version 1 and version 2 😦 . BFD is an standard protocol, so MAD BFD will just work, whether you use version 1, version 2 or a combination of version 1 and version 2.

Since I cannot use the issu CLI commands, there will be more steps, but it will be very similar to the Incompatible upgrade method.

And I cannot repeat this enough : do not save the configuration at any point during this procedure ! (same applies to the other ISSU procedures!)

Detailed steps for Unknown ISSU

These are the steps for the Unknown ISSU / Manual procedure:

  1. Save configuration and check device
  2. Download software
  3. Update unit 1 and 2 boot-loader
  4. Unit 2 reboot
  5. Unit 1 force split brain
  6. Verify unit 2 is operational but disconnected from network
  7. Synchronous commands : reboot and mad restore
  8. Verify IRF is operational again

This is the sample base diagram for 2 core switches which will be updated from a version R2200 to R2500:

20140324-issu-unk-1base

Save configuration and check device

Save the configuration at the start of the procedure (no more saving during the procedure).
Verify all units are online and have the same current software versions.
Verify MAD has been configured to ensure proper external port down results during the 2 master phase.

save
display device
display irf
display mad verbose

Download software

Use ftp/tftp to get the new image on all units. Make sure you copy the file to each member local flash.

Update unit 1 and 2 boot-loader

The next sections match the “ISSU load” section (update boot-loader, reboot the new unit and keep the new unit external interfaces down)

Configure unit 1 and unit 2 boot-loader to use the new image.

boot-loader file R2500.bin slot 1 main
boot-loader file R2500.bin slot 2 main

Note: based on chassis or top of rack models, the file name might require the flash location to be inserted in front of it.

20140324-issu-unk-2boot-loader

 

  • Verify
display boot-loader

Unit 2 reboot

In this step, unit 2 will be rebooted, so it will come online with the new software version. When executing the reboot of unit 2, it is important to move to the next step, which should be done within the next minute.

reboot slot 2

20140324-issu-unk-3reboot-unit2

Unit 1 force split brain

When a switch boots with a different software version as the IRF Master, it will discover this during the boot process.

During the boot process, the booting switch will enable the IRF links and it will try to discover any existing masters. If a master is found, it will check the firmware version and either update itself to the same version as the master or stop the boot process (if auto firmware update is not enabled).

Although this behavior is very good under normal conditions, it would break this procedure. This is because I want the unit 2 to boot with a different version and I do not want it to stop the boot process.

Therefore we need to ensure that unit 2 will not find a master during the boot process, so it will assume the master role itself. This would cause problems by itself, but that is covered by the MAD split brain detection, which will disable the external facing interfaces of unit 2.

So to force the split brain, on unit 1, shutdown the IRF links.

Review the currently used interfaces.

display irf configuration

Next, shutdown these interfaces (using ten1/0/25 and ten 1/0/26 in this example)

interface range ten 1/0/25 ten 1/0/26
 shutdown
 quit

Note: DO NOT SAVE the configuration, this is just an operational change !

20140324-issu-unk-4force-split

 

Verify unit 2 is operational, but disconnected from network

Once unit 2 has booted as master with the new release, it will enable all the interfaces, including the MAD detection interfaces. This will activate the MAD split brain process. This situation will only last 1-2 seconds.

 

20140324-issu-unk-5split-briefly

At this point, MAD will de-activate the unit with the highest unit id, in this case unit 2.

Note: Make sure to start the process with unit 2, since this is the unit that will loose the MAD selection process !

20140324-issu-unk-6split-resolved-by-mad

On the console of unit 2, verify the new version and the result of the MAD process

display version
display interface brief down

Disable MAD processing to prevent a false-positive split brain

In the next step, unit 1 will be rebooted (placed offline), and unit 2 will be re-activated (mad restore – placed online). These 2 steps need to be executed as close as possible, to minimize the network downtime.

So the steps will be:

  • through the console of unit 1 : reboot slot 1
  • wait 1 second
  • through the console of unit 2 : mad restore

However, you must realize that:

  • executing reboot slot 1 and waiting 1 second does not guarantee that the switch is really down, it may take for instance 1.5 seconds for the interfaces to go down.
  • performing mad restore will start activating interfaces, but this can take anywhere between 0-2 seconds.

So it is very well possible, even when you wait 1 second between the commands, that unit 1 is still online (will reboot in a few moments), and unit 2 is already online (assume mad restore worked very fast). This means you have 2 masters… and if MAD is fast enough (it usually is), it would detect the 2 masters and would perform a MAD shutdown of unit 2 again (resulting in a rebooting unit 1 and a mad shutdown unit 2 -> no core switch).

You can easily fix this situation by executing the mad restore again, but precious seconds will have passed before you realize this.

In order to prevent this from happening, we want to accept the (unlikely and very brief) situation of 2 masters. This will be achieved by shutting down the MAD BFD vlan interface on the unit 1.

On unit 1, lookup the MAD BFD VLAN ID. Verify no MAD LACP is configured as well.

display mad verbose

Assuming vlan 4001 is used for MAD BFD:

interface vlan 4001
 shutdown
 quit

20140324-issu-unk-7prevent-false-split-detect

Synchronous commands : reboot and mad restore

At this point, the unit 2 is ready to become the new active switch on the network, and unit 1 is ready to be rebooted (and will be joining the unit 2 master as a slave device).

This section matches the “ISSU run switchover” section.

Open a console connection on both unit 1 and unit 2.

Note: make sure you do not save the configuration when rebooting !!

On unit 1, prepare the reboot command (do not execute yet)

reboot

On unit 2, prepare the mad restore command (do not execute yet)

system-view
 mad restore

20140324-issu-unk-8prep-failover

 

Once prepared, run both commands (you will need to press N (do not save) and Y (reboot), so be prepared for that).

20140324-issu-unk-9failover-synced

 

The unit 2 will now be active on the network, and unit 1 will be rebooting with R2500 and joining the IRF system.
Verify IRF is operational again

Once unit 1 is back online, verify both switches are up in the same IRF system

dis irf
dis version
dis device

Once you have verified / feel the network is stable, make sure to save the configuration

save safely force

Final situation:

20140324-issu-unk-10done

Conclusion

Although it seems like a long procedure, you will see that it actually does not make much difference from the normal ISSU commands, and it gives you an alternative in case normal ISSU commands are not available due to the Unknown status.

And it will be required to understand this procedure first before we head to a more challenging ISSU setup : ISSU on a 4-unit chassis core, located in 2 data rooms, with local dc switches connected to the local 2 chassis units (not to all 4 units).

But that is for another article 🙂

 

 

This entry was posted in Comware5, IRF and tagged , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s