You never want to see that but especially not on a core switch that is responsible for routing 72 sites! I opened a TAC case with Cisco and was told to run these four commands and send the output back to the TAC engineer:
show module
show version
show system internal raid (Hidden Command)
slot x show system internal raid ( x = standby sup )
Note: you can use slot x show system internal raid and replace x with the slot the supervisor is in regardless if it's the standby. For example, slot 1 show system internal raid gives the same output as show system internal raid with only one supervisor.
The key output was from the command:
MY-MDF-DC1# show system internal raid
Current RAID status info:
RAID data from CMOS = 0xa5 0xc3 < ----------- Both primary and alternate failed.
and from the show module command:
Mod Online Diag Status
--- ------------------
1 Pass
3 Pass
4 Fail
TAC said this meant that both eUSB flash memory cards were failed. Since we didn't have a redundant supervisor the only way to recover was to reboot the switch. The "Failed" eUSB memory cards aren't failed as in they don't work but that they are full. The References section below has a link to the actual bug report (CSCus22805). It explains in detail how to recover if only one eUSB is failed or in you have a redundant supervisor.
The Problem
The customer had made several configuration changes and wasn't able to save the running configuration. Obviously, all changes would be lost during the reload.The Solution
The Nexus switch has a couple USB slots and a command that backs up the running configuration of all Virtual Device Contexts (VDC) up to the USB stick:copy running-config usb1:MY-N7K.txt vdc-all
Once the configurations were backed up I put the USB stick into my laptop and verified that the backup was good.
Since this switch has so many routes and some of the changes that were made were routing related I wanted to make sure all routes came up after the reboot. I saved the output from:
show ip route summary
Number of routes per mask-length:
/0 : 1 /8 : 2 /16: 82 /23: 2 /24: 113
/25: 2 /26: 1 /27: 5 /28: 1 /29: 2
/30: 1 /32: 788
to a text file so that I could compare after the reboot.
I also saved the output from
show interface status | i connected
show cdp ne det | i Dev
These two commands gave me a quick summary of the interfaces that were up and the neighboring switches.
Finally, I copied the all the license files and vlan.dat file to a tftp server.
The Reload
The maintenance window arrived and I had a plan in place. All that was left now was to reload. I consoled in and entered reload. The switch came back up and I reran the four commands. Show module was all "pass" and the RAID report was 0xa5 0xf0. The 0xf0 meaning the eUSB memory was working correctly.The Clean Up
I reran the "show ip route summary" command and was missing some routes. In addition, some interface configurations were missing. This was to be expected since the changes were lost.I ran "copy running-config usb1:MY-N7K1.txt vdc-all" and inserted the USB stick into my laptop. I use a great file diff program called MELD. I put a link to it in the references. I opened both files in MELD and it instantly highlighted the differences between the current running configuration and the backup I made before the reboot. It was a simple task to add the changes back and all routes came up.
Comparing two files in MELD |
References
N7K-SUP2/E: eUSB Flash Failure or Unable to Save Configuration CSCus22805Meld - Open source file diff tool
Write Command On Nexus Switches - How to create an alias for copy run start
No comments:
Post a Comment