Saturday, 16 February 2013

How to replace a broken node in a Juniper SRX HA cluster

Couple of weeks ago, one of our Juniper SRXs failed. Just as well I had invested the money and time in setting up a HA Cluster.
I think these devices are very robust and it got me really as a big surprise to have found this device completely failed for apparently no reason.
There were no lights at the front panel, no access via console, I had to pull its plug off and even so, it did not come back alive.

The replacement is very easy. I logged a call with Juniper, received the replacement SRX the following day.
One thing I would like to point out, Juniper support so far has been fantastic, their engineer's knowledge are high level. I am very happy not only to use Juniper's devices but also to recommend the use of them.

Replacement How To

  1. Check the OS version on your existing one: 
          root@srx100-01# show version (hit ENTER)
     ## Last changed: 2013-02-16 05:13:37 GMT
     version 11.2R4.3;

    2.  Connect your SRX to your laptop or PC via the console port using the console cable.

    3.  Power the Juniper replacement on and check its version:
         root@srx100-01# show version (hit ENTER)
         Note: both nodes must be running the same OS version. 

    4.  I am going to assume both devices are on the same OS version.
         Note: I'll write another How to later about how to upgrade the OS on an Juniper SRX.

    5. Delete the configuration on your replacement device:
        root@srx100-01# delete (hit ENTER)
        Note: this command will delete the whole configuration and leave your device blank
   
    6. Set the root password:
        root@srx100-01# set system root-authentication plain-text-password  (hit ENTER)

    7. Commit the changes: 
        root@srx100-01# commit

    8.  The following steps will require some knowledge of your existing cluster, and how it it is setup.
                 * Cluster-id (ID)
                 * Node number (No.)

    9. Once you have at hand your cluster ID and the node number, type in the following command:
        root@srx100-01> set chassis cluster cluster-id 1 node 0 reboot
         Note: the command above will set your new Juniper to be part of cluster 1 and it will be node 0, then
         it will reboot.

10. Log to existing Juniper SRX and save the configuration file /root
      root@srx100-01# save config_Juniper_SRX

11. Copy the config file config_Juniper_SRX to the new SRX.
    Note: You can use an USB memory stick to transfer the file.

12. Once you have copied the file config_Juniper_SRX to /root,
    run the following command:
    root@srx100-01# load override /root/config_Juniper_SRX

13. Commit the changes
    root@srx100-01# commit
    root@srx100-01> show configuration | display set
    Note: I always compare both devices and make sure the config
    is the same.

14. root@srx100-01# exit

15. Halt the system, with the command below:
    root@srx100-01> request system halt

16. Connect the fabric and control ports 

17. Power your new SRX on

18. Once the New Firewall has booted, check cluster status
    root@srx100-01# show chassis cluster status
    root@srx100-01# show chassis fpc pic-status
    
19. Check if there are any alarms on your new SRX:
    root@srx100-01> show system alarms


Note: There might be two minor alarms, I will talk about that in the next posts.

Your cluster should be fully online and operating well by now.

By Renato


1 comment:

  1. Hi,
    Is there any downtime, and how long is the approximate traffic down time during replace the broken node?
    Will the traffic down for more than 5 mins?

    ReplyDelete