6. IPU-M2000 software and firmware upgrade
This section describes how to upgrade the software and firmware on the IPU-M2000.
You can check the version of the software components currently running on the IPU-M2000 with the
––show-running-version (Section 5.4, Rack tool).
6.1. What happens during the upgrade
When you upgrade the software and firmware on the IPU-M2000, the following is upgraded:
There is an OpenBMC User Guide available.
The firmware of the IPU Control Unit and the system FPGA.
Note: the version of the V-IPU agent must match the version of the V-IPU server that runs on the management server.
Graphcore has only qualified the IPU-M2000 software release for the versions of the software sub-components that are listed in the release notes. Use of other combinations of software component versions are not guaranteed to work.
The IPU-Gateway and IPU Control Unit (ICU) on the IPU-M2000 support booting from one of two persistent software images, the active image or the standby image. The active image is the running image. Upgrading only affects the standby image, which is the image that is not running.
When the upgrade of the standby image completes successfully, the IPU‑POD64 is immediately instructed to switch to the upgraded standby image, making it the active image. The previously running image now becomes the standby image.
If you want to revert to the previous software version, then simply perform the “upgrade” using the previous version of the IPU-M software.
It is wise to keep, at least, the previous version of the IPU-M software on the management server, just in case a downgrade is required.
Upgrading the IPU-Gateway will overwrite all files in the standby image with default OS files. This means that any site-specific config files on the IPU-Gateway are also replaced with default config files during the upgrade.
So, we use the concept of overlay config files which are site-specific config files that can be copied (“overlaid”) to the IPU-Gateway root file system after an upgrade. In this way, a site-specific configuration is easily maintained on the IPU-Gateway even after the upgrade. Overlay config files are stored on the management server.
In an installation with multiple IPU‑POD64 racks, the overlay config files are then copied to all IPU-M2000s on all racks (as defined in the JSON
rack_tool configuration file) which ensures that all IPU-M2000s are identically configured, as they are required to be.
6.2. Upgrade instructions
Refer to the rack_tool man page for details on the specific upgrade commands.
You cannot upgrade the IPU-M2000 while running ML jobs since one of the steps of the upgrade process is to reboot the IPU-M2000.
Log in to the management server as
Ensure you have downloaded and installed the latest version of the IPU-M software onto the management server (Section 5.3, IPU-M software).
Change to the directory that contains the specific IPU-M release version you are upgrading to:
$ cd $HOME/IPU-M_releases/IPU-M_<release version>
Run the commands to do the upgrade:
$ virtualenv -p python3 venv $ source venv/bin/activate $ pip3 install -r requirements.txt $ ./rack_tool.py upgrade
rack_tool uses a default config file containing information on how to access the IPU-M2000s. The default location and name of this config file is:
It is possible to specify your own JSON configuration file with the
-–config-file option to
rack_config.json can be edited by a site administrator who integrates the IPU‑POD64 into the site-specific network in cases where the default IPU‑POD64 IP address plan collides with the site-specific network. The DHCP server config then must match the IP addresses used in the JSON configuration file.
The upgrade process will take several minutes and all the IPU-M2000s will be upgraded in parallel to make this time as short as possible. Part of the upgrade process is to reboot the IPU-M2000s a few times to activate the new software.
After the reboots,
rack_tool verifies that the upgrade has completed successfully by checking that all sub-components have been upgraded to the same version and that this version corresponds to what is defined in the release notes.
If this verification of versions fails, because there are mismatches in the reported versions compared to what is expected, then you are advised to run the upgrade again.