6. System platform software configuration
The IPU-POD DA product implements a close-to-zero configuration experience when it comes to server setup and IPU-M2000 software configuration and installation. Network setup between servers and IPU-M2000s is also automated, both for the management network and the data-plane RoCE network (server RNIC to IPU-M2000 directly cabled network). No other changes are required to any existing server networking.
The system platform includes both the server and the IPU-M2000s and their software and networking configuration.
Three software installation and configuration steps are required for bringing the system up to be a fully functional machine learning (ML) system.
Steps 1+2: IPU-POD DA system platform software installation and configuration
Step 1: Install the release bundle to the
Step 2: Install server packages, upgrade IPU-M2000 software and configure networking
Step 3: Poplar SDK and tools installation
This chapter covers the first two steps. Step 3, the installation of Poplar SDK and its accompanying tools, is covered in separate Poplar documentation which can be found in the SDK installation section of the Getting Started with IPU-POD4 DA and IPU-POD16 DA guide.
The system platform installation handles all aspects of the software installation, as well as upgrade and configuration, to bring the system up to a functional ML platform on which the Poplar SDK can be installed. It also leaves system wide configuration files available for all Poplar user accounts, allowing the IPU-POD to be shared between users (as far as ML model size allows sharing of the available IPUs).
The system platform installation step will install all the required software dependencies for software upgrade and network setup between all parts of the system platform. Once the IPU-POD is powered on, this step will automatically detect the number of IPU-M2000s in the IPU-POD and configure the server accordingly.
Before performing the system platform installation step, you must:
Then the software installation and configuration of both the server and direct attached IPU-M2000s can be carried out (see System platform installation and configuration). This involves running two scripts: the bundle install script and the IPU-POD DA install script. It is important to run through the checklist in Checklist before running the IPU-POD DA install script before running the IPU-POD DA install script.
6.1. Create an account for IPU-M2000 administration
Firstly an IPU-M2000 admin account named
ipuuser needs to be created. This account is used as a trusted admin account for IPU-M2000 installation and management tasks.
This user account needs sudo rights for running the IPU-POD DA install script. The other administration tools only need network access therefore sudo rights are not required for running these tools.
6.2. Download the IPU-M2000 system software release
You need to download the correct IPU-M2000 system software release before any software upgrade of the IPU-M2000s and the server OS configuration can be performed.
To perform the download, follow these steps:
Login with the IPU-M2000 admin account
Go to the Graphcore download portal https://downloads.graphcore.ai and download the required
IPU-M SOFTWARErelease (normally the latest) which you can find under the
IPU-POD Systemstab; land it in the server’s
Extract the tarball, which contains the installation scripts, with:$ cd /tmp $ tar xvfz <downloaded-tar-ball.tar.gz>
install.sh script will copy the IPU-M2000 software into the IPU-M2000 admin user’s home directory under
IPU-M_releases and will automatically create a directory tree with a leading directory name equal to the release package name and version number. This allows several releases to be kept on the server, in case you need to revert to a previous release of the IPU-M2000 software. If this is not needed, the older releases (both the unpacked files and the downloaded tar file) can be removed from the server. The current installation or the release to be installed needs to be present.
6.3. System platform installation and configuration
The software installation and configuration of both the server and direct attached IPU-M2000s is carried out by running two scripts – the bundle install script (Installing the IPU-M2000 software release) and the IPU-POD DA install script (System configuration including network setup).
6.3.1. Installing the IPU-M2000 software release
The first script to run is the bundle install script that installs the server tools for IPU-M2000 management, as well as the binaries used for a later software upgrade of IPU-M2000s.
This initial install doesn’t start an upgrade of the IPU-M2000s, but instead prepares the current user to be able to run the necessary commands for later management tasks.
If you are installing IPU-M software version 2.3.2, you need to perform two extra steps before running the install script (Listing 6.1).
Run the following
$ mkdir $HOME/.rack_tool
Set the environment variable
POD_SIZEto the appropriate value for the IPU-POD you are targeting. You can also pass this as an argument to the
install.shscript. For example, if you are targeting a IPU-POD16, then you would run the following (instead of the command in Listing 6.1):
$ POD_SIZE=16 ./install.sh
$ cd /tmp/IPU-M_SW-<release> $ ./install.sh $ exit # and re-login to have an updated user environment
The following tasks are performed:
Loads the release file tree onto a well-defined location in the home directory of the current user performing the installation. The directory is
Installs an admin tool called
rack_tooland sets up a config file that controls its behaviour. Also create a symbolic link to
rack_tool.pyin the release file tree.
See rack_tool for more details on rack_tool and how this is used for IPU-M2000 administration.
6.3.2. Checklist before running the IPU-POD DA install script
From version 2.5.0, the release notes for the IPU-M software on the Graphcore downloads portal specify the ‘IPU-M Software Upgrade Path’, listing the valid initial versions which can be upgraded to the release under consideration. Upgrades are only supported from these versions, meaning that it may be required to upgrade from the currently running software release to an intermediate supported initial version, in order to then complete the upgrade to the desired target version.
The current version may be confirmed by running:
$ cd ~/IPU-M_releases/IPU_M_SW-<release> $ ./rack_tool status --show-json | jq -r '..expectedVersions.ipumSoftware' | sort | uniq
Before running the IPU-POD DA install script, you must know the names of the network devices that will be used in the network connected to the IPU-POD.
These devices must be “available” or “unmanaged” for IPU-POD DA installer to dectect them. For example, if they have previously been configured using netplan, removing these interfaces from
/etc/netplan files and running
netplan try will make them “unmanaged”.
Be aware that removing the wrong interface from netplan will make it impossible to log in to the server again using SSH.
You can also provide the network interfaces via command-line options if you run the IPU-POD DA installer in non-interactive mode. For help finding the network interfaces, refer to Section 6.3.7, Discovering network interfaces for non-interactive mode
6.3.3. System configuration including network setup
The next step is to run the IPU-POD DA install script as root by using sudo.
There is no need to do an
uninstall before upgrading to a new release.
If user input to the script during installation was incorrect then an
uninstall will be required, followed by a new
$ cd ~/IPU-M_releases/IPU_M_SW-<release> $ sudo ./direct-attach-setup install
During installation and configuration, the IPU-POD DA install script will communicate with the IPU-M2000 system software that needs to be running on the IPU-M2000. For this reason, the installation script is bundled with a matching IPU-M2000 system software release package that will be installed on the IPU-M2000s if not already installed.
The IPU-POD DA install script performs the following tasks:
Identifies and requests installation of the required server OS packages. You can abort and install these manually if required.
Installs and activates a Linux systemd service
gc-ipu-networkfor correct network setup for communication with IPU-M2000s, both for Poplar SDK users (RDMA data-plane) as well as for IPU-M2000 hardware and software management (management-plane).
Proposes the use of a default IP subnet 10.44.44.0/24 – you can change the subnet to be used during installation if required. This subnet will be further divided into smaller subnets for the point-to-point RDMA links, IPU-Gateway and BMC network management. These IP addresses are local addresses that are not shared outside the single server system. This means the same subnet can be reused by other servers for their directly attached IPU-M2000s.
Checks if IPU-M2000 software upgrade is required and if yes, installs and activates the IPU-M2000 system software release on all IPU-M2000s so that they are all running the same software. This ensures that the IPU-POD DA install script has matching software running on the IPU-M2000s for the remainder of the configuration steps.
Performs the V-IPU configuration of the IPU-M2000s by selecting a master IPU-M2000 to run the V-IPU controller.
Activates the V-IPU controller for auto-discovery of other IPU-M2000s, referred to as discovery of V-IPU agents in the text output from the IPU-POD DA install script.
Creates a V-IPU cluster from the discovered V-IPU agents.
Tests the V-IPU cluster by checking if cabling and link quality is as expected and shows the cluster test results.
Creates a V-IPU partition with all IPUs in the discovered IPU-M2000s. The partition information is stored in a file in the server system-wide directory /etc/ipuof.conf.d. The single file in this directory is picked up by Poplar and is used for connecting to the IPU-M2000s. This file is shared by all users running Poplar sharing the same IPU-POD DA system. When a Poplar instance is starting, Poplar will select a free IPU in this partition and generate an error if there are no more free IPUs to be allocated.
Stores a log of actions taken during the installation and configuration process in the log directory
Asks if the IPU-POD DA install script should setup port forwarding to allow remote web browsers to access the V-IPU controller’s web interface running on the master IPU-M2000.
6.3.4. Output from bundle install script
From running the bundle install script:
The output should be as follows (example given for software release 2.0.0):
## Copying /tmp/IPU_M_SW-2.0.0 to /home/ipuuser/IPU-M_releases ## ## Symlinking /home/ipuuser/IPU-M_releases/IPU_M_SW-2.0.0/maintenance_tools/ rack_tool.py to /home/ipuuser/.local/bin/rack_tool ## ## No default rack_config.json specified. Set GC_RACK_TOOL_DEFAULT_CONFIG_PATH to point to default config ## ## Installing pip dependencies ## ## Install completed ##
Note that the comment
No default rack_config.json specified can be ignored since this will be resolved by the IPU-POD DA install script setting up the
rack_config.json configuration file (see below).
6.3.5. Example output from IPU-POD DA install script
From running the IPU-POD DA install script:
$ cd ~/IPU-M_releases/IPU_M_SW-<release> $ sudo ./direct-attach-setup install
The output should be as follows for an IPU‑POD16 DA system (example given for software release 2.0.0):
Graphcore IPU-POD Direct Attach installer for SW version: 2.0.0 Please make sure all cables are properly attached according to documentation, and that all IPU-M2000(s) are powered on, before continuing installation. NOTE: Running this installer will interrupt any ongoing Poplar workloads. Do you want to continue? (y/n): y All required packages already installed How many IPU-M2000 are attached? 4 A /24 subnet is required, is 10.44.44.0/24 okay? (y/n): y Management device configuration: A dedicated network device on this host needs to be connected to the upper RJ-45 network port of the top IPU-M2000, and then cabled downwards in a daisy-chain fashion (see docs for diagram). Note that only unmanaged devices that are not configured by netplan or NetworkManager will be available. Available network devices: 1 - eno1 Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe 2 - br-c2f59026c83d 3 - veth9f42328 Input management network device number or name: 1 eno1 selected as management network device. Found exactly 4 RNIC devices: - enp129s0f0 Mellanox Technologies MT28800 Family [ConnectX-5 Ex] (cable detected) - enp161s0f0 Mellanox Technologies MT28800 Family [ConnectX-5 Ex] (cable detected) - enp129s0f1 Mellanox Technologies MT28800 Family [ConnectX-5 Ex] (cable detected) - enp161s0f1 Mellanox Technologies MT28800 Family [ConnectX-5 Ex] (cable detected) Are all these network interfaces connected to IPU Machines? (y/n): y enp129s0f0, enp161s0f0, enp129s0f1, enp161s0f1 selected as RNIC network device(s). Port forwarding and V-IPU web interface: The V-IPU provides a web interface on port 8092. This web interface requires an envoy proxy server running on port 9900. In order to provide external access to the web interface both ports must be forwarded from the current machine Port forwarding and BMC web interfaces: Each one of the connected BMCs provides a web interface on port 443. These ports could be externally accessed through ports 7800, 7801, 7802, 7803 if wanted Do you allow the forward of ports 7800, 7801, 7802, 7803, 8092 and 9900 using the detected external network interface eno2? (y/n): y Installing and starting service gc-ipu-network. This service is responsible for setting up networking and starting up the DHCP servers that will assign IP addresses to IPU-M2000(s). Created symlink /etc/systemd/system/multi-user.target.wants/gc-ipu-network.service → /etc/systemd/system/gc-ipu-network.service. Scanning for 4 IPU-M2000. Note that this may take some minutes... | | # | 0 Elapsed Time: 0:02:40 Four (4) IPU-M2000(s) found, in stacking order: Unit 4: serial 8204721-0071 - BMC 10.44.44.92 GW mgmt 10.44.44.93 GW rnic (via enp161s0f1) 10.44.44.226 Unit 3: serial 8204721-0084 - BMC 10.44.44.98 GW mgmt 10.44.44.99 GW rnic (via enp161s0f0) 10.44.44.245 Unit 2: serial 8204721-0092 - BMC 10.44.44.61 GW mgmt 10.44.44.62 GW rnic (via enp129s0f1) 10.44.44.237 Unit 1: serial 8204721-0065 - BMC 10.44.44.74 GW mgmt 10.44.44.75 GW rnic (via enp129s0f0) 10.44.44.253 All IPU-M2000(s) have release IPU-M SW 2.0.0 installed, no upgrade needed. Stopping all existing V-IPU servers... \ | # | 0 Elapsed Time: 0:00:25 Restarting all V-IPU agents... \ | # | 0 Elapsed Time: 0:00:29 Removing any existing V-IPU partitions, clusters and agents... Creating new V-IPU agents and cluster... Testing cluster, this will take some time... - | # | 0 Elapsed Time: 0:02:11 Showing test results for cluster da-cluster Type | Duration | Passed | Summary ------------------------------------------------------------------------- Sync-Link | 0.42s | 6/6 | Sync Link test passed Cabling | 0.79s | 12/12 | All cables connected as expected IPU-Link | 139.99s | 76/76 | All Links Passed Traffic | 203.18s | 1/1 | Traffic test passed Version | 0.01s | 6/6 | All component versions are consistent ------------------------------------------------------------------------- Adding V-IPU partition with 16 IPUs... | | | 0 Elapsed Time: 0:02:26 Please find installation logs at: /home/ipuuser/IPU-M_releases/IPU_M_SW-2.0. 0 /maintenance_tools/ logs/direct-attach-20210304T2105.log V-IPU web server running at: http://10.129.96.107:8092 BMC-1 web server running at: https://10.129.96.107:7800 BMC-2 web server running at: https://10.129.96.107:7801 BMC-3 web server running at: https://10.129.96.107:7802 BMC-4 web server running at: https://10.129.96.107:7803 Installation completed!!
If you run the IPU-POD DA install script again on an already configured system, it will start by asking you to accept the previous input given:
Graphcore IPU-POD Direct Attach installer for SW version: 2.0.0 Found existing Direct Attach configuration with the following parameters: count : 4 subnet: 10.44.44.0/24 mgmt : eno1 rnic : ['enp129s0f0', 'enp161s0f0', 'enp129s0f1', 'enp161s0f1'] allow_port_forwarding : True ext_net_if : eno2 master_gw_ip : 10.44.44.75 Do you want to continue with these parameters? If you have experienced troubles with the install, choose No (y/n): y
The installation will continue in the same way as the first time installation, but without the questions.
6.3.6. Running IPU-POD DA install in non-interactive mode
You can execute an unattended IPU-POD DA install by including the option
--non-interactive. If you use this option the following arguments are required:
--count: Indicate the number of IPU-M2000 machines connected
--subnet: Provide a subnet using CIDR a.b.c.d/n to setup the IPU-POD system
--mgmt: Indicate the management network interface connected to IPU-M2000s
--rnic: Specify each RNIC device connected to an IPU-M2000
$ sudo ./direct-attach-setup install --non-interactive --count 2 --subnet 10.44.44.0/24 --mgmt dev0 --rnic dev1 --rnic dev2
For help finding the network devices to use with the previous arguments, refer to Discovering network interfaces for non-interactive mode
6.3.7. Discovering network interfaces for non-interactive mode
Finding the network management service in use
Different Linux distributions come with diferent network management services. You will need to know which network manager is being used in order to identify the available network interfaces. Typically, Linux distributions that include a desktop environment (Gnome, Unity, KDE or others) will be shipped with a service named NetworkManager. For Linux distributions intended for servers, that is, without a desktop environment, the service is usually networkd. It is also possible that both services can be found in the system, or none of them.
You can do a quick check to identify the network service in use by running:
nmcli: To test if NetworkManager is in use
networkctl: To test ifnetworkd is in use
If the service is not active or installed, the previous commands will display errors or warnings indicating that the service is not in use. If both commands fail, it means that another network manager service (for example, ifupdown) or no service is in use
The IPU-POD DA install script will not support cases when NetworkManager and networkd are found to be in use simultaneously.
Finding available network interfaces with NetworkManager
List all devices with the following command
$ nmcli devices
NetworkManager might try to automatically connect new network devices, switching the devices’ states between connecting and disconnected. The IPU-POD DA installer will deem network interfaces as available if they are unmanaged or if they are in connecting or disconnected states. After the IPU-POD DA install script is run the network interfaces selected will be blacklisted from NetworkManager leaving them unmanaged, which will prevent these devices from being automatically managed again by NetworkManager.
$ nmcli devices DEVICE TYPE STATE CONNECTION eth0 ethernet connected System eth0 eth1 ethernet connecting -- eth2 ethernet disconnected -- eth3 ethernet disconnected -- eth4 ethernet disconnected -- eth5 ethernet disconnected -- lo loopback unmanaged --
In the example all the network interfaces
eth0, will be regarded as available.
Finding network interfaces with networkd
List all devices with the following command
When using networkd, devices will be regarded as available if they are listed as unmanaged (see SETUP column)
$ networkctl IDX LINK TYPE OPERATIONAL SETUP 1 lo loopback carrier unmanaged 2 eno1 ether routable unmanaged 3 eno2 ether routable configured 4 enp129s0f0 ether routable unmanaged 5 enp129s0f1 ether routable unmanaged 6 enp161s0f0 ether routable unmanaged 7 enp161s0f1 ether routable unmanaged
In the example
eno1 and four other RNIC interfaces (
enp*) are in a state that makes the IPU-POD DA install script considered as available.
Finding network interfaces with another network manager
List all devices with one of the following commands
netstat -i ip link ipconfig -a
In this case there are no further checks to determine that the devices were managed, thus any listed network interface will be regarded as available. The IPU-POD DA install will fail at a later point when it cannot assign an IP address or cannot establish a connection with an IPU-M2000.
Finding RNIC devices
List devices with InfiniBand support with the following command
$ find /sys/class/net/*/device/ -type d -name "infiniband" /sys/class/net/eth2/device/infiniband /sys/class/net/eth3/device/infiniband /sys/class/net/eth4/device/infiniband /sys/class/net/eth5/device/infiniband
In the example
eth5 can be used as RNIC devices.
6.4. IPU-M2000 maintenance, management and control
6.4.1. IPU-POD DA system software upgrades
The IPU-POD DA install script will, the first time it is run, install and upgrade the software required on the server as well as on the IPU-M2000s if needed.
When running the IPU-POD DA install script in later releases, the install script will:
make any necessary changes to the server setup, install and upgrade server packages
Upgrade all IPU-M2000s with the new IPU-M2000 software release
Be aware that running a software install and upgrade will require a maintenance window of up to 1.5 hours (IPU‑POD16 DA system) where no Poplar workloads are allowed to run. All ongoing Poplar workloads will fail due to the IPU-M2000s going through a reset cycle as well as a system test cycle. Note that future releases of the IPU-M2000 upgrade tools will significantly reduce this time.
6.4.2. Hardware management of IPU-M2000 via BMC management interface
BMC management and operation are available via:
SSH login to the BMC command line on each IPU-M2000
openbmctool.py. This tool is found here:
BMC web GUI
The URLs and IP addresses of each BMC are provided by the IPU-POD DA install script and are in the
/var/lib/gc-ipu-network/config.json file. They can also be found in the installation log.
IP addresses can also be found in the /var/lib/gc-ipu-network/config.json file, which is probably a better way than having to look through install log.
If you have a browser running locally on the direct attach server, the BMC web GUI can be reached by opening the URL
BMC_IP_address can be found by running the following command - note this is only available if you enabled port forwarding during installation:
$ grep "Unit " ~/IPU-M_releases/IPU_M_SW_<release_version>/maintenance_tools/logs/direct-attach-<date&time>.log | cut -d " " -f 7-8,13-14
This example is for an IPU‑POD16 DA system so there are four BMC IP addresses, one per IPU-M2000. Unit 1 (IPU-M2000 #1) is shown at the bottom which reflects the physical placement of the IPU-M2000s in the rack:
Unit 4: BMC 10.44.44.92 Unit 3: BMC 10.44.44.98 Unit 2: BMC 10.44.44.61 Unit 1: BMC 10.44.44.74
The URLs are also listed in the installation logs so if you disabled port forwarding during installation the you can use the following method instead.
They are listed as lines beginning with “BMC-<n> web server running at” in the latest installation logs which can be found here:
~/IPU-M_releases/IPU_M_SW_<release_version>/maintenance_tools/logs/direct-attach-<date&time>.log. In this case however Unit 1 (IPU-M2000 #1) is at the top of the list, which is the reverse order of the physical IPU-M2000 placement in the rack:
BMC-1 web server running at: https://10.44.44.74 BMC-2 web server running at: https://10.44.44.61 BMC-3 web server running at: https://10.44.44.98 BMC-4 web server running at: https://10.44.44.92
The standard BMC web server port 443 is mapped to unique port numbers as follows:
Unit 4: BMC web server: https://<direct-attach-server>:7803 Unit 3: BMC web server: https://<direct-attach-server>:7802 Unit 2: BMC web server: https://<direct-attach-server>:7801 Unit 1: BMC web server: https://<direct-attach-server>:7800
Refer to the BMC user guide for more details about
openbmctool and the BMC web GUI.
6.4.3. V-IPU management via IPU-Gateway command line
The IPU-M2000 offers an SSH-based
itadmin account on the IPU-Gateway allowing access to the
vipu-admin command line interface (CLI). This CLI is not intended for daily operation of the IPU-M2000s but can be used for troubleshooting.
The V-IPU Admin Guide contains the required information and should be consulted for details – it is available here. Make sure the selected document version matches the two first major.minor version numbers of the V-IPU software running on the master IPU-M2000. To find the version number, run the command