6. System platform software configuration

The IPU-POD DA product implements a close-to-zero configuration experience when it comes to server setup and IPU-M2000 software configuration and installation. Network setup between servers and IPU-M2000s is also automated, both for the management network and the data-plane RoCE network (server RNIC to IPU-M2000 directly cabled network). No other changes are required to any existing server networking.

The system platform includes both the server and the IPU-M2000s and their software and networking configuration.

Three software installation and configuration steps are required for bringing the system up to be a fully functional machine learning (ML) system.

  • Steps 1+2: IPU-POD DA system platform software installation and configuration

    • Step 1: Install the release bundle to the ipuuser admin account

    • Step 2: Install server packages, upgrade IPU-M2000 software and configure networking

  • Step 3: Poplar SDK and tools installation

This chapter covers the first two steps. Step 3, the installation of Poplar SDK and its accompanying tools, is covered in separate Poplar documentation which can be found here.

The system platform installation handles all aspects of the software installation, as well as upgrade and configuration, to bring the system up to a functional ML platform on which the Poplar SDK can be installed. It also leaves system wide configuration files available for all Poplar user accounts, allowing the IPU-POD to be shared between users (as far as ML model size allows sharing of the available IPUs).

The system platform installation step will install all the required software dependencies for software upgrade and network setup between all parts of the system platform. Once the IPU-POD is powered on, this step will automatically detect the number of IPU-M2000s in the IPU-POD and configure the server accordingly.

Before performing the system platform installation step, you must:

  1. Create an account dedicated to administration of the IPU-M2000s (see Create an account for IPU-M2000 administration)

  2. Download the correct IPU-M2000 system software release from the Graphcore support portal (see Download the IPU-M2000 system software release)

Then the software installation and configuration of both the server and direct attached IPU-M2000s can be carried out (see System platform installation and configuration). This involves running two scripts: the bundle install script and the IPU-POD DA install script. It is important to run through the checklist in Checklist before running the IPU-POD DA install script before running the IPU-POD DA install script.

6.1. Create an account for IPU-M2000 administration

Firstly an IPU-M2000 admin account named ipuuser needs to be created. This account is used as a trusted admin account for IPU-M2000 installation and management tasks.

This user account needs sudo rights for running the IPU-POD DA install script. The other administration tools only need network access therefore sudo rights are not required for running these tools.

6.2. Download the IPU-M2000 system software release

You need to download the correct IPU-M2000 system software release before any software upgrade of the IPU-M2000s and the server OS configuration can be performed.

To perform the download, follow these steps:

  1. Login with the IPU-M2000 admin account ipuuser

  2. Go to the Graphcore download portal https://downloads.graphcore.ai and download the required IPU-M SOFTWARE release (normally the latest) which you can find under the IPU-POD Systems tab; land it in the server’s /tmp directory

  3. Extract the tarball, which contains the installation scripts, with:

    $ cd /tmp
    $ tar xvfz <downloaded-tar-ball.tar.gz>
    

Running the install.sh script will copy the IPU-M2000 software into the IPU-M2000 admin user’s home directory under IPU-M_releases and will automatically create a directory tree with a leading directory name equal to the release package name and version number. This allows several releases to be kept on the server, in case you need to revert to a previous release of the IPU-M2000 software. If this is not needed, the older releases (both the unpacked files and the downloaded tar file) can be removed from the server. The current installation or the release to be installed needs to be present.

6.3. System platform installation and configuration

The software installation and configuration of both the server and direct attached IPU-M2000s is carried out by running two scripts – the bundle install script (Installing the IPU-M2000 software release) and the IPU-POD DA install script (System configuration including network setup).

6.3.1. Installing the IPU-M2000 software release

The first script to run is the bundle install script that installs the server tools for IPU-M2000 management, as well as the binaries used for a later software upgrade of IPU-M2000s.

This initial install doesn’t start an upgrade of the IPU-M2000s, but instead prepares the current user to be able to run the necessary commands for later management tasks.

Warning

If you are installing IPU-M software version 2.3.2, you need to perform two extra steps before running the install script (Listing 6.1).

  1. Run the following

    $ mkdir $HOME/.rack_tool
    
  2. Set the environment variable POD_SIZE to the appropriate value for the IPU-POD you are targeting. You can also pass this as an argument to the install.sh script. For example, if you are targeting a IPU-POD16, then you would run the following (instead of the command in Listing 6.1):

    $ POD_SIZE=16 ./install.sh
    
Listing 6.1 Command to run IPU-M software install script
$ cd /tmp/IPU-M_SW-<release>
$ ./install.sh
$ exit    # and re-login to have an updated user environment

The following tasks are performed:

  1. Loads the release file tree onto a well-defined location in the home directory of the current user performing the installation. The directory is ~/IPU-M_releases.

  2. Installs an admin tool called rack_tool and sets up a config file that controls its behaviour. Also create a symbolic link to rack_tool from ~/.local/bin/rack_tool to rack_tool.py in the release file tree.

See rack_tool for more details on rack_tool and how this is used for IPU-M2000 administration.

6.3.2. Checklist before running the IPU-POD DA install script

From version 2.5.0, the release notes for the IPU-M software on the Graphcore downloads portal specify the ‘IPU-M Software Upgrade Path’, listing the valid initial versions which can be upgraded to the release under consideration. Upgrades are only supported from these versions, meaning that it may be required to upgrade from the currently running software release to an intermediate supported initial version, in order to then complete the upgrade to the desired target version.

The current version may be confirmed by running:

$ cd  ~/IPU-M_releases/IPU_M_SW-<release>
$ ./rack_tool status --show-json | jq -r '.[].expectedVersions.ipumSoftware' | sort | uniq

Before running the IPU-POD DA install script, you must know the names of the network devices that will be used in the network connected to the IPU-POD. These devices must be “available” or “unmanaged” for IPU-POD DA installer to dectect them. For example, if they have previously been configured using netplan, removing these interfaces from /etc/netplan files and running netplan try will make them “unmanaged”.

Note

Be aware that removing the wrong interface from netplan will make it impossible to log in to the server again using SSH.

You can also provide the network interfaces via command-line options if you run the IPU-POD DA installer in non-interactive mode. For help finding the network interfaces, refer to Section 6.3.7, Discovering network interfaces for non-interactive mode

6.3.3. System configuration including network setup

The next step is to run the IPU-POD DA install script as root by using sudo. There is no need to do an uninstall before upgrading to a new release.

Note

If user input to the script during installation was incorrect then an uninstall will be required, followed by a new install

$ cd  ~/IPU-M_releases/IPU_M_SW-<release>
$ sudo ./direct-attach-setup install

During installation and configuration, the IPU-POD DA install script will communicate with the IPU-M2000 system software that needs to be running on the IPU-M2000. For this reason, the installation script is bundled with a matching IPU-M2000 system software release package that will be installed on the IPU-M2000s if not already installed.

The IPU-POD DA install script performs the following tasks:

  1. Identifies and requests installation of the required server OS packages. You can abort and install these manually if required.

  2. Installs and activates a Linux systemd service gc-ipu-network for correct network setup for communication with IPU-M2000s, both for Poplar SDK users (RDMA data-plane) as well as for IPU-M2000 hardware and software management (management-plane).

  3. Proposes the use of a default IP subnet 10.44.44.0/24 – you can change the subnet to be used during installation if required. This subnet will be further divided into smaller subnets for the point-to-point RDMA links, IPU-Gateway and BMC network management. These IP addresses are local addresses that are not shared outside the single server system. This means the same subnet can be reused by other servers for their directly attached IPU-M2000s.

  4. Checks if IPU-M2000 software upgrade is required and if yes, installs and activates the IPU-M2000 system software release on all IPU-M2000s so that they are all running the same software. This ensures that the IPU-POD DA install script has matching software running on the IPU-M2000s for the remainder of the configuration steps.

  5. Performs the V-IPU configuration of the IPU-M2000s by selecting a master IPU-M2000 to run the V-IPU controller.

  6. Activates the V-IPU controller for auto-discovery of other IPU-M2000s, referred to as discovery of V-IPU agents in the text output from the IPU-POD DA install script.

  7. Creates a V-IPU cluster from the discovered V-IPU agents.

  8. Tests the V-IPU cluster by checking if cabling and link quality is as expected and shows the cluster test results.

  9. Creates a V-IPU partition with all IPUs in the discovered IPU-M2000s. The partition information is stored in a file in the server system-wide directory /etc/ipuof.conf.d. The single file in this directory is picked up by Poplar and is used for connecting to the IPU-M2000s. This file is shared by all users running Poplar sharing the same IPU-POD DA system. When a Poplar instance is starting, Poplar will select a free IPU in this partition and generate an error if there are no more free IPUs to be allocated.

  10. Stores a log of actions taken during the installation and configuration process in the log directory IPU_M_SW/maintenance_tools/logs/.

  11. Asks if the IPU-POD DA install script should setup port forwarding to allow remote web browsers to access the V-IPU controller’s web interface running on the master IPU-M2000.

6.3.4. Output from bundle install script

From running the bundle install script:

$ ./install.sh

The output should be as follows (example given for software release 2.0.0):

## Copying /tmp/IPU_M_SW-2.0.0 to /home/ipuuser/IPU-M_releases ##
## Symlinking /home/ipuuser/IPU-M_releases/IPU_M_SW-2.0.0/maintenance_tools/
rack_tool.py to /home/ipuuser/.local/bin/rack_tool ##
## No default rack_config.json specified. Set GC_RACK_TOOL_DEFAULT_CONFIG_PATH
to point to default config ##
## Installing pip dependencies ##
## Install completed ##

Note that the comment No default rack_config.json specified can be ignored since this will be resolved by the IPU-POD DA install script setting up the rack_config.json configuration file (see below).

6.3.5. Example output from IPU-POD DA install script

From running the IPU-POD DA install script:

$ cd  ~/IPU-M_releases/IPU_M_SW-<release>
$ sudo ./direct-attach-setup install

The output should be as follows for an IPU‑POD16 DA system (example given for software release 2.0.0):

Graphcore IPU-POD Direct Attach installer for SW version: 2.0.0
Please make sure all cables are properly attached according to documentation, and that all IPU-M2000(s)
are powered on, before continuing installation.
NOTE: Running this installer will interrupt any ongoing Poplar workloads.
Do you want to continue? (y/n): y

All required packages already installed
How many IPU-M2000 are attached? 4
A /24 subnet is required, is 10.44.44.0/24 okay?  (y/n): y
Management device configuration:

A dedicated network device on this host needs to be connected to the upper RJ-45 network port
of the top IPU-M2000, and then cabled downwards in a daisy-chain fashion (see docs for diagram).

Note that only unmanaged devices that are not configured by netplan or NetworkManager will
be available.

Available network devices:
1 - eno1 Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
2 - br-c2f59026c83d
3 - veth9f42328
Input management network device number or name: 1
eno1 selected as management network device.

Found exactly 4 RNIC devices:
- enp129s0f0 Mellanox Technologies MT28800 Family [ConnectX-5 Ex] (cable detected)
- enp161s0f0 Mellanox Technologies MT28800 Family [ConnectX-5 Ex] (cable detected)
- enp129s0f1 Mellanox Technologies MT28800 Family [ConnectX-5 Ex] (cable detected)
- enp161s0f1 Mellanox Technologies MT28800 Family [ConnectX-5 Ex] (cable detected)
Are all these network interfaces connected to IPU Machines?  (y/n): y
enp129s0f0, enp161s0f0, enp129s0f1, enp161s0f1 selected as RNIC network device(s).

Port forwarding and V-IPU web interface:
The V-IPU provides a web interface on port 8092. This web interface requires
an envoy proxy server running on port 9900. In order to provide external access
to the web interface both ports must be forwarded from the current machine

Port forwarding and BMC web interfaces:
Each one of the connected BMCs provides a web interface on port 443. These ports could be
externally accessed through ports 7800, 7801, 7802, 7803 if wanted

Do you allow the forward of ports 7800, 7801, 7802, 7803, 8092 and 9900 using the detected
external network interface eno2? (y/n): y
Installing and starting service gc-ipu-network.
This service is responsible for setting up networking and starting up the DHCP servers that
will assign IP addresses to IPU-M2000(s).
Created symlink /etc/systemd/system/multi-user.target.wants/gc-ipu-network.service → /etc/systemd/system/gc-ipu-network.service.
Scanning for 4 IPU-M2000. Note that this may take some minutes...
| |                              #                 | 0 Elapsed Time: 0:02:40
Four (4) IPU-M2000(s) found, in stacking order:
Unit 4:  serial 8204721-0071 - BMC 10.44.44.92 GW mgmt 10.44.44.93 GW rnic (via enp161s0f1) 10.44.44.226
Unit 3:  serial 8204721-0084 - BMC 10.44.44.98 GW mgmt 10.44.44.99 GW rnic (via enp161s0f0) 10.44.44.245
Unit 2:  serial 8204721-0092 - BMC 10.44.44.61 GW mgmt 10.44.44.62 GW rnic (via enp129s0f1) 10.44.44.237
Unit 1:  serial 8204721-0065 - BMC 10.44.44.74 GW mgmt 10.44.44.75 GW rnic (via enp129s0f0) 10.44.44.253
All IPU-M2000(s) have release IPU-M SW 2.0.0 installed, no upgrade needed.
Stopping all existing V-IPU servers...
\ |   #                                           | 0 Elapsed Time: 0:00:25
Restarting all V-IPU agents...
\ |                                    #          | 0 Elapsed Time: 0:00:29
Removing any existing V-IPU partitions, clusters and agents...
Creating new V-IPU agents and cluster...
Testing cluster, this will take some time...
- |         #                                     | 0 Elapsed Time: 0:02:11

Showing test results for cluster da-cluster
Type   | Duration | Passed | Summary
-------------------------------------------------------------------------
Sync-Link   | 0.42s    | 6/6    | Sync Link test passed
Cabling     | 0.79s    | 12/12  | All cables connected as expected
IPU-Link    | 139.99s  | 76/76  | All Links Passed
Traffic     | 203.18s  | 1/1    | Traffic test passed
Version     | 0.01s    | 6/6    | All component versions are consistent
-------------------------------------------------------------------------
Adding V-IPU partition with 16 IPUs...
| |                                               | 0 Elapsed Time: 0:02:26

Please find installation logs at: /home/ipuuser/IPU-M_releases/IPU_M_SW-2.0.
0 /maintenance_tools/ logs/direct-attach-20210304T2105.log

V-IPU web server running at: http://10.129.96.107:8092
BMC-1 web server running at: https://10.129.96.107:7800
BMC-2 web server running at: https://10.129.96.107:7801
BMC-3 web server running at: https://10.129.96.107:7802
BMC-4 web server running at: https://10.129.96.107:7803

Installation completed!!

If you run the IPU-POD DA install script again on an already configured system, it will start by asking you to accept the previous input given:

Graphcore IPU-POD Direct Attach installer for SW version: 2.0.0

Found existing Direct Attach configuration with the following parameters:
    count :  4
    subnet:  10.44.44.0/24
    mgmt  :  eno1
    rnic  :  ['enp129s0f0', 'enp161s0f0', 'enp129s0f1', 'enp161s0f1']
    allow_port_forwarding  :  True
    ext_net_if  :  eno2
    master_gw_ip  :  10.44.44.75

Do you want to continue with these parameters? If you have experienced
troubles with the install, choose No (y/n): y

The installation will continue in the same way as the first time installation, but without the questions.

6.3.6. Running IPU-POD DA install in non-interactive mode

You can execute an unattended IPU-POD DA install by including the option --non-interactive. If you use this option the following arguments are required:

  1. --count: Indicate the number of IPU-M2000 machines connected

  2. --subnet: Provide a subnet using CIDR a.b.c.d/n to setup the IPU-POD system

  3. --mgmt: Indicate the management network interface connected to IPU-M2000s

  4. --rnic: Specify each RNIC device connected to an IPU-M2000

Example

$ sudo ./direct-attach-setup install --non-interactive --count 2 --subnet 10.44.44.0/24 --mgmt dev0  --rnic dev1 --rnic dev2

For help finding the network devices to use with the previous arguments, refer to Discovering network interfaces for non-interactive mode

6.3.7. Discovering network interfaces for non-interactive mode

Finding the network management service in use

Different Linux distributions come with diferent network management services. You will need to know which network manager is being used in order to identify the available network interfaces. Typically, Linux distributions that include a desktop environment (Gnome, Unity, KDE or others) will be shipped with a service named NetworkManager. For Linux distributions intended for servers, that is, without a desktop environment, the service is usually networkd. It is also possible that both services can be found in the system, or none of them.

You can do a quick check to identify the network service in use by running:

  • nmcli: To test if NetworkManager is in use

  • networkctl: To test ifnetworkd is in use

If the service is not active or installed, the previous commands will display errors or warnings indicating that the service is not in use. If both commands fail, it means that another network manager service (for example, ifupdown) or no service is in use

The IPU-POD DA install script will not support cases when NetworkManager and networkd are found to be in use simultaneously.

Finding available network interfaces with NetworkManager

List all devices with the following command

$ nmcli devices

NetworkManager might try to automatically connect new network devices, switching the devices’ states between connecting and disconnected. The IPU-POD DA installer will deem network interfaces as available if they are unmanaged or if they are in connecting or disconnected states. After the IPU-POD DA install script is run the network interfaces selected will be blacklisted from NetworkManager leaving them unmanaged, which will prevent these devices from being automatically managed again by NetworkManager.

Example output

$ nmcli devices
DEVICE  TYPE      STATE         CONNECTION
eth0    ethernet  connected     System eth0
eth1    ethernet  connecting  --
eth2    ethernet  disconnected  --
eth3    ethernet  disconnected  --
eth4    ethernet  disconnected  --
eth5    ethernet  disconnected  --
lo      loopback  unmanaged     --

In the example all the network interfaces eth*, except eth0, will be regarded as available.

Finding network interfaces with networkd

List all devices with the following command

$ networkctl

When using networkd, devices will be regarded as available if they are listed as unmanaged (see SETUP column)

Example output

$ networkctl
IDX LINK             TYPE               OPERATIONAL SETUP
1 lo               loopback           carrier     unmanaged
2 eno1             ether              routable    unmanaged
3 eno2             ether              routable    configured
4 enp129s0f0       ether              routable    unmanaged
5 enp129s0f1       ether              routable    unmanaged
6 enp161s0f0       ether              routable    unmanaged
7 enp161s0f1       ether              routable    unmanaged

In the example eno1 and four other RNIC interfaces (enp*) are in a state that makes the IPU-POD DA install script considered as available.

Finding network interfaces with another network manager

List all devices with one of the following commands

netstat -i
ip link
ipconfig -a

In this case there are no further checks to determine that the devices were managed, thus any listed network interface will be regarded as available. The IPU-POD DA install will fail at a later point when it cannot assign an IP address or cannot establish a connection with an IPU-M2000.

Finding RNIC devices

List devices with InfiniBand support with the following command

$ find /sys/class/net/*/device/ -type d -name "infiniband"
/sys/class/net/eth2/device/infiniband
/sys/class/net/eth3/device/infiniband
/sys/class/net/eth4/device/infiniband
/sys/class/net/eth5/device/infiniband

In the example eth2, eth3, eth4 and eth5 can be used as RNIC devices.

6.4. IPU-M2000 maintenance, management and control

6.4.1. IPU-POD DA system software upgrades

The IPU-POD DA install script will, the first time it is run, install and upgrade the software required on the server as well as on the IPU-M2000s if needed.

When running the IPU-POD DA install script in later releases, the install script will:

  1. make any necessary changes to the server setup, install and upgrade server packages

  2. Upgrade all IPU-M2000s with the new IPU-M2000 software release

Note

Be aware that running a software install and upgrade will require a maintenance window of up to 1.5 hours (IPU‑POD16 DA system) where no Poplar workloads are allowed to run. All ongoing Poplar workloads will fail due to the IPU-M2000s going through a reset cycle as well as a system test cycle. Note that future releases of the IPU-M2000 upgrade tools will significantly reduce this time.

6.4.2. Hardware management of IPU-M2000 via BMC management interface

BMC management and operation are available via:

  • SSH login to the BMC command line on each IPU-M2000

  • OpenBMC tool openbmctool.py. This tool is found here: ~/IPU-M_releases/IPU_M_SW_<release_version>/bmc/bmc_software/openbmctool.py

  • BMC web GUI

The URLs and IP addresses of each BMC are provided by the IPU-POD DA install script and are in the /var/lib/gc-ipu-network/config.json file. They can also be found in the installation log.

IP addresses can also be found in the /var/lib/gc-ipu-network/config.json file, which is probably a better way than having to look through install log.

If you have a browser running locally on the direct attach server, the BMC web GUI can be reached by opening the URL https://<BMC_IP_address> where BMC_IP_address can be found by running the following command - note this is only available if you enabled port forwarding during installation:

$ grep "Unit " ~/IPU-M_releases/IPU_M_SW_<release_version>/maintenance_tools/logs/direct-attach-<date&time>.log | cut -d " " -f 7-8,13-14

This example is for an IPU‑POD16 DA system so there are four BMC IP addresses, one per IPU-M2000. Unit 1 (IPU-M2000 #1) is shown at the bottom which reflects the physical placement of the IPU-M2000s in the rack:

Unit 4: BMC 10.44.44.92
Unit 3: BMC 10.44.44.98
Unit 2: BMC 10.44.44.61
Unit 1: BMC 10.44.44.74

The URLs are also listed in the installation logs so if you disabled port forwarding during installation the you can use the following method instead.

They are listed as lines beginning with “BMC-<n> web server running at” in the latest installation logs which can be found here: ~/IPU-M_releases/IPU_M_SW_<release_version>/maintenance_tools/logs/direct-attach-<date&time>.log. In this case however Unit 1 (IPU-M2000 #1) is at the top of the list, which is the reverse order of the physical IPU-M2000 placement in the rack:

BMC-1 web server running at: https://10.44.44.74
BMC-2 web server running at: https://10.44.44.61
BMC-3 web server running at: https://10.44.44.98
BMC-4 web server running at: https://10.44.44.92

The standard BMC web server port 443 is mapped to unique port numbers as follows:

Unit 4: BMC web server: https://<direct-attach-server>:7803
Unit 3: BMC web server: https://<direct-attach-server>:7802
Unit 2: BMC web server: https://<direct-attach-server>:7801
Unit 1: BMC web server: https://<direct-attach-server>:7800

Refer to the BMC user guide for more details about openbmctool and the BMC web GUI.

6.4.3. V-IPU management via IPU-Gateway command line

The IPU-M2000 offers an SSH-based itadmin account on the IPU-Gateway allowing access to the vipu-admin command line interface (CLI). This CLI is not intended for daily operation of the IPU-M2000s but can be used for troubleshooting.

The V-IPU Admin Guide contains the required information and should be consulted for details – it is available here. Make sure the selected document version matches the two first major.minor version numbers of the V-IPU software running on the master IPU-M2000. To find the version number, run the command vipu-admin --version.