2. Installation

2.1. Prerequisites

Before you can use IPUs from your Kubernetes workloads, you need to meet the following conditions:

  1. Have access to one or more IPU-PODs.

  2. Have a compatible version of the IPU Operator with V-IPU Controller installed on your IPU-POD. IPU Operator >= 1.1.0 should work with V-IPU Controller >= 1.17. You can check the V-IPU Controller version using this command:

    $ vipu --server-version
    
  3. Create a Kubernetes cluster (we currently support version 1.21, 1.22, 1.23 and 1.24). At least one of the worker nodes in the cluster must be on the head node (Poplar server) of the Graphcore Pod. See Section 7, Known limitations for more information.

    Optionally, you may configure a MACVLAN to avoid the use of host networking. See Using MACVLAN with Kubernetes for more information.

  4. Have the kubectl and Helm (v3.0.0 or later) command-line tools installed on your machine.

  5. Nodes in Kubernetes cluster that are running on Poplar server should be marked by a label so they can be discovered and used for assigning Pods with workload. This can be done by a command:

    $ kubectl label nodes <someworkernode> vipu-ctrl=vipu-controller
    

    More details about that can be found in the sections: Section 2.5, Basic installation and Section 2.6, Multiple V-IPU Controllers.

2.2. Installation methods

There are two installation methods:

  1. Installing using the IPU Operator Helm chart hosted on GitHub Pages (requires internet connection)

  2. Installing using an IPU Operator tarball downloaded from Graphcore’s Software Download Portal (may not require any internet connection)

In both cases the installation process requires two steps:

  1. IPUJob CRD installation

  2. IPU Operator installation

2.3. Installation using Helm chart hosted on GitHub

2.3.1. Installing the IPUJob CRD

For the IPU Operator to work properly you must install the IPUJobs CRD specification first. You can obtain this directly from the GitHub releases or using the following command:

$ curl -s https://api.github.com/repos/graphcore/helm-charts/releases/latest \
  | grep -wo "https.*ipujobs.*yaml" | wget -qi -

To install the IPUJob CRD:

$ kubectl apply -f graphcore_ai_ipujobs_<version>.yaml

and check if it is installed:

$ kubectl get crd ipujobs.graphcore.ai
NAME                   CREATED AT
ipujobs.graphcore.ai   2022-11-14T11:05:06Z

2.3.2. Installing the IPU Operator

Add the Helm charts repository as follows:

$ helm repo add <repo-name> https://helm-charts.graphcore.ai/

If you had already added this repo earlier, run helm repo update to retrieve the latest versions of the packages. You can then run helm search repo <repo-name> to see the charts.

To install the ipu-operator chart do:

$ helm install [RELEASE_NAME] <repo-name>/ipu-operator --version [VERSION] [CUSTOM_PARAMETERS]

where:

  • [RELEASE_NAME] is the name you choose for this Helm installation, for example ipu-operator;

  • <repo-name>/ipu-operator is the name of the IPU Operator Helm chart in that repository;

  • [VERSION] is the exact chart version to install, for example 1.1.0. If you don’t specify the --version switch then the latest version is used;

  • [CUSTOM_PARAMETERS] is where you customise the installation (see: Section 2.5, Basic installation).

2.4. Installation using Helm chart from Graphcore’s SDP

2.4.1. Download package

The IPU Operator package can be downloaded from Graphcore’s Software Download Portal (SDP). Navigate to the IPU-POD Systems tab and then click the Load More Packages button as necessary until the IPU Kubernetes Integration package is visible. This package contains the IPU Operator. Click the Download button on the right to retrieve the package.

The software is delivered as a single tarball containing the following files:

File

Description

CRDs/graphcore.ai_ipujobs.yaml

the IPUJob CRD specification

licenses/license_3rd_party_deps.html

the list of 3rd party components used in the IPU Operator and their licenses

ipu-operator-helm-chart-VERSION.tgz

the Helm Chart

ipu-operator-images.tar.gz

the IPU Operator Controller, Launcher and V-IPU Proxy Docker images

README.md

a readme that describes the IPU Operator Helm chart

2.4.2. Installing the IPUJob CRD

To install the IPUJob CRD, run the following command:

$ kubectl apply -f <dir>/CRDs/graphcore.ai_ipujobs.yaml

where <dir> is a directory where the IPU Operator tarball has been unpacked.

2.4.3. Installing the IPU Operator from a local container repository

This step involves the Helm tool and installing the IPU Operator Helm chart that was unpacked from the tarball without access to cloud repositories.

Firstly, ensure that you have access to a local container repository: details of how to run a basic Docker repository are available from docs.docker.com.

Next, load the images from the tarball into your local Docker instance:

$ docker image load -i ipu-operator-images.tar.gz

And then push these to the local repository:

$ VERSION='1.1.0'
$ REGISTRY='localhost:5000'
$ for image in launcher controller vipu-proxy; do
>   docker tag "ipu-operator-${image}:${VERSION}" "${REGISTRY}/ipu-operator-${image}:${VERSION}"
>   docker push "${REGISTRY}/ipu-operator-${image}:${VERSION}"
> done

Run the following command:

$ RELEASE_NAME='ipu-operator'
$ VERSION='1.1.0'
$ VIPU_CONTROLLERS='172.20.224.3:8090'
$ REGISTRY='localhost:5000'
$ helm install "${RELEASE_NAME}" "ipu-operator-helm-chart-${VERSION}.tgz" \
    --set global.vipuControllers="vipu-controller:8090:vipu-ctrl=vipu-controller" \
    --set-string global.launcherImage="${REGISTRY}/ipu-operator-launcher" \
    --set controller.image.repository="${REGISTRY}/ipu-operator-controller" \
    --set vipuProxy.image.repository="${REGISTRY}/ipu-operator-vipu-proxy" \
    [CUSTOM_PARAMETERS]

where:

  • RELEASE_NAME is the name you choose for this Helm installation, for example ipu-operator;

  • ipu-operator-helm-chart-${VERSION}.tgz is the tarball containing the IPU Operator Helm chart;

  • [CUSTOM_PARAMETERS] is where you can further customise the installation (see: Section 2.5, Basic installation).

If the version tags within the local repository are different to the originals from the loaded images — which is not recommended — then also add the following parameters:

--set-string global.launcherImageTag="${VERSION}" \
--set-string controller.image.tag="${VERSION}" \
--set-string vipuProxy.image.tag="${VERSION}" \

2.5. Basic installation

Regardless of used installation method, [CUSTOM_PARAMETERS] are the same. The following command shows the basic installation of the IPU Operator to the Kubernetes cluster in the default configuration. The only required parameter is global.vipuControllers – it indicates the location of the V-IPU Controller that provides access to IPUs.

$ cd ipu-operator
$ helm install ipu-operator ipu-operator-helm-chart-1.1.0.tgz \
  --set global.vipuControllers="vipu-controller:8090:vipu-ctrl=vipu-controller"

You can either use multiple --set key=value arguments, or put your customization in a YAML file and use the --values your-values.yaml argument.

See Section 3, Configurations below for more information.

The global.vipuControllers provides a comma-separated list of V-IPU Controllers’ definitions. These definitions are built from three elements separated a colon (:):

  1. IP address or DNS name of the V-IPU Controller

  2. V-IPU Controller listening port (most often 8090)

  3. node selector in the form key=value that selects nodes that run on Poplar servers associated with this V-IPU Controller

In the example above: vipu-controller is a DNS name of V-IPU Controller, 8090 is a listening port. These can be found using this vipu command:

$ vipu --server-version
version: 1.18.0
host: 172.20.224.3:8090

This shows that the address of V-IPU Controller is 172.20.224.3 and listening port is 8090.

The last element, vipu-ctrl=vipu-controller, is a label set on nodes running on Poplar servers associated with this V-IPU Controller.

These settings are stored in the ipu-operator-vipu-controllers ConfigMap. More details about defining V-IPU Controllers can be found in Section 2.6, Multiple V-IPU Controllers below.

The helm install command installs and creates the following resources in the default namespace where the Helm release was installed:

  • ipu-operator-controller-manager and ipu-operator-vipu-proxy as deployments (they can be listed by kubectl get deployments command).

  • ipu-operator-vipu-proxy, ipu-operator-webhook-svc and ipu-operator-controller-manager-metrics as services of type ClusterIP (they can be listed by kubectl get svc command).

  • RBAC: ServiceAccount, ClusterRole to manage Pods and ConfigMaps.

  • ipu-partitions-tracker ConfigMap for tracking created IPU partitions.

  • Configuration objects for the mutation and validation webhooks.

You can read more about installing Helm charts in the Helm documentation.

You can see all the customisation options in the README.md for the Helm charts included in the IPU Operator package.

2.6. Multiple V-IPU Controllers

The IPU Operator can communicate with multiple V-IPU Controllers.

You can specify multiple V-IPU Controllers during installation by setting the vipuControllers option on the helm install command line. For example:

--set global.vipuControllers="vctl1:8090:vipu-ctrl=vctl1\,vctl2:8090:vipu-ctrl=vctl2"

where:

  • vctl1 and vctl2 are DNS names of two V-IPU Controllers,

  • 8090 is a listening port (the same in both cases),

  • and vipu-ctrl=vctl1 label allows you to select nodes associated with V-IPU Controller 1 while vipu-ctrl=vctl2 selects nodes of V-IPU Controller 2.

Alternatively, after installation you can edit the ConfigMap, as shown below, and update the value.

$ kubectl edit configmap ipu-operator-vipu-controllers

Each V-IPU Controller is specified with a colon-separated list of three values:

  • V-IPU Controller host address

  • V-IPU Controller port

  • A label defined by key=value.

The same label must be added to the node where the containers corresponding to that V-IPU Controller will run. Labeling the node is done with the following command:

$ kubectl label nodes <someworkernode> <key>=<value>

The ConfigMap can be modified at any time and the IPU Operator automatically adds the new V-IPU Controller to its internal list. It can take up to 60 seconds for the new V-IPU Controller to be added. When a partition is created, the IPU Operator goes through the list sequentially until it finds space for the requested number of IPUs.

2.7. Verify the installation is successful

When the installation is complete, you can verify that it worked correctly by running the following commands and seeing similar output:

$ kubectl get crd
NAME                   CREATED AT
ipujobs.graphcore.ai   2022-10-19T12:20:04Z
...

$ helm ls -n <the-namespace-where-you-deployed-the-operator>
NAME          NAMESPACE  REVISION  UPDATED                              STATUS    CHART               APP VERSION
ipu-operator  default    1         2022-10-19 17:49:52.98211 +0000 UTC  deployed  ipu-operator-0.0.1  1.1.0

$ kubectl get pods -n <the-namespace-where-you-deployed-the-operator>
NAME                                               READY   STATUS    RESTARTS   AGE
ipu-operator-controller-manager-54766f7f7b-x5wtr   2/2     Running   0          5d23h
ipu-operator-vipu-proxy-844c7d6b7f-88bqr           1/1     Running   1          5d23h

2.8. Uninstall the IPU Operator

To uninstall the IPU Operator:

$ helm uninstall [RELEASE_NAME]

This removes all the Kubernetes components associated with the chart and deletes the release.

See helm uninstall for command documentation.

Note

The partition tracker ConfigMap ipu-partitions-tracker does not get deleted when you uninstall the Helm release. This is so that when the V-IPU Proxy is deployed again, it can pick up from where it was uninstalled before (in terms of managing the created partitions). If you wish to remove that ConfigMap, you can run: kubectl delete configmap ipu-partitions-tracker -n <namespace>

2.9. Upgrade the IPU Operator

To upgrade the IPU Operator:

$ helm upgrade [RELEASE_NAME] [CHART]

See helm upgrade for command documentation.