2. Installation
2.1. Prerequisites
Before you can use IPUs from your Kubernetes workloads, you need to meet the following conditions:
Have access to one or more IPU-PODs.
Have a compatible version of the IPU Operator with V-IPU Controller installed on your IPU-POD. IPU Operator >= 1.1.0 should work with V-IPU Controller >= 1.17. You can check the V-IPU Controller version using this command:
$ vipu --server-version
Create a Kubernetes cluster (we currently support version 1.21, 1.22, 1.23 and 1.24). At least one of the worker nodes in the cluster must be on the head node (Poplar server) of the Graphcore Pod. See Section 7, Known limitations for more information.
Optionally, you may configure a MACVLAN to avoid the use of host networking. See Using MACVLAN with Kubernetes for more information.
Have the kubectl and Helm (v3.0.0 or later) command-line tools installed on your machine.
Nodes in Kubernetes cluster that are running on Poplar server should be marked by a label so they can be discovered and used for assigning Pods with workload. This can be done by a command:
$ kubectl label nodes <someworkernode> vipu-ctrl=vipu-controller
More details about that can be found in the sections: Section 2.5, Basic installation and Section 2.6, Multiple V-IPU Controllers.
2.2. Installation methods
There are two installation methods:
Installing using the IPU Operator Helm chart hosted on GitHub Pages (see Section 2.3, Installation using Helm chart hosted on GitHub)
Installing using an IPU Operator tarball downloaded from Graphcore’s downloads page (see Section 2.4, Installation using Helm chart from Graphcore’s downloads page)
2.3. Installation using Helm chart hosted on GitHub
Add the Helm charts repository as follows:
$ helm repo add <repo-name> https://helm-charts.graphcore.ai/
If you had already added this repo earlier, run helm repo update
to
retrieve the latest versions of the packages. You can then run
helm search repo <repo-name>
to see the charts.
To install the ipu-operator chart do:
$ helm install [RELEASE_NAME] <repo-name>/ipu-operator --version [VERSION] [CUSTOM_PARAMETERS]
where:
[RELEASE_NAME]
is the name you choose for this Helm installation, for exampleipu-operator
;<repo-name>/ipu-operator
is the name of the IPU Operator Helm chart in that repository;[VERSION]
is the exact chart version to install, for example1.1.0
. If you don’t specify the--version
switch then the latest version is used;[CUSTOM_PARAMETERS]
is where you customise the installation (see: Section 2.5, Basic installation).
2.4. Installation using Helm chart from Graphcore’s downloads page
2.4.1. Download package
The IPU Operator package can be downloaded from Graphcore’s downloads page under the Kubernetes Integration tab.
The software is delivered as a single tarball containing the following files:
File |
Description |
---|---|
|
the IPU Operator |
|
the IPU Operator |
|
the IPU Operator |
|
the list of 3rd party components used in the IPU Operator and their licenses |
|
the Helm Chart package |
|
a readme that describes the IPU Operator Helm chart |
2.4.2. Installing the IPU Operator from a local container repository
This step involves the Helm tool and installing the IPU Operator Helm chart that was unpacked from the tarball without access to cloud repositories.
Firstly, ensure that you have access to a local container repository: details of how to run a basic Docker repository are available from docs.docker.com.
Next, load the images from the tarball into your local Docker instance:
$ docker image load -i ipu-operator-{controller,launcher,vipu-proxy}.tar
And then push these to the local repository:
$ VERSION='1.1.0'
$ REGISTRY='localhost:5000'
$ for image in launcher controller vipu-proxy; do
> docker tag "ipu-operator-${image}:${VERSION}" "${REGISTRY}/ipu-operator-${image}:${VERSION}"
> docker push "${REGISTRY}/ipu-operator-${image}:${VERSION}"
> done
Run the following command:
$ RELEASE_NAME='ipu-operator'
$ VERSION='1.1.0'
$ REGISTRY='localhost:5000'
$ helm install "${RELEASE_NAME}" "ipu-operator-helm-chart-${VERSION}.tgz" \
--set-string global.launcherImage="${REGISTRY}/ipu-operator-launcher" \
--set controller.image.repository="${REGISTRY}/ipu-operator-controller" \
--set vipuProxy.image.repository="${REGISTRY}/ipu-operator-vipu-proxy" \
[CUSTOM_PARAMETERS]
where:
RELEASE_NAME
is the name you choose for this Helm installation, for exampleipu-operator
;ipu-operator-helm-chart-${VERSION}.tgz
is the tarball containing the IPU Operator Helm chart;[CUSTOM_PARAMETERS]
is where you can further customise the installation (see: Section 2.5, Basic installation).
If the version tags within the local repository are different to the originals from the loaded images — which is not recommended — then also add the following parameters:
--set-string global.launcherImageTag="${VERSION}" \
--set-string controller.image.tag="${VERSION}" \
--set-string vipuProxy.image.tag="${VERSION}" \
2.5. Basic installation
Regardless of used installation method, [CUSTOM_PARAMETERS]
are
the same. The following command shows the basic installation of the
IPU Operator to the Kubernetes cluster in the default configuration.
The only required parameter is global.vipuControllers
– it
indicates the location of the V-IPU Controller that provides access to IPUs.
$ cd ipu-operator
$ helm install ipu-operator ipu-operator-helm-chart-1.1.0.tgz \
--set global.vipuControllers="vipu-controller:8090:vipu-ctrl=vipu-controller"
You can either use multiple --set key=value
arguments, or put your
customization in a YAML file and use the --values your-values.yaml
argument.
See Section 3, Configurations below for more information.
The global.vipuControllers
provides a comma-separated list of
V-IPU Controllers’ definitions. These definitions are built from three
elements separated a colon (:
):
IP address or DNS name of the V-IPU Controller
V-IPU Controller listening port (most often 8090)
node selector in the form
key=value
that selects nodes that run on Poplar servers associated with this V-IPU Controller
In the example above: vipu-controller
is a DNS name of V-IPU
Controller, 8090
is a listening port. These can be found using
this vipu
command:
$ vipu --server-version
version: 1.18.0
host: 172.20.224.3:8090
This shows that the address of V-IPU Controller is 172.20.224.3
and listening port is 8090
.
The last element, vipu-ctrl=vipu-controller
, is a label set on
nodes running on Poplar servers associated with this V-IPU Controller.
These settings are stored in the ipu-operator-vipu-controllers
ConfigMap. More details about defining V-IPU Controllers can be
found in Section 2.6, Multiple V-IPU Controllers below.
The helm install
command installs and creates the following
resources in the default namespace where the Helm release was
installed:
ipu-operator-controller-manager
andipu-operator-vipu-proxy
as deployments (they can be listed bykubectl get deployments
command).ipu-operator-vipu-proxy
,ipu-operator-webhook-svc
andipu-operator-controller-manager-metrics
as services of type ClusterIP (they can be listed bykubectl get svc
command).RBAC: ServiceAccount, ClusterRole to manage Pods and ConfigMaps.
ipu-partitions-tracker
ConfigMap for tracking created IPU partitions.Configuration objects for the mutation and validation webhooks.
You can read more about installing Helm charts in the Helm documentation.
You can see all the customisation options in the README.md
for the
Helm charts included in the IPU Operator package.
2.6. Multiple V-IPU Controllers
The IPU Operator can communicate with multiple V-IPU Controllers.
You can specify multiple V-IPU Controllers during installation by setting the
vipuControllers
option on the helm install
command line. For example:
--set global.vipuControllers="vctl1:8090:vipu-ctrl=vctl1\,vctl2:8090:vipu-ctrl=vctl2"
where:
vctl1
andvctl2
are DNS names of two V-IPU Controllers,8090 is a listening port (the same in both cases),
and
vipu-ctrl=vctl1
label allows you to select nodes associated with V-IPU Controller 1 whilevipu-ctrl=vctl2
selects nodes of V-IPU Controller 2.
Alternatively, after installation you can edit the ConfigMap, as shown below, and update the value.
$ kubectl edit configmap ipu-operator-vipu-controllers
Each V-IPU Controller is specified with a colon-separated list of three values:
V-IPU Controller host address
V-IPU Controller port
A label defined by
key=value
.
The same label must be added to the node where the containers corresponding to that V-IPU Controller will run. Labeling the node is done with the following command:
$ kubectl label nodes <someworkernode> <key>=<value>
The ConfigMap can be modified at any time and the IPU Operator automatically adds the new V-IPU Controller to its internal list. It can take up to 60 seconds for the new V-IPU Controller to be added. When a partition is created, the IPU Operator goes through the list sequentially until it finds space for the requested number of IPUs.
2.7. Verify the installation is successful
When the installation is complete, you can verify that it worked correctly by running the following commands and seeing similar output:
$ kubectl get crd
NAME CREATED AT
ipujobs.graphcore.ai 2022-10-19T12:20:04Z
...
$ helm ls -n <the-namespace-where-you-deployed-the-operator>
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
ipu-operator default 1 2022-10-19 17:49:52.98211 +0000 UTC deployed ipu-operator-0.0.1 1.1.0
$ kubectl get pods -n <the-namespace-where-you-deployed-the-operator>
NAME READY STATUS RESTARTS AGE
ipu-operator-controller-manager-54766f7f7b-x5wtr 2/2 Running 0 5d23h
ipu-operator-vipu-proxy-844c7d6b7f-88bqr 1/1 Running 1 5d23h
2.8. Uninstall the IPU Operator
To uninstall the IPU Operator:
$ helm uninstall [RELEASE_NAME]
This removes the Kubernetes components associated with the chart and deletes the release. See helm uninstall for command documentation.
Currently there is no support for deleting CRDs using Helm and the ipujobs.graphcore.ai
CRD doesn’t get removed by this step, if you want to remove them also, you will need to do it
manually by running:
$ kubectl delete crd ipujobs.graphcore.ai
Refer to Helm documentation for more explanation.
Note
The partition tracker ConfigMap ipu-partitions-tracker
does not get deleted
when you uninstall the Helm release. This is so that when the V-IPU Proxy
is deployed
again, it can pick up from where it was uninstalled before (in terms of managing the
created partitions). If you wish to remove that ConfigMap, you can run:
kubectl delete configmap ipu-partitions-tracker -n <namespace>
2.9. Upgrade the IPU Operator
To upgrade the IPU Operator:
$ helm upgrade [RELEASE_NAME] [CHART]
See helm upgrade for command documentation.
Note
Currently there is no support for upgrading CRDs using Helm. If a CRD upgrade is
required you will first need to delete the old CRDs before installing the new Helm
chart version as described in the Uninstall the IPU Operator
section (see
Section 2.8, Uninstall the IPU Operator).