2. Installation
2.1. Prerequisites
Before you can use IPUs from your Kubernetes workloads, you need to meet the following conditions:
Have access to one or more IPU-PODs.
Have a compatible version of the IPU Operator with V-IPU Controller installed on your IPU-POD. IPU Operator >= 1.1.0 should work with V-IPU Controller >= 1.17. You can check the V-IPU Controller version using this command:
$ vipu --server-version
Create a Kubernetes cluster (we currently support version 1.21, 1.22, 1.23 and 1.24). At least one of the worker nodes in the cluster must be on the head node (Poplar server) of the Graphcore Pod. See Section 7, Known limitations for more information.
Optionally, you may configure a MACVLAN to avoid the use of host networking. See Using MACVLAN with Kubernetes for more information.
Have the kubectl and Helm (v3.0.0 or later) command-line tools installed on your machine.
Nodes in Kubernetes cluster that are running on Poplar server should be marked by a label so they can be discovered and used for assigning Pods with workload. This can be done by a command:
$ kubectl label nodes <someworkernode> vipu-ctrl=vipu-controller
More details about that can be found in the sections: Section 2.5, Basic installation and Section 2.6, Multiple V-IPU Controllers.
2.2. Installation methods
There are two installation methods:
Installing using the IPU Operator Helm chart hosted on GitHub Pages (requires internet connection)
Installing using an IPU Operator tarball downloaded from Graphcore’s Software Download Portal (may not require any internet connection)
In both cases the installation process requires two steps:
IPUJob
CRD installationIPU Operator installation
2.3. Installation using Helm chart hosted on GitHub
2.3.1. Installing the IPUJob CRD
For the IPU Operator to work properly you must install the IPUJobs
CRD
specification first. You can obtain this directly from the GitHub
releases or using
the following command:
$ curl -s https://api.github.com/repos/graphcore/helm-charts/releases/latest \
| grep -wo "https.*ipujobs.*yaml" | wget -qi -
To install the IPUJob
CRD:
$ kubectl apply -f graphcore_ai_ipujobs_<version>.yaml
and check if it is installed:
$ kubectl get crd ipujobs.graphcore.ai
NAME CREATED AT
ipujobs.graphcore.ai 2022-11-14T11:05:06Z
2.3.2. Installing the IPU Operator
Add the Helm charts repository as follows:
$ helm repo add <repo-name> https://helm-charts.graphcore.ai/
If you had already added this repo earlier, run helm repo update
to
retrieve the latest versions of the packages. You can then run
helm search repo <repo-name>
to see the charts.
To install the ipu-operator chart do:
$ helm install [RELEASE_NAME] <repo-name>/ipu-operator --version [VERSION] [CUSTOM_PARAMETERS]
where:
[RELEASE_NAME]
is the name you choose for this Helm installation, for exampleipu-operator
;<repo-name>/ipu-operator
is the name of the IPU Operator Helm chart in that repository;[VERSION]
is the exact chart version to install, for example1.1.0
. If you don’t specify the--version
switch then the latest version is used;[CUSTOM_PARAMETERS]
is where you customise the installation (see: Section 2.5, Basic installation).
2.4. Installation using Helm chart from Graphcore’s SDP
2.4.1. Download package
The IPU Operator package can be downloaded from Graphcore’s Software Download Portal (SDP). Navigate to the IPU-POD Systems tab and then click the Load More Packages button as necessary until the IPU Kubernetes Integration package is visible. This package contains the IPU Operator. Click the Download button on the right to retrieve the package.
The software is delivered as a single tarball containing the following files:
File |
Description |
---|---|
|
the |
|
the list of 3rd party components used in the IPU Operator and their licenses |
|
the Helm Chart |
|
the IPU Operator |
|
a readme that describes the IPU Operator Helm chart |
2.4.2. Installing the IPUJob CRD
To install the IPUJob
CRD, run the following command:
$ kubectl apply -f <dir>/CRDs/graphcore.ai_ipujobs.yaml
where <dir>
is a directory where the IPU Operator tarball has been
unpacked.
2.4.3. Installing the IPU Operator from a local container repository
This step involves the Helm tool and installing the IPU Operator Helm chart that was unpacked from the tarball without access to cloud repositories.
Firstly, ensure that you have access to a local container repository: details of how to run a basic Docker repository are available from docs.docker.com.
Next, load the images from the tarball into your local Docker instance:
$ docker image load -i ipu-operator-images.tar.gz
And then push these to the local repository:
$ VERSION='1.1.0'
$ REGISTRY='localhost:5000'
$ for image in launcher controller vipu-proxy; do
> docker tag "ipu-operator-${image}:${VERSION}" "${REGISTRY}/ipu-operator-${image}:${VERSION}"
> docker push "${REGISTRY}/ipu-operator-${image}:${VERSION}"
> done
Run the following command:
$ RELEASE_NAME='ipu-operator'
$ VERSION='1.1.0'
$ VIPU_CONTROLLERS='172.20.224.3:8090'
$ REGISTRY='localhost:5000'
$ helm install "${RELEASE_NAME}" "ipu-operator-helm-chart-${VERSION}.tgz" \
--set global.vipuControllers="vipu-controller:8090:vipu-ctrl=vipu-controller" \
--set-string global.launcherImage="${REGISTRY}/ipu-operator-launcher" \
--set controller.image.repository="${REGISTRY}/ipu-operator-controller" \
--set vipuProxy.image.repository="${REGISTRY}/ipu-operator-vipu-proxy" \
[CUSTOM_PARAMETERS]
where:
RELEASE_NAME
is the name you choose for this Helm installation, for exampleipu-operator
;ipu-operator-helm-chart-${VERSION}.tgz
is the tarball containing the IPU Operator Helm chart;[CUSTOM_PARAMETERS]
is where you can further customise the installation (see: Section 2.5, Basic installation).
If the version tags within the local repository are different to the originals from the loaded images — which is not recommended — then also add the following parameters:
--set-string global.launcherImageTag="${VERSION}" \
--set-string controller.image.tag="${VERSION}" \
--set-string vipuProxy.image.tag="${VERSION}" \
2.5. Basic installation
Regardless of used installation method, [CUSTOM_PARAMETERS]
are
the same. The following command shows the basic installation of the
IPU Operator to the Kubernetes cluster in the default configuration.
The only required parameter is global.vipuControllers
– it
indicates the location of the V-IPU Controller that provides access to IPUs.
$ cd ipu-operator
$ helm install ipu-operator ipu-operator-helm-chart-1.1.0.tgz \
--set global.vipuControllers="vipu-controller:8090:vipu-ctrl=vipu-controller"
You can either use multiple --set key=value
arguments, or put your
customization in a YAML file and use the --values your-values.yaml
argument.
See Section 3, Configurations below for more information.
The global.vipuControllers
provides a comma-separated list of
V-IPU Controllers’ definitions. These definitions are built from three
elements separated a colon (:
):
IP address or DNS name of the V-IPU Controller
V-IPU Controller listening port (most often 8090)
node selector in the form
key=value
that selects nodes that run on Poplar servers associated with this V-IPU Controller
In the example above: vipu-controller
is a DNS name of V-IPU
Controller, 8090
is a listening port. These can be found using
this vipu
command:
$ vipu --server-version
version: 1.18.0
host: 172.20.224.3:8090
This shows that the address of V-IPU Controller is 172.20.224.3
and listening port is 8090
.
The last element, vipu-ctrl=vipu-controller
, is a label set on
nodes running on Poplar servers associated with this V-IPU Controller.
These settings are stored in the ipu-operator-vipu-controllers
ConfigMap. More details about defining V-IPU Controllers can be
found in Section 2.6, Multiple V-IPU Controllers below.
The helm install
command installs and creates the following
resources in the default namespace where the Helm release was
installed:
ipu-operator-controller-manager
andipu-operator-vipu-proxy
as deployments (they can be listed bykubectl get deployments
command).ipu-operator-vipu-proxy
,ipu-operator-webhook-svc
andipu-operator-controller-manager-metrics
as services of type ClusterIP (they can be listed bykubectl get svc
command).RBAC: ServiceAccount, ClusterRole to manage Pods and ConfigMaps.
ipu-partitions-tracker
ConfigMap for tracking created IPU partitions.Configuration objects for the mutation and validation webhooks.
You can read more about installing Helm charts in the Helm documentation.
You can see all the customisation options in the README.md
for the
Helm charts included in the IPU Operator package.
2.6. Multiple V-IPU Controllers
The IPU Operator can communicate with multiple V-IPU Controllers.
You can specify multiple V-IPU Controllers during installation by setting the
vipuControllers
option on the helm install
command line. For example:
--set global.vipuControllers="vctl1:8090:vipu-ctrl=vctl1\,vctl2:8090:vipu-ctrl=vctl2"
where:
vctl1
andvctl2
are DNS names of two V-IPU Controllers,8090 is a listening port (the same in both cases),
and
vipu-ctrl=vctl1
label allows you to select nodes associated with V-IPU Controller 1 whilevipu-ctrl=vctl2
selects nodes of V-IPU Controller 2.
Alternatively, after installation you can edit the ConfigMap, as shown below, and update the value.
$ kubectl edit configmap ipu-operator-vipu-controllers
Each V-IPU Controller is specified with a colon-separated list of three values:
V-IPU Controller host address
V-IPU Controller port
A label defined by
key=value
.
The same label must be added to the node where the containers corresponding to that V-IPU Controller will run. Labeling the node is done with the following command:
$ kubectl label nodes <someworkernode> <key>=<value>
The ConfigMap can be modified at any time and the IPU Operator automatically adds the new V-IPU Controller to its internal list. It can take up to 60 seconds for the new V-IPU Controller to be added. When a partition is created, the IPU Operator goes through the list sequentially until it finds space for the requested number of IPUs.
2.7. Verify the installation is successful
When the installation is complete, you can verify that it worked correctly by running the following commands and seeing similar output:
$ kubectl get crd
NAME CREATED AT
ipujobs.graphcore.ai 2022-10-19T12:20:04Z
...
$ helm ls -n <the-namespace-where-you-deployed-the-operator>
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
ipu-operator default 1 2022-10-19 17:49:52.98211 +0000 UTC deployed ipu-operator-0.0.1 1.1.0
$ kubectl get pods -n <the-namespace-where-you-deployed-the-operator>
NAME READY STATUS RESTARTS AGE
ipu-operator-controller-manager-54766f7f7b-x5wtr 2/2 Running 0 5d23h
ipu-operator-vipu-proxy-844c7d6b7f-88bqr 1/1 Running 1 5d23h
2.8. Uninstall the IPU Operator
To uninstall the IPU Operator:
$ helm uninstall [RELEASE_NAME]
This removes all the Kubernetes components associated with the chart and deletes the release.
See helm uninstall for command documentation.
Note
The partition tracker ConfigMap ipu-partitions-tracker
does not get deleted
when you uninstall the Helm release. This is so that when the V-IPU Proxy
is deployed
again, it can pick up from where it was uninstalled before (in terms of managing the
created partitions). If you wish to remove that ConfigMap, you can run:
kubectl delete configmap ipu-partitions-tracker -n <namespace>
2.9. Upgrade the IPU Operator
To upgrade the IPU Operator:
$ helm upgrade [RELEASE_NAME] [CHART]
See helm upgrade for command documentation.