6. Integration with Kubernetes
Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications.
The Kubernetes plugin for V-IPU will provide an easy way to specify the number of IPUs required for your Kubernetes workload using annotations. This chapter outlines the plugin design, installation steps and usage.
A preview of the Kubernetes plugin is available by request. Please contact Graphcore support if you are interested.
Note
Kubernetes uses the word “Pod” to refer to the smallest deployable units of computing that you can create and manage in Kubernetes. This is not to be confused with the Graphcore IPU-POD, which is a rack-based system of IPUs.
6.1. Components and design
This plugin contains two components
The
ipu-controller
that communicates with the V-IPU controller (vipu-server
)A Kubernetes admission controller,
gc-webhook
The ipu-controller
is responsible for:
Managing the IPU resources by communicating with the V-IPU controller
Watching and releasing IPU resources on Pod termination
Running the REST API server to serve requests from
init-containers
for partition creationEmbedding the V-IPU command line binary to communicate with V-IPU controller
gc-webhook
is a Kubernetes dynamic admission controller application. It mutates the Pod spec by:
Mounting the ConfigMap containing the IPUoF configurations as a volume to both
init-container
and Poplar containerInjecting the init-container that creates partitions via
curl
commands, and writing the IPUoF config result to the ConfigMap volume mountAdding a watcher label
graphcore.ai/pod
for theipu-controller
to manage the lifecycle of the PodSetting the
hostNetwork
andsecurityContext/privileged
to trueSetting the Pod
dnsPolicy
toClusterFirstWithHostNet
Adding the annotation
graphcore.ai/status: injected
marking the Pod as mutated by thegc-webhook
The following components are installed by the Kubernetes plugin:
ipu-controller
andgc-webhook
are run as deploymentsipu-cluster
service is of type ClusterIPV-IPU controller service as ExternalName - to be used by
ipu-controller
to communicate with the V-IPU controllerRBAC: ServiceAccount, ClusterRole to manage Pods and ConfigMaps
6.2. Package contents
The software is delivered in two tarballs. One containing the Helm Chart and one with the Docker containers.
The Helm Chart contains a collection of files that will deploy the ipu-controller
and gc-webhook
to the cluster.
The Docker container zip contains the gc-webhook
and ipu-controller
Docker images as tar files. Unzip and push them to
a registry that is accessible from the Kubernetes cluster. The ipu-controller
Docker images embed the the V-IPU command line tools.
6.3. Deploying the software
6.3.1. Prerequisites
The Helm tool is already installed
Kubernetes cluster 1.14+
V-IPU controller installed and running
6.3.2. Install
$ cd ipu-controller
$ helm install [RELEASE_NAME] . --set vipuServerAddr=[host] --set vipuServerPort=[port] \
--set image.repository=[controller-image-repo] --set image.tag=[controller-image-tag] \
--set admissionWebhooks.image.repository=[webhook-image-repo] \
--set admissionWebhooks.image.tag=[webhook-image-tag]
The command above deploys the software to the Kubernetes cluster in the default configuration. See Section 6.3.5, Configurations below.
6.3.3. Uninstall
$ helm uninstall [RELEASE_NAME]
This removes all the Kubernetes components associated with the chart and deletes the release.
See helm uninstall for command documentation.
6.3.4. Upgrading chart
$ helm upgrade [RELEASE_NAME] [CHART] --install
See helm upgrade for command documentation.
6.3.5. Configurations
The following table lists the configurable parameters of the chart and their default values.
Parameter |
Description |
Default |
---|---|---|
image.pullPolicy |
IPU controller image pullPolicy |
IfNotPresent |
imagePullSecrets |
A list of image pull secrets names |
[] |
nameOverride |
“” |
|
fullNameOverride |
“” |
|
serviceAccount.create |
Whether to create a service account for the |
true |
serviceAccount.annotations |
IPU controller service account annotations |
{} |
serviceAccount.name |
The name of the service account to use. If not set and create is true, a name is generated using the “fullname” template |
“” |
rbac.create |
Whether to create RBAC ClusterRole and ClusterRoleBinding and attach them to the service account |
true |
vipuServerAddr |
|
example.com |
vipuServerPort |
|
8191 |
controllerPort |
|
8080 |
podAnnotations |
V-IPU controller Pod annotations |
{} |
podSecurityPolicyContext |
|
{} |
securityContext |
Security context |
{} |
service.type |
|
clusterIP |
service.port |
|
80 |
resources |
V-IPU controller compute resources |
{} |
autoscaling.enabled |
Whether the |
false |
admissionWebhooks.scope |
Comma-separated list of namespaces where the webhook should perform mutations. Leaving this empty/unset means mutation is performed on all namespaces. |
|
admissionWebhooks.image.repository |
Admission webhook image repository |
|
admissionWebhooks.image.tag |
Admission webhook image |
v18 |
admissionWebhooks.image.pullPolicy |
Admission webhooks image pullPolicy |
IfNotPresent |
admissionWebhooks.failurePolicy |
Admission webhook failure policy |
Fail |
admissionWebhooks.port |
Admission webhook Pod port |
8443 |
admissionWebhooks.service.annotations |
Admission webhook service annotations |
{} |
admissionWebhooks.service.servicePort |
Admission webhook service port |
443 |
admissionWebhooks.service.type |
admission webhook service type |
clusterIP |
admissionWebhooks.patch.enabled |
Create and configure admission webhook TLS secret |
true |
admissionWebhooks.patch.image.repository |
Admission webhook TLS patch image repository |
|
admissionWebhooks.patch.image.tag |
Admission webhook TLS patch image tag |
v1.3.0 |
admissionWebhooks.patch.image.pullPolicy |
Admission webhook TLS patch image pull policy |
|
admissionWebhooks.patch.priorityClassName |
Admission webhook TLS patch jobs priority class |
“” |
admissionWebhooks.patch.podAnnotations |
Admission webhook TLS patch jobs Pod annotations |
{} |
admissionWebhooks.patch.runAsUser |
Admission webhook TLS patch jobs run as user |
2000 |
6.4. Usage
To schedule the workload on IPUs, you can request the number of IPUs required via annotations. For example:
apiVersion: v1
kind: Pod
metadata:
name: sharding-app
annotations:
graphcore.ai/ipus: "2"
spec:
containers:
- name: demo
image: localhost:5000/sharding:latest
Only Pod specifications are supported.