6. Integration with Kubernetes

Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications.

The Kubernetes plugin for V-IPU will provide an easy way to specify the number of IPUs required for your Kubernetes workload using annotations. This chapter outlines the plugin design, installation steps and usage.

A preview of the Kubernetes plugin is available by request. Please contact Graphcore support if you are interested.

Note

Kubernetes uses the word “Pod” to refer to the smallest deployable units of computing that you can create and manage in Kubernetes. This is not to be confused with the Graphcore IPU-POD, which is a rack-based system of IPUs.

6.1. Components and design

This plugin contains two components

  • The ipu-controller that communicates with the V-IPU controller (vipu-server)

  • A Kubernetes admission controller, gc-webhook

The ipu-controller is responsible for:

  • Managing the IPU resources by communicating with the V-IPU controller

  • Watching and releasing IPU resources on Pod termination

  • Running the REST API server to serve requests from init-containers for partition creation

  • Embedding the V-IPU command line binary to communicate with V-IPU controller

gc-webhook is a Kubernetes dynamic admission controller application. It mutates the Pod spec by:

  • Mounting the ConfigMap containing the IPUoF configurations as a volume to both init-container and Poplar container

  • Injecting the init-container that creates partitions via curl commands, and writing the IPUoF config result to the ConfigMap volume mount

  • Adding a watcher label graphcore.ai/pod for the ipu-controller to manage the lifecycle of the Pod

  • Setting the hostNetwork and securityContext/privileged to true

  • Setting the Pod dnsPolicy to ClusterFirstWithHostNet

  • Adding the annotation graphcore.ai/status: injected marking the Pod as mutated by the gc-webhook

The following components are installed by the Kubernetes plugin:

  • ipu-controller and gc-webhook are run as deployments

  • ipu-cluster service is of type ClusterIP

  • V-IPU controller service as ExternalName - to be used by ipu-controller to communicate with the V-IPU controller

  • RBAC: ServiceAccount, ClusterRole to manage Pods and ConfigMaps

6.2. Package contents

The software is delivered in two tarballs. One containing the Helm Chart and one with the Docker containers.

The Helm Chart contains a collection of files that will deploy the ipu-controller and gc-webhook to the cluster.

The Docker container zip contains the gc-webhook and ipu-controller Docker images as tar files. Unzip and push them to a registry that is accessible from the Kubernetes cluster. The ipu-controller Docker images embed the the V-IPU command line tools.

6.3. Deploying the software

6.3.1. Prerequisites

  • The Helm tool is already installed

  • Kubernetes cluster 1.14+

  • V-IPU controller installed and running

6.3.2. Install

$ cd ipu-controller
$ helm install [RELEASE_NAME] . --set vipuServerAddr=[host] --set vipuServerPort=[port] \
--set image.repository=[controller-image-repo] --set image.tag=[controller-image-tag] \
--set admissionWebhooks.image.repository=[webhook-image-repo] \
--set admissionWebhooks.image.tag=[webhook-image-tag]

The command above deploys the software to the Kubernetes cluster in the default configuration. See Section 6.3.5, Configurations below.

6.3.3. Uninstall

$ helm uninstall [RELEASE_NAME]

This removes all the Kubernetes components associated with the chart and deletes the release.

See helm uninstall for command documentation.

6.3.4. Upgrading chart

$ helm upgrade [RELEASE_NAME] [CHART] --install

See helm upgrade for command documentation.

6.3.5. Configurations

The following table lists the configurable parameters of the chart and their default values.

Parameter

Description

Default

image.pullPolicy

IPU controller image pullPolicy

IfNotPresent

imagePullSecrets

A list of image pull secrets names

[]

nameOverride

“”

fullNameOverride

“”

serviceAccount.create

Whether to create a service account for the ipu-controller or not

true

serviceAccount.annotations

IPU controller service account annotations

{}

serviceAccount.name

The name of the service account to use. If not set and create is true, a name is generated using the “fullname” template

“”

rbac.create

Whether to create RBAC ClusterRole and ClusterRoleBinding and attach them to the service account

true

vipuServerAddr

ipu-controller address

example.com

vipuServerPort

ipu-controller port

8191

controllerPort

ipu-controller port

8080

podAnnotations

V-IPU controller Pod annotations

{}

podSecurityPolicyContext

ipu-controller Pod security policy context

{}

securityContext

Security context

{}

service.type

ipu-controller Kubernetes service type

clusterIP

service.port

ipu-controller Kubernetes service port

80

resources

V-IPU controller compute resources

{}

autoscaling.enabled

Whether the ipu-controller replicas are to be autoscaled or not

false

admissionWebhooks.scope

Comma-separated list of namespaces where the webhook should perform mutations. Leaving this empty/unset means mutation is performed on all namespaces.

admissionWebhooks.image.repository

Admission webhook image repository

localhost:5000/gc-webhook

admissionWebhooks.image.tag

Admission webhook image

v18

admissionWebhooks.image.pullPolicy

Admission webhooks image pullPolicy

IfNotPresent

admissionWebhooks.failurePolicy

Admission webhook failure policy

Fail

admissionWebhooks.port

Admission webhook Pod port

8443

admissionWebhooks.service.annotations

Admission webhook service annotations

{}

admissionWebhooks.service.servicePort

Admission webhook service port

443

admissionWebhooks.service.type

admission webhook service type

clusterIP

admissionWebhooks.patch.enabled

Create and configure admission webhook TLS secret

true

admissionWebhooks.patch.image.repository

Admission webhook TLS patch image repository

docker.io/jettech/kube-webhook-certgen

admissionWebhooks.patch.image.tag

Admission webhook TLS patch image tag

v1.3.0

admissionWebhooks.patch.image.pullPolicy

Admission webhook TLS patch image pull policy

IfNotPresent

admissionWebhooks.patch.priorityClassName

Admission webhook TLS patch jobs priority class

“”

admissionWebhooks.patch.podAnnotations

Admission webhook TLS patch jobs Pod annotations

{}

admissionWebhooks.patch.runAsUser

Admission webhook TLS patch jobs run as user

2000

6.4. Usage

To schedule the workload on IPUs, you can request the number of IPUs required via annotations. For example:

apiVersion: v1
kind: Pod
metadata:
  name: sharding-app
  annotations:
    graphcore.ai/ipus: "2"
spec:
  containers:
    - name: demo
      image: localhost:5000/sharding:latest

Only Pod specifications are supported.