1. Overview
Usually, a model inference service will be deployed to a Kubernetes cluster to provide scalable and highly available services. This document describes how Kubernetes manages IPU resources with Graphcore’s Kubernetes IPU device plugin. The Kubernetes IPU device plugin is a DaemonSet for Kubernetes, which:
Exposes the number of IPUs of each node in the cluster
Allocates one or more IPUs for a Pod
Inspects the health of the IPUs
For more information about compiling models to run on an IPU refer to the IPU Inference Toolkit User Guide.