1. Overview

Usually, a model inference service will be deployed to a Kubernetes cluster to provide scalable and highly available services. This document describes how Kubernetes manages IPU resources with Graphcore’s Kubernetes IPU device plugin. The Kubernetes IPU device plugin is a DaemonSet for Kubernetes, which:

  • Exposes the number of IPUs of each node in the cluster

  • Allocates one or more IPUs for a Pod

  • Inspects the health of the IPUs

For more information about compiling models to run on an IPU refer to the IPU Inference Toolkit User Guide.