1. Overview

The IPU Inference Toolkit enables you to deploy trained models to Graphcore IPU products conveniently and quickly, giving you a low-latency, high-performance end-to-end inference solution. The Toolkit allows you to convert, compile and deploy your model.

This user guide will cover the following:

  • IPU Inference Toolkit architecture

    Introduces the Graphcore software stack, as well as the compilation and runtime architecture of the model.

  • Environment preparation

    Introduces the hardware and operating system required for the host server, describes how to start an IPU environment with a Docker container and verifies that the IPU hardware can be accessed.

  • Model compilation

    Describes how to convert PyTorch, ONNX and TensorFlow models exported from different frameworks into PopEF models for running on the IPU.

  • Model runtime

    Describes how to run PopEF models by using the PopRT Runtime API and how to deploy PopEF models using the Triton Inference Server and TensorFlow Serving.

For more information about deploying models in Kubernetes, refer to the Kubernetes IPU Device Plugin User Guide .

Note

Please contact Graphcore Sales for details on how to access the container images used in this quick start guide.