1. Overview
The IPU Inference Toolkit enables you to deploy trained models to Graphcore IPU products conveniently and quickly, giving you a low-latency, high-performance end-to-end inference solution. The Toolkit allows you to convert, compile and deploy your model.
This user guide will cover the following:
IPU Inference Toolkit architecture
Introduces the Graphcore software stack, as well as the compilation and runtime architecture of the model.
-
Introduces the hardware and operating system required for the host server, describes how to start an IPU environment with a Docker container and verifies that the IPU hardware can be accessed.
-
Describes how to convert PyTorch, ONNX and TensorFlow models exported from different frameworks into PopEF models for running on the IPU.
-
Describes how to run PopEF models by using the PopRT Runtime API and how to deploy PopEF models using the Triton Inference Server and TensorFlow Serving.
For more information about deploying models in Kubernetes, refer to the Kubernetes IPU Device Plugin User Guide .
Note
Please contact Graphcore Sales for details on how to access the container images used in this quick start guide.