IPU Inference Toolkit User Guide
Version: latest
1. Overview
2. IPU Inference Toolkit architecture
2.1. Model servers
2.2. Graphcore Poplar software stack
2.2.1. PopART
2.2.2. PopEF and PopRT Runtime
2.3. Using the IPU Inference Toolkit
2.3.1. Model compilation overview
Model export
Batch size selection
Precision selection
Model conversion
Model compilation
2.3.2. Model runtime overview
3. Environment preparation
3.1. Host server CPU architecture
3.2. Host server operating system
3.3. Docker
3.4. Poplar SDK
3.4.1. Installing the Poplar SDK on the host server
3.5. Inspection of IPU hardware
3.6. Install PopRT
3.6.1. Installation with a Docker container
3.6.2. Installation with pip
3.7. Docker containers for the Poplar SDK
3.7.1. gc-docker
3.7.2. Run a Docker container
3.7.3. Query the IPU status from a Docker container
4. Model compilation
4.1. ONNX model
4.1.1. Model exporting
4.1.2. Batch size selection
4.1.3. Precision selection
4.1.4. Model conversion and compilation
4.2. TensorFlow model
4.2.1. Model exporting
4.2.2. Model conversion and compilation
4.3. PyTorch model
4.3.1. Model exporting
4.3.2. Model conversion and compilation
5. Model runtime
5.1. Run with PopRT Runtime
5.1.1. Environment preparation
5.1.2. Run with PopRT Runtime Python API
5.1.3. Run with PopRT Runtime C++ API
5.2. Deploy to Triton Inference Server
5.2.1. Environment preparation
5.2.2. Configuration of generated model
Model name
Backend
Batching
Input and output
5.2.3. Start model service
Verify the service with gRPC
Verify the service with HTTP
5.3. Deploy to TensorFlow Serving
5.3.1. Environment preparation
5.3.2. Generate SavedModel model
5.3.3. Start model service
Running with and without batching
5.3.4. Verify the service with HTTP
6. Container release notes
6.1. Triton Inference Server
6.1.1. New features
6.1.2. Bug fixes
6.1.3. Other improvements
6.1.4. Known issues
6.1.5. Compatibility changes
6.2. TensorFlow Serving
6.2.1. New features
6.2.2. Bug fixes
6.2.3. Other improvements
6.2.4. Known issues
6.2.5. Compatibility changes
7. Trademarks & copyright
IPU Inference Toolkit User Guide
IPU Inference Toolkit User Guide
1. Overview
2. IPU Inference Toolkit architecture
2.1. Model servers
2.2. Graphcore Poplar software stack
2.3. Using the IPU Inference Toolkit
3. Environment preparation
3.1. Host server CPU architecture
3.2. Host server operating system
3.3. Docker
3.4. Poplar SDK
3.5. Inspection of IPU hardware
3.6. Install PopRT
3.7. Docker containers for the Poplar SDK
4. Model compilation
4.1. ONNX model
4.2. TensorFlow model
4.3. PyTorch model
5. Model runtime
5.1. Run with PopRT Runtime
5.2. Deploy to Triton Inference Server
5.3. Deploy to TensorFlow Serving
6. Container release notes
6.1. Triton Inference Server
6.2. TensorFlow Serving
7. Trademarks & copyright
Read the Docs
v: latest