Logo
IPU Inference Toolkit User Guide
Version: latest
  • 1. Overview
  • 2. IPU Inference Toolkit architecture
    • 2.1. Model servers
    • 2.2. Graphcore Poplar software stack
      • 2.2.1. PopART
      • 2.2.2. PopEF and PopRT Runtime
    • 2.3. Using the IPU Inference Toolkit
      • 2.3.1. Model compilation overview
        • Model export
        • Batch size selection
        • Precision selection
        • Model conversion
        • Model compilation
      • 2.3.2. Model runtime overview
  • 3. Environment preparation
    • 3.1. Host server CPU architecture
    • 3.2. Host server operating system
    • 3.3. Docker
    • 3.4. Poplar SDK
      • 3.4.1. Installing the Poplar SDK on the host server
    • 3.5. Inspection of IPU hardware
    • 3.6. Install PopRT
      • 3.6.1. Installation with a Docker container
      • 3.6.2. Installation with pip
    • 3.7. Docker containers for the Poplar SDK
      • 3.7.1. gc-docker
      • 3.7.2. Run a Docker container
      • 3.7.3. Query the IPU status from a Docker container
  • 4. Model compilation
    • 4.1. ONNX model
      • 4.1.1. Model exporting
      • 4.1.2. Batch size selection
      • 4.1.3. Precision selection
      • 4.1.4. Model conversion and compilation
    • 4.2. TensorFlow model
      • 4.2.1. Model exporting
      • 4.2.2. Model conversion and compilation
    • 4.3. PyTorch model
      • 4.3.1. Model exporting
      • 4.3.2. Model conversion and compilation
  • 5. Model runtime
    • 5.1. Run with PopRT Runtime
      • 5.1.1. Environment preparation
      • 5.1.2. Run with PopRT Runtime Python API
      • 5.1.3. Run with PopRT Runtime C++ API
    • 5.2. Deploy to Triton Inference Server
      • 5.2.1. Environment preparation
      • 5.2.2. Configuration of generated model
        • Model name
        • Backend
        • Batching
        • Input and output
      • 5.2.3. Start model service
        • Verify the service with gRPC
        • Verify the service with HTTP
    • 5.3. Deploy to TensorFlow Serving
      • 5.3.1. Environment preparation
      • 5.3.2. Generate SavedModel model
      • 5.3.3. Start model service
        • Running with and without batching
      • 5.3.4. Verify the service with HTTP
  • 6. Container release notes
    • 6.1. Triton Inference Server
      • 6.1.1. New features
      • 6.1.2. Bug fixes
      • 6.1.3. Other improvements
      • 6.1.4. Known issues
      • 6.1.5. Compatibility changes
    • 6.2. TensorFlow Serving
      • 6.2.1. New features
      • 6.2.2. Bug fixes
      • 6.2.3. Other improvements
      • 6.2.4. Known issues
      • 6.2.5. Compatibility changes
  • 7. Trademarks & copyright
IPU Inference Toolkit User Guide

IPU Inference Toolkit User Guide

  • 1. Overview
  • 2. IPU Inference Toolkit architecture
    • 2.1. Model servers
    • 2.2. Graphcore Poplar software stack
    • 2.3. Using the IPU Inference Toolkit
  • 3. Environment preparation
    • 3.1. Host server CPU architecture
    • 3.2. Host server operating system
    • 3.3. Docker
    • 3.4. Poplar SDK
    • 3.5. Inspection of IPU hardware
    • 3.6. Install PopRT
    • 3.7. Docker containers for the Poplar SDK
  • 4. Model compilation
    • 4.1. ONNX model
    • 4.2. TensorFlow model
    • 4.3. PyTorch model
  • 5. Model runtime
    • 5.1. Run with PopRT Runtime
    • 5.2. Deploy to Triton Inference Server
    • 5.3. Deploy to TensorFlow Serving
  • 6. Container release notes
    • 6.1. Triton Inference Server
    • 6.2. TensorFlow Serving
  • 7. Trademarks & copyright
Next

Revision 8899ab77.

Read the Docs v: latest