Logo
IPU Inference Toolkit User Guide
latest
  • 1. Overview
  • 2. IPU Inference Toolkit architecture
    • 2.1. Model servers
    • 2.2. Graphcore Poplar software stack
      • 2.2.1. PopART
      • 2.2.2. PopEF and PopRT Runtime
    • 2.3. Using the IPU Inference Toolkit
      • 2.3.1. Model compilation overview
        • Model export
        • Batch size selection
        • Precision selection
        • Model conversion
        • Model compilation
      • 2.3.2. Model runtime overview
  • 3. Environment preparation
    • 3.1. Host server CPU architecture
    • 3.2. Host server operating system
    • 3.3. Docker
    • 3.4. Poplar SDK
      • 3.4.1. Installing the Poplar SDK on the host server
    • 3.5. Inspection of IPU hardware
    • 3.6. Install PopRT
      • 3.6.1. Installation with a Docker container
      • 3.6.2. Installation with pip
    • 3.7. Docker containers for the Poplar SDK
      • 3.7.1. gc-docker
      • 3.7.2. Run a Docker container
      • 3.7.3. Query the IPU status from a Docker container
  • 4. Model compilation
    • 4.1. ONNX model
      • 4.1.1. Model exporting
      • 4.1.2. Batch size selection
      • 4.1.3. Precision selection
      • 4.1.4. Model conversion and compilation
    • 4.2. TensorFlow model
      • 4.2.1. Model exporting
      • 4.2.2. Model conversion and compilation
    • 4.3. PyTorch model
      • 4.3.1. Model exporting
      • 4.3.2. Model conversion and compilation
  • 5. Model runtime
    • 5.1. Run with PopRT Runtime
      • 5.1.1. Environment preparation
      • 5.1.2. Run with PopRT Runtime Python API
      • 5.1.3. Run with PopRT Runtime C++ API
    • 5.2. Deploy to Triton Inference Server
      • 5.2.1. Environment preparation
      • 5.2.2. Configuration of generated model
        • Model name
        • Backend
        • Batching
        • Input and output
      • 5.2.3. Start model service
        • Verify the service with gRPC
        • Verify the service with HTTP
    • 5.3. Deploy to TensorFlow Serving
      • 5.3.1. Environment preparation
      • 5.3.2. Generate SavedModel model
      • 5.3.3. Start model service
        • Running with and without batching
      • 5.3.4. Verify the service with HTTP
  • 6. Container release notes
    • 6.1. Triton Inference Server
      • 6.1.1. New features
      • 6.1.2. Bug fixes
      • 6.1.3. Other improvements
      • 6.1.4. Known issues
      • 6.1.5. Compatibility changes
    • 6.2. TensorFlow Serving
      • 6.2.1. New features
      • 6.2.2. Bug fixes
      • 6.2.3. Other improvements
      • 6.2.4. Known issues
      • 6.2.5. Compatibility changes
  • 7. Trademarks & copyright
IPU Inference Toolkit User Guide

Search help

Note: Searching from the top-level index page will search all documents. Searching from a specific document will search only that document.

  • Find an exact phrase: Wrap your search phrase in "" (double quotes) to only get results where the phrase is exactly matched. For example "PyTorch for the IPU" or "replicated tensor sharding"
  • Prefix query: Add an * (asterisk) at the end of any word to indicate a prefix query. This will return results containing all words with the specific prefix. For example tensor*
  • Fuzzy search: Use ~N (tilde followed by a number) at the end of any word for a fuzzy search. This will return results that are similar to the search word. N specifies the “edit distance” (fuzziness) of the match. For example Polibs~1
  • Words close to each other: ~N (tilde followed by a number) after a phrase (in quotes) returns results where the words are close to each other. N is the maximum number of positions allowed between matching words. For example "ipu version"~2
  • Logical operators. You can use the following logical operators in a search:
    • + signifies AND operation
    • | signifies OR operation
    • - negates a single word or phrase (returns results without that word or phrase)
    • () controls operator precedence

IPU Inference Toolkit User Guide

  • 1. Overview
  • 2. IPU Inference Toolkit architecture
    • 2.1. Model servers
    • 2.2. Graphcore Poplar software stack
    • 2.3. Using the IPU Inference Toolkit
  • 3. Environment preparation
    • 3.1. Host server CPU architecture
    • 3.2. Host server operating system
    • 3.3. Docker
    • 3.4. Poplar SDK
    • 3.5. Inspection of IPU hardware
    • 3.6. Install PopRT
    • 3.7. Docker containers for the Poplar SDK
  • 4. Model compilation
    • 4.1. ONNX model
    • 4.2. TensorFlow model
    • 4.3. PyTorch model
  • 5. Model runtime
    • 5.1. Run with PopRT Runtime
    • 5.2. Deploy to Triton Inference Server
    • 5.3. Deploy to TensorFlow Serving
  • 6. Container release notes
    • 6.1. Triton Inference Server
    • 6.2. TensorFlow Serving
  • 7. Trademarks & copyright
Next

Revision afce004c.