Logo
Kubernetes IPU Operator User Guide
Version: latest
  • 1. The IPU Operator
    • 1.1. Components and design
  • 2. Installation
    • 2.1. Prerequisites
    • 2.2. Installation methods
    • 2.3. Installation using Helm chart hosted on GitHub
    • 2.4. Installation using Helm chart from Graphcore’s downloads page
      • 2.4.1. Download package
      • 2.4.2. Installing the IPU Operator from a local container repository
    • 2.5. Basic installation
    • 2.6. Multiple V-IPU Controllers
    • 2.7. Verify the installation is successful
    • 2.8. Uninstall the IPU Operator
    • 2.9. Upgrade the IPU Operator
  • 3. Configurations
  • 4. Creating an IPUJob
    • 4.1. Training job
      • 4.1.1. Simple training
      • 4.1.2. Distributed training
    • 4.2. Inference job
      • 4.2.1. Scale up or down operations
    • 4.3. Automatic restarts
    • 4.4. Clean up resources and IPU partitions
  • 5. Debugging problems
    • 5.1. How does the IPU Operator work?
    • 5.2. Debugging
  • 6. IPU usage statistics
    • 6.1. Operator metrics
  • 7. Known limitations
  • 8. Release notes
    • 8.1. Version 1.2.0
      • 8.1.1. New features
      • 8.1.2. Bug fixes
      • 8.1.3. Other improvements
      • 8.1.4. Known issues
      • 8.1.5. Upgrade guidelines
    • 8.2. Version 1.1.0
      • 8.2.1. New features
      • 8.2.2. Bug fixes
      • 8.2.3. Other improvements
      • 8.2.4. Known issues
      • 8.2.5. Compatibility changes
  • 9. Legal notices
Kubernetes IPU Operator User Guide

Search help

Note: Searching from the top-level index page will search all documents. Searching from a specific document will search only that document.

  • Find an exact phrase: Wrap your search phrase in "" (double quotes) to only get results where the phrase is exactly matched. For example "PyTorch for the IPU" or "replicated tensor sharding"
  • Prefix query: Add an * (asterisk) at the end of any word to indicate a prefix query. This will return results containing all words with the specific prefix. For example tensor*
  • Fuzzy search: Use ~N (tilde followed by a number) at the end of any word for a fuzzy search. This will return results that are similar to the search word. N specifies the “edit distance” (fuzziness) of the match. For example Polibs~1
  • Words close to each other: ~N (tilde followed by a number) after a phrase (in quotes) returns results where the words are close to each other. N is the maximum number of positions allowed between matching words. For example "ipu version"~2
  • Logical operators. You can use the following logical operators in a search:
    • + signifies AND operation
    • | signifies OR operation
    • - negates a single word or phrase (returns results without that word or phrase)
    • () controls operator precedence

Kubernetes IPU Operator User Guide

Kubernetes (K8s) is an open-source container orchestration and management system. Kubernetes Operators allow the Kubernetes API to be extended with custom objects, and implement the control logic for such custom objects.

This document describes the components, installation steps and use for the IPU Operator for systems based on the IPU-M2000 and Bow 2000. If you are using the C600 PCIe card, please refer to the Kubernetes IPU Device Plugin User Guide.

  • 1. The IPU Operator
    • 1.1. Components and design
  • 2. Installation
    • 2.1. Prerequisites
    • 2.2. Installation methods
    • 2.3. Installation using Helm chart hosted on GitHub
    • 2.4. Installation using Helm chart from Graphcore’s downloads page
    • 2.5. Basic installation
    • 2.6. Multiple V-IPU Controllers
    • 2.7. Verify the installation is successful
    • 2.8. Uninstall the IPU Operator
    • 2.9. Upgrade the IPU Operator
  • 3. Configurations
  • 4. Creating an IPUJob
    • 4.1. Training job
    • 4.2. Inference job
    • 4.3. Automatic restarts
    • 4.4. Clean up resources and IPU partitions
  • 5. Debugging problems
    • 5.1. How does the IPU Operator work?
    • 5.2. Debugging
  • 6. IPU usage statistics
    • 6.1. Operator metrics
  • 7. Known limitations
  • 8. Release notes
    • 8.1. Version 1.2.0
    • 8.2. Version 1.1.0
  • 9. Legal notices
Next

Revision 042979c2.