Logo
Kubernetes IPU Operator User Guide
Version: 1.1.0
  • 1. The IPU Operator
    • 1.1. Components and design
  • 2. Installation
    • 2.1. Prerequisites
    • 2.2. Installation methods
    • 2.3. Installation using Helm chart hosted on GitHub
      • 2.3.1. Installing the IPUJob CRD
      • 2.3.2. Installing the IPU Operator
    • 2.4. Installation using Helm chart from Graphcore’s SDP
      • 2.4.1. Download package
      • 2.4.2. Installing the IPUJob CRD
      • 2.4.3. Installing the IPU Operator from a local container repository
    • 2.5. Basic installation
    • 2.6. Multiple V-IPU Controllers
    • 2.7. Verify the installation is successful
    • 2.8. Uninstall the IPU Operator
    • 2.9. Upgrade the IPU Operator
  • 3. Configurations
  • 4. Creating an IPUJob
    • 4.1. Training job
      • 4.1.1. Simple training
      • 4.1.2. Distributed training
    • 4.2. Inference job
      • 4.2.1. Scale up or down operations
    • 4.3. Automatic restarts
    • 4.4. Clean up resources and IPU partitions
  • 5. Debugging problems
    • 5.1. How does the IPU Operator work?
    • 5.2. Debugging
  • 6. IPU usage statistics
    • 6.1. Operator metrics
  • 7. Known limitations
  • 8. Release notes
    • 8.1. Version 1.1.0
      • 8.1.1. New features
      • 8.1.2. Bug fixes
      • 8.1.3. Other improvements
      • 8.1.4. Known issues
      • 8.1.5. Compatibility changes
  • 9. Legal notices
Kubernetes IPU Operator User Guide

Kubernetes IPU Operator User Guide

Kubernetes (K8s) is an open-source container orchestration and management system. Kubernetes Operators allow the Kubernetes API to be extended with custom objects, and implement the control logic for such custom objects.

This document describes the components, installation steps and use for the IPU Operator for systems based on the IPU-M2000 and Bow 2000. If you are using the C600 PCIe card, please refer to the Kubernetes IPU Device Plugin User Guide.

  • 1. The IPU Operator
    • 1.1. Components and design
  • 2. Installation
    • 2.1. Prerequisites
    • 2.2. Installation methods
    • 2.3. Installation using Helm chart hosted on GitHub
    • 2.4. Installation using Helm chart from Graphcore’s SDP
    • 2.5. Basic installation
    • 2.6. Multiple V-IPU Controllers
    • 2.7. Verify the installation is successful
    • 2.8. Uninstall the IPU Operator
    • 2.9. Upgrade the IPU Operator
  • 3. Configurations
  • 4. Creating an IPUJob
    • 4.1. Training job
    • 4.2. Inference job
    • 4.3. Automatic restarts
    • 4.4. Clean up resources and IPU partitions
  • 5. Debugging problems
    • 5.1. How does the IPU Operator work?
    • 5.2. Debugging
  • 6. IPU usage statistics
    • 6.1. Operator metrics
  • 7. Known limitations
  • 8. Release notes
    • 8.1. Version 1.1.0
  • 9. Legal notices
Next

Revision 2b547de8.