Logo
Kubernetes IPU Operator User Guide
Version: 1.1.0
  • 1. The IPU Operator
    • 1.1. Components and design
  • 2. Installation
    • 2.1. Prerequisites
    • 2.2. Installation methods
    • 2.3. Installation using Helm chart hosted on GitHub
      • 2.3.1. Installing the IPUJob CRD
      • 2.3.2. Installing the IPU Operator
    • 2.4. Installation using Helm chart from Graphcore’s SDP
      • 2.4.1. Download package
      • 2.4.2. Installing the IPUJob CRD
      • 2.4.3. Installing the IPU Operator from a local container repository
    • 2.5. Basic installation
    • 2.6. Multiple V-IPU Controllers
    • 2.7. Verify the installation is successful
    • 2.8. Uninstall the IPU Operator
    • 2.9. Upgrade the IPU Operator
  • 3. Configurations
  • 4. Creating an IPUJob
    • 4.1. Training job
      • 4.1.1. Simple training
      • 4.1.2. Distributed training
    • 4.2. Inference job
      • 4.2.1. Scale up or down operations
    • 4.3. Automatic restarts
    • 4.4. Clean up resources and IPU partitions
  • 5. Debugging problems
    • 5.1. How does the IPU Operator work?
    • 5.2. Debugging
  • 6. IPU usage statistics
    • 6.1. Operator metrics
  • 7. Known limitations
  • 8. Release notes
    • 8.1. Version 1.1.0
      • 8.1.1. New features
      • 8.1.2. Bug fixes
      • 8.1.3. Other improvements
      • 8.1.4. Known issues
      • 8.1.5. Compatibility changes
  • 9. Legal notices
Kubernetes IPU Operator User Guide


Revision 2b547de8.