Logo
Kubernetes IPU Operator User Guide
Version: latest
  • 1. The IPU Operator
    • 1.1. Components and design
  • 2. Installation
    • 2.1. Prerequisites
    • 2.2. Installation methods
    • 2.3. Installation using Helm chart hosted on GitHub
    • 2.4. Installation using Helm chart from Graphcore’s downloads page
      • 2.4.1. Download package
      • 2.4.2. Installing the IPU Operator from a local container repository
    • 2.5. Basic installation
    • 2.6. Multiple V-IPU Controllers
    • 2.7. Verify the installation is successful
    • 2.8. Uninstall the IPU Operator
    • 2.9. Upgrade the IPU Operator
  • 3. Configurations
  • 4. Creating an IPUJob
    • 4.1. Training job
      • 4.1.1. Simple training
      • 4.1.2. Distributed training
    • 4.2. Inference job
      • 4.2.1. Scale up or down operations
    • 4.3. Automatic restarts
    • 4.4. Clean up resources and IPU partitions
  • 5. Debugging problems
    • 5.1. How does the IPU Operator work?
    • 5.2. Debugging
  • 6. IPU usage statistics
    • 6.1. Operator metrics
  • 7. Known limitations
  • 8. Release notes
    • 8.1. Version 1.2.0
      • 8.1.1. New features
      • 8.1.2. Bug fixes
      • 8.1.3. Other improvements
      • 8.1.4. Known issues
      • 8.1.5. Upgrade guidelines
    • 8.2. Version 1.1.0
      • 8.2.1. New features
      • 8.2.2. Bug fixes
      • 8.2.3. Other improvements
      • 8.2.4. Known issues
      • 8.2.5. Compatibility changes
  • 9. Legal notices
Kubernetes IPU Operator User Guide

Search help

Note: Searching from the top-level index page will search all documents. Searching from a specific document will search only that document.

  • Find an exact phrase: Wrap your search phrase in "" (double quotes) to only get results where the phrase is exactly matched. For example "PyTorch for the IPU" or "replicated tensor sharding"
  • Prefix query: Add an * (asterisk) at the end of any word to indicate a prefix query. This will return results containing all words with the specific prefix. For example tensor*
  • Fuzzy search: Use ~N (tilde followed by a number) at the end of any word for a fuzzy search. This will return results that are similar to the search word. N specifies the “edit distance” (fuzziness) of the match. For example Polibs~1
  • Words close to each other: ~N (tilde followed by a number) after a phrase (in quotes) returns results where the words are close to each other. N is the maximum number of positions allowed between matching words. For example "ipu version"~2
  • Logical operators. You can use the following logical operators in a search:
    • + signifies AND operation
    • | signifies OR operation
    • - negates a single word or phrase (returns results without that word or phrase)
    • () controls operator precedence

9. Legal notices

Use of the Graphcore IPU Operator for Kubernetes is subject to the terms of the Graphcore End User License Agreement and Graphcore Container License Agreement . For details on all licenses related to the IPU Operator, see Kubernetes IPU Operator License.

Use of the information included in this document is at your own risk, and subject to the terms of the Graphcore end user license agreement or Graphcore container license agreement.

All Graphcore products including hardware and software, described in the License, are subject to continuous changes at any time, without notice.

Graphcloud®, Graphcore®, Poplar® and PopVision® are registered trademarks of Graphcore Ltd.

Bow™, Bow-2000™, Bow Pod™, Colossus™, In-Processor-Memory™, IPU-Core™, IPU-Exchange™, IPU-Fabric™, IPU-Link™, IPU-M2000™, IPU-Machine™, IPU-POD™, IPU-Tile™, PopART™, PopDist™, PopLibs™, PopRun™, PopTorch™, Streaming Memory™ and Virtual-IPU™ are trademarks of Graphcore Ltd.

All other trademarks are the property of their respective owners.

Copyright © 2022 Graphcore Ltd. All rights reserved.

Previous

Revision 042979c2.