Logo
V-IPU Administrator Guide
latest
  • 1. Introduction
    • 1.1. Terminology and concepts
    • 1.2. Scope of the document
    • 1.3. Structure of the document
  • 2. Concepts and architecture
    • 2.1. Architecture
  • 3. Installation
    • 3.1. V-IPU agents
    • 3.2. V-IPU controller
      • 3.2.1. Initialisation
      • 3.2.2. Service execution
      • 3.2.3. Upgrading
    • 3.3. V-IPU admin client
    • 3.4. Installation script
      • 3.4.1. Uninstall
  • 4. Securing the installation
    • 4.1. Enabling mutual TLS
    • 4.2. Decoupling user and admin API end points
  • 5. Users and allocations
    • 5.1. Users
    • 5.2. Allocations
  • 6. Clusters
    • 6.1. Overview
    • 6.2. Cluster entities
      • 6.2.1. Agent entity
        • Agent auto-discovery
      • 6.2.2. Clusters
        • IPU-Link Domains
      • 6.2.3. Allocation
      • 6.2.4. Partition
    • 6.3. Cluster topologies
      • 6.3.1. IPU-Link topologies
        • Mesh
        • Torus
      • 6.3.2. GW-Link topologies
        • Looped
        • Switched
    • 6.4. Cluster tests
      • 6.4.1. List of cluster tests
        • Sync test
        • IPU-Link training test
        • Traffic test
        • GW-Link traffic test
        • GW-Link test
        • Version consistency test
        • Cabling test
        • PFC-settings test
      • 6.4.2. Cluster tests dependencies
  • 7. V-IPU monitoring
    • 7.1. Expose monitoring metrics in V-IPU
      • 7.1.1. V-IPU exporter
      • 7.1.2. V-IPU exporter metrics description
    • 7.2. V-IPU controller
    • 7.3. Example V-IPU / Prometheus integration
      • 7.3.1. Prometheus configuration
      • 7.3.2. V-IPU controller / Prometheus integration
      • 7.3.3. Using Grafana for visualization
  • 8. Integration with Slurm
    • 8.1. Overview of integration options
    • 8.2. Host-IPU mapping (recommended)
    • 8.3. Preconfigured partition: multiple static partitions
    • 8.4. Preconfigured partition: single reconfigurable dynamic partition
    • 8.5. Graphcore-modified Slurm with IPU resource selection plugin
      • 8.5.1. Configuring Slurm to use the V-IPU select plugin
      • 8.5.2. Configuration parameters
      • 8.5.3. The V-IPU GRES plugin
      • 8.5.4. An example Slurm Controller configuration
    • 8.6. Troubleshooting
  • 9. Admin command line reference
    • 9.1. Global options
      • 9.1.1. Using a configuration file
      • 9.1.2. Using environment variables
    • 9.2. Script generation
      • 9.2.1. Generate a Bash completion script
        • Supported options
      • 9.2.2. Generate a Zsh completion script
        • Supported options
    • 9.3. Create entities
      • 9.3.1. Create an agent
        • Supported options
      • 9.3.2. Create an allocation
        • Supported options
      • 9.3.3. Create a cluster
        • Supported options
      • 9.3.4. Create a partition
        • IPUoF configuration file
        • Supported options
      • 9.3.5. Create a user
        • Supported options
    • 9.4. Discover entities
      • 9.4.1. Discover agents
        • Supported options
    • 9.5. List entities
      • 9.5.1. List connected agents
        • Supported options
      • 9.5.2. List allocations
        • Supported options
      • 9.5.3. List available clusters
        • Supported options
      • 9.5.4. List IPUs
        • Supported options
      • 9.5.5. List IPU attributes
        • Supported options
      • 9.5.6. List partitions in an allocation
        • Supported options
      • 9.5.7. List active users
        • Supported options
    • 9.6. Get info about an entity
      • 9.6.1. Get info for an agent
        • Supported options
      • 9.6.2. Get allocation
        • Supported options
      • 9.6.3. Get info for a cluster
        • Supported options
      • 9.6.4. Get info for a partition
        • Supported options
    • 9.7. Remove entities
      • 9.7.1. Remove an agent
        • Supported options
      • 9.7.2. Remove an allocation
        • Supported options
      • 9.7.3. Remove a cluster
        • Supported options
      • 9.7.4. Remove a partition
        • Supported options
      • 9.7.5. Remove a user
        • Supported options
    • 9.8. Rename entities
      • 9.8.1. Rename an agent
        • Supported options
      • 9.8.2. Rename a cluster
        • Supported options
    • 9.9. Reset entities
      • 9.9.1. Reset a partition
        • Supported options
    • 9.10. Test entities
      • 9.10.1. Test a cluster
        • Supported options
  • 10. Server command line reference
    • 10.1. Global options
      • 10.1.1. Using a configuration file
      • 10.1.2. Using environment variables
  • 11. Trademarks & copyright
V-IPU Administrator Guide

Search help

Note: Searching from the top-level index page will search all documents. Searching from a specific document will search only that document.

  • Find an exact phrase: Wrap your search phrase in "" (double quotes) to only get results where the phrase is exactly matched. For example "PyTorch for the IPU" or "replicated tensor sharding"
  • Prefix query: Add an * (asterisk) at the end of any word to indicate a prefix query. This will return results containing all words with the specific prefix. For example tensor*
  • Fuzzy search: Use ~N (tilde followed by a number) at the end of any word for a fuzzy search. This will return results that are similar to the search word. N specifies the “edit distance” (fuzziness) of the match. For example Polibs~1
  • Words close to each other: ~N (tilde followed by a number) after a phrase (in quotes) returns results where the words are close to each other. N is the maximum number of positions allowed between matching words. For example "ipu version"~2
  • Logical operators. You can use the following logical operators in a search:
    • + signifies AND operation
    • | signifies OR operation
    • - negates a single word or phrase (returns results without that word or phrase)
    • () controls operator precedence

11. Trademarks & copyright

Graphcore® and Poplar® are registered trademarks of Graphcore Ltd.

AI-Float™, Colossus™, Exchange Memory™, In-Processor-Memory™, IPU-Core™, IPU-Exchange™, IPU-Fabric™, IPU-Link™, IPU-M2000™, IPU-Machine™, IPU-POD™, IPU-Tile™, PopART™, PopLibs™, PopVision™, PyTorch™, Streaming Memory™ and Virtual-IPU™ are trademarks of Graphcore Ltd.

All other trademarks are the property of their respective owners.

Copyright © 2020—2022 Graphcore Ltd. All rights reserved.

Previous

Revision 40e6b64f.