Logo
IPU Programmer's Guide
Version: 3.1.0
  • 1. Introduction
  • 2. IPU hardware overview
    • 2.1. Memory architecture
    • 2.2. Execution
    • 2.3. Tile architecture
      • 2.3.1. On-tile memory
    • 2.4. Host/device communication
  • 3. Programming model
    • 3.1. The Poplar graph libary
    • 3.2. Programs
      • 3.2.1. Data variables
      • 3.2.2. Copying data and executing compute sets
      • 3.2.3. Control flow: sequences, conditionals and loops
      • 3.2.4. Compute sets
      • 3.2.5. The computational graph
      • 3.2.6. Data streams
      • 3.2.7. IPU-level task parallelism
        • 3.2.7.1. Overlapping I/O within the IPU
    • 3.3. Loading and running programs
    • 3.4. The implementation of ML frameworks using IPU programs
    • 3.5. The compilation and execution of IPU programs
      • 3.5.1. Variable liveness
  • 4. Programming tools
    • 4.1. Using a machine learning framework
    • 4.2. Writing IPU programs directly
    • 4.3. Adding custom operations to ML frameworks
    • 4.4. Compilation
    • 4.5. Executing programs
    • 4.6. Profiling and analysing programs
    • 4.7. Further Reading
  • 5. Common algorithmic techniques for IPUs
    • 5.1. Replication
      • 5.1.1. Replication in ML training
      • 5.1.2. Replication in ML inference
      • 5.1.3. Using multiple processes on the host or multiple hosts
        • 5.1.3.1. Replicated tensor sharding
    • 5.2. Gradient accumulation
    • 5.3. Recomputation
    • 5.4. Model parallelism and pipelining
      • 5.4.1. A simple example
      • 5.4.2. Efficient pipelining
      • 5.4.3. Memory stashes and recomputation
      • 5.4.4. Interleaved schedule pipelining
      • 5.4.5. Further reading on pipelining
    • 5.5. Machine learning techniques on IPU hardware
      • 5.5.1. Batch size terminology
  • 6. Trademarks & copyright
IPU Programmer's Guide


Revision 41aa41ac.