IPU Programmer's Guide
Version: 3.1.0
1. Introduction
2. IPU hardware overview
2.1. Memory architecture
2.2. Execution
2.3. Tile architecture
2.3.1. On-tile memory
2.4. Host/device communication
3. Programming model
3.1. The Poplar graph libary
3.2. Programs
3.2.1. Data variables
3.2.2. Copying data and executing compute sets
3.2.3. Control flow: sequences, conditionals and loops
3.2.4. Compute sets
3.2.5. The computational graph
3.2.6. Data streams
3.2.7. IPU-level task parallelism
3.2.7.1. Overlapping I/O within the IPU
3.3. Loading and running programs
3.4. The implementation of ML frameworks using IPU programs
3.5. The compilation and execution of IPU programs
3.5.1. Variable liveness
4. Programming tools
4.1. Using a machine learning framework
4.2. Writing IPU programs directly
4.3. Adding custom operations to ML frameworks
4.4. Compilation
4.5. Executing programs
4.6. Profiling and analysing programs
4.7. Further Reading
5. Common algorithmic techniques for IPUs
5.1. Replication
5.1.1. Replication in ML training
5.1.2. Replication in ML inference
5.1.3. Using multiple processes on the host or multiple hosts
5.1.3.1. Replicated tensor sharding
5.2. Gradient accumulation
5.3. Recomputation
5.4. Model parallelism and pipelining
5.4.1. A simple example
5.4.2. Efficient pipelining
5.4.3. Memory stashes and recomputation
5.4.4. Interleaved schedule pipelining
5.4.5. Further reading on pipelining
5.5. Machine learning techniques on IPU hardware
5.5.1. Batch size terminology
6. Trademarks & copyright
IPU Programmer's Guide
Please activate JavaScript to enable the search functionality.