Memory and Performance Optimisation on the IPU
The purpose of this document is to help Graphcore AI engineers and customers to develop high-performance machine learning models running on the IPU. This document will cover the general topic of optimising performance.
Contents
- 1. Overview
- 2. Understanding the IPU programming model
- 3. Mapping a model to an IPU system
- 4. Optimising for performance
- 5. Common memory optimisations
- 6. Debugging an out-of-memory exception
- 7. Scaling an application over multiple replicas
- 7.1. Quick guide to scaling
- 7.2. Analyse your scaling behaviour
- 7.3. Constant or slowed-down processes (Amdahl’s law)
- 7.4. Graph compilation and executable loading
- 7.5. Host-I/O optimisation
- 7.6. Batch size and gradient accumulation count
- 7.7. Memory optimization for more replicas
- 7.8. Pipeline optimization and replicated tensor sharding
- 7.9. Technical background
- 8. Reducing graph compilation time
- 9. Trademarks & copyright