1. Model parallelism

If a deep learning network has too many layers and parameters to fit on one IPU, we need to divide it into pieces and distribute those pieces across multiple IPUs. This is called the model parallelism approach, and it enables us to train large models that exceed the memory capacity on a single IPU accelerator. Currently, we support two types of model parallelism, Sharding and Pipelining.

For more information about the IPU architecture, abstract programming model and tools, as well as algorithmic techniques, refer to the IPU Programmer’s Guide. The Memory and Performance Optimisation on the IPU contains guidelines for optimising performance in machine learning models running on the IPU.