Software Documents

Licensed Software

This software is made available under the terms of the Graphcore End User License Agreement (EULA). Please ensure you have read and accept the terms of the license before using the software.

There are release notes for each software release.

TensorFlow

Targeting the IPU from TensorFlow 1

User guide and API reference for the IPU implementation of TensorFlow 1.

Targeting the IPU from TensorFlow 2

User guide and API reference for the IPU implementation of TensorFlow 2.

Keras with IPUs

The Graphcore implementation of TensorFlow includes Keras support for IPUs. This document describes how to use Keras for the IPU.

Porting TensorFlow models to the IPU (technical note)

This document is a practical guide to porting TensorFlow models to the IPU using the Poplar SDK. It is assumed that you are already aware of the document Targeting the IPU from TensorFlow 1, which serves as the primary introduction to developing TensorFlow models for the IPU. It provides a conceptual introduction to developing models at the framework level, and describes a number of specific facets of the TensorFlow-to-Poplar API that are pivotal to running models on the IPU.

This document will focus on some of the practical considerations for developing a model for the IPU and provide some guidance on best practices. In doing so, it will attempt to identify those key elements that assist the developer in transitioning to using TensorFlow on the IPU.

Model parallelism with TensorFlow: sharding and pipelining (technical note)

This technical note describes how to parallelise TensorFlow models on IPU hardware.

If a deep learning network has too many layers and parameters to fit on one IPU, we need to divide it into pieces and distribute those pieces across multiple IPUs. This is called the model parallelism approach, and it enables us to train large models that exceed the memory capacity on a single IPU. Currently, we support two types of model parallelism, sharding and pipelining.

Optimising for the IPU: Computational Graph Recompilation and Executable Switching in TensorFlow (technical note)

When code is executed on an IPU, a multi-operation computational graph is compiled to run efficiently on the device. This technical note describes how to minimise recompilation.

This compilation ensures that the code running on the IPU is optimal: as many tiles as possible are used, as little device memory as possible is used and the number of execution cycles is short. Note that, in contrast to some other platforms, the graph to be compiled isn’t just one matmul operation but many consecutive operations and so almost every graph execution is different and will need to be compiled and optimised.

The compilation process performs many optimizations, and so it can take some time. It is therefore important to know when the compilation of the graph will happen and avoid it occurring at inconvenient times, or too often. This is especially relevant when running benchmarks since it can add significant overhead.

As a result, it is important to avoid recompilations as far as possible. This technical note provides some strategies that can help you with this.

Creating Custom Operations for the IPU (technical note)

This technical note provides an overview of the steps for implementing a custom op in each of the frameworks available in the Poplar SDK, with links to sources of more information.

Memory and Performance Optimisation (technical note)

The goal of this document is to help Graphcore AI engineers and customers to optimise high-performance machine learning models running on the IPU.

There are many factors that affect model performance; this document will cover: memory optimisation, execution schemes and optimisations specific to the IPU and Poplar.

Although this document focuses on performance it is worth bearing in mind that numerical stability and convergence properties may limit the design options when trying to optimise a model for performance.

Optimising Temporary Memory Usage for Convolutions and Matmuls on the IPU (technical note)

In many of our example applications for the IPU, you will see an option called availableMemoryProportion. This technical note describes what this option does and when you may need to tune it in order to make a model fit onto the IPU or to optimise its performance.

All of the frameworks for the IPU, such as TensorFlow and PyTorch, make use of the facilities provided by the Poplar and PopLibs library functions. So, for example when a TensorFlow program needs to perform a matrix multiply (matmul), it will call the matmul functions in PopLibs.

availableMemoryProportion is used by PopLibs when deciding how to implement operations on the IPU; in other words, how to convert the framework-level operations into the low-level functions that execute on the IPU.

This document discusses availableMemoryProportion in relation to convolutions and matmuls, which are the most common use cases at the time of writing, but it may also apply to other PopLibs functions not covered here.

PyTorch

PyTorch for the IPU: User Guide

User guide and API reference for PyTorch on the IPU.

Creating Custom Operations for the IPU (technical note)

This technical note provides an overview of the steps for implementing a custom op in each of the frameworks available in the Poplar SDK, with links to sources of more information.

Memory and Performance Optimisation (technical note)

The goal of this document is to help Graphcore AI engineers and customers to optimise high-performance machine learning models running on the IPU.

There are many factors that affect model performance; this document will cover: memory optimisation, execution schemes and optimisations specific to the IPU and Poplar.

Although this document focuses on performance it is worth bearing in mind that numerical stability and convergence properties may limit the design options when trying to optimise a model for performance.

Optimising Temporary Memory Usage for Convolutions and Matmuls on the IPU (technical note)

In many of our example applications for the IPU, you will see an option called availableMemoryProportion. This technical note describes what this option does and when you may need to tune it in order to make a model fit onto the IPU or to optimise its performance.

All of the frameworks for the IPU, such as TensorFlow and PyTorch, make use of the facilities provided by the Poplar and PopLibs library functions. So, for example when a TensorFlow program needs to perform a matrix multiply (matmul), it will call the matmul functions in PopLibs.

availableMemoryProportion is used by PopLibs when deciding how to implement operations on the IPU; in other words, how to convert the framework-level operations into the low-level functions that execute on the IPU.

This document discusses availableMemoryProportion in relation to convolutions and matmuls, which are the most common use cases at the time of writing, but it may also apply to other PopLibs functions not covered here.

PopART

Documents for the Poplar Advanced Runtime (PopART) which you can use to import and execute models from industry standard ML frameworks, using the ONNX format.

PopART User Guide

PopART Python API Reference

PopART C++ API Reference

Creating Custom Operations for the IPU (technical note)

This technical note provides an overview of the steps for implementing a custom op in each of the frameworks available in the Poplar SDK, with links to sources of more information.

Memory and Performance Optimisation (technical note)

The goal of this document is to help Graphcore AI engineers and customers to optimise high-performance machine learning models running on the IPU.

There are many factors that affect model performance; this document will cover: memory optimisation, execution schemes and optimisations specific to the IPU and Poplar.

Although this document focuses on performance it is worth bearing in mind that numerical stability and convergence properties may limit the design options when trying to optimise a model for performance.

Optimising Temporary Memory Usage for Convolutions and Matmuls on the IPU (technical note)

In many of our example applications for the IPU, you will see an option called availableMemoryProportion. This technical note describes what this option does and when you may need to tune it in order to make a model fit onto the IPU or to optimise its performance.

All of the frameworks for the IPU, such as TensorFlow and PyTorch, make use of the facilities provided by the Poplar and PopLibs library functions. So, for example when a TensorFlow program needs to perform a matrix multiply (matmul), it will call the matmul functions in PopLibs.

availableMemoryProportion is used by PopLibs when deciding how to implement operations on the IPU; in other words, how to convert the framework-level operations into the low-level functions that execute on the IPU.

This document discusses availableMemoryProportion in relation to convolutions and matmuls, which are the most common use cases at the time of writing, but it may also apply to other PopLibs functions not covered here.

Poplar Graph Programming Framework

Poplar and PopLibs User Guide

Information on how to use the Poplar graph programming tools to write code for the IPU.

Poplar and PopLibs API Reference

Details of the functions in the Poplar and PopLibs libraries provided in the Poplar SDK.

Memory and Performance Optimisation (technical note)

The goal of this document is to help Graphcore AI engineers and customers to optimise high-performance machine learning models running on the IPU.

There are many factors that affect model performance; this document will cover: memory optimisation, execution schemes and optimisations specific to the IPU and Poplar.

Although this document focuses on performance it is worth bearing in mind that numerical stability and convergence properties may limit the design options when trying to optimise a model for performance.

Optimising Temporary Memory Usage for Convolutions and Matmuls on the IPU (technical note)

In many of our example applications for the IPU, you will see an option called availableMemoryProportion. This technical note describes what this option does and when you may need to tune it in order to make a model fit onto the IPU or to optimise its performance.

All of the frameworks for the IPU, such as TensorFlow and PyTorch, make use of the facilities provided by the Poplar and PopLibs library functions. So, for example when a TensorFlow program needs to perform a matrix multiply (matmul), it will call the matmul functions in PopLibs.

availableMemoryProportion is used by PopLibs when deciding how to implement operations on the IPU; in other words, how to convert the framework-level operations into the low-level functions that execute on the IPU.

This document discusses availableMemoryProportion in relation to convolutions and matmuls, which are the most common use cases at the time of writing, but it may also apply to other PopLibs functions not covered here.

Running Code on the IPU

PopDist and PopRun: User Guide

PopRun and PopDist support running applications across multiple IPUs.

PopEF: User Guide

PopEF is an exchange format used by Graphcore’s frameworks (TensorFlow, PopART, PopTorch) to store compiled Poplar executables and their associated metadata.

Graphcore Command Line Tools

Tools to monitor and control the IPU hardware (see also Supporting Tools).

Using IPUs from Docker

User guide for the pre-built Graphcore Docker packages containing components of the Poplar SDK.

Profiling and Debugging

PopVision Graph Analyser User Guide

Documentation for the PopVision Graph Analyser. This information is also available as context-sensitive help in the tool.

The Graph Analyser can be downloaded from the PopVision tools web page <https://www.graphcore.ai/developer/popvision-tools>.

PopVision System Analyser User Guide

Documentation for the PopVision System Analyser. This information is also available as context-sensitive help in the tool.

The System Analyser can be downloaded from the PopVision tools web page <https://www.graphcore.ai/developer/popvision-tools>.

PopVision Analysis Library (libpva) User Guide

The PopVision analysis library (libpva) can be used for programmatic analysis of Poplar profiling information.

PopVision Trace Instrumentation Library

The PopVision trace instrumentation library (libpvti) provides functions to control the capture of profiling information for the host-code of your IPU application. This data can then be explored with the PopVision System Analyser.

Open Source Software

The following software is available as open source:

See also the Examples and Tutorials.

License Agreements