4. Support for TensorFlow 2
In TensorFlow version 2, eager mode is enabled by default and Keras is the main API for constructing models. Distribution strategies are the new way of targeting different pieces of hardware.
The Graphcore implementation of TensorFlow includes IPU-specific implementations
of the Model and Sequential classes, and adds PipelineModel and
PipelineSequential classes for running a model on multiple IPUs. It
also makes efficient use of the IPU by fusing operations into a single kernel
that is executed repeatedly, amortising the cost of control and I/O.
4.1. Function annotation with @tf.function
The function annotation @tf.function is well documented in the standard
TensorFlow documentation. It converts the body of the annotated function into
a fused set of operations that are executed as a group, in the same way as a
whole graph would have been in TensorFlow version 1. In addition, a library
called autograph will convert Python flow control constructs into TensorFlow
graph operations.
Best practice is to ensure that anything which is intended to be executed on
the IPU is placed into a function and annotated with @tf.function. This
does not apply to constructing a Keras model or using the Keras Model.fit()
API. See below for details on Keras.
When calling a function that is marked with a @tf.function from within a
distribution strategy like IPUStrategy, you should not call it directly,
but instead use the experimental_run_v2 method.
See the following online resources for more information:
4.2. IPUStrategy
The tf.distribute.Strategy is an API to distribute training across multiple
devices. IPUStrategy is a
subclass which targets a system with one or more IPUs attached. Another subclass,
IPUMultiWorkerStrategy,
targets a multiple system configuration.
Use the strategy.scope() context to ensure that everything within that
context will be compiled for the IPU device. You should do this instead
of using the tf.device context.
from tensorflow.python import ipu
# Create an IPU distribution strategy
strategy = ipu.ipu_strategy.IPUStrategy()
with strategy.scope():
...
It is important to construct a Keras model within the scope of the
IPUStrategy, because Keras may create some parts of the model at
construction time, and some other parts at execution time.
See the TensorFlow documentation for more details: https://www.tensorflow.org/guide/distributed_training
4.3. Keras
The Graphcore implementation of TensorFlow includes a port of Keras for the IPU,
available as tensorflow.python.ipu.keras.
The Keras API is used for constructing models using a set of high-level Layer
objects. See https://www.tensorflow.org/guide/keras for more information.
IPU optimized replacements for the Keras Model and Sequential classes are
available for the IPU. These have the following features:
On-device training loop for reduction of communication overhead.
Gradient accumulation for simulating larger batch sizes.
Automatic data-parallelisation of the model when placed on a multi-IPU device. This means that during training the gradients will be reduced across replicas.
These are described in more detail below.
Note
The model must be both instantiated and called from within an IPUStrategy
context.
See https://www.tensorflow.org/guide/keras/train_and_evaluate for more background.
4.3.1. Model class
An IPU port of the standard Keras Model is
available as tensorflow.python.ipu.keras.Model.
This is a substitute for the standard Keras Model class, using only a single
IPU for training. Unlike the standard Keras Model class, it cannot
be called directly. You must use use the fit(), evaluate() and
predict() methods for training, evaluation and making predictions.
For a high-performance, multi-IPU solution use the PipelineModel class.
4.3.2. Sequential class
An implementation of the Keras Sequential class is
available as tensorflow.python.ipu.keras.Sequential.
This is a substitute for the standard Keras Sequential class, using only a
single IPU for training. For a high-performance, multi-IPU solution use
the PipelineSequential class.
Unlike the standard Keras Model class, it cannot be
called directly. You must use use the fit(), evaluate() and
predict() methods for training, evaluation and making predictions.
Similarly, you cannot get the list of trainable variables before you have
executed the model.
4.3.3. PipelineModel class
PipelineModel is an alternative for the
Keras Model class, with support for multi-device IPU pipelines. Using
pipelined execution allows the IPU to achieve high compute efficiency while
utilising multiple devices.
The PipelineModel has the same API as the standard Keras Model classes,
but will train the model on multiple IPUs and stream the data into the devices
using an Infeed queue which is created automatically.
When defining a model for use with PipelineModel, the pipeline stage at
which a Layer is to be executed is given by the
PipelineModel context in which it is
called.
In a machine learning model, a “step” is often considered to be one pass through the model, in which the forward pass is done, the gradients are calculated and then the parameters are updated. Since a pipeline accumulates multiple gradients before applying them collectively to the parameters, we call each of those pipeline operations a “step”. So the number of data samples processed per step is equal to the batch size multiplied by the pipeline depth.
This will be reflected in the rate at which the progress bar advances, and the entries in the Keras history.
Like the Sequential class, PipelineModel also supports automatic
data-parallelism.
4.3.4. PipelineSequential class
PipelineSequential is an
alternative to the PipelineModel class for the Keras Sequential class.
Like the constructor for the standard Keras Sequential model,
PipelineSequential takes a list of lists of layers, where each list of
layers is assigned to an IPU pipeline stage. See TensorFlow 2 examples to
see how the API is used.
Like the Sequential class, PipelineSequential also supports
automatic data-parallelism.
4.3.5. Custom training loops
If a more sophisticated training loop is required, then it can be described
inside a function which is marked as a @tf.function. See TensorFlow 2 examples
for an example.
The outer training function should be called using the experimental_run_v2
method on the IPUStrategy object, to ensure that it is executed using the
strategy’s configuration.
Note
It is not possible to use either PipelineModel or
PipelineSequential in a custom training loop.
For more information on the @tf.function annotation, see the
TensorFlow function documentation.