4. Support for TensorFlow 2
In TensorFlow version 2, eager mode is enabled by default and Keras is the main API for constructing models. Distribution strategies are the new way of targeting different pieces of hardware.
The Graphcore implementation of TensorFlow includes IPU-specific implementations
of the Model
and Sequential
classes, and adds PipelineModel
and
PipelineSequential
classes for running a model on multiple IPUs. It
also makes efficient use of the IPU by fusing operations into a single kernel
that is executed repeatedly, amortising the cost of control and I/O.
4.1. Function annotation with @tf.function
The function annotation @tf.function
is well documented in the standard
TensorFlow documentation. It converts the body of the annotated function into
a fused set of operations that are executed as a group, in the same way as a
whole graph would have been in TensorFlow version 1. In addition, a library
called autograph
will convert Python flow control constructs into TensorFlow
graph operations.
Best practice is to ensure that anything which is intended to be executed on
the IPU is placed into a function and annotated with @tf.function
. This
does not apply to constructing a Keras model or using the Keras Model.fit()
API. See below for details on Keras.
When calling a function that is marked with a @tf.function
from within a
distribution strategy like IPUStrategy
, you should not call it directly,
but instead use the experimental_run_v2
method.
See the following online resources for more information:
4.2. IPUStrategy
The tf.distribute.Strategy
is an API to distribute training across multiple
devices. IPUStrategy
is a
subclass which targets a system with one or more IPUs attached. Another subclass,
IPUMultiWorkerStrategy
,
targets a multiple system configuration.
Use the strategy.scope()
context to ensure that everything within that
context will be compiled for the IPU device. You should do this instead
of using the tf.device
context.
from tensorflow.python import ipu
# Create an IPU distribution strategy
strategy = ipu.ipu_strategy.IPUStrategy()
with strategy.scope():
...
It is important to construct a Keras model within the scope of the
IPUStrategy
, because Keras may create some parts of the model at
construction time, and some other parts at execution time.
See the TensorFlow documentation for more details: https://www.tensorflow.org/guide/distributed_training
4.3. Keras
The Graphcore implementation of TensorFlow includes a port of Keras for the IPU,
available as tensorflow.python.ipu.keras
.
The Keras API is used for constructing models using a set of high-level Layer
objects. See https://www.tensorflow.org/guide/keras for more information.
IPU optimized replacements for the Keras Model
and Sequential
classes are
available for the IPU. These have the following features:
On-device training loop for reduction of communication overhead.
Gradient accumulation for simulating larger batch sizes.
Automatic data-parallelisation of the model when placed on a multi-IPU device. This means that during training the gradients will be reduced across replicas.
These are described in more detail below.
Note
The model must be both instantiated and called from within an IPUStrategy
context.
See https://www.tensorflow.org/guide/keras/train_and_evaluate for more background.
4.3.1. Model class
An IPU port of the standard Keras Model
is
available as tensorflow.python.ipu.keras.Model
.
This is a substitute for the standard Keras Model
class, using only a single
IPU for training. Unlike the standard Keras Model
class, it cannot
be called directly. You must use use the fit()
, evaluate()
and
predict()
methods for training, evaluation and making predictions.
For a high-performance, multi-IPU solution use the PipelineModel class.
4.3.2. Sequential class
An implementation of the Keras Sequential
class is
available as tensorflow.python.ipu.keras.Sequential
.
This is a substitute for the standard Keras Sequential
class, using only a
single IPU for training. For a high-performance, multi-IPU solution use
the PipelineSequential class.
Unlike the standard Keras Model
class, it cannot be
called directly. You must use use the fit()
, evaluate()
and
predict()
methods for training, evaluation and making predictions.
Similarly, you cannot get the list of trainable variables before you have
executed the model.
4.3.3. PipelineModel class
PipelineModel
is an alternative for the
Keras Model
class, with support for multi-device IPU pipelines. Using
pipelined execution allows the IPU to achieve high compute efficiency while
utilising multiple devices.
The PipelineModel
has the same API as the standard Keras Model
classes,
but will train the model on multiple IPUs and stream the data into the devices
using an Infeed
queue which is created automatically.
When defining a model for use with PipelineModel
, the pipeline stage at
which a Layer
is to be executed is given by the
PipelineModel
context in which it is
called.
In a machine learning model, a “step” is often considered to be one pass through the model, in which the forward pass is done, the gradients are calculated and then the parameters are updated. Since a pipeline accumulates multiple gradients before applying them collectively to the parameters, we call each of those pipeline operations a “step”. So the number of data samples processed per step is equal to the batch size multiplied by the pipeline depth.
This will be reflected in the rate at which the progress bar advances, and the entries in the Keras history.
Like the Sequential
class, PipelineModel
also supports automatic
data-parallelism.
4.3.4. PipelineSequential class
PipelineSequential
is an
alternative to the PipelineModel
class for the Keras Sequential
class.
Like the constructor for the standard Keras Sequential
model,
PipelineSequential
takes a list of lists of layers, where each list of
layers is assigned to an IPU pipeline stage. See TensorFlow 2 examples to
see how the API is used.
Like the Sequential
class, PipelineSequential
also supports
automatic data-parallelism.
4.3.5. Custom training loops
If a more sophisticated training loop is required, then it can be described
inside a function which is marked as a @tf.function
. See TensorFlow 2 examples
for an example.
The outer training function should be called using the experimental_run_v2
method on the IPUStrategy
object, to ensure that it is executed using the
strategy’s configuration.
Note
It is not possible to use either PipelineModel
or
PipelineSequential
in a custom training loop.
For more information on the @tf.function
annotation, see the
TensorFlow function documentation.