3. Support for TensorFlow 2

In TensorFlow version 2, eager mode is enabled by default and Keras is the main API for constructing models. Distribution strategies are the new way of targeting different pieces of hardware.

3.1. IPUStrategy

The tf.distribute.Strategy is an API to distribute training across multiple devices. IPUStrategy is a subclass which targets a single system with one or more IPUs attached. Another subclass, IPUMultiWorkerStrategyV1, targets a distributed system with multiple machines (workers). For more information, see the distributed training section.

Use the strategy.scope() context to ensure that everything within that context will be compiled for the IPU device. You should do this instead of using the tf.device context:

from tensorflow.python import ipu

# Create an IPU distribution strategy
strategy = ipu.ipu_strategy.IPUStrategy()

with strategy.scope():


It is important to construct a Keras model within the scope of the IPUStrategy, because Keras may create some parts of the model at construction time, and some other parts at execution time.

See the TensorFlow documentation for more detail on distribution strategies.

3.2. Execution modes

TensorFlow operations can be executed in either graph mode or eager mode. Both of these modes are supported on IPUs, however graph mode is much more efficient. It is therefore important to understand the difference between them and understand how to write TensorFlow programs which will fully utilize the IPU devices.

3.2.1. Graph mode with @tf.function

The TensorFlow function annotation @tf.function converts the body of the annotated function into a fused set of operations that are executed as a group, in the same way as a whole graph would have been in TensorFlow version 1. In addition, a library called autograph will convert Python control flow constructs into TensorFlow graph operations.

It is best practice to ensure that anything which is intended to be executed on the IPU is placed into a Python function which is annotated with @tf.function(jit_compile=True). Note that this does not apply to constructing a Keras model or using the Keras Model.fit() API. See Section 19, Keras with IPUs for details on Keras.

When calling a function which is marked with a @tf.function(jit_compile=True) annotation from within a distribution strategy such as IPUStrategy, you should not call it directly, but instead use the run method. For example:

 1import tensorflow as tf
 2from tensorflow.python import ipu
 4# Configure the IPU device.
 5config = ipu.config.IPUConfig()
 6config.auto_select_ipus = 1
 9a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
10b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
14def matmul_fn(x, y):
15  z = tf.matmul(x, y)
16  return z
19strategy = ipu.ipu_strategy.IPUStrategy()
20with strategy.scope():
21  c = strategy.run(matmul_fn, args=(a, b))


When using the @tf.function annotation, it is important to set the jit_compile=True argument to ensure best performance.

For more information about tf.function and examples, see the TensorFlow documentation.

3.2.2. Eager mode

Eager mode is the default execution mode for TensorFlow operations. This mode is supported on IPUs, however it is not as performant as graph mode and we do not recommend using it.

For example, the code below executes the tf.matmul immediately on an IPU device and returns a tf.Tensor object containing the result:

 1import tensorflow as tf
 2from tensorflow.python import ipu
 4# Configure the IPU device.
 5config = ipu.config.IPUConfig()
 6config.auto_select_ipus = 1
 9a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
10b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
11with tf.device('/device:IPU:0'):
12  c = tf.matmul(a, b)

3.3. On-device loops

In Section 19, Keras with IPUs, we describe how to use Keras to perform training, testing and prediction. However, sometimes a more sophisticated loop is required. You can create these to train, test and run inference of your models using a loop created inside of a tf.function - this is commonly known as an on-device loop.

By executing multiple steps of the model with an on-device loop, you can improve the performance of your model. This is achieved by creating a for loop using tf.range inside tf.function; AutoGraph will convert this to a tf.while_loop for you.

For example, the code below creates a custom training loop using an on-device loop to train a simple model. It uses the syntactical shorthand for infeed creation and defines an iterator over the outfeed, as described in Section 6.2, Optional simplification of infeeds and outfeeds.

 1import tensorflow as tf
 2from tensorflow.python import ipu
 3from tensorflow.python.keras.datasets import mnist
 5# Configure the IPU device.
 6config = ipu.config.IPUConfig()
 7config.auto_select_ipus = 1
11# Create a simple model.
12def create_model():
13  return tf.keras.Sequential([
14      tf.keras.layers.Flatten(),
15      tf.keras.layers.Dense(256, activation='relu'),
16      tf.keras.layers.Dense(128, activation='relu'),
17      tf.keras.layers.Dense(10)
18  ])
21# Create a dataset for the model.
22def create_dataset():
23  (x_train, y_train), (_, _) = mnist.load_data()
24  x_train = x_train / 255.0
26  train_ds = tf.data.Dataset.from_tensor_slices(
27      (x_train, y_train)).shuffle(10000).batch(32, drop_remainder=True)
28  train_ds = train_ds.map(lambda d, l:
29                          (tf.cast(d, tf.float32), tf.cast(l, tf.int32)))
31  return train_ds.repeat().prefetch(16)
34# Define a function which performs a single training step of a model.
35def training_step(features, labels, model, optimizer):
36  # Execute the model and calculate the loss.
37  with tf.GradientTape() as tape:
38    predictions = model(features, training=True)
39    prediction_loss = tf.keras.losses.sparse_categorical_crossentropy(
40        labels, predictions)
41    loss = tf.reduce_mean(prediction_loss)
43  # Apply the gradients.
44  grads = tape.gradient(loss, model.trainable_variables)
45  optimizer.apply_gradients(zip(grads, model.trainable_variables))
46  return loss
49# Create a loop which performs ``steps_per_execution`` iterations of
50# ``training_step`` every time this function is executed.
52def training_loop(iterator, steps_per_execution, outfeed, model, optimizer):
53  # Create an on device loop.
54  for _ in tf.range(steps_per_execution):
55    # Get the next input.
56    features, labels = next(iterator)
58    # Perform the training step.
59    loss = training_step(features, labels, model, optimizer)
61    # Enqueue the loss after each step to the outfeed queue. This is then read
62    # back on the host for monitoring the model performance.
63    outfeed.enqueue(loss)
66# Create a strategy for execution on the IPU.
67strategy = ipu.ipu_strategy.IPUStrategy()
68with strategy.scope():
69  # Create a Keras model.
70  model = create_model()
72  # Create an optimizer.
73  opt = tf.keras.optimizers.SGD(0.01)
75  # Create an iterator inside the strategy for the dataset the model will be
76  # trained on.
77  iterator = iter(create_dataset())
79  # Create an IPUOutfeedQueue to collect results from each step.
80  outfeed_queue = ipu.ipu_outfeed_queue.IPUOutfeedQueue()
82  # Total number of steps (batches) to run.
83  total_steps = 100
85  # How many steps (batches) to execute each time the device executes.
86  steps_per_execution = 10
88  for begin_step in range(0, total_steps, steps_per_execution):
89    # Run the training loop.
90    strategy.run(training_loop,
91                 args=(iterator, steps_per_execution, outfeed_queue, model,
92                       opt))
93    # Calculate the mean loss.
94    mean_loss = sum(outfeed_queue) / steps_per_execution
95    print(f"Current step: {begin_step}, training loss: {mean_loss}")

Download targeting_tf2_example3.py