3. Support for TensorFlow 2

In TensorFlow version 2, eager mode is enabled by default and Keras is the main API for constructing models. Distribution strategies are the new way of targeting different pieces of hardware.

3.1. IPUStrategy

The tf.distribute.Strategy is an API to distribute training across multiple devices. IPUStrategy is a subclass which targets a single system with one or more IPUs attached. Another subclass, IPUMultiWorkerStrategyV1, targets a distributed system with multiple machines (workers). For more information, see the distributed training section.

Use the strategy.scope() context to ensure that everything within that context will be compiled for the IPU device. You should do this instead of using the tf.device context:

from tensorflow.python import ipu

# Create an IPU distribution strategy
strategy = ipu.ipu_strategy.IPUStrategy()

with strategy.scope():
    ...

Note

It is important to construct a Keras model within the scope of the IPUStrategy, because Keras may create some parts of the model at construction time, and some other parts at execution time.

See the TensorFlow documentation for more details on distribution strategies: https://www.tensorflow.org/guide/distributed_training

3.2. Execution modes

TensorFlow operations can be executed in either graph mode or eager mode. Both of these modes are supported on IPUs, however graph mode is much more efficient. It is therefore important to understand the difference between them and understand how to write TensorFlow programs which will fully utilize the IPU devices.

3.2.1. Graph mode with @tf.function

The TensorFlow function annotation @tf.function converts the body of the annotated function into a fused set of operations that are executed as a group, in the same way as a whole graph would have been in TensorFlow version 1. In addition, a library called autograph will convert Python control flow constructs into TensorFlow graph operations.

It is best practice to ensure that anything which is intended to be executed on the IPU is placed into a Python function which is annotated with @tf.function(experimental_compile=True). Note that this does not apply to constructing a Keras model or using the Keras Model.fit() API. See the Keras with IPUs section for details on Keras.

When calling a function which is marked with a @tf.function(experimental_compile=True) annotation from within a distribution strategy such as IPUStrategy, you should not call it directly, but instead use the run method. For example:

import tensorflow as tf
from tensorflow.python import ipu

# Configure the IPU device.
config = ipu.config.IPUConfig()
config.auto_select_ipus = 1
config.configure_ipu_system()

a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])


@tf.function(experimental_compile=True)
def matmul_fn(x, y):
  z = tf.matmul(x, y)
  return z


strategy = ipu.ipu_strategy.IPUStrategy()
with strategy.scope():
  c = strategy.run(matmul_fn, args=(a, b))
print(c)

Note

When using the @tf.function annotation, it is important to set the experimental_compile=True argument to ensure best performance.

For more information about tf.function and examples, see the TensorFlow documentation at https://www.tensorflow.org/guide/function.

3.2.2. Eager mode

Eager mode is the default execution mode for TensorFlow operations. This mode is supported on IPUs, however it is not as performant as graph mode and we do not recommend using it.

For example, the code below executes the tf.matmul immediately on an IPU device and returns a tf.Tensor object containing the result:

import tensorflow as tf
from tensorflow.python import ipu

# Configure the IPU device.
config = ipu.config.IPUConfig()
config.auto_select_ipus = 1
config.configure_ipu_system()

a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
with tf.device('/device:IPU:0'):
  c = tf.matmul(a, b)

print(c)

3.3. On-device loops

In the Keras with IPUs section, we describe how to use Keras to perform training, testing and prediction. However, sometimes a more sophisticated loop is required. You can create these to train, test and run inference of your models using a loop created inside of a tf.function - this is commonly known as an on-device loop.

By executing multiple steps of the model with an on-device loop, you can improve the performance of your model. This is achieved by creating a for loop using tf.range inside tf.function; AutoGraph will convert this to a tf.while_loop for you.

For example, the code below creates a custom training loop using an on-device loop to train a simple model:

import tensorflow as tf
from tensorflow.python import ipu

# Configure the IPU device.
config = ipu.config.IPUConfig()
config.auto_select_ipus = 1
config.configure_ipu_system()


# Create a simple model.
def create_model():
  return tf.keras.Sequential([
      tf.keras.layers.Flatten(),
      tf.keras.layers.Dense(256, activation='relu'),
      tf.keras.layers.Dense(128, activation='relu'),
      tf.keras.layers.Dense(10)
  ])


# Create a dataset for the model.
def create_dataset():
  mnist = tf.keras.datasets.mnist

  (x_train, y_train), (_, _) = mnist.load_data()
  x_train = x_train / 255.0

  train_ds = tf.data.Dataset.from_tensor_slices(
      (x_train, y_train)).shuffle(10000).batch(32, drop_remainder=True)
  train_ds = train_ds.map(lambda d, l:
                          (tf.cast(d, tf.float32), tf.cast(l, tf.int32)))

  return train_ds.repeat().prefetch(16)


# Define a function which performs a single training step of a model.
def training_step(features, labels, model, optimizer):
  # Execute the model and calculate the loss.
  with tf.GradientTape() as tape:
    predictions = model(features, training=True)
    prediction_loss = tf.keras.losses.sparse_categorical_crossentropy(
        labels, predictions)
    loss = tf.reduce_mean(prediction_loss)

  # Apply the gradients.
  grads = tape.gradient(loss, model.trainable_variables)
  optimizer.apply_gradients(zip(grads, model.trainable_variables))
  return loss


# Create a loop which performs ``steps_per_execution`` iterations of
# ``training_step`` every time this function is executed.
@tf.function(experimental_compile=True)
def training_loop(iterator, steps_per_execution, outfeed, model, optimizer):
  # Create an on device loop.
  for _ in tf.range(steps_per_execution):
    # Get the next input.
    features, labels = next(iterator)

    # Perform the training step.
    loss = training_step(features, labels, model, optimizer)

    # Enqueue the loss after each step to the outfeed queue. This is then read
    # back on the host for monitoring the model performance.
    outfeed.enqueue(loss)


# Create a strategy for execution on the IPU.
strategy = ipu.ipu_strategy.IPUStrategy()
with strategy.scope():
  # Create a Keras model.
  model = create_model()

  # Create an optimizer.
  opt = tf.keras.optimizers.SGD(0.01)

  # Create an iterator inside the strategy for the dataset the model will be
  # trained on.
  iterator = iter(create_dataset())

  # Create an IPUOutfeedQueue to collect results from each step.
  outfeed_queue = ipu.ipu_outfeed_queue.IPUOutfeedQueue()

  # Total number of steps (batches) to run.
  total_steps = 100

  # How many steps (batches) to execute each time the device executes.
  steps_per_execution = 10

  for begin_step in range(0, total_steps, steps_per_execution):
    # Run the training loop.
    strategy.run(training_loop,
                 args=(iterator, steps_per_execution, outfeed_queue, model,
                       opt))
    # Calculate the mean loss.
    mean_loss = sum(outfeed_queue) / steps_per_execution
    print(f"Current step: {begin_step}, training loss: {mean_loss}")