3. Support for TensorFlow 2
In TensorFlow version 2, eager mode is enabled by default and Keras is the main API for constructing models. Distribution strategies are the new way of targeting different pieces of hardware.
3.1. IPUStrategy
The tf.distribute.Strategy
is an API to distribute training across multiple
devices. IPUStrategy
is a
subclass which targets a single system with one or more IPUs attached. Another
subclass, IPUMultiWorkerStrategyV1
,
targets a distributed system with multiple machines (workers). For more
information, see the distributed training section.
Use the strategy.scope()
context to ensure that everything within that
context will be compiled for the IPU device. You should do this instead
of using the tf.device
context:
from tensorflow.python import ipu
# Create an IPU distribution strategy
strategy = ipu.ipu_strategy.IPUStrategy()
with strategy.scope():
...
Note
It is important to construct a Keras model within the scope of the
IPUStrategy
, because Keras may create some parts of the model at
construction time, and some other parts at execution time.
See the TensorFlow documentation for more detail on distribution strategies.
3.2. Execution modes
TensorFlow operations can be executed in either graph mode or eager mode. Both of these modes are supported on IPUs, however graph mode is much more efficient. It is therefore important to understand the difference between them and understand how to write TensorFlow programs which will fully utilize the IPU devices.
3.2.1. Graph mode with @tf.function
The TensorFlow function annotation @tf.function
converts the body of the
annotated function into a fused set of operations that are executed as a group,
in the same way as a whole graph would have been in TensorFlow version 1. In
addition, a library called autograph
will convert Python control flow
constructs into TensorFlow graph operations.
It is best practice to ensure that anything which is intended to be executed on
the IPU is placed into a Python function which is annotated with
@tf.function(jit_compile=True)
. Note that this does not apply to
constructing a Keras model or using the Keras Model.fit()
API. See
Section 19, Keras with IPUs for details on Keras.
When calling a function which is marked with a
@tf.function(jit_compile=True)
annotation from within a
distribution strategy such as IPUStrategy
, you should not call it directly,
but instead use the run
method. For example:
1import tensorflow as tf
2from tensorflow.python import ipu
3
4# Configure the IPU device.
5config = ipu.config.IPUConfig()
6config.auto_select_ipus = 1
7config.configure_ipu_system()
8
9a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
10b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
11
12
13@tf.function(jit_compile=True)
14def matmul_fn(x, y):
15 z = tf.matmul(x, y)
16 return z
17
18
19strategy = ipu.ipu_strategy.IPUStrategy()
20with strategy.scope():
21 c = strategy.run(matmul_fn, args=(a, b))
22print(c)
Note
When using the @tf.function
annotation, it is important to set the
jit_compile=True
argument to ensure best performance.
For more information about tf.function
and examples, see the TensorFlow
documentation.
3.2.2. Eager mode
Eager mode is the default execution mode for TensorFlow operations. This mode is supported on IPUs, however it is not as performant as graph mode and we do not recommend using it.
For example, the code below executes the tf.matmul
immediately on an IPU
device and returns a tf.Tensor
object containing the result:
1import tensorflow as tf
2from tensorflow.python import ipu
3
4# Configure the IPU device.
5config = ipu.config.IPUConfig()
6config.auto_select_ipus = 1
7config.configure_ipu_system()
8
9a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
10b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
11with tf.device('/device:IPU:0'):
12 c = tf.matmul(a, b)
13
14print(c)
3.3. On-device loops
In Section 19, Keras with IPUs, we describe how to use Keras to perform
training, testing and prediction. However, sometimes a more sophisticated loop
is required. You can create these to train, test and run inference of your
models using a loop created inside of a tf.function
- this is commonly known
as an on-device loop.
By executing multiple steps of the model with an on-device loop, you can improve
the performance of your model. This is achieved by creating a for
loop
using tf.range
inside tf.function
; AutoGraph will convert this to a
tf.while_loop
for you.
For example, the code below creates a custom training loop using an on-device loop to train a simple model. It uses the syntactical shorthand for infeed creation and defines an iterator over the outfeed, as described in Section 6.2, Optional simplification of infeeds and outfeeds.
1import tensorflow as tf
2from tensorflow.python import ipu
3from tensorflow.python.keras.datasets import mnist
4
5# Configure the IPU device.
6config = ipu.config.IPUConfig()
7config.auto_select_ipus = 1
8config.configure_ipu_system()
9
10
11# Create a simple model.
12def create_model():
13 return tf.keras.Sequential([
14 tf.keras.layers.Flatten(),
15 tf.keras.layers.Dense(256, activation='relu'),
16 tf.keras.layers.Dense(128, activation='relu'),
17 tf.keras.layers.Dense(10)
18 ])
19
20
21# Create a dataset for the model.
22def create_dataset():
23 (x_train, y_train), (_, _) = mnist.load_data()
24 x_train = x_train / 255.0
25
26 train_ds = tf.data.Dataset.from_tensor_slices(
27 (x_train, y_train)).shuffle(10000).batch(32, drop_remainder=True)
28 train_ds = train_ds.map(lambda d, l:
29 (tf.cast(d, tf.float32), tf.cast(l, tf.int32)))
30
31 return train_ds.repeat().prefetch(16)
32
33
34# Define a function which performs a single training step of a model.
35def training_step(features, labels, model, optimizer):
36 # Execute the model and calculate the loss.
37 with tf.GradientTape() as tape:
38 predictions = model(features, training=True)
39 prediction_loss = tf.keras.losses.sparse_categorical_crossentropy(
40 labels, predictions)
41 loss = tf.reduce_mean(prediction_loss)
42
43 # Apply the gradients.
44 grads = tape.gradient(loss, model.trainable_variables)
45 optimizer.apply_gradients(zip(grads, model.trainable_variables))
46 return loss
47
48
49# Create a loop which performs ``steps_per_execution`` iterations of
50# ``training_step`` every time this function is executed.
51@tf.function(jit_compile=True)
52def training_loop(iterator, steps_per_execution, outfeed, model, optimizer):
53 # Create an on device loop.
54 for _ in tf.range(steps_per_execution):
55 # Get the next input.
56 features, labels = next(iterator)
57
58 # Perform the training step.
59 loss = training_step(features, labels, model, optimizer)
60
61 # Enqueue the loss after each step to the outfeed queue. This is then read
62 # back on the host for monitoring the model performance.
63 outfeed.enqueue(loss)
64
65
66# Create a strategy for execution on the IPU.
67strategy = ipu.ipu_strategy.IPUStrategy()
68with strategy.scope():
69 # Create a Keras model.
70 model = create_model()
71
72 # Create an optimizer.
73 opt = tf.keras.optimizers.SGD(0.01)
74
75 # Create an iterator inside the strategy for the dataset the model will be
76 # trained on.
77 iterator = iter(create_dataset())
78
79 # Create an IPUOutfeedQueue to collect results from each step.
80 outfeed_queue = ipu.ipu_outfeed_queue.IPUOutfeedQueue()
81
82 # Total number of steps (batches) to run.
83 total_steps = 100
84
85 # How many steps (batches) to execute each time the device executes.
86 steps_per_execution = 10
87
88 for begin_step in range(0, total_steps, steps_per_execution):
89 # Run the training loop.
90 strategy.run(training_loop,
91 args=(iterator, steps_per_execution, outfeed_queue, model,
92 opt))
93 # Calculate the mean loss.
94 mean_loss = sum(outfeed_queue) / steps_per_execution
95 print(f"Current step: {begin_step}, training loss: {mean_loss}")
Download targeting_tf2_example3.py