4. Compiling and pre-compiling executables

4.1. Caching of compiled executables

It can take a long time to compile a large TensorFlow graph into an executable suitable for the IPU. To prevent the need for compiling the same graphs every time a TensorFlow process is started, you can enable an executable cache.

To enable it, you can use the option --executable_cache_path to specify a directory where the compiled executables for TensorFlow graphs will be placed. For example:

TF_POPLAR_FLAGS='--executable_cache_path=/tmp/cachedir'

An executable binary file with a file extension .poplar_exec will be saved for each XLA/Poplar graph required to execute a TensorFlow graph.

The cache does not manage the files within the directory. It is your responsibility to delete files. No index is kept of the files, so they can be deleted without risk.

4.2. Pre-compiling executables

If you are using a machine which is not attached to any IPU devices, but would still like to pre-compile your TensorFlow graphs, you can do so by enabling the pre-compile mode. In this mode your TensorFlow program is traced as if it was executing on IPU device(s) to identify which programs need to be compiled along with which tf.Variables are used.

During the tracing in the pre-compile mode your TensorFlow program is executed as if it was attached to IPU device(s), however any numerical results returned are set to zero. This means that if any operations in your TensorFlow program are executed conditionally dependent on the previous output, they might not be pre-compiled.

To enable the pre-compile mode, you need to use the option --executable_cache_path to specify a directory where the compiled executables for TensorFlow graphs will be placed. For example:

TF_POPLAR_FLAGS='--executable_cache_path=/tmp/executables'

Then in your TensorFlow program you need to modify your IPU system configuration to use the pre-compile mode. For example:

 1from tensorflow.python.ipu import ipu_compiler
 2from tensorflow.python.ipu import ipu_infeed_queue
 3from tensorflow.python.ipu import ipu_outfeed_queue
 4from tensorflow.python.ipu import loops
 5from tensorflow.python.ipu import nn_ops
 6from tensorflow.python.ipu import normalization_ops
 7from tensorflow.python.ipu import scopes
 8from tensorflow.python.ipu import utils
 9import tensorflow.compat.v1 as tf
10tf.disable_v2_behavior()
11
12# Create a configuration for a single IPU.
13cfg = utils.create_ipu_config()
14cfg = utils.auto_select_ipus(cfg, num_ipus=1)
15
16# Enable the Pre-compile mode for IPU version 2 with remote buffers enabled.
17cfg = utils.set_ipu_connection_type(
18    cfg,
19    connection_type=utils.DeviceConnectionType.PRE_COMPILE,
20    ipu_version=2,
21    enable_remote_buffers=True)
22
23utils.configure_ipu_system(cfg)
24
25# The dataset for feeding the graphs
26ds = tf.data.Dataset.from_tensors(tf.constant(1.0, shape=[64, 64]))
27ds = ds.repeat()
28
29# The host side queues
30infeed_queue = ipu_infeed_queue.IPUInfeedQueue(ds, feed_name="infeed")
31outfeed_queue = ipu_outfeed_queue.IPUOutfeedQueue(feed_name="outfeed")
32
33
34# The device side main
35def body(x):
36  w1 = tf.get_variable(
37      "w1",
38      shape=[64, 64],
39      initializer=tf.glorot_uniform_initializer(dtype=tf.float32))
40  w2 = tf.get_variable(
41      "w2",
42      shape=[64, 64],
43      initializer=tf.glorot_uniform_initializer(dtype=tf.float32))
44
45  def func(a, b):
46    x = tf.matmul(a, b)
47    x = normalization_ops.layer_norm(x)
48    x = nn_ops.gelu(x)
49    return x
50
51  x = func(x, w1)
52  x = func(x, w2)
53  outfeed = outfeed_queue.enqueue(x)
54  return outfeed
55
56
57def my_net():
58  r = loops.repeat(10, body, [], infeed_queue)
59  return r
60
61
62with scopes.ipu_scope('/device:IPU:0'):
63  run_loop = ipu_compiler.compile(my_net, inputs=[])
64
65# The outfeed dequeue has to happen after the outfeed enqueue
66dequeue_outfeed = outfeed_queue.dequeue()
67
68with tf.Session() as sess:
69  sess.run(infeed_queue.initializer)
70  sess.run(tf.global_variables_initializer())
71  sess.run(run_loop)
72  print(sess.run(dequeue_outfeed))

In the above example we create an IPU system configuration with pre-compile mode for a single IPU device (IPU version 2) and with remote buffers enabled, with the rest of the program unchanged.

Note

It is important to check whether the system you are pre-compiling for supports remote buffers as this is required for features such as optimizer state offloading.

During the execution of the program, messages will appear with the information about what executables have been compiled and where they have been saved to. For example:

A pre-compiled Poplar program has been saved to /tmp/executables/277a08fe4c20b50.poplar_exec

Once your program has finished executing, you can copy all the executables to a machine with IPUs. After these have been copied, on the machine with IPUs, you should set --executable_cache_path to the directory where the compiled executables for your TensorFlow program were copied to and then run your TensorFlow program (without enabling the pre-compile mode).

4.2.1. Unsupported Operations

TensorFlow programs which contain the following cannot be pre-compiled:

  • Custom user operations for which is_hashable has not been set to True (see Metadata).

  • Programs containing tensorflow.python.ipu.scopes.outside_compilation_scope.