16. Exporting precompiled models for TensorFlow Serving

TensorFlow applications compiled for the IPU can be exported to standard TensorFlow SavedModel format and deployed to a TensorFlow Serving instance. The exported SavedModel contains the executable compiled for the IPU and a simple TensorFlow graph with IPU embedded application runtime operations which allow you to run the executable as part of the TensorFlow graph.

The Graphcore TensorFlow API for exporting models for TensorFlow Serving supports two different use cases:

  1. Models defined inside a function without using pipelining can be exported using the tensorflow.python.ipu.serving.export_single_step() function.

  2. Pipelined models defined as a list of functions can be exported using the tensorflow.python.ipu.serving.export_pipeline() function.

General notes about using the Graphcore TensorFlow API for exporting models with TensorFlow Serving:

  1. Since the exported SavedModel contains custom IPU embedded application runtime operations, it can be used only with the Graphcore distribution of TensorFlow Serving.

  2. The exported SavedModel cannot be loaded back into a TensorFlow script and used as a regular model because, in the export stage, the model is compiled into an IPU executable. The exported TensorFlow graph contains only IPU embedded application runtime operations and has no information about specific layers, and so on.

  3. TensorFlow and TensorFlow Serving versions must always match. It means that you have to use the same version of TensorFlow Serving as the version of TensorFlow that was used to export the model. Moreover, Poplar versions must also match between the TensorFlow Serving instance and the version of TensorFlow that was used to export the model.

16.1. Exporting non-pipelined models defined inside a function

Exporting the forward pass of a non-pipelined model can be done with the tensorflow.python.ipu.serving.export_single_step() function. A function that defines the forward pass of the model is required as a first argument. Under the hood, the export_single_step() wraps that function into a while loop optimized for the IPU, with the iterations parameter denoting the number of loop iterations. You can use this parameter to tweak the model’s latency; its optimal value is use-case specific. Additionally, the function adds the infeed and outfeed queues, so you do not have to take care of it. Then the model is compiled into an executable and included as an asset in the SavedModel stored at the export_dir location.

To export such a model, the function’s input signature has to be defined. This can be accomplished in one of three ways:

All of the above methods are functionally equivalent and can be used interchangeably based on what you find more convenient.

You can also specify a variable_initializer function that performs an initialization of all the graph’s variables. This function takes a tf.Session instance as the only argument. The example below shows how it can be used for restoring values of variables from a checkpoint.

def variable_initializer(session):
  saver = tf.train.Saver()
  ipu.utils.move_variable_initialization_to_cpu()
  init = tf.global_variables_initializer()
  session.run(init)
  saver.restore(session, 'path/to/checkpoint')

16.1.1. Example of exporting non-pipelined model defined inside a function

This example exports a very simple model with embedded IPU program that doubles the input tensor.

 1import os
 2import shutil
 3
 4import numpy as np
 5
 6import tensorflow as tf
 7from tensorflow.python.ipu import serving
 8from tensorflow.python.ipu import config
 9
10# Directory where SavedModel will be written.
11saved_model_directory = './my_saved_model_ipu/001'
12# Directory should be empty or should not exist.
13if os.path.exists(saved_model_directory):
14  shutil.rmtree(saved_model_directory)
15
16
17# The function to export.
18@tf.function
19def my_net(x):
20  # Double the input - replace this with application body.
21  result = x * 2
22  return result
23
24
25# Configure the IPU for compilation.
26cfg = config.IPUConfig()
27cfg.auto_select_ipus = 1
28cfg.device_connection.enable_remote_buffers = True
29cfg.device_connection.type = config.DeviceConnectionType.ON_DEMAND
30cfg.configure_ipu_system()
31
32input_shape = (4,)
33# Prepare the input signature.
34input_signature = (tf.TensorSpec(shape=input_shape, dtype=np.float32),)
35# Export as a SavedModel.
36iterations = 16
37
38runtime_func = serving.export_single_step(my_net, saved_model_directory,
39                                          iterations, input_signature)
40print(f"SavedModel written to {saved_model_directory}")
41
42# You can test the exported executable using returned `runtime_func`.
43# This should print the even numbers 0 to 30.
44input_placeholder = tf.placeholder(dtype=np.float32, shape=input_shape)
45result_op = runtime_func(input_placeholder)
46
47with tf.Session() as sess:
48  for i in range(iterations):
49    input_data = np.ones(input_shape, dtype=np.float32) * i
50    print(sess.run(result_op, {input_placeholder: input_data}))

16.2. Exporting pipelined models defined as a list of functions

Exporting the forward pass of a pipelined models can be accomplished using tensorflow.python.ipu.serving.export_pipeline() function.

The use of that function is very similar to the creation of a pipeline op using the tensorflow.python.ipu.pipelining_ops.pipeline() function. You have to provide a list of functions that represent the pipeline’s computational stages.

Function tensorflow.python.ipu.serving.export_pipeline() also has an iteration argument. It denotes the number of times each pipeline stage is executed before the pipeline is restarted. Again, you can use it to tweak the model’s latency. This argument is sometimes called steps_per_execution.

Similarly to exporting non-pipelined models, to export a pipelined model the signature of the first computational stage has to be known. You can do this in the same three ways as non-pipelined models. It’s worth noting that for the first option—passing the input signature to the @tf.function decorator—you only need to do that for the first computational stage.

For pipelined models you can also specify a variable_initializer function. It works exactly the same as in case of non-pipelined models.

16.2.1. Pipeline example

This example exports a simple pipelined IPU program that performs 2x+3 function on the input.

 1import os
 2import shutil
 3
 4import numpy as np
 5
 6import tensorflow as tf
 7from tensorflow.python.ipu import serving
 8from tensorflow.python.ipu import config
 9
10# Directory where SavedModel will be written.
11saved_model_directory = './my_saved_model_ipu/002'
12# Directory should be empty or should not exist.
13if os.path.exists(saved_model_directory):
14  shutil.rmtree(saved_model_directory)
15
16
17# The pipeline's stages to export.
18@tf.function
19def stage1(x):
20  # Double the input - replace this with 1st stage body.
21  output = x * 2
22  return output
23
24
25@tf.function
26def stage2(x):
27  # Add 3 to the input - replace this with 2nd stage body.
28  output = x + 3
29  return output
30
31
32# Configure the IPU for compilation.
33cfg = config.IPUConfig()
34cfg.auto_select_ipus = 2
35cfg.device_connection.enable_remote_buffers = True
36cfg.device_connection.type = config.DeviceConnectionType.ON_DEMAND
37cfg.configure_ipu_system()
38
39input_shape = (4,)
40# Prepare the input signature.
41input_signature = (tf.TensorSpec(shape=input_shape, dtype=np.float32),)
42# Number of times each pipeline stage is executed.
43iterations = 16
44
45# Export as a SavedModel.
46runtime_func = serving.export_pipeline([stage1, stage2],
47                                       saved_model_directory,
48                                       iterations=iterations,
49                                       device_mapping=[0, 1],
50                                       input_signature=input_signature)
51print(f"SavedModel written to {saved_model_directory}")
52
53# You can test the exported executable using returned `runtime_func`.
54# This should print numbers from 3 to 33.
55input_placeholder = tf.placeholder(dtype=np.float32, shape=input_shape)
56result_op = runtime_func(input_placeholder)
57
58with tf.Session() as sess:
59  for i in range(iterations):
60    input_data = np.ones(input_shape, dtype=np.float32) * i
61    print(sess.run(result_op, {input_placeholder: input_data}))

16.3. Running the model in TensorFlow Serving

To test the exported SavedModel you can just start a TensorFlow Serving instance and point it to the model’s location. Graphcore’s distribution of TensorFlow Serving can be run directly in the host system:

$ tensorflow_model_server --rest_api_port=8501 --model_name=my_model \
      --model_base_path="$(pwd)/my_saved_model_ipu"

And then you can start sending inference requests, for example:

$ curl -d '{"instances": [1.0, 2.0, 5.0, 7.0]}'   \
    -X POST http://localhost:8501/v1/models/my_model:predict

Graphcore does not distribute the TensorFlow Serving API package. If you want to use it, you need to install it from the official distribution using pip.