17. Exporting precompiled models for TensorFlow Serving

TensorFlow applications compiled for the IPU can be exported to the standard TensorFlow SavedModel format and deployed to a TensorFlow Serving instance. The exported SavedModel contains the executable compiled for the IPU, and a TensorFlow graph with IPU embedded application runtime operations which allows the executable to be run as part of the TensorFlow graph. The exported graph may contain optional preprocessing and postprocessing parts that are executed on the CPU.

The Graphcore TensorFlow API for exporting models for TensorFlow Serving supports two different use cases:

Models defined inside a function without using pipelining can be exported using the tensorflow.python.ipu.serving.export_single_step() function.
Pipelined models defined as a list of functions can be exported using the tensorflow.python.ipu.serving.export_pipeline() function.

General notes about using the Graphcore TensorFlow API for exporting models with TensorFlow Serving:

Since the exported SavedModel contains custom IPU embedded application runtime operations, it can be used only with the Graphcore distribution of TensorFlow Serving.
The exported SavedModel cannot be loaded back into a TensorFlow script and used as a regular model because, in the export stage, the model is compiled into an IPU executable. The exported TensorFlow graph contains only IPU embedded application runtime operations and has no information about specific layers, and so on.
TensorFlow and TensorFlow Serving versions must always match. It means that you have to use the same version of TensorFlow Serving as the version of TensorFlow that was used to export the model. Moreover, Poplar versions must also match between the TensorFlow Serving instance and the version of TensorFlow that was used to export the model.

17.1. Exporting non-pipelined models defined inside a function

Exporting the forward pass of a non-pipelined model can be done with the tensorflow.python.ipu.serving.export_single_step() function. A function that defines the forward pass of the model is required as a first argument. Under the hood, export_single_step() wraps that function into a while loop optimized for the IPU, with the iterations parameter denoting the number of loop iterations. You can use this parameter to tweak the model’s latency; its optimal value is use-case specific. Additionally, the function adds the infeed and outfeed queues, so you do not have to take care of it. Then the model is compiled into an executable and included as an asset in the SavedModel stored at the export_dir location. The export_single_step() function adds the possibility of passing the preprocessing_step and postprocessing_step functions which will be included into the SavedModel graph and executed on the CPU on the server side. If all preprocessing and postprocessing operations are available on the IPU, preprocessing_step and postprocessing_step functions should be called inside the predict_step function. Then function bodies will be compiled together with the inference model.

To export such a model, the predict_step function’s input signature has to be defined. This can be accomplished in one of three ways:

You can decorate the function with @tf.function, which takes the input_signature argument;
You can pass the predict_step function signature (predict_step_signature) directly to tensorflow.python.ipu.serving.export_single_step();
You can pass the input dataset (input_dataset) to the tensorflow.python.ipu.serving.export_single_step() function and the exported model’s input signature will be inferred from it.

All of the above methods are functionally equivalent and can be used interchangeably based on what you find more convenient.

You can also specify a variable_initializer function that performs an initialization of all the variables in the predict step graph. This function takes a tf.Session instance as the only argument. The example below shows how it can be used for restoring values of variables from a checkpoint.

def variable_initializer(session):
  saver = tf.train.Saver()
  ipu.utils.move_variable_initialization_to_cpu()
  init = tf.global_variables_initializer()
  session.run(init)
  saver.restore(session, 'path/to/checkpoint')

17.1.1. Example of exporting non-pipelined model defined inside a function

This example exports a very simple model with an embedded IPU program that doubles the input tensor.

import os
import sys

import numpy as np

import tensorflow as tf
from tensorflow.python.ipu import config
from tensorflow.python.ipu import serving

# Directory where SavedModel will be written.
saved_model_directory = './my_saved_model_ipu/001'

if os.path.exists(saved_model_directory):
  sys.exit(f"Directory '{saved_model_directory}' exists! Please delete it "
           "before running the example.")


# The function to export.
@tf.function
def predict_step(x):
  # Double the input - replace this with application body.
  result = x * 2
  return result


# Configure the IPU for compilation.
cfg = config.IPUConfig()
cfg.auto_select_ipus = 1
cfg.device_connection.enable_remote_buffers = True
cfg.device_connection.type = config.DeviceConnectionType.ON_DEMAND
cfg.configure_ipu_system()

input_shape = (4,)
# Prepare the `predict_step` function signature.
predict_step_signature = (tf.TensorSpec(shape=input_shape, dtype=np.float32),)

# Export as a SavedModel.
iters = 16
runtime_func = serving.export_single_step(
    predict_step,
    saved_model_directory,
    iterations=iters,
    predict_step_signature=predict_step_signature)
print(f"SavedModel written to {saved_model_directory}")

# You can test the exported executable using returned `runtime_func`.
# This should print the even numbers 0 to 30.
input_placeholder = tf.placeholder(dtype=np.float32, shape=input_shape)
result_op = runtime_func(input_placeholder)

with tf.Session() as sess:
  for i in range(iters):
    input_data = np.ones(input_shape, dtype=np.float32) * i
    print(sess.run(result_op, {input_placeholder: input_data}))

17.1.2. Example of exporting non-pipelined model defined inside a function with additional preprocessing and postprocessing steps

This example exports a very simple model with an embedded IPU program, which doubles the input tensor. The model also performs a preprocessing step (on the IPU) to compute the absolute value of the input and a postprocessing step (on the IPU) to reduce the output.

import os
import sys

import numpy as np
import tensorflow as tf

from tensorflow.python.ipu import config
from tensorflow.python.ipu import serving

# Directory where SavedModel will be written.
saved_model_directory = './my_saved_model_ipu/005'

if os.path.exists(saved_model_directory):
  sys.exit(f"Directory '{saved_model_directory}' exists! Please delete it "
           "before running the example.")


# The preprocessing step is performed fully on the IPU.
def preprocessing_step(x):
  return tf.abs(x)


# The postprocessing step is performed fully on the IPU.
def postprocessing_step(x):
  return tf.reduce_sum(x)


def application_body(x):
  # Double the input - replace this with your application body.
  return x * 2


# The function to export.
def predict_step(x):
  # preprocessing and postprocessing will be compiled and exported together
  # with application body.
  x = preprocessing_step(x)
  x = application_body(x)
  return postprocessing_step(x)


# Configure the IPU for compilation.
cfg = config.IPUConfig()
cfg.auto_select_ipus = 1
cfg.device_connection.enable_remote_buffers = True
cfg.device_connection.type = config.DeviceConnectionType.ON_DEMAND
cfg.configure_ipu_system()

input_shape = (4,)
# Prepare the `predict_step` function signature.
predict_step_signature = (tf.TensorSpec(shape=input_shape, dtype=np.float32),)

# Export as a SavedModel.
iters = 10
runtime_func = serving.export_single_step(
    predict_step,
    saved_model_directory,
    iterations=iters,
    predict_step_signature=predict_step_signature)
print(f"SavedModel written to {saved_model_directory}")

# You can test the exported executable using the returned `runtime_func`.
input_placeholder = tf.placeholder(dtype=tf.float32, shape=input_shape)
result_op = runtime_func(input_placeholder)

with tf.Session() as sess:
  for i in range(iters):
    input_data = np.ones(input_shape, dtype=np.float32) * (-1.0 * i)
    print(sess.run(result_op, {input_placeholder: input_data}))

This example exports a very simple model with an embedded IPU program, which doubles the input tensor. The model also performs a preprocessing step (on the CPU) to convert string tensors to floats and a postprocessing step (on the CPU) to compute the absolute value of the outputs.

import os
import sys

import numpy as np
import tensorflow as tf

from tensorflow.python.ipu import config
from tensorflow.python.ipu import serving

# Directory where SavedModel will be written
saved_model_directory = './my_saved_model_ipu/004'

if os.path.exists(saved_model_directory):
  sys.exit(f"Directory '{saved_model_directory}' exists! Please delete it "
           "before running the example.")


# The preprocessing step is performed fully on the CPU.
def preprocessing_step(x):
  def transform_fn(inp):
    is_gc = lambda: tf.constant(1.0)
    is_oth = lambda: tf.random.uniform(shape=[])
    condition = tf.equal(inp, tf.constant("Graphcore", dtype=tf.string))
    return tf.cond(condition, is_gc, is_oth)

  return tf.stack([transform_fn(elem) for elem in tf.unstack(x)])


# The postprocessing step is performed fully on the CPU.
def postprocessing_step(x):
  return tf.abs(x)


# The function to export.
def predict_step(x):
  # Double the input - replace this with your application body.
  return x * 2


# Configure the IPU for compilation.
cfg = config.IPUConfig()
cfg.auto_select_ipus = 1
cfg.device_connection.enable_remote_buffers = True
cfg.device_connection.type = config.DeviceConnectionType.ON_DEMAND
cfg.configure_ipu_system()

input_shape = (6,)
# Prepare the `predict_step` function signature.
predict_step_signature = (tf.TensorSpec(shape=input_shape, dtype=np.float32),)

# Prepare the `preprocessing_step` function signature.
preprocessing_step_signature = (tf.TensorSpec(shape=input_shape,
                                              dtype=tf.string),)

# Prepare the `postprocessing_step` function signature.
postprocessing_step_signature = (tf.TensorSpec(shape=input_shape,
                                               dtype=np.float32),)
# Export as a SavedModel.
iters = 10
runtime_func = serving.export_single_step(
    predict_step,
    saved_model_directory,
    iterations=iters,
    predict_step_signature=predict_step_signature,
    preprocessing_step=preprocessing_step,
    preprocessing_step_signature=preprocessing_step_signature,
    postprocessing_step=postprocessing_step,
    postprocessing_step_signature=postprocessing_step_signature)
print(f"SavedModel written to {saved_model_directory}")

# You can test the exported executable using returned `runtime_func`.
input_placeholder = tf.placeholder(dtype=tf.string, shape=input_shape)
result_op = runtime_func(input_placeholder)

with tf.Session() as sess:
  input_data = ["make", "AI", "breakthroughs", "with", "Graphcore", "IPUS"]
  print(sess.run(result_op, {input_placeholder: input_data}))

17.2. Exporting pipelined models defined as a list of functions

Exporting the forward pass of a pipelined model can be accomplished using tensorflow.python.ipu.serving.export_pipeline() function.

The use of that function is very similar to the creation of a pipeline op using the tensorflow.python.ipu.pipelining_ops.pipeline() function. You have to provide a list of functions that represent the pipeline’s computational stages.

The function tensorflow.python.ipu.serving.export_pipeline() also has an iteration argument. It denotes the number of times each pipeline stage is executed before the pipeline is restarted. Again, you can use iteration to tweak the model’s latency. This argument is sometimes called steps_per_execution, especially for Keras models.

Similarly to exporting non-pipelined models, to export a pipelined model the signature of the first computational stage has to be known. You can do this using the same methods as for non-pipelined models (Section 17.1, Exporting non-pipelined models defined inside a function). It is worth noting that for the first option—passing the input signature to the @tf.function decorator—you only need to do this for the first computational stage.

17.2.1. Pipeline example

This example exports a simple pipelined IPU program that computes the function 2x+3 on the input.

import os
import sys

import numpy as np
import tensorflow as tf
from tensorflow.python.ipu import config
from tensorflow.python.ipu import serving

# Directory where SavedModel will be written.
saved_model_directory = './my_saved_model_ipu/002'

if os.path.exists(saved_model_directory):
  sys.exit(f"Directory '{saved_model_directory}' exists! Please delete it "
           "before running the example.")


# The pipeline stages to export.
@tf.function
def stage1(x):
  # Double the input - replace this with 1st stage body.
  output = x * 2
  return output


@tf.function
def stage2(x):
  # Add 3 to the input - replace this with 2nd stage body.
  output = x + 3
  return output


# Configure the IPU for compilation.
cfg = config.IPUConfig()
cfg.auto_select_ipus = 2
cfg.device_connection.enable_remote_buffers = True
cfg.device_connection.type = config.DeviceConnectionType.ON_DEMAND
cfg.configure_ipu_system()

input_shape = (4,)
# Prepare the input signature.
predict_step_signature = (tf.TensorSpec(shape=input_shape, dtype=np.float32),)

# Export as a SavedModel.
iters = 16
predict_step = [stage1, stage2]
runtime_func = serving.export_pipeline(
    predict_step,
    saved_model_directory,
    iterations=iters,
    device_mapping=[0, 1],
    predict_step_signature=predict_step_signature)
print(f"SavedModel written to {saved_model_directory}")

# You can test the exported executable using returned `runtime_func`,
# which should print numbers from 3 to 33.
input_placeholder = tf.placeholder(dtype=np.float32, shape=input_shape)
result_op = runtime_func(input_placeholder)

with tf.Session() as sess:
  for i in range(iters):
    input_data = np.ones(input_shape, dtype=np.float32) * i
    print(sess.run(result_op, {input_placeholder: input_data}))

17.2.2. Pipeline example with preprocessing and postprocessing steps

This example exports a simple pipelined IPU program that computes the function 2x+3 on the input. The model includes a preprocessing computational stage which computes the absolute value of the input and an IPU postprocessing step to reduce the output.

import os
import sys

import numpy as np
import tensorflow as tf

from tensorflow.python.ipu import config
from tensorflow.python.ipu import serving

# Directory where SavedModel will be written.
saved_model_directory = './my_saved_model_ipu/007'

if os.path.exists(saved_model_directory):
  sys.exit(f"Directory '{saved_model_directory}' exists! Please delete it "
           "before running the example.")


# The preprocessing stage is performed fully on the IPU.
def preprocessing_stage(x):
  return tf.abs(x)


# The pipeline's stages to export.
def stage1(x):
  # Double the input - replace this with 1st stage body.
  output = x * 2
  return output


def stage2(x):
  # Add 3 to the input - replace this with 2nd stage body.
  output = x + 3
  return output


# The postprocessing stage is performed fully on the IPU.
def postprocessing_stage(x):
  return tf.reduce_sum(x)


# Configure the IPU for compilation.
cfg = config.IPUConfig()
cfg.auto_select_ipus = 4
cfg.device_connection.enable_remote_buffers = True
cfg.device_connection.type = config.DeviceConnectionType.ON_DEMAND
cfg.configure_ipu_system()

input_shape = (4,)
# Prepare the input signature.
predict_step_signature = (tf.TensorSpec(shape=input_shape, dtype=np.float32),)

# Export as a SavedModel.
iters = 8
predict_step = [preprocessing_stage, stage1, stage2, postprocessing_stage]
runtime_func = serving.export_pipeline(
    predict_step,
    saved_model_directory,
    iterations=iters,
    device_mapping=[0, 1, 2, 3],
    predict_step_signature=predict_step_signature)
print(f"SavedModel written to {saved_model_directory}")

# You can test the exported executable using the returned `runtime_func`,
input_placeholder = tf.placeholder(dtype=tf.float32, shape=input_shape)
result_op = runtime_func(input_placeholder)

with tf.Session() as sess:
  for i in range(iters):
    input_data = np.ones(input_shape, dtype=np.float32) * (-1.0 * i)
    print(sess.run(result_op, {input_placeholder: input_data}))

This example exports a simple pipelined IPU program that computes the function 2x+3 on the input tensor. The model also performs a preprocessing step (on the CPU) to convert string tensors to floats and a postprocessing step (on the CPU) to compute the absolute value of the outputs.

import os
import sys

import numpy as np
import tensorflow as tf

from tensorflow.python.ipu import config
from tensorflow.python.ipu import serving

# Directory where SavedModel will be written.
saved_model_directory = './my_saved_model_ipu/006'

if os.path.exists(saved_model_directory):
  sys.exit(f"Directory '{saved_model_directory}' exists! Please delete it "
           "before running the example.")


# The preprocessing stage is performed fully on the CPU.
def preprocessing_step(x):
  # The preprocessing stage is performed on the cpu.
  return tf.abs(x)


# The pipeline stages to export.
def stage1(x):
  # Double the input - replace this with 1st stage body.
  output = x * 2
  return output


def stage2(x):
  # Add 3 to the input - replace this with 2nd stage body.
  output = x + 3
  return output


# The postprocessing step is performed fully on the CPU.
def postprocessing_step(x):
  return tf.abs(x)


# Configure the IPU for compilation.
cfg = config.IPUConfig()
cfg.auto_select_ipus = 2
cfg.device_connection.enable_remote_buffers = True
cfg.device_connection.type = config.DeviceConnectionType.ON_DEMAND
cfg.configure_ipu_system()

input_shape = (4,)
# Prepare the input signature.
predict_step_signature = (tf.TensorSpec(shape=input_shape, dtype=np.float32),)
# Prepare the `preprocessing_step` function signature.
preprocessing_step_signature = (tf.TensorSpec(shape=input_shape,
                                              dtype=tf.float32),)
# Prepare the `postprocessing_step` function signature.
postprocessing_step_signature = (tf.TensorSpec(shape=input_shape,
                                               dtype=np.float32),)

# Export as a SavedModel.
iters = 10
predict_step = [stage1, stage2]
runtime_func = serving.export_pipeline(
    predict_step,
    saved_model_directory,
    iterations=iters,
    device_mapping=[0, 1],
    predict_step_signature=predict_step_signature,
    preprocessing_step=preprocessing_step,
    preprocessing_step_signature=preprocessing_step_signature,
    postprocessing_step=postprocessing_step,
    postprocessing_step_signature=postprocessing_step_signature)
print(f"SavedModel written to {saved_model_directory}")

# You can test the exported executable using returned `runtime_func`,
input_placeholder = tf.placeholder(dtype=tf.float32, shape=input_shape)
result_op = runtime_func(input_placeholder)

with tf.Session() as sess:
  for i in range(iters):
    input_data = np.ones(input_shape, dtype=np.float32) * (-1.0 * i)
    print(sess.run(result_op, {input_placeholder: input_data}))

17.3. Running the model in TensorFlow Serving

To test the exported SavedModel you can just start a TensorFlow Serving instance and point it to the model’s location. Graphcore’s distribution of TensorFlow Serving can be run directly in the host system:

$ tensorflow_model_server --rest_api_port=8501 --model_name=my_model \
      --model_base_path="$(pwd)/my_saved_model_ipu"

And then you can start sending inference requests, for example:

$ curl -d '{"instances": [1.0, 2.0, 5.0, 7.0]}'   \
    -X POST http://localhost:8501/v1/models/my_model:predict

Graphcore does not distribute the TensorFlow Serving API package. If you want to use it, you need to install it from the official distribution using pip.

Search help