3. Exporting and deploying a model

The Graphcore distribution of TensorFlow allows you to export a precompiled model to the standard SavedModel format. Such a model can be later deployed for inference using the Graphcore distribution of TensorFlow Serving. The documentation of the Graphcore distribution of TensorFlow 1 shows how to export both pipelined and non-pipelined models for serving. We recommend you read that documentation to get familiar with the details of the exporting procedure.

This section shows all the steps required to export and deploy a simple model for the recognition of handwritten digits using the MNIST dataset.

3.1. Creating and training a model

First, we create a very simple Keras model consisting of a Flatten and two Dense layers. The second Dense layer uses the softmax activation function, so the model returns a vector of per-class probabilities.

The code below shows a function that creates a model:

import tensorflow.compat.v1 as tf
...
tf.disable_eager_execution()
tf.disable_v2_behavior()
...
def create_model():
    model = tf.keras.Sequential([
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(128, activation='relu'),
        tf.keras.layers.Dense(10, activation='softmax')])
    return model

To use the IPU, we must create an IPU configuration. In this example we use a single IPU, and use auto_select_ipus to automatically select a single IPU.

from tensorflow.python import ipu
...
cfg = ipu.config.IPUConfig()
cfg.auto_select_ipus = 1
cfg.configure_ipu_system()

In this example we first train the model on the IPU using the MNIST dataset. The code below creates and preprocesses the dataset. Note that the dataset is created inside a training graph.

Note

You have to use separate graphs for training and exporting models so there are no conflicts in variable names during variable initialization.

import tensorflow as tf
...
TRAIN_BATCH_SIZE = 32
...
train_graph = tf.Graph()
with train_graph.as_default():
    mnist = tf.keras.datasets.mnist
    (x_train, y_train), _ = mnist.load_data()
    x_train = x_train / 255.0

    train_ds = tf.data.Dataset.from_tensor_slices(
        (x_train, y_train)).shuffle(10000).batch(
        TRAIN_BATCH_SIZE, drop_remainder=True)
    train_ds = train_ds.map(
        lambda d, l: (tf.cast(d, tf.float32), tf.cast(l, tf.int32)))
    train_ds = train_ds.repeat()

    train_ds_iterator = tf.data.make_initializable_iterator(train_ds)
    (x, y) = train_ds_iterator.get_next()

Then, you have to construct a model and create a function that represents a single iteration of training. After that the function has to be compiled for the IPU inside an IPU scope:

from tensorflow.python import ipu
...
with train_graph.as_default():
    model = create_model()

    def training_loop_body(x, y):
        logits = model(x, training=True)
        loss = tf.losses.sparse_softmax_cross_entropy(
            labels=y, logits=logits)
        train_op = tf.train.AdamOptimizer(
            learning_rate=0.01).minimize(
            loss=loss)

        return([loss, train_op])

    with ipu.scopes.ipu_scope('/device:IPU:0'):
        training_loop_body_on_ipu = ipu.ipu_compiler.compile(
            computation=training_loop_body, inputs=[x, y])

The code below trains the model for four epochs and then saves its variables:

TRAIN_EPOCHS = 4
CHECKPOINT_PATH = f"model_checkpoint/model.ckpt"
...
with train_graph.as_default():
    saver = tf.train.Saver()
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        sess.run(train_ds_iterator.initializer)

        batches_per_epoch = len(x_train) // TRAIN_BATCH_SIZE
        for epoch in range(TRAIN_EPOCHS):
            loss_running_total = 0.0
            for batch in range(batches_per_epoch):
                loss = sess.run(training_loop_body_on_ipu)
                loss_running_total += loss[0]

            print(f"Loss: {loss_running_total / batches_per_epoch}")

        save_path = saver.save(sess, CHECKPOINT_PATH)

3.2. Exporting model for TensorFlow Serving

There are two ways to export a model from TensorFlow to the SavedModel format so that it can be used in TensorFlow Serving. We recommend reading the Exporting precompiled models for TensorFlow Serving of the documentation for the Graphcore distribution of TensorFlow.

Below, we show a simple way to export a non-pipelined model defined as a predict function.

After the model is trained, it can be exported for TensorFlow Serving. The export steps consist of compiling a model into an executable and storing it in the PopEF: User Guide file, creating a small TensorFlow graph that uses the IPU embedded application runtime and saving it in the standard SavedModel format.

PopEF is used by the Poplar SDK as a universal file format mostly used for exporting and importing models.

There are two ways to export models depending on whether they are pipelined or not.

Note

The model is compiled and exported for a specific batch size value, so this value cannot be changed after the model has been exported.

Both export functions take either the input_signature or input_dataset parameters, so you have to either set the batch size value in the shapes of tf.TensorSpec objects or set the batch size of the dataset by calling its batch() method.

All the export methods take a path to the directory where the SavedModel should be written (argument export_dir of any of the export functions, see Serving utilities in the TensorFlow API).

Note

The directory where the SavedModel is written has to either be empty or not exist at all. If the directory is not empty, a ValueError is raised.

The directory structure used for the saved model is described in Section 3.3, Structure of the SavedModel directory.

Note

TensorFlow Serving expects that the model directory structure is <model_name>/<model_version>, where <model_name> is the name of a model being served and <model_version> is a number indicating the version of the model. TensorFlow Serving always loads the latest model version—the one with the highest version number inside the <model_name> directory. See Section 3.3, Structure of the SavedModel directory for details of the directory structure.

In order to export a model with correct variable values you have to provide a variable initializer to the export functions. The variable initializer is a function that takes a tf.Session instance as the only argument and performs an initialization of all the graph’s variables. The code below creates a var_initializer function that restores the variables from the previously created checkpoint.

def var_initializer(session):
    saver = tf.train.Saver()
    ipu.utils.move_variable_initialization_to_cpu()
    init = tf.global_variables_initializer()
    session.run(init)
    saver.restore(session, CHECKPOINT_PATH)

To export a non-pipelined model you need to use the tensorflow.python.ipu.serving.export_single_step() function. It takes a predict function as an argument. Note that the export function also takes an iterations argument that represents the number of iterations of the underlying Poplar loop executed in the single run of the Poplar program. It can be used to tweak the model performance; the optimal values are use-case specific.

Note

The names of the predict function’s arguments are used as the names of the inputs in the exported SavedModel.

Remember that you can always check input names using the SavedModel Command Line Interface. This is described in more detail in Section 3.3, Structure of the SavedModel directory.

The code below makes sure the export directory is empty, creates a new graph with a predict function that wraps the previously created Keras model and then exports the model for serving.

import os
import shutil
...
EXPORT_BATCH_SIZE = 1
MODEL_VERSION = 1
MODEL_NAME = "my_model"
SAVED_MODEL_PATH = f"{MODEL_NAME}/{MODEL_VERSION}"
OUTPUT_NAME = "probabilities"
...
if os.path.exists(SAVED_MODEL_PATH):
    shutil.rmtree(SAVED_MODEL_PATH)

export_graph = tf.Graph()
with export_graph.as_default():
    model = create_model()

    def predict(image_data):
        return model(image_data)

    input_signature = (
        tf.TensorSpec(
            shape=(EXPORT_BATCH_SIZE,28,28),
            dtype=tf.float32),
    )
    runtime_func = ipu.serving.export_single_step(
        predict,
        SAVED_MODEL_PATH,
        iterations=10,
        output_names=[OUTPUT_NAME],
        variable_initializer=var_initializer,
        input_signature=input_signature)

All the export methods return a function that can be used to locally test the created PopEF file. The function is the same predict function that was exported into the SavedModel.

The code below uses the returned function to make a prediction using a sample image of the digit “7”. Recall that the model returns probabilities, so you have to add the argmax operation to get the predicted class.

import numpy as np
from PIL import Image
...
IMAGE_EXAMPLE_CLASS = 7
IMAGE_EXAMPLE_PATH = "handwritten_7.png"
...
image = Image.open(IMAGE_EXAMPLE_PATH)
image = np.expand_dims(np.array(image, dtype=np.float32), axis=0)

with tf.Session() as sess:
    input_placeholder = tf.placeholder(dtype=np.float32, shape=(1,28,28))
    result_op = runtime_func(input_placeholder)
    result = sess.run(result_op, feed_dict={input_placeholder: image})
    probabilities = result[0]
    predicted_category = np.argmax(probabilities, axis=1)[0]
    print(f"Predicted category: {predicted_category}, "
          f"actual: {IMAGE_EXAMPLE_CLASS}")

3.3. Structure of the SavedModel directory

Models are exported into a standard SavedModel format. The created PopEF file is stored inside the assets subdirectory. The structure of the SavedModel directory should look as follows:

my_model/
└── 1/ (model version)
    ├── assets/
    │   └── application_{random UUID}.popef (compiled Poplar model)
    ├── saved_model.pb
    └── variables/
        ├── variables.index
        └── variables.data-x-of-y

The Poplar executable is stored inside the PopEF file.

You can analyze the executable using the popef_dump application distributed in the Poplar SDK. By running the command:

$ popef_dump my_model/1/assets/application_{random UUID}.popef

you can check the executable’s metadata such as:

  • number of IPUs the model was compiled for,

  • model’s replication factor,

  • names of model’s input and outputs (those are visible to Poplar and not TensorFlow Serving),

  • datatype used by model’s input and outputs,

  • shapes used by model’s input and outputs.

Moreover, you can use the saved_model_cli tool to analyze the SavedModel:

$ saved_model_cli show --all --dir my_model/1

In this way, you can check the signature name of exported SavedModel, datatypes, shapes and names visible to TensorFlow Serving and the signature of the exported functions with their names and arguments.

3.4. Launching TensorFlow Serving for IPU

TensorFlow Serving for IPU is released as part of the Poplar SDK and can be found in the root of the SDK directory. It has a name of the form:

tensorflow_model_server-r1-<version>

where <version> includes SDK version and build information.

To launch TensorFlow Serving for IPU with a model, for both gRPC (on port 8500) and REST requests (on port 8501), you need to run:

$ ${POPLAR_SDK_ENABLED?}/../tensorflow_model_server-r1-<version> --port=8500 --model_name=[name-of-model] --model_base_path=[absolute-path-to-model]

where [model-name] is the name of model and [absolute-path-to-model] is the absolute path to the model. POPLAR_SDK_ENABLED is the location of the Poplar SDK defined when the SDK was enabled. The ? ensures that an error message is displayed if Poplar has not been enabled.

You can run:

$ tensorflow_model_server-r1-<version> --help

to see more command line options.

Note

You can enable both the gRPC API, by setting the --port option, and the REST API, by setting the --rest_api_port option. Note that if they are both enabled, they should work on different ports.

3.5. Sending requests to TensorFlow Serving

To send requests to TensorFlow Serving, we use the TensorFlow Serving API. See Section 2.4, Install TensorFlow Serving API for installation instructions.

You can send requests to TensorFlow Serving in two ways—using either the REST or gRPC API.

The REST API is the most popular way of designing APIs. Requests usually are serialized using the text-based, human-readable, JSON format. On the other hand, gRPC requests use the highly-efficient and compact Protobuf (protocol buffer) format for message serialization. We highly recommend using gRPC for communication with TensorFlow Serving as it helps to minimize the latency and in general improve the overall performance.

3.5.1. Using gRPC API

The code below sends a request to TensorFlow Serving using the gRPC API. The request contains an image of a handwritten digit “7”, the same image that was used to test the function returned by the export method. The TensorFlow Serving API is used to create and send a request.

First, you need to create a gRPC channel with the server name and port, in this example we use localhost and port 8500 that was previously set when running the TensorFlow Serving instance. You also need to create a prediction service stub.

import grpc
from tensorflow_serving.apis import prediction_service_pb2_grpc
...
PORT = 8500
SERVER = f"localhost:{PORT}"
...
channel = grpc.insecure_channel(SERVER)
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

Then, a request has to be created, so we load the same image file that was used before. You also need to specify:

  • the model name that was passed with the --model_name option when running the TensorFlow Serving instance

  • the signature name that can be read using the saved_model_cli tool, the default name is serving_default

  • the names of all inputs that were either explicitly set or automatically generated, they can also be checked with the saved_model_cli tool

import numpy as np
from PIL import Image
import tensorflow as tf
from tensorflow_serving.apis import predict_pb2
...
IMAGE_EXAMPLE_CLASS = 7
IMAGE_EXAMPLE_PATH = "handwritten_7.png"
MODEL_NAME = "my_model"
INPUT_NAME = "image_data"
MODEL_SIGNATURE_NAME = "serving_default"
...
image = Image.open(IMAGE_EXAMPLE_PATH)
image = np.expand_dims(np.array(image), axis=0)

request = predict_pb2.PredictRequest()
request.model_spec.name = MODEL_NAME
request.model_spec.signature_name = MODEL_SIGNATURE_NAME
request.inputs[INPUT_NAME].CopyFrom(
    tf.make_tensor_proto(image, shape=image.shape, dtype=tf.float32))

Finally, you can send a request using the previously created stub and parse the result:

OUTPUT_NAME = "probabilities"
...
result = stub.Predict(request, 10.0)
probs = tf.make_ndarray(result.outputs[OUTPUT_NAME])
predicted_category = np.argmax(probs, axis=1)[0]
print(f"Predicted category: {predicted_category}, "
      f"actual: {IMAGE_EXAMPLE_CLASS}")

3.5.2. Using REST API

The code below sends a similar request to the one for the gRPC API, but uses the REST API.

First, you need to serialize an image to a JSON formatted string:

import json
from PIL import Image
...
IMAGE_EXAMPLE_PATH = "handwritten_7.png"
MODEL_SIGNATURE_NAME = "serving_default"
...
image = Image.open(IMAGE_EXAMPLE_PATH)
image = np.expand_dims(np.array(image), axis=0)
request_data = image.tolist()
predict_request = json.dumps(
    {"signature_name": MODEL_SIGNATURE_NAME, "instances": request_data})

Then you can use the requests package to send a POST request to TensorFlow Serving. In this example the serving server is hosted on localhost and the REST API was enabled on port 8501.

import numpy as np
import requests
...
IMAGE_EXAMPLE_CLASS = 7
MODEL_NAME = "my_model"
PORT = 8501
SERVER = f"http://localhost:{PORT}/v1/models/{MODEL_NAME}:predict"
...
response = requests.post(SERVER, data=predict_request)
response.raise_for_status()
probs = response.json()['predictions']
predicted_category = np.argmax(probs, axis=1)[0]
print(f"Predicted category: {predicted_category}, "
      f"actual: {IMAGE_EXAMPLE_CLASS}")