3. Exporting and deploying a model

Note

Ensure you have set up your environment as described in Section 2, Environment setup.

The Graphcore distribution of TensorFlow allows you to export a precompiled model to the standard SavedModel format. Such a model can be later deployed for inference using the Graphcore distribution of TensorFlow Serving. The documentation of the Graphcore distribution of TensorFlow 2 shows how to export both pipelined and non-pipelined, Keras and non-Keras models for serving. We recommend you to read that documentation to get familiar with the details of the exporting procedure.

This section shows all the steps required to export and deploy a simple model for the recognition of handwritten digits using the MNIST dataset.

3.1. Creating and training a model

First, we create a very simple Keras model consisting of a Flatten and two Dense layers. The second Dense layer uses the softmax activation function, so the model returns a vector of per-class probabilities.

Note

For Keras models, the names of the input layers are used to create the names of inputs in the SavedModel. Note, that TensorFlow adds “_input” postfix to the names when creating the signature of the SavedModel. You can set the names of input layers manually by passing the name parameter to the layers constructors, or not set the names and rely on the names automatically generated by TensorFlow.

For non-Keras models implemented as a function, the names of function arguments are used as names of inputs in the SavedModel.

Remember that you can always check input names using the SavedModel Command Line Interface; this is described in more detail in Section 3.3, Structure of the SavedModel directory.

The code below shows a function that creates a model:

from tensorflow import keras
...
INPUT_NAME = "image_data"
...
def create_model():
    model = keras.Sequential([
        keras.layers.Flatten(name=INPUT_NAME),
        keras.layers.Dense(128, activation='relu'),
        keras.layers.Dense(10, activation='softmax')])
    return model

In this example we first train the model on the IPU using the MNIST dataset. The code below shows a function that creates and preprocesses the training data:

import tensorflow as tf
...
TRAIN_BATCH_SIZE = 32
...
def create_dataset():
    mnist = keras.datasets.mnist
    (x_train, y_train), _ = mnist.load_data()
    x_train = x_train / 255.0

    train_ds = tf.data.Dataset.from_tensor_slices(
        (x_train, y_train)).shuffle(10000).batch(
        TRAIN_BATCH_SIZE, drop_remainder=True)
    train_ds = train_ds.map(
        lambda d, l: (tf.cast(d, tf.float32), tf.cast(l, tf.float32)))
    train_ds = train_ds.repeat()
    return train_ds

To use the IPU, we must create an IPU configuration. In this example we use a single IPU, and use auto_select_ipus to automatically select a single IPU.

from tensorflow.python import ipu
...
cfg = ipu.config.IPUConfig()
cfg.auto_select_ipus = 1
cfg.configure_ipu_system()

Then, you should create an IPUStrategy and run the code in its scope. Creating a model inside an IPU strategy scope ensures that all the variables are placed on the IPU, but their initialization is performed on the CPU device. Creating model in such scope ensures also that the Keras model will contain the IPU-specific methods.

Note

The steps_per_execution argument in the model’s compile() method sets the number of iterations of the underlying Poplar loop executed in the single run of Poplar program. It can be used to tweak the model performance, the optimal values are use-case specific.

In this example we set steps_per_execution for training to one hundred.

strategy = ipu.ipu_strategy.IPUStrategy()
with strategy.scope():
    model = create_model()
    train_ds = create_dataset()

    model.compile(loss=keras.losses.SparseCategoricalCrossentropy(),
                  optimizer=keras.optimizers.SGD(),
                  steps_per_execution=100,
                  metrics=["accuracy"])

    model.fit(train_ds, steps_per_epoch=2000, epochs=4)

3.2. Exporting model for TensorFlow Serving

There are multiple ways to export a model from TensorFlow to the SavedModel format so that it can be used in TensorFlow Serving. We recommend reading the Exporting precompiled models for TensorFlow Serving section of the documentation for the Graphcore distribution of TensorFlow.

Below we show simple ways to export both a Keras model and a model defined as a TensorFlow function.

3.2.1. Exporting Keras model

After the model is trained, it can be exported for TensorFlow Serving. The export steps consist of compiling a model into an executable and storing it in the PopEF: User Guide file, creating a small TensorFlow graph that uses the IPU embedded application runtime and saving it in the standard SavedModel format.

PopEF is used by the Poplar SDK as a universal file format mostly used for exporting and importing models.

There are several ways to export models depending on whether they are Keras models, or non-Keras models defined as a function or a list of functions (in the case of pipelined models).

Note that you can call the model’s compile() function again to set a different steps_per_execution value for inference.

Note

The model is compiled and exported for a specific batch size value, so this value cannot be changed after the model has been exported.

For Keras models, you can set the model’s batch size used in the SavedModel by passing it directly to the export function (argument batch_size of either the tensorflow.python.ipu.serving.export_keras() method or the model’s export_for_ipu_serving method). That way you can change the batch size value after the training. You can also not specify batch size in the export method and the batch size used for training (or model creation if there was no training) will be used.

For non-Keras models, the export method does not accept the batch-size parameter. It does accept either the input_signature or input_dataset, so you have to either set the batch size value in the shapes of tf.TensorSpec objects or set the batch size of the dataset by calling its batch() method.

All the export methods take a path to the directory where the SavedModel should be written (argument export_dir of any of the export functions, see Serving utilities in the TensorFlow API).

Note

The directory where the SavedModel is written has to either be empty or not exist at all. If the directory is not empty, a ValueError is raised.

The directory structure used for the saved model is described in Section 3.3, Structure of the SavedModel directory.

Note

TensorFlow Serving expects that the model directory structure is <model_name>/<model_version>, where <model_name> is the name of a model being served and <model_version> is a number indicating the version of the model. TensorFlow Serving always loads the latest model version—the one with the highest version number inside the <model_name> directory. See Section 3.3, Structure of the SavedModel directory for details of the directory structure.

The code below makes sure the directory is empty, changes steps_per_execution used in the SavedModel and exports the model changing its batch size value to 1.

import os
import shutil
...
EXPORT_BATCH_SIZE = 1
OUTPUT_NAME = "probabilities"
KERAS_SAVED_MODEL_PATH = "my_keras_model/1"
if os.path.exists(SAVED_MODEL_PATH):
    shutil.rmtree(SAVED_MODEL_PATH)
...
with strategy.scope():
    model.compile(steps_per_execution=50)
    runtime_func = model.export_for_ipu_serving(KERAS_SAVED_MODEL_PATH,
                                                batch_size=EXPORT_BATCH_SIZE,
                                                output_names=[OUTPUT_NAME])

All the export methods return a function that can be used to locally test the created PopEF file. The function is the same predict function that was exported into the SavedModel.

The code below uses the returned function to make a prediction using a sample image of the digit “7”. Recall that the model returns probabilities, so you have to add the argmax operation to get the predicted class.

import numpy as np
from PIL import Image
...
IMAGE_EXAMPLE_CLASS = 7
IMAGE_EXAMPLE_PATH = "handwritten_7.png"
OUTPUT_NAME = "probabilities"
...
with strategy.scope():
    image = Image.open(IMAGE_EXAMPLE_PATH)
    image = np.expand_dims(np.array(image), axis=0)

    result = strategy.run(runtime_func,
                          args=(tf.constant(image, dtype=tf.float32),))
    probabilities = result[OUTPUT_NAME]
    predicted_category = np.argmax(probabilities, axis=1)[0]
    print(f"Predicted category: {predicted_category}, "
          f"actual: {IMAGE_EXAMPLE_CLASS}")

3.2.2. Exporting model implemented as a TensorFlow function

You can also export non-Keras models defined as a TensorFlow function (or a list of functions in the case of pipelined models; here we focus on non-pipelined models). The code below shows an example TensorFlow function that extends a previously created Keras model with TensorFlow’s argmax operation.

Note

You have to specify an input signature when exporting non-Keras models. It can be either specified as a parameter of the tf.function decorator of the function to export (in the case of pipelined models, only the first function needs to have the input signature specified) or pass it as an argument to the export functions.

@tf.function(input_signature=(tf.TensorSpec(shape=(EXPORT_BATCH_SIZE,28,28), dtype=tf.float32),))
def predict(predict_input):
    probabilities = model(predict_input)
    prediction = tf.math.argmax(probabilities, axis=1, output_type=tf.int32)
    return prediction

To export such a model you need to use the tensorflow.python.ipu.serving.export_single_step() function. It returns a similar runtime function that can be used to test the model locally.

Note that the export function takes an iterations argument; its meaning is the same as steps_per_execution in the case of Keras models.

with strategy.scope():
    SAVED_MODEL_PATH = "my_model/1"
    ...
    runtime_func = ipu.serving.export_single_step(predict, SAVED_MODEL_PATH, iterations=10)
    result = strategy.run(runtime_func,
                          args=(tf.constant(image, dtype=tf.float32),))[0][0]
    print(f"Predicted category: {result}, "
        f"actual: {IMAGE_EXAMPLE_CLASS}")

3.3. Structure of the SavedModel directory

Models are exported into a standard SavedModel format. The created PopEF file is stored inside the assets subdirectory. The structure of the SavedModel directory should look as follows:

my_keras_model/
└── 1/ (model version)
    ├── assets/
    │   └── application_{random UUID}.popef (compiled Poplar model)
    ├── saved_model.pb
    └── variables/
        ├── variables.index
        └── variables.data-x-of-y

The Poplar executable is stored inside the PopEF file.

You can analyze the executable using the popef_dump application distributed in the Poplar SDK. By running the command:

$ popef_dump my_keras_model/1/assets/application_{random UUID}.popef

you can check the executable’s metadata such as:

  • number of IPUs the model was compiled for,

  • model’s replication factor,

  • names of model’s input and outputs (those are visible to Poplar and not TensorFlow Serving),

  • datatype used by model’s input and outputs,

  • shapes used by model’s input and outputs.

Moreover, you can use the saved_model_cli tool to analyze the SavedModel:

$ saved_model_cli show --all --dir my_keras_model/1

In this way, you can check the signature name of exported SavedModel, datatypes, shapes and names visible to TensorFlow Serving and the signature of the exported functions with their names and arguments.

3.4. Launching TensorFlow Serving for IPU

TensorFlow Serving for IPU is released as part of the Poplar SDK and can be found in the root of the SDK directory. It has a name of the form:

tensorflow_model_server-r2-<version>

where <version> includes SDK version and build information.

To launch TensorFlow Serving for IPU with a model, for both gRPC (on port 8500) and REST requests (on port 8501), you need to run:

$ ${POPLAR_SDK_ENABLED?}/../tensorflow_model_server-r2-<version> --port=8500 --model_name=[name-of-model] --model_base_path=[absolute-path-to-model]

where [model-name] is the name of model and [absolute-path-to-model] is the absolute path to the model. POPLAR_SDK_ENABLED is the location of the Poplar SDK defined when the SDK was enabled. The ? ensures that an error message is displayed if Poplar has not been enabled.

You can run:

$ tensorflow_model_server-r2-<version> --help

to see more command line options.

Note

You can enable both the gRPC API, by setting the --port option, and the REST API, by setting the --rest_api_port option. Note that if they are both enabled, they should work on different ports.

3.5. Sending requests to TensorFlow Serving

To send requests to TensorFlow Serving, we use the TensorFlow Serving API. See Section 2.4, Install TensorFlow Serving API for installation instructions.

You can send requests to TensorFlow Serving in two ways—using either the REST or gRPC API.

The REST API is the most popular way of designing APIs. Requests usually are serialized using the text-based, human-readable, JSON format. On the other hand, gRPC requests use the highly-efficient and compact Protobuf (protocol buffer) format for message serialization. We highly recommend using gRPC for communication with TensorFlow Serving as it helps to minimize the latency and in general improve the overall performance.

3.5.1. Using the gRPC API

The code below sends a request to TensorFlow Serving using the gRPC API. The request contains an image of a handwritten digit “7”, the same image that was used to test the function returned by the export method. The TensorFlow Serving API is used to create and send a request.

First, you need to create a gRPC channel with the server name and port, in this example we use localhost and port 8500 that was previously set when running the TensorFlow Serving instance. You also need to create a prediction service stub.

import grpc
from tensorflow_serving.apis import prediction_service_pb2_grpc
...
PORT = 8500
SERVER = f"localhost:{PORT}"
...
channel = grpc.insecure_channel(SERVER)
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

Then, a request has to be created, so we load the same image file that was used before. You also need to specify:

  • the model name that was passed with the --model_name option when running the TensorFlow Serving instance

  • the signature name that can be read using the saved_model_cli tool, the default name is serving_default

  • the names of all inputs that were either explicitly set or automatically generated, they can also be checked with the saved_model_cli tool

import numpy as np
from PIL import Image
import tensorflow as tf
from tensorflow_serving.apis import predict_pb2
...
IMAGE_EXAMPLE_CLASS = 7
IMAGE_EXAMPLE_PATH = "handwritten_7.png"
KERAS_MODEL_NAME = "my_keras_model"
INPUT_NAME = "image_data_input"
MODEL_SIGNATURE_NAME = "serving_default"
...
image = Image.open(IMAGE_EXAMPLE_PATH)
image = np.expand_dims(np.array(image), axis=0)

request = predict_pb2.PredictRequest()
request.model_spec.name = KERAS_MODEL_NAME
request.model_spec.signature_name = MODEL_SIGNATURE_NAME
request.inputs[INPUT_NAME].CopyFrom(
    tf.make_tensor_proto(image, shape=image.shape, dtype=tf.float32))

Finally, you can send a request using the previously created stub and parse the result:

OUTPUT_NAME = "probabilities"
...
result = stub.Predict(request, 10.0)
probs = tf.make_ndarray(result.outputs[OUTPUT_NAME])
predicted_category = np.argmax(probs, axis=1)[0]
print(f"Predicted category: {predicted_category}, "
      f"actual: {IMAGE_EXAMPLE_CLASS}")

3.5.2. Using the REST API

The code below sends a similar request to the one for the gRPC API, but uses the REST API.

First, you need to serialize an image to a JSON formatted string:

import json
from PIL import Image
...
IMAGE_EXAMPLE_PATH = "handwritten_7.png"
MODEL_SIGNATURE_NAME = "serving_default"
...
image = Image.open(IMAGE_EXAMPLE_PATH)
image = np.expand_dims(np.array(image), axis=0)
request_data = image.tolist()
predict_request = json.dumps(
    {"signature_name": MODEL_SIGNATURE_NAME, "instances": request_data})

Then you can use the requests package to send a POST request to TensorFlow Serving. In this example the serving server is hosted on localhost and the REST API was enabled on port 8501.

import numpy as np
import requests
...
IMAGE_EXAMPLE_CLASS = 7
KERAS_MODEL_NAME = "my_keras_model"
PORT = 8501
SERVER = f"http://localhost:{PORT}/v1/models/{KERAS_MODEL_NAME}:predict"
...
response = requests.post(SERVER, data=predict_request)
response.raise_for_status()
probs = response.json()['predictions']
predicted_category = np.argmax(probs, axis=1)[0]
print(f"Predicted category: {predicted_category}, "
      f"actual: {IMAGE_EXAMPLE_CLASS}")