3. Quick start

This chapter will use an ONNX model as the example to explain how to perform model conversion and rapid deployment with PopRT.

Note

PopRT is required to run the examples in this chapter. If PopRT is not installed, refer to the Installation chapter.
PopRT supports the conversion and running of ONNX models, while exporting to PopEF files for model deployment. PopEF is a unified file format provided by Poplar SDK for importing and exporting models. The PopEF file contains the binary code compiled by the Poplar compiler that can run on IPU, the input and output of the model as well as the hardware information required for the running of models. For more information, please refer to to the PopEF: User Guide .
The examples in this chapter use the PopRT command-line interface.

3.1. CLI parameters

PopRT is easy to use from the command line. The following are some of the CLI parameters:

--input_model (Required) Path to original model (for example, ONNX) and model name.
--show (Optional) Print only the input and output information of the model.
--input_shape (Optional) Specify the model input shape.
--output_model (Optional) The name of the converted model. The default storage path is the current directory. If this parameter is not specified, the converted model will be saved with the default name.
--precision (Optional) Specify the accuracy of the converted model. If this parameter is not specified, the original model accuracy will be used by default.
--run (Optional) Run the converted model using random input.
--export_popef (Optional) Export the PopEF model generated by compilation. The file is saved with a default name of executable.popef.

For full details of all CLI parameters, refer to the Command line interface chapter.

Note

For --input_shape, only the variable dimensions in the specified input shape are supported, and the size of known dimensions cannot be changed.
The IPU only supports static graphs. If the input shape of the model is variable, --input_shape must be used to specify the input shape.
For more configurations, please refer to the Command line interface chapter.

3.2. Convert and run model

This section will use the BERT Squad model in the ONNX model zoo as the example to explain how to perform model conversion.

3.2.1. Download ONNX model

Download the ONNX model with:

wget https://github.com/onnx/models/raw/main/text/machine_comprehension/bert-squad/model/bertsquad-12.onnx

3.2.2. Obtain input and output information for ONNX model

Obtain the input and output information for the ONNX model with:

poprt \
    --input_model bertsquad-12.onnx \
    --show

# The Input/Output Information of the Model
2023-01-06 02:59:32,897 INFO cli.py:327] Input unique_ids_raw_output___9:0: dtype - int64, shape - [0]
2023-01-06 02:59:32,897 INFO cli.py:327] Input segment_ids:0: dtype - int64, shape - [0, 256]
2023-01-06 02:59:32,897 INFO cli.py:327] Input input_mask:0: dtype - int64, shape - [0, 256]
2023-01-06 02:59:32,898 INFO cli.py:327] Input input_ids:0: dtype - int64, shape - [0, 256]
2023-01-06 02:59:32,898 INFO cli.py:334] Output unstack:1: dtype - float32, shape - [0, 256]
2023-01-06 02:59:32,898 INFO cli.py:334] Output unstack:0: dtype - float32, shape - [0, 256]
2023-01-06 02:59:32,898 INFO cli.py:334] Output unique_ids:0: dtype - int64, shape - [0]

3.2.3. Specify input shape

According to the input and output information of the original model, the input shape is variable. This means that --input_shape must be used to specify the input shape. In this example, batch size = 2 is used.

Note

The original model input is int64. Since the IPU does not support int64, the converted model input will be int32.

poprt \
    --input_model bertsquad-12.onnx \
    --input_shape unique_ids_raw_output___9:0=2 segment_ids:0=2,256 input_mask:0=2,256 input_ids:0=2,256 \
    --output_model bertsquad-12_fp32_bs2.onnx

3.2.4. Specify model accuracy

According to the input and output information of the original model, the accuracy is fp32. In this example, we will use an accuracy of fp16.

poprt \
    --input_model bertsquad-12.onnx \
    --input_shape unique_ids_raw_output___9:0=2 segment_ids:0=2,256 input_mask:0=2,256 input_ids:0=2,256 \
    --output_model bertsquad-12_fp16_bs2.onnx \
    --precision fp16

3.2.5. Run model

Note

--run can be used to quickly verify whether the converted model can be compiled properly and run on an IPU.
The data shown in this example does not represent optimal performance and only demonstrates the default conversion process.

# Convert, compile and run the ONNX model of fp32
poprt \
    --input_model bertsquad-12.onnx \
    --input_shape unique_ids_raw_output___9:0=2 segment_ids:0=2,256 input_mask:0=2,256 input_ids:0=2,256 \
    --output_model bertsquad-12_fp32_bs2.onnx \
    --run

# Random number run results
2023-01-06 05:58:14,209 INFO cli.py:452] Bs: 2
2023-01-06 05:58:14,209 INFO cli.py:455] Latency: 4.58ms
2023-01-06 05:58:14,210 INFO cli.py:456] Tput: 436

# Convert, compile and run the ONNX model of fp16
poprt \
    --input_model bertsquad-12.onnx \
    --input_shape unique_ids_raw_output___9:0=2 segment_ids:0=2,256 input_mask:0=2,256 input_ids:0=2,256 \
    --output_model bertsquad-12_fp16_bs2.onnx \
    --precision fp16 \
    --run

# Random number run results
2023-01-06 06:00:59,283 INFO cli.py:452] Bs: 2
2023-01-06 06:00:59,283 INFO cli.py:455] Latency: 2.23ms
2023-01-06 06:00:59,283 INFO cli.py:456] Tput: 896

3.2.6. Export PopEF model

The exported PopEF model can be used for offline deployment.

Note

The --export_popef parameter uses the default name executable.popef to save the PopEF model to a file. Note that multiple exports will overwrite the executable.popef file.

# Convert, compile and export the PopEF model of fp32
poprt \
    --input_model bertsquad-12.onnx \
    --input_shape unique_ids_raw_output___9:0=2 segment_ids:0=2,256 input_mask:0=2,256 input_ids:0=2,256 \
    --export_popef

# Convert, compile and export the PopEF model of fp16
poprt \
    --input_model bertsquad-12.onnx \
    --input_shape unique_ids_raw_output___9:0=2 segment_ids:0=2,256 input_mask:0=2,256 input_ids:0=2,256 \
    --precision fp16 \
    --export_popef

3.3. Quick deployment

This section will take the ONNX model and the PopEF model converted in the previous section as an example to explain how to use the PopRT Python API for quick deployment.

3.3.1. Run exported PopEF model

To run the exported PopEF model:

from poprt import runtime
import numpy as np

# Create the ModelRunner instance, and load the PopEF file
runner = runtime.Runner('executable.popef')

# Obtain the model output information
outputs = runner.get_execute_outputs()

# Create the input and output data
input_dict = {
    "unique_ids_raw_output___9:0": np.ones([2]).astype(np.int32),
    "segment_ids:0": np.ones([2, 256]).astype(np.int32),
    "input_mask:0": np.ones([2, 256]).astype(np.int32),
    "input_ids:0": np.ones([2, 256]).astype(np.int32),
}
output_dict = {x.name: np.zeros(x.shape).astype(x.numpy_data_type()) for x in outputs}

# Run PopEF
runner.execute(input_dict, output_dict)
print(output_dict)

3.3.2. Run converted ONNX model

You can directly compile the converted ONNX model with the PopRT Compiler class. Then, you can use the ModelRunner class to run the PopEF instance generated by compilation.

import onnx
import numpy as np
from poprt import runtime
from poprt.compiler import Compiler

# Import the ONNX model
model = onnx.load("bertsquad-12_fp32_bs2.onnx")
model_bytes = model.SerializeToString()
output_names = [o.name for o in model.graph.output]

# Compile ONNX, and generate the PopEF instance
executable = Compiler.compile(model_bytes, output_names)

# Create the ModelRunner instance, and load the PopEF instance
runner = runtime.Runner(executable)

# Obtain the model output information
outputs = runner.get_execute_outputs()

# Create the input and output data
input_dict = {
    "unique_ids_raw_output___9:0": np.ones([2]).astype(np.int32),
    "segment_ids:0": np.ones([2, 256]).astype(np.int32),
    "input_mask:0": np.ones([2, 256]).astype(np.int32),
    "input_ids:0": np.ones([2, 256]).astype(np.int32),
}
output_dict = {x.name: np.zeros(x.shape).astype(x.numpy_data_type()) for x in outputs}

# Run PopEF
runner.execute(input_dict, output_dict)
print(output_dict)

3.4. Python API example

The following is an example of converting, compiling and running the ONNX model entirely with the Python API:

Listing 3.1 Example showing how to convert, compile and run an ONNX model using the PopRT Python API

# Copyright (c) 2022 Graphcore Ltd. All rights reserved.
import argparse
import time

from typing import Dict

import numpy as np
import onnx

from onnx import helper

from poprt import Pass, runtime
from poprt.compiler import Compiler, CompilerOptions
from poprt.converter import Converter

RuntimeInput = Dict[str, np.ndarray]


def convert(model_proto: onnx.ModelProto, args) -> onnx.ModelProto:
    """Convert ONNX model to a new optimized ONNX model."""
    converter = Converter(convert_version=11, precision='fp16')
    converted_model = converter.convert(model_proto)
    # Add other passes here
    converted_model = Pass.get_pass('int64_to_int32')(converted_model)
    converted_model = Pass.get_pass('gelu_pattern')(converted_model)

    return converted_model


def compile(model: onnx.ModelProto, args):
    """Compile ONNX to PopEF."""
    model_bytes = model.SerializeToString()
    outputs = [o.name for o in model.graph.output]

    options = CompilerOptions()
    options.num_ipus = 1
    options.ipu_version = runtime.DeviceManager().ipu_hardware_version()
    options.batches_per_step = args.batches_per_step
    options.partials_type = 'half'
    executable = Compiler.compile(model_bytes, outputs, options)

    return executable


def run_synchronous(
    model_runner: runtime.Runner,
    input: RuntimeInput,
    output: RuntimeInput,
    iterations: int,
) -> None:
    deltas = []
    sess_start = time.time()
    for _ in range(iterations):
        start = time.time()
        model_runner.execute(input, output)
        end = time.time()
        deltas.append(end - start)
    sess_end = time.time()

    latency = sum(deltas) / len(deltas) * 1000
    print(f'Latency : {latency:.3f}ms')
    avg_sess_time = (sess_end - sess_start) / iterations * 1000
    print(f'Synchorous avg Session Time : {avg_sess_time:.3f}ms')


def run_asynchronous(
    model_runner: runtime.Runner,
    input: RuntimeInput,
    output: RuntimeInput,
    iterations: int,
) -> None:
    # precreate multiple numbers of outputs
    async_inputs = [input] * iterations
    async_outputs = [output] * iterations
    futures = []

    sess_start = time.time()
    for i in range(iterations):
        f = model_runner.execute_async(async_inputs[i], async_outputs[i])
        futures.append(f)

    # waits all execution ends
    for i, future in enumerate(futures):
        future.wait()
    sess_end = time.time()

    avg_sess_time = (sess_end - sess_start) / iterations * 1000
    print(f'Asyncronous avg Session Time : {avg_sess_time:.3f}ms')


def run(executable, args):
    """Run PopEF."""
    # Create model runner
    model_runner = runtime.Runner(executable)

    # fix random number generation
    np.random.seed(2022)

    # Prepare inputs and outpus
    inputs = {}
    inputs_info = model_runner.get_execute_inputs()
    for input in inputs_info:
        inputs[input.name] = np.random.uniform(0, 1, input.shape).astype(
            input.numpy_data_type()
        )

    outputs = {}
    outputs_info = model_runner.get_execute_outputs()
    for output in outputs_info:
        outputs[output.name] = np.zeros(output.shape, dtype=output.numpy_data_type())

    # Run
    # To correctly generate the popvision report, iteration must be a
    # multiple of batches_per_step and greater than 2 * batches_per_step
    iteration = args.batches_per_step * 10

    # warm up device
    for _ in range(10):
        model_runner.execute(inputs, outputs)

    run_synchronous(model_runner, inputs, outputs, iteration)
    run_asynchronous(model_runner, inputs, outputs, iteration)


def default_model():
    TensorProto = onnx.TensorProto
    matmul = helper.make_node("MatMul", ["X", "Y"], ["Z"])
    add = helper.make_node("Add", ["Z", "A"], ["B"])
    graph = helper.make_graph(
        [matmul, add],
        "test",
        [
            helper.make_tensor_value_info("X", TensorProto.FLOAT, (4, 4, 8, 16)),
            helper.make_tensor_value_info("Y", TensorProto.FLOAT, (4, 4, 16, 8)),
            helper.make_tensor_value_info("A", TensorProto.FLOAT, (4, 4, 8, 8)),
        ],
        [helper.make_tensor_value_info("B", TensorProto.FLOAT, (4, 4, 8, 8))],
    )
    opset_imports = [helper.make_opsetid("", 11)]
    original_model = helper.make_model(graph, opset_imports=opset_imports)
    return original_model


if __name__ == '__main__':
    parser = argparse.ArgumentParser(
        description='Convert onnx model and run it on IPU.'
    )
    parser.add_argument('--onnx_model', type=str, help="Full path of the onnx model.")
    parser.add_argument(
        '--batches_per_step',
        type=int,
        default=100,
        help="The number of on-chip loop count.",
    )
    parser.add_argument('--popef', type=str, help="Full path of the popef file")
    args = parser.parse_args()

    if args.popef:
        run(args.popef, args)
    else:
        if not args.onnx_model:
            print("No onnx model provided, run default model.")
            model = default_model()
        else:
            print(f"Run onnx model {args.onnx_model}")
            model = onnx.load(args.onnx_model)

        converted_model = convert(model, args)
        executable = compile(converted_model, args)
        run(executable, args)

Download convert_compile_and_run.py

Search help

3. Quick start

3.1. CLI parameters

3.2. Convert and run model

3.2.1. Download ONNX model

3.2.2. Obtain input and output information for ONNX model

3.2.3. Specify input shape

3.2.4. Specify model accuracy

3.2.5. Run model

3.2.6. Export PopEF model

3.3. Quick deployment

3.3.1. Run exported PopEF model

3.3.2. Run converted ONNX model

3.4. Python API example