3. Quick start

This chapter will use an ONNX model as the example to explain how to perform model conversion and rapid deployment with PopRT.

Note

  • PopRT is required to run the examples in this chapter. If PopRT is not installed, refer to the Installation chapter.

  • PopRT supports the conversion and running of ONNX models, while exporting to PopEF files for model deployment. PopEF is a unified file format provided by Poplar SDK for importing and exporting models. The PopEF file contains the binary code compiled by the Poplar compiler that can run on IPU, the input and output of the model as well as the hardware information required for the running of models. For more information, please refer to to the PopEF: User Guide .

  • The examples in this chapter use the PopRT command-line interface.

3.1. CLI parameters

PopRT is easy to use from the command line. The following are some of the CLI parameters:

  • --input_model (Required) Path to original model (for example, ONNX) and model name.

  • --show (Optional) Print only the input and output information of the model.

  • --input_shape (Optional) Specify the model input shape.

  • --output_model (Optional) The name of the converted model. The default storage path is the current directory. If this parameter is not specified, the converted model will be saved with the default name.

  • --precision (Optional) Specify the accuracy of the converted model. If this parameter is not specified, the original model accuracy will be used by default.

  • --run (Optional) Run the converted model using random input.

  • --export_popef (Optional) Export the PopEF model generated by compilation. The file is saved with a default name of executable.popef.

For full details of all CLI parameters, refer to the Command line interface chapter.

Note

  • For --input_shape, only the variable dimensions in the specified input shape are supported, and the size of known dimensions cannot be changed.

  • The IPU only supports static graphs. If the input shape of the model is variable, --input_shape must be used to specify the input shape.

  • For more configurations, please refer to the Command line interface chapter.

3.2. Convert and run model

This section will use the BERT Squad model in the ONNX model zoo as the example to explain how to perform model conversion.

3.2.1. Download ONNX model

Download the ONNX model with:

wget https://github.com/onnx/models/raw/main/text/machine_comprehension/bert-squad/model/bertsquad-12.onnx

3.2.2. Obtain input and output information for ONNX model

Obtain the input and output information for the ONNX model with:

poprt \
    --input_model bertsquad-12.onnx \
    --show

# The Input/Output Information of the Model
2023-01-06 02:59:32,897 INFO cli.py:327] Input unique_ids_raw_output___9:0: dtype - int64, shape - [0]
2023-01-06 02:59:32,897 INFO cli.py:327] Input segment_ids:0: dtype - int64, shape - [0, 256]
2023-01-06 02:59:32,897 INFO cli.py:327] Input input_mask:0: dtype - int64, shape - [0, 256]
2023-01-06 02:59:32,898 INFO cli.py:327] Input input_ids:0: dtype - int64, shape - [0, 256]
2023-01-06 02:59:32,898 INFO cli.py:334] Output unstack:1: dtype - float32, shape - [0, 256]
2023-01-06 02:59:32,898 INFO cli.py:334] Output unstack:0: dtype - float32, shape - [0, 256]
2023-01-06 02:59:32,898 INFO cli.py:334] Output unique_ids:0: dtype - int64, shape - [0]

3.2.3. Specify input shape

According to the input and output information of the original model, the input shape is variable. This means that --input_shape must be used to specify the input shape. In this example, batch size = 2 is used.

Note

  • The original model input is int64. Since the IPU does not support int64, the converted model input will be int32.

poprt \
    --input_model bertsquad-12.onnx \
    --input_shape unique_ids_raw_output___9:0=2 segment_ids:0=2,256 input_mask:0=2,256 input_ids:0=2,256 \
    --output_model bertsquad-12_fp32_bs2.onnx

3.2.4. Specify model accuracy

According to the input and output information of the original model, the accuracy is fp32. In this example, we will use an accuracy of fp16.

poprt \
    --input_model bertsquad-12.onnx \
    --input_shape unique_ids_raw_output___9:0=2 segment_ids:0=2,256 input_mask:0=2,256 input_ids:0=2,256 \
    --output_model bertsquad-12_fp16_bs2.onnx \
    --precision fp16

3.2.5. Run model

Note

  • --run can be used to quickly verify whether the converted model can be compiled properly and run on an IPU.

  • The data shown in this example does not represent optimal performance and only demonstrates the default conversion process.

# Convert, compile and run the ONNX model of fp32
poprt \
    --input_model bertsquad-12.onnx \
    --input_shape unique_ids_raw_output___9:0=2 segment_ids:0=2,256 input_mask:0=2,256 input_ids:0=2,256 \
    --output_model bertsquad-12_fp32_bs2.onnx \
    --run

# Random number run results
2023-01-06 05:58:14,209 INFO cli.py:452] Bs: 2
2023-01-06 05:58:14,209 INFO cli.py:455] Latency: 4.58ms
2023-01-06 05:58:14,210 INFO cli.py:456] Tput: 436

# Convert, compile and run the ONNX model of fp16
poprt \
    --input_model bertsquad-12.onnx \
    --input_shape unique_ids_raw_output___9:0=2 segment_ids:0=2,256 input_mask:0=2,256 input_ids:0=2,256 \
    --output_model bertsquad-12_fp16_bs2.onnx \
    --precision fp16 \
    --run

# Random number run results
2023-01-06 06:00:59,283 INFO cli.py:452] Bs: 2
2023-01-06 06:00:59,283 INFO cli.py:455] Latency: 2.23ms
2023-01-06 06:00:59,283 INFO cli.py:456] Tput: 896

3.2.6. Export PopEF model

The exported PopEF model can be used for offline deployment.

Note

  • The --export_popef parameter uses the default name executable.popef to save the PopEF model to a file. Note that multiple exports will overwrite the executable.popef file.

# Convert, compile and export the PopEF model of fp32
poprt \
    --input_model bertsquad-12.onnx \
    --input_shape unique_ids_raw_output___9:0=2 segment_ids:0=2,256 input_mask:0=2,256 input_ids:0=2,256 \
    --export_popef

# Convert, compile and export the PopEF model of fp16
poprt \
    --input_model bertsquad-12.onnx \
    --input_shape unique_ids_raw_output___9:0=2 segment_ids:0=2,256 input_mask:0=2,256 input_ids:0=2,256 \
    --precision fp16 \
    --export_popef

3.3. Quick deployment

This section will take the ONNX model and the PopEF model converted in the previous section as an example to explain how to use the PopRT Python API for quick deployment.

3.3.1. Run exported PopEF model

To run the exported PopEF model:

from poprt import runtime
import numpy as np

# Create the ModelRunner instance, and load the PopEF file
runner = runtime.Runner('executable.popef')

# Obtain the model output information
outputs = runner.get_execute_outputs()

# Create the input and output data
input_dict = {
    "unique_ids_raw_output___9:0": np.ones([2]).astype(np.int32),
    "segment_ids:0": np.ones([2, 256]).astype(np.int32),
    "input_mask:0": np.ones([2, 256]).astype(np.int32),
    "input_ids:0": np.ones([2, 256]).astype(np.int32),
}
output_dict = {x.name: np.zeros(x.shape).astype(x.numpy_data_type()) for x in outputs}

# Run PopEF
runner.execute(input_dict, output_dict)
print(output_dict)

3.3.2. Run converted ONNX model

You can directly compile the converted ONNX model with the PopRT Compiler class. Then, you can use the ModelRunner class to run the PopEF instance generated by compilation.

import onnx
import numpy as np
from poprt import runtime
from poprt.compiler import Compiler

# Import the ONNX model
model = onnx.load("bertsquad-12_fp32_bs2.onnx")
model_bytes = model.SerializeToString()
output_names = [o.name for o in model.graph.output]

# Compile ONNX, and generate the PopEF instance
executable = Compiler.compile(model_bytes, output_names)

# Create the ModelRunner instance, and load the PopEF instance
runner = runtime.Runner(executable)

# Obtain the model output information
outputs = runner.get_execute_outputs()

# Create the input and output data
input_dict = {
    "unique_ids_raw_output___9:0": np.ones([2]).astype(np.int32),
    "segment_ids:0": np.ones([2, 256]).astype(np.int32),
    "input_mask:0": np.ones([2, 256]).astype(np.int32),
    "input_ids:0": np.ones([2, 256]).astype(np.int32),
}
output_dict = {x.name: np.zeros(x.shape).astype(x.numpy_data_type()) for x in outputs}

# Run PopEF
runner.execute(input_dict, output_dict)
print(output_dict)

3.4. Python API example

The following is an example of converting, compiling and running the ONNX model entirely with the Python API:

Listing 3.1 Example showing how to convert, compile and run an ONNX model using the PopRT Python API
  1# Copyright (c) 2022 Graphcore Ltd. All rights reserved.
  2import argparse
  3import time
  4
  5from typing import Dict
  6
  7import numpy as np
  8import onnx
  9
 10from onnx import helper
 11
 12from poprt import Pass, runtime
 13from poprt.compiler import Compiler, CompilerOptions
 14from poprt.converter import Converter
 15
 16RuntimeInput = Dict[str, np.ndarray]
 17
 18
 19def convert(model_proto: onnx.ModelProto, args) -> onnx.ModelProto:
 20    """Convert ONNX model to a new optimized ONNX model."""
 21    converter = Converter(convert_version=11, precision='fp16')
 22    converted_model = converter.convert(model_proto)
 23    # Add other passes here
 24    converted_model = Pass.get_pass('int64_to_int32')(converted_model)
 25    converted_model = Pass.get_pass('gelu_pattern')(converted_model)
 26
 27    return converted_model
 28
 29
 30def compile(model: onnx.ModelProto, args):
 31    """Compile ONNX to PopEF."""
 32    model_bytes = model.SerializeToString()
 33    outputs = [o.name for o in model.graph.output]
 34
 35    options = CompilerOptions()
 36    options.num_ipus = 1
 37    options.ipu_version = runtime.DeviceManager().ipu_hardware_version()
 38    options.batches_per_step = args.batches_per_step
 39    options.partials_type = 'half'
 40    executable = Compiler.compile(model_bytes, outputs, options)
 41
 42    return executable
 43
 44
 45def run_synchronous(
 46    model_runner: runtime.Runner,
 47    input: RuntimeInput,
 48    output: RuntimeInput,
 49    iterations: int,
 50) -> None:
 51    deltas = []
 52    sess_start = time.time()
 53    for _ in range(iterations):
 54        start = time.time()
 55        model_runner.execute(input, output)
 56        end = time.time()
 57        deltas.append(end - start)
 58    sess_end = time.time()
 59
 60    latency = sum(deltas) / len(deltas) * 1000
 61    print(f'Latency : {latency:.3f}ms')
 62    avg_sess_time = (sess_end - sess_start) / iterations * 1000
 63    print(f'Synchorous avg Session Time : {avg_sess_time:.3f}ms')
 64
 65
 66def run_asynchronous(
 67    model_runner: runtime.Runner,
 68    input: RuntimeInput,
 69    output: RuntimeInput,
 70    iterations: int,
 71) -> None:
 72    # precreate multiple numbers of outputs
 73    async_inputs = [input] * iterations
 74    async_outputs = [output] * iterations
 75    futures = []
 76
 77    sess_start = time.time()
 78    for i in range(iterations):
 79        f = model_runner.execute_async(async_inputs[i], async_outputs[i])
 80        futures.append(f)
 81
 82    # waits all execution ends
 83    for i, future in enumerate(futures):
 84        future.wait()
 85    sess_end = time.time()
 86
 87    avg_sess_time = (sess_end - sess_start) / iterations * 1000
 88    print(f'Asyncronous avg Session Time : {avg_sess_time:.3f}ms')
 89
 90
 91def run(executable, args):
 92    """Run PopEF."""
 93    # Create model runner
 94    model_runner = runtime.Runner(executable)
 95
 96    # fix random number generation
 97    np.random.seed(2022)
 98
 99    # Prepare inputs and outpus
100    inputs = {}
101    inputs_info = model_runner.get_execute_inputs()
102    for input in inputs_info:
103        inputs[input.name] = np.random.uniform(0, 1, input.shape).astype(
104            input.numpy_data_type()
105        )
106
107    outputs = {}
108    outputs_info = model_runner.get_execute_outputs()
109    for output in outputs_info:
110        outputs[output.name] = np.zeros(output.shape, dtype=output.numpy_data_type())
111
112    # Run
113    # To correctly generate the popvision report, iteration must be a
114    # multiple of batches_per_step and greater than 2 * batches_per_step
115    iteration = args.batches_per_step * 10
116
117    # warm up device
118    for _ in range(10):
119        model_runner.execute(inputs, outputs)
120
121    run_synchronous(model_runner, inputs, outputs, iteration)
122    run_asynchronous(model_runner, inputs, outputs, iteration)
123
124
125def default_model():
126    TensorProto = onnx.TensorProto
127    matmul = helper.make_node("MatMul", ["X", "Y"], ["Z"])
128    add = helper.make_node("Add", ["Z", "A"], ["B"])
129    graph = helper.make_graph(
130        [matmul, add],
131        "test",
132        [
133            helper.make_tensor_value_info("X", TensorProto.FLOAT, (4, 4, 8, 16)),
134            helper.make_tensor_value_info("Y", TensorProto.FLOAT, (4, 4, 16, 8)),
135            helper.make_tensor_value_info("A", TensorProto.FLOAT, (4, 4, 8, 8)),
136        ],
137        [helper.make_tensor_value_info("B", TensorProto.FLOAT, (4, 4, 8, 8))],
138    )
139    opset_imports = [helper.make_opsetid("", 11)]
140    original_model = helper.make_model(graph, opset_imports=opset_imports)
141    return original_model
142
143
144if __name__ == '__main__':
145    parser = argparse.ArgumentParser(
146        description='Convert onnx model and run it on IPU.'
147    )
148    parser.add_argument('--onnx_model', type=str, help="Full path of the onnx model.")
149    parser.add_argument(
150        '--batches_per_step',
151        type=int,
152        default=100,
153        help="The number of on-chip loop count.",
154    )
155    parser.add_argument('--popef', type=str, help="Full path of the popef file")
156    args = parser.parse_args()
157
158    if args.popef:
159        run(args.popef, args)
160    else:
161        if not args.onnx_model:
162            print("No onnx model provided, run default model.")
163            model = default_model()
164        else:
165            print(f"Run onnx model {args.onnx_model}")
166            model = onnx.load(args.onnx_model)
167
168        converted_model = convert(model, args)
169        executable = compile(converted_model, args)
170        run(executable, args)

Download convert_compile_and_run.py