3. Quick start
This chapter will use an ONNX model as the example to explain how to perform model conversion and rapid deployment with PopRT.
Note
PopRT is required to run the examples in this chapter. If PopRT is not installed, refer to the Installation chapter.
PopRT supports the conversion and running of ONNX models, while exporting to PopEF files for model deployment. PopEF is a unified file format provided by Poplar SDK for importing and exporting models. The PopEF file contains the binary code compiled by the Poplar compiler that can run on IPU, the input and output of the model as well as the hardware information required for the running of models. For more information, please refer to to the PopEF: User Guide .
The examples in this chapter use the PopRT command-line interface.
3.1. CLI parameters
PopRT is easy to use from the command line. The following are some of the CLI parameters:
--input_model
(Required) Path to original model (for example, ONNX) and model name.--show
(Optional) Print only the input and output information of the model.--input_shape
(Optional) Specify the model input shape.--output_model
(Optional) The name of the converted model. The default storage path is the current directory. If this parameter is not specified, the converted model will be saved with the default name.--precision
(Optional) Specify the accuracy of the converted model. If this parameter is not specified, the original model accuracy will be used by default.--run
(Optional) Run the converted model using random input.--export_popef
(Optional) Export the PopEF model generated by compilation. The file is saved with a default name ofexecutable.popef
.
For full details of all CLI parameters, refer to the Command line interface chapter.
Note
For
--input_shape
, only the variable dimensions in the specified input shape are supported, and the size of known dimensions cannot be changed.The IPU only supports static graphs. If the input shape of the model is variable,
--input_shape
must be used to specify the input shape.For more configurations, please refer to the Command line interface chapter.
3.2. Convert and run model
This section will use the BERT Squad model in the ONNX model zoo as the example to explain how to perform model conversion.
3.2.1. Download ONNX model
Download the ONNX model with:
wget https://github.com/onnx/models/raw/main/text/machine_comprehension/bert-squad/model/bertsquad-12.onnx
3.2.2. Obtain input and output information for ONNX model
Obtain the input and output information for the ONNX model with:
poprt \
--input_model bertsquad-12.onnx \
--show
# The Input/Output Information of the Model
2023-01-06 02:59:32,897 INFO cli.py:327] Input unique_ids_raw_output___9:0: dtype - int64, shape - [0]
2023-01-06 02:59:32,897 INFO cli.py:327] Input segment_ids:0: dtype - int64, shape - [0, 256]
2023-01-06 02:59:32,897 INFO cli.py:327] Input input_mask:0: dtype - int64, shape - [0, 256]
2023-01-06 02:59:32,898 INFO cli.py:327] Input input_ids:0: dtype - int64, shape - [0, 256]
2023-01-06 02:59:32,898 INFO cli.py:334] Output unstack:1: dtype - float32, shape - [0, 256]
2023-01-06 02:59:32,898 INFO cli.py:334] Output unstack:0: dtype - float32, shape - [0, 256]
2023-01-06 02:59:32,898 INFO cli.py:334] Output unique_ids:0: dtype - int64, shape - [0]
3.2.3. Specify input shape
According to the input and output information of the original model, the input shape is variable. This means that --input_shape
must be used to specify the input shape. In this example, batch size = 2 is used.
Note
The original model input is int64. Since the IPU does not support int64, the converted model input will be int32.
poprt \
--input_model bertsquad-12.onnx \
--input_shape unique_ids_raw_output___9:0=2 segment_ids:0=2,256 input_mask:0=2,256 input_ids:0=2,256 \
--output_model bertsquad-12_fp32_bs2.onnx
3.2.4. Specify model accuracy
According to the input and output information of the original model, the accuracy is fp32. In this example, we will use an accuracy of fp16.
poprt \
--input_model bertsquad-12.onnx \
--input_shape unique_ids_raw_output___9:0=2 segment_ids:0=2,256 input_mask:0=2,256 input_ids:0=2,256 \
--output_model bertsquad-12_fp16_bs2.onnx \
--precision fp16
3.2.5. Run model
Note
--run
can be used to quickly verify whether the converted model can be compiled properly and run on an IPU.The data shown in this example does not represent optimal performance and only demonstrates the default conversion process.
# Convert, compile and run the ONNX model of fp32
poprt \
--input_model bertsquad-12.onnx \
--input_shape unique_ids_raw_output___9:0=2 segment_ids:0=2,256 input_mask:0=2,256 input_ids:0=2,256 \
--output_model bertsquad-12_fp32_bs2.onnx \
--run
# Random number run results
2023-01-06 05:58:14,209 INFO cli.py:452] Bs: 2
2023-01-06 05:58:14,209 INFO cli.py:455] Latency: 4.58ms
2023-01-06 05:58:14,210 INFO cli.py:456] Tput: 436
# Convert, compile and run the ONNX model of fp16
poprt \
--input_model bertsquad-12.onnx \
--input_shape unique_ids_raw_output___9:0=2 segment_ids:0=2,256 input_mask:0=2,256 input_ids:0=2,256 \
--output_model bertsquad-12_fp16_bs2.onnx \
--precision fp16 \
--run
# Random number run results
2023-01-06 06:00:59,283 INFO cli.py:452] Bs: 2
2023-01-06 06:00:59,283 INFO cli.py:455] Latency: 2.23ms
2023-01-06 06:00:59,283 INFO cli.py:456] Tput: 896
3.2.6. Export PopEF model
The exported PopEF model can be used for offline deployment.
Note
The
--export_popef
parameter uses the default nameexecutable.popef
to save the PopEF model to a file. Note that multiple exports will overwrite theexecutable.popef
file.
# Convert, compile and export the PopEF model of fp32
poprt \
--input_model bertsquad-12.onnx \
--input_shape unique_ids_raw_output___9:0=2 segment_ids:0=2,256 input_mask:0=2,256 input_ids:0=2,256 \
--export_popef
# Convert, compile and export the PopEF model of fp16
poprt \
--input_model bertsquad-12.onnx \
--input_shape unique_ids_raw_output___9:0=2 segment_ids:0=2,256 input_mask:0=2,256 input_ids:0=2,256 \
--precision fp16 \
--export_popef
3.3. Quick deployment
This section will take the ONNX model and the PopEF model converted in the previous section as an example to explain how to use the PopRT Python API for quick deployment.
3.3.1. Run exported PopEF model
To run the exported PopEF model:
from poprt import runtime
import numpy as np
# Create the ModelRunner instance, and load the PopEF file
runner = runtime.Runner('executable.popef')
# Obtain the model output information
outputs = runner.get_execute_outputs()
# Create the input and output data
input_dict = {
"unique_ids_raw_output___9:0": np.ones([2]).astype(np.int32),
"segment_ids:0": np.ones([2, 256]).astype(np.int32),
"input_mask:0": np.ones([2, 256]).astype(np.int32),
"input_ids:0": np.ones([2, 256]).astype(np.int32),
}
output_dict = {x.name: np.zeros(x.shape).astype(x.numpy_data_type()) for x in outputs}
# Run PopEF
runner.execute(input_dict, output_dict)
print(output_dict)
3.3.2. Run converted ONNX model
You can directly compile the converted ONNX model with the PopRT Compiler
class. Then, you can use the ModelRunner
class to run the PopEF instance generated by compilation.
import onnx
import numpy as np
from poprt import runtime
from poprt.compiler import Compiler
# Import the ONNX model
model = onnx.load("bertsquad-12_fp32_bs2.onnx")
model_bytes = model.SerializeToString()
output_names = [o.name for o in model.graph.output]
# Compile ONNX, and generate the PopEF instance
executable = Compiler.compile(model_bytes, output_names)
# Create the ModelRunner instance, and load the PopEF instance
runner = runtime.Runner(executable)
# Obtain the model output information
outputs = runner.get_execute_outputs()
# Create the input and output data
input_dict = {
"unique_ids_raw_output___9:0": np.ones([2]).astype(np.int32),
"segment_ids:0": np.ones([2, 256]).astype(np.int32),
"input_mask:0": np.ones([2, 256]).astype(np.int32),
"input_ids:0": np.ones([2, 256]).astype(np.int32),
}
output_dict = {x.name: np.zeros(x.shape).astype(x.numpy_data_type()) for x in outputs}
# Run PopEF
runner.execute(input_dict, output_dict)
print(output_dict)
3.4. Python API example
The following is an example of converting, compiling and running the ONNX model entirely with the Python API:
1# Copyright (c) 2022 Graphcore Ltd. All rights reserved.
2import argparse
3import time
4
5from typing import Dict
6
7import numpy as np
8import onnx
9
10from onnx import helper
11
12from poprt import Pass, runtime
13from poprt.compiler import Compiler, CompilerOptions
14from poprt.converter import Converter
15
16RuntimeInput = Dict[str, np.ndarray]
17
18
19def convert(model_proto: onnx.ModelProto, args) -> onnx.ModelProto:
20 """Convert ONNX model to a new optimized ONNX model."""
21 converter = Converter(convert_version=11, precision='fp16')
22 converted_model = converter.convert(model_proto)
23 # Add other passes here
24 converted_model = Pass.get_pass('int64_to_int32')(converted_model)
25 converted_model = Pass.get_pass('gelu_pattern')(converted_model)
26
27 return converted_model
28
29
30def compile(model: onnx.ModelProto, args):
31 """Compile ONNX to PopEF."""
32 model_bytes = model.SerializeToString()
33 outputs = [o.name for o in model.graph.output]
34
35 options = CompilerOptions()
36 options.num_ipus = 1
37 options.ipu_version = runtime.DeviceManager().ipu_hardware_version()
38 options.batches_per_step = args.batches_per_step
39 options.partials_type = 'half'
40 executable = Compiler.compile(model_bytes, outputs, options)
41
42 return executable
43
44
45def run_synchronous(
46 model_runner: runtime.Runner,
47 input: RuntimeInput,
48 output: RuntimeInput,
49 iterations: int,
50) -> None:
51 deltas = []
52 sess_start = time.time()
53 for _ in range(iterations):
54 start = time.time()
55 model_runner.execute(input, output)
56 end = time.time()
57 deltas.append(end - start)
58 sess_end = time.time()
59
60 latency = sum(deltas) / len(deltas) * 1000
61 print(f'Latency : {latency:.3f}ms')
62 avg_sess_time = (sess_end - sess_start) / iterations * 1000
63 print(f'Synchorous avg Session Time : {avg_sess_time:.3f}ms')
64
65
66def run_asynchronous(
67 model_runner: runtime.Runner,
68 input: RuntimeInput,
69 output: RuntimeInput,
70 iterations: int,
71) -> None:
72 # precreate multiple numbers of outputs
73 async_inputs = [input] * iterations
74 async_outputs = [output] * iterations
75 futures = []
76
77 sess_start = time.time()
78 for i in range(iterations):
79 f = model_runner.execute_async(async_inputs[i], async_outputs[i])
80 futures.append(f)
81
82 # waits all execution ends
83 for i, future in enumerate(futures):
84 future.wait()
85 sess_end = time.time()
86
87 avg_sess_time = (sess_end - sess_start) / iterations * 1000
88 print(f'Asyncronous avg Session Time : {avg_sess_time:.3f}ms')
89
90
91def run(executable, args):
92 """Run PopEF."""
93 # Create model runner
94 model_runner = runtime.Runner(executable)
95
96 # fix random number generation
97 np.random.seed(2022)
98
99 # Prepare inputs and outpus
100 inputs = {}
101 inputs_info = model_runner.get_execute_inputs()
102 for input in inputs_info:
103 inputs[input.name] = np.random.uniform(0, 1, input.shape).astype(
104 input.numpy_data_type()
105 )
106
107 outputs = {}
108 outputs_info = model_runner.get_execute_outputs()
109 for output in outputs_info:
110 outputs[output.name] = np.zeros(output.shape, dtype=output.numpy_data_type())
111
112 # Run
113 # To correctly generate the popvision report, iteration must be a
114 # multiple of batches_per_step and greater than 2 * batches_per_step
115 iteration = args.batches_per_step * 10
116
117 # warm up device
118 for _ in range(10):
119 model_runner.execute(inputs, outputs)
120
121 run_synchronous(model_runner, inputs, outputs, iteration)
122 run_asynchronous(model_runner, inputs, outputs, iteration)
123
124
125def default_model():
126 TensorProto = onnx.TensorProto
127 matmul = helper.make_node("MatMul", ["X", "Y"], ["Z"])
128 add = helper.make_node("Add", ["Z", "A"], ["B"])
129 graph = helper.make_graph(
130 [matmul, add],
131 "test",
132 [
133 helper.make_tensor_value_info("X", TensorProto.FLOAT, (4, 4, 8, 16)),
134 helper.make_tensor_value_info("Y", TensorProto.FLOAT, (4, 4, 16, 8)),
135 helper.make_tensor_value_info("A", TensorProto.FLOAT, (4, 4, 8, 8)),
136 ],
137 [helper.make_tensor_value_info("B", TensorProto.FLOAT, (4, 4, 8, 8))],
138 )
139 opset_imports = [helper.make_opsetid("", 11)]
140 original_model = helper.make_model(graph, opset_imports=opset_imports)
141 return original_model
142
143
144if __name__ == '__main__':
145 parser = argparse.ArgumentParser(
146 description='Convert onnx model and run it on IPU.'
147 )
148 parser.add_argument('--onnx_model', type=str, help="Full path of the onnx model.")
149 parser.add_argument(
150 '--batches_per_step',
151 type=int,
152 default=100,
153 help="The number of on-chip loop count.",
154 )
155 parser.add_argument('--popef', type=str, help="Full path of the popef file")
156 args = parser.parse_args()
157
158 if args.popef:
159 run(args.popef, args)
160 else:
161 if not args.onnx_model:
162 print("No onnx model provided, run default model.")
163 model = default_model()
164 else:
165 print(f"Run onnx model {args.onnx_model}")
166 model = onnx.load(args.onnx_model)
167
168 converted_model = convert(model, args)
169 executable = compile(converted_model, args)
170 run(executable, args)