3. Model Runner deep dive through examples
The ModelRunner
class is a lightweight wrapper
around all classes and mechanisms provided by the Model Runtime library that
allows loading and execution of the model stored in a PopEF file with minimal
client effort.
Running a PopEF model with a ModelRunner
class
consists of two steps:
Create a
ModelRunner
object by either providing a list of PopEF files or an instance ofModel
. In this part, the IPU device partition is acquired, the given model is loaded onto a device, and all necessary threads and classes are created and stored inside theModelRunner
internal state. The user is able to passModelRunnerConfig
duringModelRunner
object construction and set up several configuration options, for example replication factor (replication_factor
).Use one of the available execution modes to send the request to the IPU.
Note
The ModelRunner
object instance lifetime must be
preserved until the last send request returns a result. Destruction of the
object causes stopping and unloading the model from the IPU device. State of
requests that were being processed during destruction is undefined.
Note
Files that are included by the examples presented in this chapter are placed in the Section 9, Appendix. They contain helper functions for example: processing command line arguments.
3.1. Execution modes
The ModelRunner
class provides two execution modes:
synchronous (execute()
) and asynchronous
(executeAsync()
). In the synchronous mode,
the working thread is blocked until the result is available. In the asynchronous
mode the request is queued and a std::future object is returned.
The result can be accessed as soon as the IPU finishes computations. Unlike
the synchronous mode, the working thread is not blocked here. For each mode,
the user is responsible for the memory allocation of the input tensors.
All
execute()
and
executeAsync()
overloads take as a
parameter InputMemoryView
that contains pointers
to all input data. The user has to ensure that the input data exists and passed
pointers are valid until a result is returned. Each
execute()
and
executeAsync()
comes in two flavors:
execute(const InputMemoryView &, unsigned)
,
execute(const InputMemoryView &, const OutputMemoryView &output_data, unsigned)
,
executeAsync(const InputMemoryView &, unsigned)
,
executeAsync(const InputMemoryView &, const OutputMemoryView &output_data, unsigned)
.
The difference between overloads comes down to who is responsible for the memory
allocation of the output tensors. There are two options available:
ModelRunner
allocates memory for the output and returns aTensorMemory
instance for each output tensor.The user allocates the output tensor memory and passes
OutputMemoryView
to the particular execution mode.ModelRunner
will place the result in the memory provided by the user.
The client can find out what tensors the model accepts and returns by calling
one of following ModelRunner
class methods:
These methods return a collection of DataDesc
objects which contain basic information about the tensor: name, shape,
data type, size in bytes.
This C++ example sends inference requests to the IPU using all available execution modes.
1// Copyright (c) 2022 Graphcore Ltd. All rights reserved.
2#include <string>
3#include <vector>
4
5#include <boost/program_options.hpp>
6
7#include "model_runtime/ModelRunner.hpp"
8#include "model_runtime/Tensor.hpp"
9#include "utils.hpp"
10
11namespace examples {
12
13void synchronousExecutionModeLibraryAllocatedOutput(
14 model_runtime::ModelRunner &model_runner);
15void synchronousExecutionModeUserAllocatedOutput(
16 model_runtime::ModelRunner &model_runner);
17void asynchronousExecutionModeLibraryAllocatedOutput(
18 model_runtime::ModelRunner &model_runner);
19void asynchronousExecutionModeUserAllocatedOutput(
20 model_runtime::ModelRunner &model_runner);
21
22} // namespace examples
23
24/* The example shows loading a model from PopEF files and sending
25 * inference requests using all available ModelRunner execution modes.
26 */
27int main(int argc, char *argv[]) {
28 using namespace std::chrono_literals;
29 static const char *example_desc = "Model runner execution modes example.";
30 const boost::program_options::variables_map vm =
31 examples::parsePopefProgramOptions(example_desc, argc, argv);
32 const auto popef_paths = vm["popef"].as<std::vector<std::string>>();
33
34 model_runtime::ModelRunnerConfig config;
35 config.device_wait_config =
36 model_runtime::DeviceWaitConfig{600s /*timeout*/, 1s /*sleep_time*/};
37 model_runtime::ModelRunner model_runner(popef_paths, config);
38
39 examples::print("Running synchronous execution mode. The memory of the "
40 "output tensors is allocated by the ModelRunner object.");
41 examples::synchronousExecutionModeLibraryAllocatedOutput(model_runner);
42
43 examples::print("Running synchronous execution mode. The memory of the "
44 "output tensors is allocated by the user.");
45 examples::synchronousExecutionModeUserAllocatedOutput(model_runner);
46
47 examples::print("Running asynchronous execution mode. The memory of the "
48 "output tensors is allocated by the ModelRunner object.");
49 examples::asynchronousExecutionModeLibraryAllocatedOutput(model_runner);
50
51 examples::print("Running asynchronous execution mode. The memory of the "
52 "output tensors is allocated by the user.");
53 examples::asynchronousExecutionModeUserAllocatedOutput(model_runner);
54
55 examples::print("Success: exiting");
56 return EXIT_SUCCESS;
57}
58
59namespace examples {
60
61void synchronousExecutionModeLibraryAllocatedOutput(
62 model_runtime::ModelRunner &model_runner) {
63 examples::print("Allocating input tensors");
64 const model_runtime::InputMemory input_memory =
65 examples::allocateHostInputData(model_runner.getExecuteInputs());
66
67 examples::printInputMemory(input_memory);
68
69 examples::print("Sending single synchronous request with empty data. Output "
70 "allocated by ModelRunner.");
71
72 const model_runtime::OutputMemory output_memory =
73 model_runner.execute(examples::toInputMemoryView(input_memory));
74
75 examples::print("Received output allocated by ModelRunner:");
76 using ValueType = std::pair<const std::string, model_runtime::TensorMemory>;
77
78 for (const ValueType &name_with_memory : output_memory) {
79 auto &&[name, memory] = name_with_memory;
80 examples::print(fmt::format("Output tensor {}, {} bytes", name,
81 memory.data_size_bytes));
82 }
83}
84
85void synchronousExecutionModeUserAllocatedOutput(
86 model_runtime::ModelRunner &model_runner) {
87 examples::print("Allocating input tensors");
88 const model_runtime::InputMemory input_memory =
89 examples::allocateHostInputData(model_runner.getExecuteInputs());
90
91 examples::printInputMemory(input_memory);
92
93 examples::print("Allocating output tensors");
94 model_runtime::OutputMemory output_memory =
95 examples::allocateHostOutputData(model_runner.getExecuteOutputs());
96
97 examples::print("Sending single synchronous request with empty data.");
98
99 model_runner.execute(examples::toInputMemoryView(input_memory),
100 examples::toOutputMemoryView(output_memory));
101
102 examples::print("Received output allocated by ModelRunner:");
103
104 using ValueType = std::pair<const std::string, model_runtime::TensorMemory>;
105
106 for (const ValueType &name_with_memory : output_memory) {
107 auto &&[name, memory] = name_with_memory;
108 examples::print(fmt::format("Output tensor {}, {} bytes", name,
109 memory.data_size_bytes));
110 }
111}
112
113void asynchronousExecutionModeLibraryAllocatedOutput(
114 model_runtime::ModelRunner &model_runner) {
115 examples::print("Allocating input tensors");
116 const model_runtime::InputMemory input_memory =
117 examples::allocateHostInputData(model_runner.getExecuteInputs());
118
119 examples::printInputMemory(input_memory);
120
121 examples::print("Sending single synchronous request with empty data. Output "
122 "allocated by ModelRunner.");
123
124 const model_runtime::OutputFutureMemory output_future_memory =
125 model_runner.executeAsync(examples::toInputMemoryView(input_memory));
126
127 examples::print("Waiting for output allocated by ModelRunner:");
128
129 using ValueType = std::pair<const std::string,
130 std::shared_future<model_runtime::TensorMemory>>;
131
132 for (const ValueType &name_with_future_memory : output_future_memory) {
133 auto &&[name, future_memory] = name_with_future_memory;
134 examples::print(fmt::format("Waiting for the result: tensor {}", name));
135 future_memory.wait();
136 const model_runtime::TensorMemory &memory = future_memory.get();
137 examples::print(fmt::format("Output tensor {} available, received {} bytes",
138 name, memory.data_size_bytes));
139 }
140}
141
142void asynchronousExecutionModeUserAllocatedOutput(
143 model_runtime::ModelRunner &model_runner) {
144 examples::print("Allocating input tensors");
145 const model_runtime::InputMemory input_memory =
146 examples::allocateHostInputData(model_runner.getExecuteInputs());
147
148 examples::printInputMemory(input_memory);
149
150 examples::print("Allocating output tensors");
151 model_runtime::OutputMemory output_memory =
152 examples::allocateHostOutputData(model_runner.getExecuteOutputs());
153
154 examples::print("Sending single synchronous request with empty data.");
155
156 const model_runtime::OutputFutureMemoryView output_future_memory_view =
157 model_runner.executeAsync(examples::toInputMemoryView(input_memory),
158 examples::toOutputMemoryView(output_memory));
159
160 examples::print("Waiting for the output");
161
162 using ValueType =
163 std::pair<const std::string,
164 std::shared_future<model_runtime::TensorMemoryView>>;
165
166 for (const ValueType &name_with_future_memory_view :
167 output_future_memory_view) {
168 auto &&[name, future_memory_view] = name_with_future_memory_view;
169 examples::print(fmt::format("Waiting for the result: tensor {}", name));
170 future_memory_view.wait();
171 const model_runtime::TensorMemoryView &memory_view =
172 future_memory_view.get();
173 examples::print(fmt::format("Output tensor {} available, received {} bytes",
174 name, memory_view.data_size_bytes));
175 }
176}
177
178} // namespace examples
Download model_runner_execution_modes.cpp
This Python example sends inference requests to the IPU using all available execution modes.
1#!/usr/bin/env python3
2# Copyright (c) 2022 Graphcore Ltd. All rights reserved.
3
4import argparse
5from datetime import timedelta
6from re import L
7import numpy as np
8import model_runtime
9import popef
10"""
11The example shows loading a model from PopEF files and sending
12inference requests using all available ModelRunner execution modes.
13"""
14
15
16def main():
17 parser = argparse.ArgumentParser("Model runner simple example.")
18 parser.add_argument(
19 "-p",
20 "--popef",
21 type=str,
22 metavar='popef_file_path',
23 help="A collection of PopEF files containing the model.",
24 nargs='+',
25 required=True)
26 args = parser.parse_args()
27
28 # Create model runner
29 config = model_runtime.ModelRunnerConfig()
30 config.device_wait_config = model_runtime.DeviceWaitConfig(
31 model_runtime.DeviceWaitStrategy.WAIT_WITH_TIMEOUT,
32 timeout=timedelta(seconds=600),
33 sleepTime=timedelta(seconds=1))
34
35 print("Creating ModelRunner with", config)
36 model_runner = model_runtime.ModelRunner(model_runtime.PopefPaths(
37 args.popef),
38 config=config)
39
40 print("Preparing input tensors:")
41 input_descriptions = model_runner.getExecuteInputs()
42 input_tensors = [
43 np.random.randn(*input_desc.shape).astype(input_desc.numpy_data_type())
44 for input_desc in input_descriptions
45 ]
46 input_view = model_runtime.InputMemoryView()
47
48 for input_desc, input_tensor in zip(input_descriptions, input_tensors):
49 print("\tname:", input_desc.name, "shape:", input_tensor.shape,
50 "dtype:", input_tensor.dtype)
51 input_view[input_desc.name] = input_tensor
52
53 print("Running synchronous execution mode. The memory of the output "
54 "tensors is allocated by the ModelRunner object.")
55 synchronousExecutionModeLibraryAllocatedOutput(model_runner, input_view)
56
57 print("Running synchronous execution mode. The memory of the output "
58 "tensors is allocated by the user.")
59 synchronousExecutionModeUserAllocatedOutput(model_runner, input_view)
60
61 print("Running asynchronous execution mode. The memory of the output "
62 "tensors is allocated by the ModelRunner object.")
63 asynchronousExecutionModeLibraryAllocatedOutput(model_runner, input_view)
64
65 print("Running asynchronous execution mode. The memory of the output "
66 "tensors is allocated by the user.")
67 asynchronousExecutionModeUserAllocatedOutput(model_runner, input_view)
68
69 input_numpy = dict()
70 for input_desc, input_tensor in zip(input_descriptions, input_tensors):
71 input_numpy[input_desc.name] = input_tensor
72
73 print("Running synchronous execution mode. The input is a numpy array. "
74 "The memory of the output tensors is allocated by the ModelRunner "
75 "object.")
76 synchronousExecutionModeLibraryAllocatedNumpyInputOutput(
77 model_runner, input_numpy)
78
79 print("Running synchronous execution mode. The input and the output are "
80 "numpy arrays. The memory of the output tensors is allocated by the "
81 "user. ")
82 synchronousExecutionModeUserAllocatedNumpyInputOutput(
83 model_runner, input_numpy)
84
85 print(
86 "Running asynchronous execution mode. The input and the output are "
87 "numpy arrays . The memory of the output tensors is allocated by the "
88 "ModelRunner object.")
89 asynchronousExecutionModeLibraryAllocatedNumpyOutput(
90 model_runner, input_numpy)
91
92 print(
93 "Running asynchronous execution mode. The input and the output are "
94 "numpy arrays . The memory of the output tensors is allocated by the "
95 "user.")
96 asynchronousExecutionModeUserAllocatedNumpyOutput(model_runner,
97 input_numpy)
98
99 print("Success: exiting")
100 return 0
101
102
103def synchronousExecutionModeLibraryAllocatedOutput(model_runner, input_view):
104 print("Sending single synchronous request with random data. Output "
105 "allocated by ModelRunner.")
106 result = model_runner.execute(input_view)
107
108 output_descriptions = model_runner.getExecuteOutputs()
109 print("Processing output tensors:")
110 for output_desc in output_descriptions:
111 output_tensor = np.frombuffer(
112 result[output_desc.name],
113 dtype=output_desc.numpy_data_type()).reshape(output_desc.shape)
114 print("\tname:", output_desc.name, "shape:", output_tensor.shape,
115 "dtype:", output_tensor.dtype, "\n", output_tensor)
116
117
118def synchronousExecutionModeUserAllocatedOutput(model_runner, input_view):
119
120 output_descriptions = model_runner.getExecuteOutputs()
121 print("Preparing memory for output tensors")
122 output_tensors = [
123 np.zeros(output_desc.shape, dtype=output_desc.numpy_data_type())
124 for output_desc in output_descriptions
125 ]
126
127 print("Creating model_runtime.OutputMemoryView()")
128 output_view = model_runtime.OutputMemoryView()
129 for desc, tensor in zip(output_descriptions, output_tensors):
130 print("\tname:", desc.name, "shape:", tensor.shape, "dtype:",
131 tensor.dtype)
132 output_view[desc.name] = tensor
133
134 print("Sending single synchronous request with random data")
135 model_runner.execute(input_view, output_view)
136 print("Processing output tensors:")
137 for desc, tensor in zip(output_descriptions, output_tensors):
138 print("\tname:", desc.name, "shape", tensor.shape, "dtype",
139 tensor.dtype, "\n", tensor)
140
141
142def synchronousExecutionModeLibraryAllocatedNumpyInputOutput(
143 model_runner, numpy_input):
144
145 output_descriptions = model_runner.getExecuteOutputs()
146
147 print("Sending single synchronous request random data (numpy array)")
148 output_tensors = model_runner.execute(numpy_input)
149 print("Processing output tensors (numpy dict):")
150 for desc in output_descriptions:
151 tensor = output_tensors[desc.name]
152 print("\tname:", desc.name, "shape", tensor.shape, "dtype",
153 tensor.dtype, "\n", tensor)
154
155
156def synchronousExecutionModeUserAllocatedNumpyInputOutput(
157 model_runner, numpy_input):
158
159 output_descriptions = model_runner.getExecuteOutputs()
160 print("Preparing memory for output tensors")
161 numpy_output = {}
162 for output_desc in output_descriptions:
163 numpy_output[output_desc.name] = np.zeros(
164 output_desc.shape, dtype=output_desc.numpy_data_type())
165
166 print("Sending single synchronous request with random data")
167 model_runner.execute(numpy_input, numpy_output)
168 print("Processing output tensors (numpy dict):")
169 for desc in output_descriptions:
170 tensor = numpy_output[desc.name]
171 print("\tname:", desc.name, "shape", tensor.shape, "dtype",
172 tensor.dtype, "\n", tensor)
173
174
175def asynchronousExecutionModeLibraryAllocatedOutput(model_runner, input_view):
176
177 print("Sending single asynchronous request with random data. Output "
178 "allocated by ModelRunner.")
179 result = model_runner.executeAsync(input_view)
180
181 print("Waiting for output allocated by ModelRunner:")
182 result.wait()
183 print("Results available")
184
185 output_descriptions = model_runner.getExecuteOutputs()
186 print("Processing output tensors:")
187 for output_desc in output_descriptions:
188 output_tensor = np.frombuffer(
189 result[output_desc.name],
190 dtype=output_desc.numpy_data_type()).reshape(output_desc.shape)
191 print("\tname:", output_desc.name, "shape:", output_tensor.shape,
192 "dtype:", output_tensor.dtype, "\n", output_tensor)
193
194
195def asynchronousExecutionModeUserAllocatedOutput(model_runner, input_view):
196 output_descriptions = model_runner.getExecuteOutputs()
197 print("Preparing memory for output tensors")
198 output_tensors = [
199 np.zeros(output_desc.shape, dtype=output_desc.numpy_data_type())
200 for output_desc in output_descriptions
201 ]
202
203 print("Creating model_runtime.OutputMemoryView()")
204 output_view = model_runtime.OutputMemoryView()
205 for desc, tensor in zip(output_descriptions, output_tensors):
206 print("\tname:", desc.name, "shape:", tensor.shape, "dtype:",
207 tensor.dtype)
208 output_view[desc.name] = tensor
209
210 print("Sending single asynchronous request with random data")
211 future = model_runner.executeAsync(input_view, output_view)
212
213 print("Waiting for the output.")
214 future.wait()
215 print("Results available.")
216 print("Processing output tensors:")
217 for desc, tensor in zip(output_descriptions, output_tensors):
218 print("\tname:", desc.name, "shape", tensor.shape, "dtype",
219 tensor.dtype, "\n", tensor)
220
221
222def asynchronousExecutionModeLibraryAllocatedNumpyOutput(
223 model_runner, numpy_input):
224 print("Sending single asynchronous request with random data")
225 future = model_runner.executeAsync(numpy_input)
226
227 print("Waiting for the output.")
228 future.wait()
229 for desc in model_runner.getExecuteOutputs():
230 future_py_array = future[desc.name]
231
232 # Create a np.array copy from the future_py_array buffer
233 # using numpy() method.
234 tensor = future_py_array.numpy()
235 print("\tname:", desc.name, "shape", tensor.shape, "dtype",
236 tensor.dtype, "tensor id", id(tensor), "\n", tensor)
237
238 # Create a np.array copy from the future_py_array buffer
239 # (allocated by ModelRunner instance).
240 tensor_copy = np.array(future_py_array, copy=True)
241 print("Tensor copy", tensor_copy, "tensor id", id(tensor_copy))
242
243 # Avoid copying. Create a np.array view from the future_py_array buffer
244 # (allocated by ModelRunner instance).
245 tensor_view = np.array(future_py_array, copy=False)
246 print("Tensor view", tensor_view, "tensor id", id(tensor_view))
247
248 assert not np.shares_memory(tensor_view, tensor_copy)
249 assert not np.shares_memory(tensor, tensor_copy)
250 assert not np.shares_memory(tensor, tensor_view)
251
252
253def asynchronousExecutionModeUserAllocatedNumpyOutput(model_runner,
254 numpy_input):
255
256 output_descriptions = model_runner.getExecuteOutputs()
257 print("Preparing memory for output tensors")
258 numpy_output = {}
259 for output_desc in output_descriptions:
260 numpy_output[output_desc.name] = np.zeros(
261 output_desc.shape, dtype=output_desc.numpy_data_type())
262
263 print("Sending single asynchronous request with random data")
264 future = model_runner.executeAsync(numpy_input, numpy_output)
265
266 print("Waiting for the output.")
267 future.wait()
268 print("Results available.")
269 print("Processing output tensors:")
270 for desc in output_descriptions:
271 output_tensor = numpy_output[desc.name]
272 future_py_array_view = future[desc.name]
273
274 # Create a np.array view from the future_py_array_view using numpy()
275 # method, view points to np.array present in numpy_output dict
276 tensor_from_future_object = future_py_array_view.numpy()
277 print("\tname:", desc.name, "shape", tensor_from_future_object.shape,
278 "dtype", tensor_from_future_object.dtype, "\n",
279 tensor_from_future_object)
280 assert np.shares_memory(output_tensor, tensor_from_future_object)
281
282 # Create a np.array view from the future_py_array_view buffer, view
283 # points to np.array present in numpy_output dict
284 tensor_view = np.array(future_py_array_view, copy=False)
285 assert np.shares_memory(output_tensor, tensor_view)
286 assert np.shares_memory(tensor_from_future_object, tensor_view)
287
288 # Create a np.array copy from the future_py_array_view buffer
289 tensor_copy = np.array(future_py_array_view, copy=True)
290 assert not np.shares_memory(tensor_from_future_object, tensor_copy)
291 assert not np.shares_memory(output_tensor, tensor_copy)
292
293
294if __name__ == "__main__":
295 main()
3.2. Replication
The ModelRunner
class allows to specify the
replication factor inside ModelRunnerConfig
passed
to its constructor. As a result of setting this option, a
ModelRunner
object will create as many IPU model
replica instances as requested, as far as the required number of devices is
available. Each execution mode accepts as the last parameter
unsigned replica_id
which decides to which replica the request will be sent.
This example creates two replicas and sends inference requests to each of them using the C++ API.
1// Copyright (c) 2022 Graphcore Ltd. All rights reserved.
2#include <string>
3#include <unordered_map>
4#include <vector>
5
6#include <boost/program_options.hpp>
7
8#include "model_runtime/ModelRunner.hpp"
9#include "model_runtime/Tensor.hpp"
10#include "utils.hpp"
11
12/* The example shows loading a model from PopEF files, creating 2 model replicas
13 * and sending inference requests to each of them.
14 */
15int main(int argc, char *argv[]) {
16 static constexpr unsigned num_replicas = 2;
17
18 using namespace std::chrono_literals;
19 static const char *example_desc = "Model runner simple example.";
20 const boost::program_options::variables_map vm =
21 examples::parsePopefProgramOptions(example_desc, argc, argv);
22 const auto popef_paths = vm["popef"].as<std::vector<std::string>>();
23
24 model_runtime::ModelRunnerConfig config;
25 config.device_wait_config =
26 model_runtime::DeviceWaitConfig{600s /*timeout*/, 1s /*sleep_time*/};
27 examples::print(fmt::format(
28 "Setting model_runtime::ModelRunnerConfig replication_factor=",
29 num_replicas));
30
31 config.replication_factor = num_replicas;
32 model_runtime::ModelRunner model_runner(popef_paths, config);
33
34 for (unsigned replica_id = 0; replica_id < num_replicas; ++replica_id) {
35 examples::print("Allocating input tensors");
36 const model_runtime::InputMemory input_memory =
37 examples::allocateHostInputData(model_runner.getExecuteInputs());
38 examples::printInputMemory(input_memory);
39
40 examples::print(fmt::format(
41 "Sending single synchronous request with empty data - replica {}",
42 replica_id));
43
44 const model_runtime::OutputMemory output_memory = model_runner.execute(
45 examples::toInputMemoryView(input_memory), replica_id);
46
47 examples::print(fmt::format("Received output - replica {}", replica_id));
48
49 using OutputValueType =
50 std::pair<const std::string, model_runtime::TensorMemory>;
51
52 for (const OutputValueType &name_with_memory : output_memory) {
53 auto &&[name, memory] = name_with_memory;
54 examples::print(fmt::format("Output tensor {}, {} bytes", name,
55 memory.data_size_bytes));
56 }
57 }
58 examples::print("Success: exiting");
59 return EXIT_SUCCESS;
60}
Download model_runner_replication.cpp
This example creates two replicas and sends inference requests to each of them using the Python API.
1#!/usr/bin/env python3
2# Copyright (c) 2022 Graphcore Ltd. All rights reserved.
3
4import argparse
5from datetime import timedelta
6import numpy as np
7import model_runtime
8import popef
9"""
10The example shows loading a model from PopEF files, creating 2 model replicas
11and sending inference requests to each of them.
12"""
13
14
15def main():
16 parser = argparse.ArgumentParser("Model runner simple example.")
17 parser.add_argument(
18 "-p",
19 "--popef",
20 type=str,
21 metavar='popef_file_path',
22 help="A collection of PopEF files containing the model.",
23 nargs='+',
24 required=True)
25 args = parser.parse_args()
26
27 num_replicas = 2
28 # Create model runner
29 config = model_runtime.ModelRunnerConfig()
30 config.replication_factor = num_replicas
31 config.device_wait_config = model_runtime.DeviceWaitConfig(
32 model_runtime.DeviceWaitStrategy.WAIT_WITH_TIMEOUT,
33 timeout=timedelta(seconds=600),
34 sleepTime=timedelta(seconds=1))
35
36 print("Creating ModelRunner with", config)
37 runner = model_runtime.ModelRunner(model_runtime.PopefPaths(args.popef),
38 config=config)
39
40 input_descriptions = runner.getExecuteInputs()
41
42 input = model_runtime.InputMemoryView()
43
44 print("Preparing input tensors:")
45 input_descriptions = runner.getExecuteInputs()
46 input_tensors = [
47 np.random.randn(*input_desc.shape).astype(input_desc.numpy_data_type())
48 for input_desc in input_descriptions
49 ]
50 input_view = model_runtime.InputMemoryView()
51
52 for input_desc, input_tensor in zip(input_descriptions, input_tensors):
53 print("\tname:", input_desc.name, "shape:", input_tensor.shape,
54 "dtype:", input_tensor.dtype)
55 input_view[input_desc.name] = input_tensor
56
57 for replica_id in range(num_replicas):
58 print("Sending single synchronous request with empty data - replica",
59 replica_id, ".")
60 result = runner.execute(input_view, replica_id=replica_id)
61 output_descriptions = runner.getExecuteOutputs()
62
63 print("Processing output tensors - replica", replica_id, ":")
64 for output_desc in output_descriptions:
65 output_tensor = np.frombuffer(
66 result[output_desc.name],
67 dtype=output_desc.numpy_data_type()).reshape(output_desc.shape)
68 print("\tname:", output_desc.name, "shape:", output_tensor.shape,
69 "dtype:", output_tensor.dtype, "\n", output_tensor)
70
71 print("Success: exiting")
72 return 0
73
74
75if __name__ == "__main__":
76 main()
3.3. Multithreading
By default, ModelRunner
is not thread-safe. When
many threads call execute()
or
executeAsync()
it leads to race
conditions and undefined behavior. To avoid undesirable situations in a
multithreaded environment when using ModelRunner
,
the user must ensure that appropriate synchronization mechanisms between threads
are applied. The alternative is to set
thread_safe
in
ModelRunnerConfig
to True. Consequently, every call
of execute()
or
executeAsync()
will cause the
internal std::mutex
instance to lock.
This example creates several threads and each one sends inference requests to the IPU using the C++ API.
1// Copyright (c) 2022 Graphcore Ltd. All rights reserved.
2#include <array>
3#include <string>
4#include <vector>
5
6#include <boost/program_options.hpp>
7
8#include "model_runtime/ModelRunner.hpp"
9#include "model_runtime/Tensor.hpp"
10#include "utils.hpp"
11
12namespace examples {
13
14void workerMain(model_runtime::ModelRunner &model_runner);
15
16} // namespace examples
17
18/* The example shows loading a model from PopEF files and sending inference
19 * requests to the same model by multiple threads.
20 */
21int main(int argc, char *argv[]) {
22 using namespace std::chrono_literals;
23 static const char *example_desc =
24 "Model runner multithreading client example.";
25 const boost::program_options::variables_map vm =
26 examples::parsePopefProgramOptions(example_desc, argc, argv);
27 const auto popef_paths = vm["popef"].as<std::vector<std::string>>();
28
29 model_runtime::ModelRunnerConfig config;
30 config.device_wait_config =
31 model_runtime::DeviceWaitConfig{600s /*timeout*/, 1s /*sleep_time*/};
32 examples::print(
33 "Setting model_runtime::ModelRunnerConfig: thread safe = true");
34 config.thread_safe = true;
35 model_runtime::ModelRunner model_runner(popef_paths, config);
36
37 static constexpr unsigned num_workers = 4;
38 std::vector<std::thread> threads;
39 threads.reserve(num_workers);
40
41 examples::print(fmt::format("Starting {} worker threads", num_workers));
42 for (unsigned i = 0; i < num_workers; i++) {
43 threads.emplace_back(examples::workerMain, std::ref(model_runner));
44 }
45
46 for (auto &worker : threads) {
47 worker.join();
48 };
49
50 examples::print("Success: exiting");
51 return EXIT_SUCCESS;
52}
53
54namespace examples {
55
56void workerMain(model_runtime::ModelRunner &model_runner) {
57 examples::print("Starting workerMain()");
58
59 static constexpr unsigned num_requests = 5;
60 std::array<model_runtime::InputMemory, num_requests> requests_input_data;
61
62 for (unsigned req_id = 0; req_id < num_requests; req_id++) {
63 examples::print(
64 fmt::format("Allocating input tensors - request id {}", req_id));
65 requests_input_data[req_id] =
66 examples::allocateHostInputData(model_runner.getExecuteInputs());
67 }
68
69 std::vector<model_runtime::OutputFutureMemory> results;
70
71 for (unsigned req_id = 0; req_id < num_requests; req_id++) {
72 examples::print(
73 fmt::format("Sending asynchronous request. Request id {}", req_id));
74 results.emplace_back(model_runner.executeAsync(
75 examples::toInputMemoryView(requests_input_data[req_id])));
76 }
77
78 examples::print("Waiting for output:");
79 for (unsigned req_id = 0; req_id < num_requests; req_id++) {
80 auto &output_future_memory = results[req_id];
81
82 using OutputValueType =
83 std::pair<const std::string,
84 std::shared_future<model_runtime::TensorMemory>>;
85 for (const OutputValueType &name_with_future_memory :
86 output_future_memory) {
87 auto &&[name, future_memory] = name_with_future_memory;
88 examples::print(fmt::format(
89 "Waiting for the result: tensor {}, request_id {}", name, req_id));
90 future_memory.wait();
91 const model_runtime::TensorMemory &memory = future_memory.get();
92 examples::print(fmt::format(
93 "Output tensor {} available, request_id {} received {} bytes", name,
94 req_id, memory.data_size_bytes));
95 }
96 }
97}
98
99} // namespace examples
Download model_runner_multithreading.cpp
This example creates several threads and each one sends inference requests to the IPU using the Python API.
1#!/usr/bin/env python3
2# Copyright (c) 2022 Graphcore Ltd. All rights reserved.
3
4import argparse
5import threading
6from datetime import timedelta
7import numpy as np
8import model_runtime
9import popef
10"""
11The example shows loading a model from PopEF files and sending inference
12requests to the same model by multiple threads.
13"""
14
15
16def main():
17 parser = argparse.ArgumentParser("Model runner simple example.")
18 parser.add_argument(
19 "-p",
20 "--popef",
21 type=str,
22 metavar='popef_file_path',
23 help="A collection of PopEF files containing the model.",
24 nargs='+',
25 required=True)
26 args = parser.parse_args()
27
28 config = model_runtime.ModelRunnerConfig()
29 config.thread_safe = True
30 config.device_wait_config = model_runtime.DeviceWaitConfig(
31 model_runtime.DeviceWaitStrategy.WAIT_WITH_TIMEOUT,
32 timeout=timedelta(seconds=600),
33 sleepTime=timedelta(seconds=1))
34
35 print("Creating ModelRunner with", config)
36 model_runner = model_runtime.ModelRunner(model_runtime.PopefPaths(
37 args.popef),
38 config=config)
39 num_workers = 4
40 print("Starting", num_workers, "worker threads.")
41 threads = [
42 threading.Thread(target=workerMain, args=(model_runner, worker_id))
43 for worker_id in range(num_workers)
44 ]
45
46 for thread in threads:
47 thread.start()
48
49 for thread in threads:
50 thread.join()
51
52 print("Success: exiting")
53 return 0
54
55
56def workerMain(model_runner, worker_id):
57 print("Worker", worker_id, "Starting workerMain()")
58 num_requests = 5
59
60 input_descriptions = model_runner.getExecuteInputs()
61 input_requests = []
62
63 print("Worker", worker_id, "Allocating input tensors for", num_requests,
64 "requests", input_descriptions)
65 for _ in range(num_requests):
66 input_requests.append([
67 np.random.randn(*input_desc.shape).astype(
68 input_desc.numpy_data_type())
69 for input_desc in input_descriptions
70 ])
71
72 futures = []
73
74 for req_id in range(num_requests):
75 print("Worker", worker_id, "Sending asynchronous request. Request id",
76 req_id)
77 input_view = model_runtime.InputMemoryView()
78 for input_desc, input_tensor in zip(input_descriptions,
79 input_requests[req_id]):
80 input_view[input_desc.name] = input_tensor
81 futures.append(model_runner.executeAsync(input_view))
82
83 print("Worker", worker_id, "Processing outputs.")
84 for req_id, future in enumerate(futures):
85 print("Worker", worker_id, "Waiting for the result - request", req_id)
86 future.wait()
87 print("Worker", worker_id, "Result available - request", req_id)
88
89
90if __name__ == "__main__":
91 main()
3.4. Frozen inputs
The ModelRunner
class allows binding constant
tensors by setting frozen_inputs
in ModelRunnerConfig
.
frozen_inputs
is
an instance of InputMemoryView
. The user allocates
and passes the pointer to the data for the selected input tensors. If the tensor
is the input required during the execution call, it will no longer be required
and the tensor from frozen_inputs
will always be added to the request. If the tensor is the input saved as PopEF
tensor data or feed data, it will be overridden by tensor from
frozen_inputs
.
Note
Examples Listing 3.7 and Listing 3.8 rely on a PopEF file generated by the code Listing 9.2.
This example binds a constant value to one of the inputs and sends inference requests to the IPU modes using the C++ API.
1// Copyright (c) 2022 Graphcore Ltd. All rights reserved.
2#include <algorithm>
3#include <string>
4#include <vector>
5
6#include <boost/program_options.hpp>
7
8#include <popef/Model.hpp>
9#include <popef/Reader.hpp>
10#include <popef/Types.hpp>
11
12#include "model_runtime/ModelRunner.hpp"
13#include "model_runtime/Tensor.hpp"
14#include "utils.hpp"
15
16namespace examples {
17
18std::shared_ptr<popef::Model>
19createPopefModel(const std::vector<std::string> &popef_paths);
20const popef::Anchor *findAnchor(const std::string &name, popef::Model *model);
21std::vector<float> createFrozenTensorData(const popef::Anchor *anchor);
22
23} // namespace examples
24
25/* The example shows loading a model from PopEF file and binding constant tensor
26 * value to one of the inputs. The example is based on the PopEF file generated
27 * by `model_runtime_example_generate_simple_popef` example. Generated PopEF
28 * file consists simple model:
29 *
30 * output = (A * weights) + B
31 *
32 * where A and B are stream inputs, weights is tensor saved as popef::TensorData
33 * and and output is result stream output tensor.
34 */
35int main(int argc, char *argv[]) {
36 using namespace std::chrono_literals;
37 static const char *example_desc = "Model runner frozen inputs example.";
38 const boost::program_options::variables_map vm =
39 examples::parsePopefProgramOptions(example_desc, argc, argv);
40 const auto popef_paths = vm["popef"].as<std::vector<std::string>>();
41
42 std::shared_ptr<popef::Model> model = examples::createPopefModel(popef_paths);
43
44 static const std::string frozen_input_name = "tensor_B";
45
46 examples::print(fmt::format("Looking for tensor {} inside PopEF model.",
47 frozen_input_name));
48 const popef::Anchor *tensor_b_anchor =
49 examples::findAnchor(frozen_input_name, model.get());
50 examples::print(fmt::format("Found {}.", *tensor_b_anchor));
51
52 examples::print("Creating frozen input tensor data.");
53 const std::vector<float> tensor_b_data =
54 examples::createFrozenTensorData(tensor_b_anchor);
55
56 examples::print("Creating ModelRunnerConfig.");
57 model_runtime::ModelRunnerConfig config;
58
59 examples::print(fmt::format("Tensor {} is frozen - will be treated as "
60 "constant in each execution request.",
61 frozen_input_name));
62 const uint64_t tensor_b_size_in_bytes =
63 tensor_b_anchor->tensorInfo().sizeInBytes();
64
65 config.frozen_inputs = {
66 {frozen_input_name, model_runtime::ConstTensorMemoryView{
67 tensor_b_data.data(), tensor_b_size_in_bytes}}};
68
69 config.device_wait_config =
70 model_runtime::DeviceWaitConfig{600s /*timeout*/, 1s /*sleep_time*/};
71
72 model_runtime::ModelRunner model_runner(model, config);
73
74 examples::print("Allocating input tensors");
75
76 const model_runtime::InputMemory input_memory =
77 examples::allocateHostInputData(model_runner.getExecuteInputs());
78
79 examples::printInputMemory(input_memory);
80
81 examples::print("Sending single synchronous request with empty data.");
82 const model_runtime::OutputMemory output_memory =
83 model_runner.execute(examples::toInputMemoryView(input_memory));
84
85 examples::print("Received output:");
86
87 using ValueType = std::pair<const std::string, model_runtime::TensorMemory>;
88
89 for (const ValueType &name_with_memory : output_memory) {
90 auto &&[name, memory] = name_with_memory;
91 examples::print(fmt::format("Output tensor {}, {} bytes", name,
92 memory.data_size_bytes));
93 }
94
95 examples::print("Success: exiting");
96 return EXIT_SUCCESS;
97}
98
99namespace examples {
100
101std::shared_ptr<popef::Model>
102createPopefModel(const std::vector<std::string> &popef_paths) {
103 auto reader = std::make_shared<popef::Reader>();
104 for (const auto &path : popef_paths)
105 reader->parseFile(path);
106
107 return popef::ModelBuilder(reader).createModel();
108}
109
110const popef::Anchor *findAnchor(const std::string &name, popef::Model *model) {
111 const auto &anchors = model->metadata.anchors();
112
113 const auto anchor_it = std::find_if(
114 anchors.cbegin(), anchors.cend(),
115 [&](const popef::Anchor &anchor) { return anchor.name() == name; });
116
117 if (anchor_it == anchors.cend()) {
118 throw std::runtime_error(fmt::format(
119 "Anchor {} not found in given model. Please make sure that PopEF was "
120 "generated by `model_runtime_example_generate_simple_popef`.",
121 name));
122 }
123
124 if (auto anchorDataType = anchor_it->tensorInfo().dataType();
125 anchorDataType != popef::DataType::F32) {
126 throw std::runtime_error(fmt::format(
127 "Example expects anchor {} with popef::DataType::F32. Received {}",
128 name, anchorDataType));
129 }
130
131 return &(*anchor_it);
132}
133
134std::vector<float> createFrozenTensorData(const popef::Anchor *anchor) {
135 const auto size_in_bytes = anchor->tensorInfo().sizeInBytes();
136 const auto num_elements = size_in_bytes / sizeof(float);
137
138 return std::vector<float>(num_elements, 11.0f);
139}
140
141} // namespace examples
Download model_runner_frozen_inputs.cpp
This example binds a constant value to one of the inputs and sends inference requests to the IPU modes using the Python API.
1#!/usr/bin/env python3
2# Copyright (c) 2022 Graphcore Ltd. All rights reserved.
3
4import os
5import argparse
6from datetime import timedelta
7import numpy as np
8import model_runtime
9import popef
10"""
11The example shows loading a model from PopEF file and binding constant tensor
12value to one of the inputs. The example is based on the PopEF file generated
13by `model_runtime_example_generate_simple_popef` example. Generated PopEF
14file consists simple model:
15
16output = (A * weights) + B
17
18where A and B are stream inputs, weights is a tensor saved as popef::TensorData
19and output is result stream output tensor.
20"""
21
22
23def main():
24 parser = argparse.ArgumentParser("Model runner simple example.")
25 parser.add_argument(
26 "-p",
27 "--popef",
28 type=str,
29 metavar='popef_file_path',
30 help="A collection of PopEF files containing the model.",
31 nargs='+',
32 required=True)
33 args = parser.parse_args()
34 model = load_model(args.popef)
35
36 frozen_input_name = "tensor_B"
37 print("Looking for tensor", frozen_input_name, "inside PopEF model.")
38 tensor_b_anchor = popef.Anchor()
39
40 for anchor in model.metadata.anchors():
41 if anchor.name() == frozen_input_name:
42 tensor_b_anchor = anchor
43 break
44 else:
45 raise Exception(f'Anchor {frozen_input_name} not found inside givem '
46 'model. Please make sure that PopEF was generated by '
47 '`model_runtime_example_generate_simple_popef`')
48
49 print("Generating", frozen_input_name, "random values")
50 tensor_b_info = tensor_b_anchor.tensorInfo()
51 tensor_b = np.random.randn(*tensor_b_info.shape()).astype(
52 tensor_b_info.numpyDType())
53
54 config = model_runtime.ModelRunnerConfig()
55
56 frozen_inputs = model_runtime.InputMemoryView()
57 frozen_inputs[frozen_input_name] = tensor_b
58 config.frozen_inputs = frozen_inputs
59
60 print(
61 "Tensor", frozen_input_name, "is frozen - will be treated as "
62 "constant in each execution request.")
63 config.device_wait_config = model_runtime.DeviceWaitConfig(
64 model_runtime.DeviceWaitStrategy.WAIT_WITH_TIMEOUT,
65 timeout=timedelta(seconds=600),
66 sleepTime=timedelta(seconds=1))
67
68 model_runner = model_runtime.ModelRunner(model, config=config)
69
70 print("Preparing input tensors:")
71 input_descriptions = model_runner.getExecuteInputs()
72 input_tensors = [
73 np.random.randn(*input_desc.shape).astype(input_desc.numpy_data_type())
74 for input_desc in input_descriptions
75 ]
76 input_view = model_runtime.InputMemoryView()
77
78 for input_desc, input_tensor in zip(input_descriptions, input_tensors):
79 print("\tname:", input_desc.name, "shape:", input_tensor.shape,
80 "dtype:", input_tensor.dtype)
81 input_view[input_desc.name] = input_tensor
82
83 print("Sending single synchronous request with empty data.")
84 result = model_runner.execute(input_view)
85 output_descriptions = model_runner.getExecuteOutputs()
86
87 print("Processing output tensors:")
88 for output_desc in output_descriptions:
89 output_tensor = np.frombuffer(
90 result[output_desc.name],
91 dtype=output_desc.numpy_data_type()).reshape(output_desc.shape)
92 print("\tname:", output_desc.name, "shape:", output_tensor.shape,
93 "dtype:", output_tensor.dtype, "\n", output_tensor)
94
95 print("Success: exiting")
96
97 return 0
98
99
100def load_model(popef_paths):
101 for model_file in popef_paths:
102 assert os.path.isfile(model_file) is True
103 reader = popef.Reader()
104 reader.parseFile(model_file)
105
106 meta = reader.metadata()
107 exec = reader.executables()
108 return popef.ModelBuilder(reader).createModel()
109
110
111if __name__ == "__main__":
112 main()