10. C++ API reference
10.1. High-level API
10.1.1. Device management
-
enum class model_runtime::DeviceWaitStrategy
Defines the different options for waiting for a device to become available.
Values:
-
enumerator NO_WAIT
An exception will be thrown if no IPU device is immediately available.
-
enumerator WAIT_WITH_TIMEOUT
The device manager will wait for a specified amount of time for an IPU device to become available.
The device manager will try to attach to the required device at a specified interval.
-
enumerator WAIT_FOREVER
The device manager will wait until an IPU device is available.
The device manager will try to attach to the required device at a specified interval.
-
enumerator NO_WAIT
-
struct DeviceWaitConfig
The configuration of how to wait for the expected device.
Public Functions
-
constexpr DeviceWaitConfig() = default
Constructor with default values.
-
inline constexpr DeviceWaitConfig(DeviceWaitStrategy p_strategy, std::chrono::seconds p_timeout = std::chrono::seconds{0}, std::chrono::seconds p_sleepTime = std::chrono::seconds{15})
Constructor with specified device waiting strategy.
- Parameters
p_strategy – [in] The device waiting strategy.
p_timeout – [in] The time in seconds to wait for a device. Only required if
p_strategy
is DeviceWaitStrategy::WAIT_WITH_TIMEOUT.p_sleepTime – [in] The time in seconds between attach attempts. Required if
p_strategy
is DeviceWaitStrategy::WAIT_WITH_TIMEOUT or DeviceWaitStrategy::WAIT_FOREVER.
-
inline explicit constexpr DeviceWaitConfig(std::chrono::seconds p_timeout, std::chrono::seconds p_sleepTime = std::chrono::seconds{15})
Constructor with device waiting strategy DeviceWaitStrategy::WAIT_WITH_TIMEOUT.
This means that the device manager will wait for a finite amount of time,
p_timeout
, for a device to become available.- Parameters
p_timeout – [in] The time in seconds to wait for a device.
p_sleepTime – [in] The time in seconds between attach attempts.
Public Members
-
DeviceWaitStrategy strategy = {DeviceWaitStrategy::NO_WAIT}
The device waiting strategy.
-
std::chrono::seconds timeout = {0}
The time in seconds to wait for if no device is currently available.
-
std::chrono::seconds sleepTime = {15}
The time in seconds between attach attempts.
-
constexpr DeviceWaitConfig() = default
-
struct DeviceConstraints
Requirements for the device that the user wants to connect to.
Public Functions
-
constexpr DeviceConstraints() = default
Constructor with default values.
-
inline explicit constexpr DeviceConstraints(bool p_requiresRemoteBuffersSupport)
Constructor with specified values.
- Parameters
p_requiresRemoteBuffersSupport – [in] If true, the device has to support remote buffers. If false, the device does not have to support remote buffers. Default: false.
-
inline constexpr operator bool() const
Check whether the selection of the device has any constraints.
If true, the device has constraints; if false, it does not.
Public Members
-
bool requiresRemoteBuffersSupport = false
Specify that the device has to support remote buffers.
If true, the device has to support remote buffers. If false, the device does not have to support remote buffers. Default: false.
-
constexpr DeviceConstraints() = default
-
class Device
Create a device.
This is a wrapper around a Poplar device.
Public Functions
-
Device(poplar::Device device, int64_t ipu_version)
Constructor with specified values.
- Parameters
device – [in] The Poplar device. This is a device that can execute code.
ipu_version – [in] The architecture version of the IPU.
-
int64_t ipuVersion() const
Get the architecture version of the device.
- Returns
The architecture version of the IPU or -1 if unknown.
Protected Functions
-
bool isActiveSession(const Session *session) const
Get whether a session is the active session for this device.
True if the session is the active session for this device, false otherwise.
-
void bindToSession(Session *session)
Bind the device to a session.
If the device is bound to a different session then unbindSession() is called on that session first.
- Parameters
session – [in] The new session to bind the device to.
-
void unbindSession()
Unbind the device from the active session.
Friends
- friend class Session
-
Device(poplar::Device device, int64_t ipu_version)
-
class DeviceManager
Select which device to run on.
Public Functions
-
DeviceManager &operator=(const DeviceManager &other) = delete
-
DeviceManager &operator=(DeviceManager&&) = delete
-
DeviceManager()
Constructor with default values.
-
DeviceManager(const DeviceManager&) = default
Default copy constructor.
-
DeviceManager(DeviceManager&&) = default
Default forward constructor.
-
int64_t ipuHardwareVersion()
- Returns
Either:
The architecture version of the IPU in the system. For example, 2 for the Mk2 IPU (GC200 or Bow) or 21 for the Mk2 IPU with FP8 support (C600).
-1 if there is an IPU but the architecture is unknown.
0 if there is no IPU in the system.
Get a device matching the configuration needed by the given model.
- Parameters
model – [in] A PopEF model. This method gets the number of IPUs and the option flags from the model.
wait_config – [in] The configuration of how to wait for the requested device.
- Returns
The device, if attachment is successful, otherwise throws an error.
Try to get a device matching the configuration needed by the given model.
- Parameters
model – [in] A PopEF model. This method gets the number of IPUs and the option flags from the model.
wait_config – [in] The configuration of how to wait for the requested device.
- Returns
The device, if attachment is successful, otherwise a nullptr.
-
std::shared_ptr<Device> getDevice(int64_t num_ipus, const poplar::OptionFlags &device_options = {}, const DeviceWaitConfig &wait_config = {}, const DeviceConstraints &constrains = {})
Get a device matching the requested configuration.
- Parameters
num_ipus – [in] The number of IPUs the device must contain.
device_options – [in] The device options.
wait_config – [in] The configuration of how to wait for the expected device.
constrains – [in] The set of constraints that the device must meet.
- Returns
The device, if attachment is successful, otherwise throws an error.
-
std::shared_ptr<Device> tryGetDevice(int64_t num_ipus, const poplar::OptionFlags &device_options = {}, const DeviceWaitConfig &wait_config = {}, const DeviceConstraints &constrains = {})
Try to get a device matching the requested configuration.
- Parameters
num_ipus – [in] The number of IPUs the device must contain.
device_options – [in] The device options.
wait_config – [in] The configuration of how to wait for the expected device.
constrains – [in] The set of constraints that the device must meet.
- Returns
The device, if attachment is successful, otherwise a nullptr.
Get a specific device matching the requested configuration.
- Parameters
device_id – [in] The ID of the device to acquire.
model – [in] A PopEF model. This method gets the option flags from the model.
wait_config – [in] The configuration of how to wait for the expected device.
- Returns
The device, if attachment is successful, otherwise throws an error.
-
std::shared_ptr<Device> getSpecificDevice(int64_t device_id, const poplar::OptionFlags &device_options = {}, const DeviceWaitConfig &wait_config = {})
Get a specific device matching the requested configuration.
- Parameters
device_id – [in] The ID of the device to acquire.
device_options – [in] The device options.
wait_config – [in] The configuration of how to wait for the expected device.
- Returns
The device, if attachment is successful, otherwise throws an error.
Try to get a specific device matching the requested configuration.
- Parameters
device_id – [in] The ID of the device to acquire.
model – [in] A PopEF model. This method gets the option flags from the model.
wait_config – [in] The configuration of how to wait for the expected device.
- Returns
The device, if attachment is successful, otherwise a nullptr.
-
std::shared_ptr<Device> tryGetSpecificDevice(int64_t device_id, const poplar::OptionFlags &device_options = {}, const DeviceWaitConfig &wait_config = {})
Try to get a specific device matching the requested configuration.
- Parameters
device_id – [in] The ID of the device to acquire.
device_options – [in] The device options.
wait_config – [in] The configuration of how to wait for the expected device.
- Returns
The device, if attachment is successful, otherwise a nullptr.
-
std::shared_ptr<Device> createIpuModelDevice(int64_t num_ipus, int64_t ipu_version = 2, int64_t tiles_per_ipu = 0)
Create a model of the device.
- Parameters
num_ipus – [in] The number of IPUs the device must contain.
ipu_version – [in] The target architecture version of the IPU.
tiles_per_ipu – [in] The number of tiles per IPU the model will have. If 0, defaults to the number of tiles for the IPU architecture.
- Returns
The model of the device.
Create a model of the device.
- Parameters
model – [in] A PopEF model. This method gets the number of IPUs and the IPU architecture version from the model.
tiles_per_ipu – [in] The number of tiles per IPU the model will have. If 0, defaults to the number of tiles for the chosen IPU architecture.
- Returns
The model of the device.
-
std::shared_ptr<Device> createSmallIpuModelDevice(int64_t num_ipus, int64_t ipu_version = 2)
Create a small IPU model of the device.
Small IPU models only have 4 tiles: they are much quicker to create and run than a full size model but can only run small programs.
- Parameters
num_ipus – [in] The number of IPUs the device must contain.
ipu_version – [in] The target architecture version of the IPU.
- Returns
A device with the model that was created.
Create a small IPU model of the device.
Small IPU models only have 4 tiles: they are much quicker to create and run than a full size model but can only run small programs.
- Parameters
model – [in] A PopEF model.
- Returns
A device with the model attached.
-
DeviceManager &operator=(const DeviceManager &other) = delete
10.1.2. Tensor memory representation
-
struct TensorMemoryView
Mutable view to already allocated memory.
-
struct ConstTensorMemoryView
Immutable view to already allocated memory.
Public Functions
-
ConstTensorMemoryView() = default
Default constructor.
-
ConstTensorMemoryView(const TensorMemoryView &other)
Default copy constructor.
-
ConstTensorMemoryView(const void *data, uint64_t data_size_bytes)
Immutable view to const memory.
- Parameters
data – [in] The pointer to the allocated memory.
data_size_bytes – [in] The size of the memory block, in bytes.
-
ConstTensorMemoryView() = default
-
struct TensorMemory
Tensor memory manager responsible for allocating, storing, sharing and releasing tensor memory.
Public Functions
-
TensorMemory() = default
Default constructor.
-
TensorMemory(int64_t data_size_bytes)
Allocate the user-requested memory block and store it in a
shared_ptr
.- Parameters
data_size_bytes – [in] The size of the memory block, in bytes.
-
TensorMemoryView getView()
Get the mutable memory view.
-
ConstTensorMemoryView getConstView()
Get the immutable memory view.
-
ConstTensorMemoryView getView() const
Get the immutable memory view.
-
TensorMemory() = default
10.1.3. ModelRunner
-
using model_runtime::InputMemoryView = std::unordered_map<std::string, ConstTensorMemoryView>
Mapping between a tensor name and an immutable memory view.
Used as input to ModelRunner::execute.
-
using model_runtime::OutputMemoryView = std::unordered_map<std::string, TensorMemoryView>
Mapping between a tensor name and a memory view.
Used as output from ModelRunner::execute, when the output memory is allocated and managed by the ModelRunner client.
-
using model_runtime::OutputMemory = std::unordered_map<std::string, TensorMemory>
Mapping between a tensor name and memory.
Used as output from ModelRunner::execute output when memory is allocated during execution by the library.
-
using model_runtime::OutputFutureMemoryView = std::unordered_map<std::string, std::shared_future<TensorMemoryView>>
Mapping between a tensor name and a future memory view.
Used as output from the asynchronous ModelRunner::execute when the output memory is allocated and managed by the ModelRunner client.
-
using model_runtime::OutputFutureMemory = std::unordered_map<std::string, std::shared_future<TensorMemory>>
Mapping between a tensor name and a future memory.
Used as output from ModelRunner::execute output when memory is allocated during execution by the library.
-
struct DataDesc
The description of the data used by ModelRunner.
Public Functions
-
DataDesc(std::string name, int64_t size_in_bytes, std::vector<int64_t> shape, popef::DataType data_type, bool popef_contains_tensor_data = false)
Create a description of input or output data.
- Parameters
name – [in] The name of the input or output tensor.
size_in_bytes – [in] The size of the tensor measured in bytes.
shape – [in] A vector defining the shape of the tensor. The size of the vector is equal to the number of tensor dimensions. Each element of the vector indicates the size of the corresponding dimension.
data_type – [in] The data type of a single tensor element.
popef_contains_tensor_data – [in] If true, the model has a tensor data blob associated with the tensor. If false, the model does not have a tensor data blob associated with the tensor. Default: false.
Public Members
-
std::string name
The name of the input or output tensor.
-
int64_t size_in_bytes
The size of the tensor measured in bytes.
-
std::vector<int64_t> shape
A vector defining the shape of the tensor.
The size of the vector is equal to the number of tensor dimensions. Each element of the vector indicates the size of the corresponding dimension.
-
bool popef_contains_tensor_data
If true, the model has a tensor data blob associated with the tensor.
If false, the model does not have a tensor data blob associated with the tensor.
-
DataDesc(std::string name, int64_t size_in_bytes, std::vector<int64_t> shape, popef::DataType data_type, bool popef_contains_tensor_data = false)
-
using model_runtime::InputDesc = DataDesc
Description of input data required by ModelRunner.
-
using model_runtime::OutputDesc = DataDesc
Description of output data required by ModelRunner.
-
using model_runtime::ReplicaIdToOutputMemoryView = std::unordered_map<unsigned, OutputMemoryView>
Mapping of replicas to the memory allocated for the output tensors required by ModelRunner.
-
using model_runtime::ReplicaIdToDevice = std::unordered_map<unsigned, std::shared_ptr<Device>>
Mapping of replicas to physical entities that can execute the IPU programs.
-
struct ModelRunnerConfig
ModelRunner configuration options.
Public Members
-
unsigned replication_factor = 1
Number of replicas to be created.
-
bool run_save_programs = false
If true, “save” programs will be called on ModelRunner instance destruction.
If false, “save” programs will not be called. Default: false.
-
bool thread_safe = false
If true, the mutex will be locked on each execution call.
If false, the mutex will not be locked. By default the model runner is not thread-safe and each replica has an independent mutex. Default: false.
-
InputMemoryView frozen_inputs = {}
Map of the user-data required by the model when it is loaded onto hardware as well as any data that will be considered as constant during model execution.
This allows the overwriting of tensor data saved inside a PopEF file.
-
ReplicaIdToOutputMemoryView replica_to_save_programs_outputs = {}
Mapping between replica ID and OutputMemoryView.
The PopEF data format allows for the creation of “save” programs that will be executed on ModelRunner instance destruction, if required. This function allows you to get data returned by these “save” programs.
-
ReplicaIdToDevice replica_to_device = {}
Mapping between replica ID and Device.
This allows you to set a specific device for a given replica. By default, the model runner assigns devices automatically.
-
DeviceWaitConfig device_wait_config = {}
By default, the model runner throws an exception when it is not able to attach to any device required by the given model.
This behaviour can be changed by setting a custom DeviceWaitConfig.
-
bool check_package_hash = true
If true, the Poplar hash will be checked before the executable is loaded onto the device.
If false, this check is not done. Default: true.
-
std::chrono::nanoseconds timeout_ns = std::chrono::seconds(5)
Duration in nanoseconds before the timeout callback is called when the IPU is waiting for input data which is not available.
If 0, never call the timeout, in other words wait forever for the data.
-
bool validate_io_params = true
If true, the I/O parameters will be checked when the ModelRunner “execute” functions are run.
If false, this check is not done. Default: true.
-
unsigned batching_dim = std::numeric_limits<unsigned>::max()
Enables dynamic batch sizing and specifies the dimension of the input and output data that contains the dynamic batch size.
By default, dynamic batch sizing is disabled and ModelRunner can only accept inputs and outputs where the batch size is an integer multiple of the batch size defined in the PopEF model.
Dynamic batch sizing is disabled when
batching_dim
is set to 0xFFFFFFFF.To enable dynamic batch sizing, set
batching_dim
to the value of the dimension that contains the dynamic batch size. This value can be any positive integer less than the maximum dimension of the model. ModelRunner will accept the batch size specified in the tensor dimension specified bybatching_dim
. The batch size can now be any value.
-
bool auto_reset = false
If true, the IPU will be reset automatically before the next inference when:
an application runtime error occurs, or
a recoverable error occurs and the RecoveryAction is
IPU_RESET
.
If false, an exception will be raised and no action will be taken. Default: false.
-
std::string max_look_ahead = "unlimited"
The number of host synchronization points that Model Runtime is allowed to prepare in advance.
This prevents the IPU from being idle.
Possible values:
”unlimited”: Model Runtime decides the best value. (Default)
”x”: An unsigned integer value.
The default value of “unlimited” usually offers the best performance. If there are deadlocks or other timeouts which are not the result of other errors, set this value to 0.
Possible use case for
max_look_ahead
set to 0:The model is compiled with prefetch enabled.
Device iterations is greater than 1.
A single input for the
execute
function is not enough to run an iteration number equal to device iterations.
An alternative to setting
max_look_ahead
to 0 is to useexecuteAsync
to handle requests. In this case, the application must know when the data will be ready to read.
-
unsigned ring_buffer_size_multiplier = 2
The multiplier used to determine the size of the ring buffer.
The ring buffer size is given by
ring_buffer_size_multiplier
*batch size
.
-
bool flush_on_waiting_outputs = false
If true, only flush the dummy data when there is a user request waiting for outputs from queues.
Otherwise, wait forever for user-requested data in the callback. If false, flush the dummy data anyway.
-
std::chrono::nanoseconds batch_size_timeout_ns = std::chrono::nanoseconds::max()
Duration in nanoseconds before the timeout callback is called when the IPU is waiting for input data to compose a batch to load.
If
max()
, the callback will never be called, in other words, the model will wait forever for the data.
-
std::chrono::nanoseconds data_parallel_timeout_ns = std::chrono::nanoseconds::max()
Duration in nanoseconds before the timeout callback is called when the IPU is waiting in parallel for input data.
If
max()
, the callback will never be called, in other words, the model will wait forever for the data.
-
bool is_batch_size_timeout_enabled = false
If true, check the batch_size_timeout_ns and data_parallel_timeout_ns in the timeout callback when the IPU is waiting for input data which doesn’t arrive.
-
size_t request_tracepoints_buffer_size = 1000
Size of the request tracepoint buffer.
If set to 0, the request tracepoint buffer size will be infinite. Note: if
flush_on_waiting_outputs
is set to false, no request tracepoints will not be recorded.
-
std::function<void(const std::string &tensor_id, const InputMemoryView *&inputs, const OutputMemoryView *&outputs)> flush_callback = nullptr
The config option that specifies the pointer to the callback function for flushing data that is not available.
flush_callback
indicates to Model Runtime which data to flush.If specified, the
flush_callback
function will be called once when the IPU has timed out after waiting for input data, which is not available.The application which defines the
flush_callback
function must ensure that the inputs and outputs it specifies for Model Runtime to flush are correct. Otherwise the results of this and following inferences will be corrupted.- Param tensor_id
[in] The ID of the tensor that Model Runtime was waiting for when the timeout occurred.
- Param inputs
[in] The pointer to the input data structure to be flushed. Model Runtime uses this to determine which inputs should be flushed after the
flush_callback
function is called.- Param outputs
[in] The pointer to the output data structure to be flushed. Model Runtime uses this to determine which outputs should be flushed after the
flush_callback
function is called.
-
unsigned replication_factor = 1
-
class ModelRunner
Inference model abstraction.
The model runner creates a session, manages queues, runs Poplar executable programs and allows execution of inference models synchronously and asynchronously.
Public Functions
-
ModelRunner(const ModelRunner&) = delete
-
ModelRunner &operator=(const ModelRunner &other) = delete
-
ModelRunner(ModelRunner&&) = default
Default forward constructor.
-
ModelRunner &operator=(ModelRunner&&) = default
Default move assignment operator.
-
explicit ModelRunner(const std::string &popef_path, const ModelRunnerConfig &config = ModelRunnerConfig{})
Create a new ModelRunner object.
- Parameters
popef_path – The path to PopEF files from which the model will be loaded.
config – The model runner configuration.
-
explicit ModelRunner(const std::vector<std::string> &popef_paths, const ModelRunnerConfig &config = ModelRunnerConfig{})
Create a new ModelRunner object.
- Parameters
popef_paths – Paths to PopEF files from which the model will be loaded.
config – The model runner configuration.
Create a new ModelRunner object.
- Parameters
model – The model which will be loaded and run.
config – The model runner configuration.
-
~ModelRunner()
Default destructor.
-
OutputMemory execute(const InputMemoryView &input_data, unsigned replica_id = 0)
Run model synchronously.
This will allocate output memory internally.
- Parameters
input_data – [in] The user-allocated tensor buffer for all executable input tensors.
replica_id – [in] The user-selected replica that will execute computations. Must be less than
replication_factor
provided in ModelRunnerConfig.
- Returns
The output memory allocated by Model Runtime for all executable output tensors.
-
void execute(const InputMemoryView &input_data, const OutputMemoryView &output_data, unsigned replica_id = 0)
Run a model synchronously.
This uses output memory that you allocate and pass pointers to.
- Parameters
input_data – [in] The user-allocated tensor buffer for all executable input tensors.
output_data – [in] The user-allocated tensor buffer for all executable output tensors.
replica_id – [in] The user-selected replica that will execute computations. Must be less than
replication_factor
provided in ModelRunnerConfig.
-
OutputFutureMemory executeAsync(const InputMemoryView &input_data, unsigned replica_id = 0)
Run a model asynchronously.
This will allocate output memory internally.
- Parameters
input_data – [in] The user-allocated tensor buffer for all executable input tensors.
replica_id – [in] The user-selected replica that will execute computations. Must be less than
replication_factor
provided in ModelRunnerConfig.
- Returns
The future result of an asynchronous call for all executable output tensors.
-
OutputFutureMemoryView executeAsync(const InputMemoryView &input_data, const OutputMemoryView &output_data, unsigned replica_id = 0)
Run a model asynchronously.
This uses output memory that you allocate and pass pointers to.
- Parameters
input_data – [in] The user-allocated tensor buffer for all executable input tensors.
output_data – [in] The user-allocated tensor buffer for all executable output tensors.
replica_id – [in] The user-selected replica that will execute computations. Must be less than replication_factor provided in ModelRunnerConfig.
- Returns
The future result of an asynchronous call for all executable output tensors.
-
std::vector<InputDesc> getExecuteInputs() const
Get a description of the input data required in the execute class methods.
- Returns
A vector of DataDesc instances.
-
std::vector<OutputDesc> getExecuteOutputs() const
Get a description of the output data required in the execute class methods.
- Returns
A vector of DataDesc instances.
-
std::vector<InputDesc> getModelInputs() const
Get a description of all the user-provided input data.
In addition to the data used by the execute calls, this will return a description of all tensors used by the model which must be provided when loading the model onto the device. The data required for the additional tensors may be included in PopEF files. In this case, the descriptions of the additional are loaded automatically by ModelRunner.
- Returns
A vector of DataDesc instances.
-
std::vector<OutputDesc> getModelOutputs() const
Get a description of all the user-provided output data.
In addition to the data used by the execute calls, this will return a list of descriptions of all tensors used by the model that the loading phase requires (weights tensors as an example). The data for these additional tensors can be included in PopEF files that are loaded automatically by the ModelRunner.
- Returns
The vector of DataDesc instances.
-
const ModelRunnerConfig &config() const
The configuration associated with this model runner.
-
inline void getTimeTrace(std::map<std::string, float> &time_info, unsigned replica_id = 0, int cur = -1)
Get the time taken for different phases of the last request.
- Parameters
time_info – [in] The time trace information is written to this map. An empty
std::map<std::string, float>
should be passed in.replica_id – [in] The ID of the replica. Default: 0.
cur – [in] The .
- Returns
The time trace of the last
execute
function called. The time trace information returned asstd::map<std::string, float>
:request_duration_us
: time taken (in microseconds) for the last request from the point it was received to the point that the computation was complete.read_preparation_duration_us
: time taken (in microseconds) for the preparation of the last request before it was added to the queue.read_queue_duration_us
: time (in microseconds) the last task spent in the queue.computation_duration_us
: time taken (in microseconds) for the computation to complete.
-
inline void getMonitoringStatisticsPercentile(std::map<std::string, double> &monitoring_info, double quantile = 0.9, unsigned replica_id = 0)
Get the percentile of latencies.
- Parameters
monitoring_info – [in] The monitoring information is written to this map. An empty std::map<std::string, float> should be passed in.
quantile – [in] The percentile to use. Default: 0.9 returns the P90 latency.
replica_id – [in] The ID of the replica. Default: 0.
- Returns
The percentile of the latencies is returned in
monitoring_info
as astd::map<std::string, float>
containing:request_monitoring_statistics_percentile_us
: percentile of the latency (in microseconds) across the entire request.read_preparation_monitoring_statistics_percentile_us
: percentile of the latency (in microseconds) in the read preparation phase.read_queue_monitoring_statistics_percentile_us
: percentile of the latency (in microseconds) in the read queue phase.computation_monitoring_statistics_percentile_us
: percentile of the latency (in microseconds) in the computation phase.
-
inline void getMonitoringStatisticsMean(std::map<std::string, double> &monitoring_info, unsigned replica_id = 0)
Get the mean value of the latencies.
- Parameters
monitoring_info – [in] The monitoring information is written to this map. An empty
std::map<std::string, float>
should be passed in.replica_id – [in] The ID of the replica. Default: 0.
- Returns
The mean value of the latencies is returned in
monitoring_info
as astd::map<std::string, float>
containing:request_monitoring_statistics_mean_us
: mean latency (in microseconds) across the entire request.read_preparation_monitoring_statistics_mean_us
: mean latency (in microseconds) in the read preparation phase.read_queue_monitoring_statistics_mean_us
: mean latency (in microseconds) in the read queue phase.computation_monitoring_statistics_mean_us
: mean latency (in microseconds) in the computation phase.
-
inline void getMonitoringStatisticsTotalCount(std::map<std::string, double> &monitoring_info, unsigned replica_id = 0)
Get the total number of requests.
- Parameters
monitoring_info – [in] The monitoring information is written to this map. An empty
std::map<std::string, float>
should be passed in.replica_id – [in] The ID of the replica. Default: 0.
- Returns
The total number of requests is returned in
monitoring_info
as astd::map<std::string, float>
containing:request_monitoring_statistics_count
: number of requests in the request phase.read_preparation_monitoring_statistics_count
: number of requests in the read preparation phase.read_queue_monitoring_statistics_count
: number of requests in the read queue phase.computation_monitoring_statistics_count
: number of requests in the computation preparation phase.
-
ModelRunner(const ModelRunner&) = delete
10.2. Low-level API
10.2.1. Anchor callback management
-
struct CallbackInfo
Information passed to CallbackFactory.
-
using model_runtime::CallbackHandle = std::function<void(void*)>
The callback function called whenever the stream will be read from or written to by the device.
The memory location will only be valid for reading or writing for the duration of the callback.
-
using model_runtime::CallbackFactory = std::function<poplar::StreamCallbackHandle(const CallbackInfo &info)>
Factory to create a callback for the given callback information.
-
enum class model_runtime::PopefDataUsagePolicy
The policy for using PopEF TensorData and TensorFeed when creating anchor callbacks.
Values:
-
enumerator USE_POPEF_DATA_IF_ANY = 0
Use the TensorData and the TensorFeed stored in the model’s PopEF data to implicitly create callbacks for the Anchors, for which the data exists.
-
enumerator USE_USER_DATA
Don’t use the data stored in the PopEF.
This allows you to bind your own data source.
-
enumerator USE_POPEF_DATA_IF_ANY = 0
-
using model_runtime::PopefDataUsagePredicate = std::function<PopefDataUsagePolicy(const popef::Anchor&)>
PopefDataUsagePredicates are used to control the use of PopEF tensor or feed data when creating callbacks for Anchors.
For more information, see the description of anchors in the PopEF User Guide.
-
static const PopefDataUsagePredicate model_runtime::null_popef_data_usage_predicate = {}
-
enum class model_runtime::AnchorCallbackPolicy
Policy to handle anchor callbacks.
Values:
-
enumerator BIND_USER_CB = 0
Bind user callback to anchor.
-
enumerator BIND_EMPTY_CB
Bind empty (dummy) callback to anchor.
-
enumerator SKIP_CB
Skip binding a callback to the anchor.
-
enumerator BIND_USER_CB = 0
-
using model_runtime::AnchorCallbackPredicate = std::function<AnchorCallbackPolicy(const popef::Anchor&)>
AnchorCallbackPredicates are used to control the callback creation policy for an individual Anchor.
For more information, see the description of anchors in the PopEF User Guide.
-
static const AnchorCallbackPredicate model_runtime::null_anchor_callback_predicate = {}
-
namespace predicate_factory
Predefined callback predicates.
Set of basic predicates to control handling of PopEF data use or anchor callbacks.
-
template<typename Policy>
class AnchorWithPolicy - #include <SessionUtils.hpp>
The callback-handling policy to be used with an anchor.
- Template Parameters
Policy – The policy to be bound to an anchor.
-
template<typename Policy>
class ProgramsWithPolicy - #include <SessionUtils.hpp>
The callback-handling policies to be used for each program.
- Template Parameters
Policy – The policy to be bound to the
programsIndexes
.
Public Members
-
const std::vector<popef::ProgramFlow::ProgramIndexType> &programsIndexes
A set of indexes to named programs.
-
namespace anchor_callbacks
Functions
-
AnchorCallbackPredicate predProgramFlowLoad(const popef::ProgramFlow &flow, AnchorCallbackPolicy accept_policy = AnchorCallbackPolicy::BIND_USER_CB, AnchorCallbackPolicy reject_policy = AnchorCallbackPolicy::BIND_EMPTY_CB)
Callback predicate to filter all anchors owned by any “load” programs.
- Parameters
flow – [in] The user-model PopEF program flow (to read “load” program numbers from).
accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.
- Returns
The anchor callback predicate.
-
AnchorCallbackPredicate predProgramFlowMain(const popef::ProgramFlow &flow, AnchorCallbackPolicy accept_policy = AnchorCallbackPolicy::BIND_USER_CB, AnchorCallbackPolicy reject_policy = AnchorCallbackPolicy::BIND_EMPTY_CB)
Callback predicate to filter all anchors owned by the main program.
- Parameters
flow – [in] The user-model PopEF program flow (to read main program number from).
accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.
- Returns
The anchor callback predicate.
-
AnchorCallbackPredicate predProgramFlowSave(const popef::ProgramFlow &flow, AnchorCallbackPolicy accept_policy = AnchorCallbackPolicy::BIND_USER_CB, AnchorCallbackPolicy reject_policy = AnchorCallbackPolicy::BIND_EMPTY_CB)
Callback predicate to filter all anchors owned by any save programs.
- Parameters
flow – [in] The user-model PopEF program flow (to read main program number from).
accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.
- Returns
The anchor callback predicate.
-
AnchorCallbackPredicate predProgramNotAssigned(AnchorCallbackPolicy accept_policy = AnchorCallbackPolicy::BIND_USER_CB, AnchorCallbackPolicy reject_policy = AnchorCallbackPolicy::BIND_EMPTY_CB)
Callback predicate to filter all anchors that are not assigned to any programs.
- Parameters
accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.
- Returns
The anchor callback predicate.
-
AnchorCallbackPredicate predNonScalarType(AnchorCallbackPolicy accept_policy = AnchorCallbackPolicy::BIND_USER_CB, AnchorCallbackPolicy reject_policy = AnchorCallbackPolicy::BIND_EMPTY_CB)
Callback predicate to filter all anchors that are not scalars.
- Parameters
accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.
- Returns
The anchor callback predicate.
-
AnchorCallbackPredicate predProgramIndexes(const std::vector<popef::ProgramFlow::ProgramIndexType> &program_indexes, AnchorCallbackPolicy accept_policy = AnchorCallbackPolicy::BIND_USER_CB, AnchorCallbackPolicy reject_policy = AnchorCallbackPolicy::BIND_EMPTY_CB)
Callback predicate to filter all anchors owned by any of the programs passed in by the user.
- Parameters
program_indexes – [in] The program indices to filter.
accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.
- Returns
The anchor callback predicate.
-
AnchorCallbackPredicate predProgramIndexes(const std::vector<ProgramsWithPolicy<AnchorCallbackPolicy>> &accepted_programs_policies, const AnchorCallbackPolicy reject_policy = AnchorCallbackPolicy::BIND_EMPTY_CB)
Callback predicate to apply defined anchor callback handling policies to grouped program indices.
- Parameters
accepted_programs_policies – The program indices with defined anchor callback handling policies.
reject_policy – The anchor rejection policy.
- Returns
The anchor callback predicate.
-
AnchorCallbackPredicate predAnchorsPolicies(const std::vector<AnchorWithPolicy<AnchorCallbackPolicy>> &accepted_anchors_policies, const AnchorCallbackPolicy reject_policy = AnchorCallbackPolicy::BIND_EMPTY_CB)
Callback predicate to apply anchor handling policies.
- Parameters
accepted_anchors_policies – [in] The anchor indices with handling policies.
reject_policy – [in] The anchor rejection policy.
- Returns
The anchor callback predicate.
-
template<typename ...Args>
AnchorCallbackPredicate andBind(AnchorCallbackPolicy accept_policy, AnchorCallbackPolicy reject_policy, Args&&... pred) Conjunction operator.
Combines multiple predicates into one predicate.
- Parameters
accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.
pred – [in] The predicates which will be combined by operator.
- Returns
The anchor callback predicate which returns
accept_policy
when one of the passed predicates returns anaccept_policy
otherwise returnsreject_policy
.
-
template<typename ...Args>
AnchorCallbackPredicate orBind(AnchorCallbackPolicy accept_policy, AnchorCallbackPolicy reject_policy, Args&&... pred) Disjunction operator.
Combines multiple predicates into one predicate.
- Parameters
accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.
pred – [in] The predicates which will be combined by operator.
- Returns
The anchor callback predicate which returns
accept_policy
when one of the passed predicates returns anaccept_policy
otherwise returnsreject_policy
.
-
AnchorCallbackPredicate predProgramFlowLoad(const popef::ProgramFlow &flow, AnchorCallbackPolicy accept_policy = AnchorCallbackPolicy::BIND_USER_CB, AnchorCallbackPolicy reject_policy = AnchorCallbackPolicy::BIND_EMPTY_CB)
-
namespace popef_data_usage
Functions
-
PopefDataUsagePredicate predProgramFlowLoad(const popef::ProgramFlow &flow, PopefDataUsagePolicy accept_policy = PopefDataUsagePolicy::USE_USER_DATA, PopefDataUsagePolicy reject_policy = PopefDataUsagePolicy::USE_POPEF_DATA_IF_ANY)
Callback predicate to filter all anchors owned by any “load” programs.
- Parameters
flow – [in] The user-model PopEF program flow (to read “load” program numbers from).
accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.
- Returns
The anchor callback predicate.
-
PopefDataUsagePredicate predProgramFlowMain(const popef::ProgramFlow &flow, PopefDataUsagePolicy accept_policy = PopefDataUsagePolicy::USE_USER_DATA, PopefDataUsagePolicy reject_policy = PopefDataUsagePolicy::USE_POPEF_DATA_IF_ANY)
Callback predicate to filter all anchors owned by the main program.
- Parameters
flow – [in] The user-model PopEF program flow (to read main program number from).
accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.
- Returns
The anchor callback predicate.
-
PopefDataUsagePredicate predProgramFlowSave(const popef::ProgramFlow &flow, PopefDataUsagePolicy accept_policy = PopefDataUsagePolicy::USE_USER_DATA, PopefDataUsagePolicy reject_policy = PopefDataUsagePolicy::USE_POPEF_DATA_IF_ANY)
Callback predicate to filter all anchors owned by any save programs.
- Parameters
flow – [in] The user-model PopEF program flow (to read main program number from).
accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.
- Returns
The anchor callback predicate.
-
PopefDataUsagePredicate predProgramNotAssigned(PopefDataUsagePolicy accept_policy = PopefDataUsagePolicy::USE_USER_DATA, PopefDataUsagePolicy reject_policy = PopefDataUsagePolicy::USE_POPEF_DATA_IF_ANY)
Callback predicate to filter all anchors that are not assigned to any programs.
- Parameters
accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.
- Returns
The anchor callback predicate.
-
PopefDataUsagePredicate predProgramIndexes(const std::vector<popef::ProgramFlow::ProgramIndexType> &program_indexes, PopefDataUsagePolicy accept_policy = PopefDataUsagePolicy::USE_USER_DATA, PopefDataUsagePolicy reject_policy = PopefDataUsagePolicy::USE_POPEF_DATA_IF_ANY)
Callback predicate to filter all anchors owned by any of the programs passed in by the user.
- Parameters
program_indexes – [in] The program indices to filter.
accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.
- Returns
The anchor callback predicate.
-
PopefDataUsagePredicate predProgramIndexes(const std::vector<ProgramsWithPolicy<PopefDataUsagePolicy>> &accepted_programs_policies, const PopefDataUsagePolicy reject_policy = PopefDataUsagePolicy::USE_POPEF_DATA_IF_ANY)
Callback predicate to apply defined anchor callback handling policies to grouped program indices.
- Parameters
accepted_programs_policies – The program indices with defined anchor callback handling policies.
reject_policy – The anchor rejection policy.
- Returns
The anchor callback predicate.
-
PopefDataUsagePredicate predAnchorsPolicies(const std::vector<AnchorWithPolicy<PopefDataUsagePolicy>> &accepted_anchors_policies, const PopefDataUsagePolicy reject_policy = PopefDataUsagePolicy::USE_POPEF_DATA_IF_ANY)
Callback predicate to apply anchor handling policies.
- Parameters
accepted_anchors_policies – [in] The anchor indices with handling policies.
reject_policy – [in] The anchor rejection policy.
- Returns
The anchor callback predicate.
-
template<typename ...Args>
PopefDataUsagePredicate orBind(PopefDataUsagePolicy accept_policy, PopefDataUsagePolicy reject_policy, Args&&... pred) Disjunction operator.
Combines multiple predicates into one Predicate.
- Parameters
accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.
pred – [in] The predicates which will be combined by operator.
- Returns
The anchor callback predicate which returns
accept_policy
when one of the passed predicates returns anaccept_policy
otherwise returnsreject_policy
.
-
PopefDataUsagePredicate predProgramFlowLoad(const popef::ProgramFlow &flow, PopefDataUsagePolicy accept_policy = PopefDataUsagePolicy::USE_USER_DATA, PopefDataUsagePolicy reject_policy = PopefDataUsagePolicy::USE_POPEF_DATA_IF_ANY)
-
template<typename Policy>
10.2.2. Queue memory management
-
class IMemoryPool
A common interface for all memory allocators.
Subclassed by model_runtime::RingMemoryPool
-
class RingMemoryPool : public model_runtime::IMemoryPool
Memory pool of fixed size blobs.
Allocate the requested number of blobs at construction time and loop over the blobs every time getMemoryBlob() is called.
Public Functions
-
RingMemoryPool(int64_t blob_size, int64_t num_blobs)
Create a ring memory pool.
This allocates memory of
num_blobs
*blob_size
size under the hood.- Parameters
blob_size – [in] The size of a single memory blob.
num_blobs – [in] The number of memory blobs.
-
int64_t numBlobs() const
Get the number of memory blobs.
-
virtual int64_t blobSize() const override
Get the size of a memory blob.
-
virtual void *getMemoryBlob() override
Get a pointer to the next blob.
Note
When the end of the memory pool is reached, the iteration starts from the beginning again.
-
RingMemoryPool(int64_t blob_size, int64_t num_blobs)
10.2.3. Queue management
-
template<typename BufferType>
class SpscRingBuffer A lock-free, fixed-size, single-producer, single-consumer ring buffer implementation.
auto dst = rb.writeLock(); if (!rb.isValid()) { // dst is not valid: don't dereference it. return; } *dst = obj; rb.writeComplete();
Note
writeLock() and readLock() are blocking calls which might return early if the ring buffer gets invalidated, so you should always use isValid() after locking a buffer to check whether the returned buffer is safe to use or not.
Public Types
-
using ReadTimeoutCallback = std::function<void(SpscRingBuffer*)>
Signature of the function to call when readLock() times out.
Public Functions
-
SpscRingBuffer(const SpscRingBuffer &other) = delete
-
SpscRingBuffer(const SpscRingBuffer &&other) noexcept = delete
-
SpscRingBuffer &operator=(const SpscRingBuffer &other) = delete
-
SpscRingBuffer &operator=(SpscRingBuffer &&other) noexcept = delete
-
SpscRingBuffer(std::size_t num_buffers, const std::string &label, ReadTimeoutCallback timeout_cb = nullptr, std::chrono::nanoseconds timeout_ns = std::chrono::nanoseconds::zero())
Create a single-producer, single-consumer ring buffer.
- Parameters
num_buffers – [in] The number of buffers to use in the ring buffer.
label – [in] The debug string to use in printState().
timeout_cb – [in] The function to call when a read times out.
timeout_ns – [in] The duration in nanoseconds before the timeout callback is called when no read input is available. If 0, never call the callback.
-
~SpscRingBuffer()
Default destructor.
-
void write(const BufferType &obj)
Lock the ring buffer, write to it and unlock it.
- Parameters
obj – [in] The buffer to be written to.
-
BufferType *writeLock()
Lock and return a buffer for writing.
Only one buffer can be locked for writing at any time. Calling writeLock() when a buffer is already locked will return the same buffer.
If no buffer is available, then the function will block until either:
An existing buffer becomes available.
The ring buffer is invalidated.
Note
isValid() must be used to determine whether the returned buffer is valid or not.
- Returns
A buffer to write to.
-
void writeComplete()
Unlock the currently write-locked buffer.
- Pre
A buffer must have been locked for writing using writeLock().
- Post
The next time writeLock() is called, a different buffer will be returned.
-
const BufferType &readLock()
Lock a buffer for read access.
If no buffer is available the function will block until either:
A buffer becomes available (writeComplete() is called from another thread)
The ring buffer is invalidated.
Some buffers are read-locked and readReset() is called.
Several buffers can be locked in reading mode and each call to readLock() will return a new buffer.
If
timeout_ns
is greater than zero and you provide a timeout callback and readLock() has been waiting for a buffer for longer thantimeout_ns
then call the callback function until a new read buffer becomes available.Note
isValid() must be used to determine whether the returned buffer is valid or not.
-
void readComplete()
Unlock the oldest read-locked buffer.
- Pre
A buffer must have been locked for reading using readLock().
-
void readReset()
All the buffers currently locked for reading are unlocked and placed back at the front of the reading queue.
-
bool readAvailable() const
Check whether any buffer is available to be read-locked.
- Returns
True if there is at least one buffer available to be read-locked, false if there are no buffers available.
-
void invalidate()
Invalidate the ring buffer.
All the calls after this call will become non blocking.
All the objects returned by calls on an invalidated ring buffer are invalid and should be discarded or ignored.
-
void reset()
Reset all ring buffer values to the initial state.
-
bool isValid() const
Check the state of the ring buffer.
- Returns
True if the ring buffer is in a valid state, or false if it was invalidated.
-
std::string getState(const std::string &prefix) const
Debug function to print the current state of the ring buffer.
- Parameters
prefix – [in] The string to prefix the state with.
- Returns
The current state of the ring buffer
-
std::size_t numBuffers() const
Return the maximum number of elements the ring buffer can store.
-
const std::string &label() const
Label associated with this ring buffer.
-
using ReadTimeoutCallback = std::function<void(SpscRingBuffer*)>
-
class IQueue
Common interface implemented by various queues.
The interface describes the memory requirements of the queue and provides an interface to disconnect the queue from its data source.
Subclassed by model_runtime::InputQueue, model_runtime::OutputQueue
Public Functions
-
virtual ~IQueue() = default
Default destructor.
-
virtual const popef::TensorInfo &tensorInfo() const = 0
Get the shape and data type of a tensor.
Each buffer in the queue has the same type and shape.
- Returns
The structure encapsulating the shape and data type of a tensor.
-
virtual int64_t numBuffers() const = 0
Get the number of buffers this queue can store.
-
virtual void disconnect() = 0
Disconnect the queue ring buffers: no longer wait for data.
All queue ring buffers are invalidated and immediately return from any blocking calls.
Disconnected queues can no longer be used to feed real data.
Typically, disconnect() is used at shutdown to feed dummy data to the executable until it returns from its run() method and can be safely destroyed.
-
virtual ~IQueue() = default
-
RingMemoryPool model_runtime::allocateQueueStorage(const IQueue &queue, int64_t extra_buffers = 0)
Allocate a memory pool large enough to back the given queue.
- Parameters
queue – [in] The queue the memory pool will be used to feed.
extra_buffers – [in] The number of extra buffers to allocate in addition to the queue’s requirements.
- Returns
A memory pool.
-
using model_runtime::ReadStartCallback = std::function<void(void)>
Signature of the function called just before the first chunk of data is about to be transferred to Poplar to build a complete input for the executable.
-
using model_runtime::ReadCompleteCallback = ReadStartCallback
Signature of the function called when the data for a complete model input is about to be consumed by the executable.
-
using model_runtime::WriteCompleteCallback = std::function<void(void)>
Signature of the function called after the data has been written.
-
struct InputData
Structure represents queue input data.
Public Members
-
const uint8_t *data = {nullptr}
Pointer to the buffer containing the data to read.
-
int64_t data_size = {0}
Size in bytes of the data.
-
ReadStartCallback readStartCallback = {nullptr}
Optional function to call just before the first chunk of data is about to be fetched to finally build a complete model input.
Note
The callback might be called more than once if the data is prefetched, then discarded and fetched again.
-
ReadCompleteCallback readCompleteCallback = {nullptr}
Optional function to call when the data for a complete model input is about to be consumed by the executable.
Note
The callback might be called more than once if the data is prefetched, then discarded and fetched again.
-
const uint8_t *data = {nullptr}
-
struct OutputData
Structure represents queue output data.
Public Members
-
uint8_t *data = {nullptr}
Where to write the output.
-
int64_t data_size = {0}
Amount of data to write in bytes.
-
WriteCompleteCallback writeCompleteCallback = {nullptr}
Optional function to call after the data has been written.
Note
The callback might be called more than once if the data is prefetched then discarded and fetched again.
-
uint8_t *data = {nullptr}
-
using model_runtime::InputRingBuffer = model_runtime::SpscRingBuffer<InputData>
Fixed size, single producer, single consumer ring buffer for input data.
-
using model_runtime::OutputRingBuffer = model_runtime::SpscRingBuffer<OutputData>
Fixed size, single producer, single consumer ring buffer for output data.
-
class InputQueue : public model_runtime::IQueue
Pack or split the data that you pass to match the amount of data expected by the executable.
For example, if the model was compiled to process an input tensor of size [256, 48, 48], which means samples of 48x48 across a batch size of 256, then the application can enqueue 48x48 tensors of any batch size. This queue will take care of either regrouping the inputs into a single run or splitting them across several runs.
Note
It is your responsibility to ensure the data size enqueued is a multiple of a single sample size.
Note
InputQueue cannot be instantiated directly; it is created by QueueManager.
Public Functions
Create an input queue.
- Parameters
buffer – [in] The target ring buffer connected to a Poplar callback.
info – [in] The buffer description.
-
InputData *enqueueLock()
Lock a buffer for writing.
This is a blocking call and only one buffer can be locked for writing at any time. Calling enqueueLock() when a buffer is already locked will return the same buffer.
- Returns
A pointer to an InputData object to fill.
-
void enqueueComplete()
Unlock the current write-locked buffer.
- Pre
A buffer must have been locked for writing using enqueueLock().
- Post
The next time enqueueLock() is called, a different buffer will be returned.
-
void enqueue(const void *input, int64_t data_size, ReadStartCallback read_start_callback = nullptr, ReadCompleteCallback read_complete_callback = nullptr)
Convenience method to lock a buffer for writing, fill InputData and unlock the buffer.
Note
The callback might be called more than once if the data is prefetched, then discarded and fetched again.
- Parameters
input – [in] The address of the buffer containing the data. The data is not copied to a buffer, the pointer must remain valid until the data has been used by the ring buffer consumer.
data_size – [in] The number of bytes to use from the input. Must be a multiple of single sample size.
read_start_callback – [in] An optional callback to call when the data starts being read.
read_complete_callback – [in] An optional callback to call when the data read is complete.
-
void flush()
Flush any partial input still being created or enqueue a dummy batch.
-
void reset()
Reset underlying ring buffer.
-
virtual void disconnect() override
Parent interface method.
-
virtual const popef::TensorInfo &tensorInfo() const override
Parent interface method.
-
virtual int64_t numBuffers() const override
Parent interface method.
-
class OutputQueue : public model_runtime::IQueue
Pack or split the data returned by the executable to the application’s batches.
See also
Note
It is your responsibility to ensure the data size enqueued is a multiple of the output sample size.
Note
The batch size used in OutputQueue must match the one enqueued to InputQueue.
Note
OutputQueue cannot be instantiated directly; it is created by QueueManager.
Public Functions
Create an output queue.
- Parameters
buffer – [in] The source ring buffer connected to a Poplar callback.
info – [in] The buffer description
-
OutputData *enqueueLock()
Blocking call to lock a buffer for reading.
Only one buffer can be locked for reading at any time. Calling readLock() when a buffer is already locked will return the same buffer.
Note
This is a queue of pointers or addresses (not data). The content of the buffer will be overwritten after readComplete() is called.
- Returns
A pointer to the buffer containing the data to be read.
-
void enqueueComplete()
Unlock the current read-locked buffer.
- Pre
A buffer must have been locked for reading using readLock().
- Post
The next time readLock() is called, a different buffer will be returned.
-
void enqueue(void *output, int64_t data_size, WriteCompleteCallback write_complete_callback = nullptr)
Convenience method to lock a buffer for reading, fill OutputData and unlock the buffer.
- Parameters
output – [in] The address of the buffer to be read. The pointer must remain valid until the data has been copied by the ring buffer producer.
data_size – [in] The number of bytes to copy to the output. Must be a multiple of the single sample size.
write_complete_callback – [in] An optional callback to call when the output has been filled.
-
void flush()
Flush any partial output still being created or, if there isn’t any, enqueue a handler to handle the dummy output produced by the dummy input enqueued by the corresponding InputQueue flush() methods.
-
void reset()
Reset underlying ring buffer.
-
virtual void disconnect() override
Parent interface method.
-
virtual const popef::TensorInfo &tensorInfo() const override
Parent interface method.
-
virtual int64_t numBuffers() const override
Parent interface method.
-
using model_runtime::RingSizeMultiplierProdType = std::function<int64_t(const popef::Anchor&)>
A multiplier of the ring size of the producer data type.
A factory function type used to produce a ring size multiplier for a specific popef::Anchor. You can define this kind of functor to control the size of the QueueManager ring buffer (see:
SpscRingBuffer
class). For example, in case of tensors loaded by the program only once (like weights or other tensors fetched from host in “load” program) the ring size buffer can be set up to 1, as there is no need to prefetch more values from the user.
-
class QueueManager
Create and manage queues related to a session.
Note
Queues currently only work for models with replica == 1.
Public Functions
-
QueueManager(const QueueManager&) = delete
-
QueueManager(QueueManager&&) = delete
-
QueueManager &operator=(const QueueManager &other) = delete
-
QueueManager &operator=(QueueManager&&) = delete
-
~QueueManager() = default
Default destructor.
-
InputQueue &inputQueue(const std::string &name)
Get the input queue of the named tensor or stream.
- Parameters
name – [in] The name of the tensor or stream.
-
OutputQueue &outputQueue(const std::string &name)
Get the output queue of the named tensor or stream.
- Parameters
name – [in] The name of the tensor or stream.
-
void flushAll()
Call flush() on all the queues.
-
void disconnectAll()
Disconnect all the queues from their ring buffers.
-
void resetAll()
Reset all the queues after session stop.
Public Members
-
std::map<std::string, InputQueue> inputs
Map the user input streams to their input queues.
-
std::map<std::string, OutputQueue> outputs
Map the user output streams to their output queues.
-
QueueManager(const QueueManager&) = delete
-
namespace ring_size_multiplier_factory
Functions
-
RingSizeMultiplierProdType ringSizeMultForProgs(const std::vector<popef::ProgramFlow::ProgramIndexType> &programs, int64_t selected_ring_size_multiplier, int64_t others_ring_size_multiplier = 1)
Factory function returning the selected ring buffer size multiplier for Anchors “owned” by the set of programs (programs that fetch Anchor data from the host).
For the remaining anchors the other value is returned. This function can be passed to the QueueManager constructor to control the sizes of its internal ring buffers created for model anchors.
- Parameters
programs – [in] The programs “owning” the anchors of interest.
selected_ring_size_multiplier – [in] The ring buffer size multiplier for anchors owned by the programs.
others_ring_size_multiplier – [in] The ring buffer size multiplier for remaining anchors. Note: It is preferred to set the ring buffer multiplier to 1 for anchors from the “load” or “save” ProgramFlow to reduce memory.
-
RingSizeMultiplierProdType ringSizeMultForMainProgs(const popef::Model &model, int64_t main_ring_size_multiplier, int64_t load_save_ring_size_multiplier = 1)
Factory function returning the selected ring buffer size multiplier for anchors “owned” by the main programs (programs of the main ProgramFlow that fetch Anchor data from the host).
For the remaining anchors (“owned” by “load” or “save” ProgramFlow) the other value is returned. This function can be passed to the QueueManager constructor to control the sizes of its internal ring buffers created for model Anchor objects.
Note
It is preferred to set the ring buffer multiplier to 1 for anchors from the “load” or “save” ProgramFlow to reduce memory.
- Parameters
model – [in] A combination of PopEF blobs representing a single model.
main_ring_size_multiplier – [in] The ring buffer size multiplier for anchors owned by the main programs.
load_save_ring_size_multiplier – [in] The ring buffer size multiplier for remaining anchors.
-
RingSizeMultiplierProdType ringSizeMultForProgs(const std::vector<popef::ProgramFlow::ProgramIndexType> &programs, int64_t selected_ring_size_multiplier, int64_t others_ring_size_multiplier = 1)
10.2.4. Runtime management
-
struct SessionConfig
Session configuration.
Public Members
-
LaunchPolicy policy = LaunchPolicy::Deferred
Session creation policy that is associated with acquiring a device.
Default: LaunchPolicy::Deferred.
-
PopefDataUsagePredicate pred_tensor_data = null_popef_data_usage_predicate
Predicate for anchor callback.
This controls user callback handling for the anchor. It is not used by default.
-
bool check_package_hash = true
If true, the Poplar hash will be checked before the executable is loaded onto the device.
If false, this check is not done. Default: true.
-
DeviceWaitConfig wait_config = {}
By default Session throws an exception when it is not able to attach any device needed by the given model.
This behaviour can be changed by setting a custom device wait config.
-
std::string max_look_ahead = "unlimited"
Limits the number of host synchronisation points the runtime library is allowed to prepare in advance.
This will help preventing the IPU from being idle. The string value “unlimited” (the default) removes this restriction completely.
Possible values:
”unlimited”: the runtime library will decide the value.
”x”: where x is an unsigned integer value.
-
LaunchPolicy policy = LaunchPolicy::Deferred
-
class Session
Link a model to a device.
Note
If two or more sessions share a device, runLoadPrograms() and runSavePrograms() will implicitly be called when the device gets bound or unbound to this session.
Public Types
-
using ProgIdxType = popef::ProgramFlow::ProgramIndexType
The index for a runnable program available in the executable.
-
using ProgramsAndAnchorsMap = std::map<ProgIdxType, std::vector<const popef::Anchor*>>
Mapping between program index and a vector of
popef::Anchor
objects appearing in that program that are available in the executable.
Public Functions
-
explicit Session(const std::vector<std::string> &popef_paths, const SessionConfig &config = {})
Create a new Session object.
- Parameters
popef_paths – The paths to PopEF files from which the model will be loaded.
config – The session configuration.
Create a Session object.
- Parameters
model – The model which will be loaded and executed on the IPU.
config – The session configuration.
-
~Session()
Default destructor.
Bind the session to a device and load the executable onto it.
If the session is already bound to a device, this method first unbinds the current device before binding to the new device.
- Parameters
device – [in] The wrapper around a Poplar device.
-
void runLoadPrograms()
Run the programs to copy the data to the device.
Note
This method is implicitly called before the first call to runMainPrograms() after the device was bound to this session.
- Pre
The session must be bound to a device.
-
void runMainPrograms()
Run the main programs.
Note
If the device was last used by a different session, this method first unbinds the device from that session, then binds it to this session and calls runLoadPrograms() before actually running the main programs.
- Pre
The session must be bound to a device.
-
void runSavePrograms()
Run the programs to copy the data back to the host.
Note
This method is implicitly called when the device bound to this session gets unbound.
- Pre
The session must be bound to a device.
-
void runPrograms(const std::vector<ProgIdxType> &progs)
Run your own set of programs.
Each program will run once.
If you run programs from the main set of programs and you did not previously run programs to load the data, you might get incorrect results. The same applies if you run programs to save data before running programs from the load and main set.
Therefore, proper program order in the vector is really important.
Note
This function is for advanced users who understand what the programs do during the execution, and what the result is. Remember that the order of the programs in the vector matters. Programs will be run in sequence based on their position in the vector.
- Parameters
progs – [in] The set of program indices which you would like to run. Indices need to be present in the loaded popef::Model.
- Pre
The session must be bound to a device.
-
void unloadFromDevice()
Unload the session from the device it is currently bound to.
- Pre
The session must be bound to a device.
-
void setCallbackForAnchor(const std::string &anchor_handle, CallbackHandle callback)
Set the callback (data source or destination buffer and a way of managing it) for a popef::Anchor (an input or output tensor).
- Parameters
anchor_handle – [in] The anchor handle to which the callback will be assigned. Each popef::Anchor has a unique handle.
callback – [in] The callback to be called whenever the stream is to be read or was written to by the device. This depends on whether the callback is assigned * to an input tensor or an output tensor.
- Pre
The session must be bound to a device.
-
void setUserOutputHandler(CallbackFactory factory, const AnchorCallbackPredicate &anchor_callback_predicate = null_anchor_callback_predicate, bool skip_connected = false)
Set up handlers for output tensors.
If the factory returns nullptr for a tensor then the existing callback remains in place.
If the factory returns a callback for a tensor which already had a callback associated with it then the existing callback is discarded and the new one is used instead.
- Parameters
factory – [in] The factory that will be called once per output tensor.
anchor_callback_predicate – [in] The functor controlling user callback binding.
skip_connected – [in] If true, call the factory only for the streams which are not connected. If false, call the factory for all the streams that are required.
- Pre
The session must be bound to a device.
-
void setUserInputHandler(CallbackFactory factory, const AnchorCallbackPredicate &anchor_callback_predicate = null_anchor_callback_predicate, bool skip_connected = false)
Set up handlers for input tensors.
If the factory returns nullptr for a tensor then the existing callback remains in place.
If the factory returns a callback for a tensor which already had a callback associated with it then the existing callback is discarded and the new one is used instead.
- Parameters
factory – [in] The factory that will be called once per input tensor.
anchor_callback_predicate – [in] The functor controlling user callback binding.
skip_connected – [in] If true, call the factory only for the streams which are not connected. If false, call the factory for all the streams that are required.
- Pre
The session must be bound to a device.
-
ProgramsAndAnchorsMap anchorsNotConnectedToCallbacks(const std::vector<ProgIdxType> &progs)
Returns anchors that are not connected to callbacks for particular programs.
- Parameters
progs – [in] The list of program indices.
- Returns
Map, where the key is the program index and value is the vector of anchors that have no linked callbacks for that program.
-
void errorIfAnchorsAreNotConnectedToCallbacks(const std::vector<ProgIdxType> &progs)
Check if all programs have connected all required anchors to the callbacks.
If there are any programs that are not connected to callbacks, then this method throws an error that lists these programs in the error message.
- Parameters
progs – [in] List of program indices.
-
void stop()
Stop the working session.
Send the stop signal to the executable. Disconnect queues if QueueManager is bound. The device will be left in an undefined state and no more programs can be run until reload() is called.
-
void reload()
Load executable again on the bound device.
-
bool isStopped()
Return true if the executable has stopped.
-
template<typename ...T>
inline QueueManager *createQueueManager(T&&... args) Create QueueManager.
Session takes full ownership of the created QueueManager object. The lifetime of the created QueueManager object is strictly linked with the Session lifetime.
Params should be passed in the same order as in the QueueManager constructors. See model_runtime::QueueManager class.
- Returns
Access ptr to created QueueManager.
-
std::vector<const popef::Anchor*> getInputAnchors() const
Returns all inputs.
This includes inputs that need a user-defined callback or inputs that already have a callback defined based on data from the popef::Model.
- Returns
A vector of pointers to popef::Anchor objects.
-
using ProgIdxType = popef::ProgramFlow::ProgramIndexType