10. C++ API reference

10.1. High-level API

10.1.1. Device management

enum class model_runtime::DeviceWaitStrategy

Defines the different options for waiting for a device to become available.

Values:

enumerator NO_WAIT: An exception will be thrown if no IPU device is immediately available.

enumerator WAIT_WITH_TIMEOUT

The device manager will wait for a specified amount of time for an IPU device to become available.

The device manager will try to attach to the required device at a specified interval.

enumerator WAIT_FOREVER

The device manager will wait until an IPU device is available.

The device manager will try to attach to the required device at a specified interval.

struct DeviceWaitConfig

The configuration of how to wait for the expected device.

Public Functions

constexpr DeviceWaitConfig() = default: Constructor with default values.

inline constexpr DeviceWaitConfig(DeviceWaitStrategy p_strategy, std::chrono::seconds p_timeout = std::chrono::seconds{0}, std::chrono::seconds p_sleepTime = std::chrono::seconds{15})

Constructor with specified device waiting strategy.

Parameters

p_strategy – [in] The device waiting strategy.
p_timeout – [in] The time in seconds to wait for a device. Only required if p_strategy is DeviceWaitStrategy::WAIT_WITH_TIMEOUT.
p_sleepTime – [in] The time in seconds between attach attempts. Required if p_strategy is DeviceWaitStrategy::WAIT_WITH_TIMEOUT or DeviceWaitStrategy::WAIT_FOREVER.

inline explicit constexpr DeviceWaitConfig(std::chrono::seconds p_timeout, std::chrono::seconds p_sleepTime = std::chrono::seconds{15})

Constructor with device waiting strategy DeviceWaitStrategy::WAIT_WITH_TIMEOUT.

This means that the device manager will wait for a finite amount of time, p_timeout, for a device to become available.

Parameters

p_timeout – [in] The time in seconds to wait for a device.
p_sleepTime – [in] The time in seconds between attach attempts.

Public Members

DeviceWaitStrategy strategy = {DeviceWaitStrategy::NO_WAIT}: The device waiting strategy.

std::chrono::seconds timeout = {0}: The time in seconds to wait for if no device is currently available.

std::chrono::seconds sleepTime = {15}: The time in seconds between attach attempts.

struct DeviceConstraints

Requirements for the device that the user wants to connect to.

Public Functions

constexpr DeviceConstraints() = default: Constructor with default values.

inline explicit constexpr DeviceConstraints(bool p_requiresRemoteBuffersSupport)

Constructor with specified values.

Parameters: p_requiresRemoteBuffersSupport – [in] If true, the device has to support remote buffers. If false, the device does not have to support remote buffers. Default: false.

inline constexpr operator bool() const

Check whether the selection of the device has any constraints.

If true, the device has constraints; if false, it does not.

Public Members

bool requiresRemoteBuffersSupport = false

Specify that the device has to support remote buffers.

If true, the device has to support remote buffers. If false, the device does not have to support remote buffers. Default: false.

class Device

Create a device.

This is a wrapper around a Poplar device.

Public Functions

Device(const Device&) = delete

Device &operator=(const Device &other) = delete

Device(Device&&) = default: Default forward constructor.

Device &operator=(Device&&) = default: Default move assignment operator.

Device(poplar::Device device, int64_t ipu_version)

Constructor with specified values.

Parameters

device – [in] The Poplar device. This is a device that can execute code.
ipu_version – [in] The architecture version of the IPU.

poplar::Device &device(): Get the underlying Poplar device.

const poplar::Device &device() const: Get the underlying Poplar device.

int64_t ipuVersion() const

Get the architecture version of the device.

Returns: The architecture version of the IPU or -1 if unknown.

Protected Functions

bool isActiveSession(const Session *session) const

Get whether a session is the active session for this device.

True if the session is the active session for this device, false otherwise.

void bindToSession(Session *session)

Bind the device to a session.

If the device is bound to a different session then unbindSession() is called on that session first.

Parameters: session – [in] The new session to bind the device to.

void unbindSession(): Unbind the device from the active session.

Friends

friend class Session

class DeviceManager

Select which device to run on.

Public Functions

DeviceManager &operator=(const DeviceManager &other) = delete

DeviceManager &operator=(DeviceManager&&) = delete

DeviceManager(): Constructor with default values.

DeviceManager(const DeviceManager&) = default: Default copy constructor.

DeviceManager(DeviceManager&&) = default: Default forward constructor.

int64_t ipuHardwareVersion()

Returns

Either:

The architecture version of the IPU in the system. For example, 2 for the Mk2 IPU (GC200 or Bow) or 21 for the Mk2 IPU with FP8 support (C600).
-1 if there is an IPU but the architecture is unknown.
0 if there is no IPU in the system.

std::shared_ptr<Device> getDevice(std::shared_ptr<popef::Model> model, const DeviceWaitConfig &wait_config = {})

Get a device matching the configuration needed by the given model.

Parameters

model – [in] A PopEF model. This method gets the number of IPUs and the option flags from the model.
wait_config – [in] The configuration of how to wait for the requested device.

Returns

The device, if attachment is successful, otherwise throws an error.

std::shared_ptr<Device> tryGetDevice(std::shared_ptr<popef::Model> model, const DeviceWaitConfig &wait_config = {})

Try to get a device matching the configuration needed by the given model.

Parameters

model – [in] A PopEF model. This method gets the number of IPUs and the option flags from the model.
wait_config – [in] The configuration of how to wait for the requested device.

Returns

The device, if attachment is successful, otherwise a nullptr.

std::shared_ptr<Device> getDevice(int64_t num_ipus, const poplar::OptionFlags &device_options = {}, const DeviceWaitConfig &wait_config = {}, const DeviceConstraints &constrains = {})

Get a device matching the requested configuration.

Parameters

num_ipus – [in] The number of IPUs the device must contain.
device_options – [in] The device options.
wait_config – [in] The configuration of how to wait for the expected device.
constrains – [in] The set of constraints that the device must meet.

Returns

The device, if attachment is successful, otherwise throws an error.

std::shared_ptr<Device> tryGetDevice(int64_t num_ipus, const poplar::OptionFlags &device_options = {}, const DeviceWaitConfig &wait_config = {}, const DeviceConstraints &constrains = {})

Try to get a device matching the requested configuration.

Parameters

num_ipus – [in] The number of IPUs the device must contain.
device_options – [in] The device options.
wait_config – [in] The configuration of how to wait for the expected device.
constrains – [in] The set of constraints that the device must meet.

Returns

The device, if attachment is successful, otherwise a nullptr.

std::shared_ptr<Device> getSpecificDevice(int64_t device_id, std::shared_ptr<popef::Model> model, const DeviceWaitConfig &wait_config = {})

Get a specific device matching the requested configuration.

Parameters

device_id – [in] The ID of the device to acquire.
model – [in] A PopEF model. This method gets the option flags from the model.
wait_config – [in] The configuration of how to wait for the expected device.

Returns

The device, if attachment is successful, otherwise throws an error.

std::shared_ptr<Device> getSpecificDevice(int64_t device_id, const poplar::OptionFlags &device_options = {}, const DeviceWaitConfig &wait_config = {})

Get a specific device matching the requested configuration.

Parameters

device_id – [in] The ID of the device to acquire.
device_options – [in] The device options.
wait_config – [in] The configuration of how to wait for the expected device.

Returns

The device, if attachment is successful, otherwise throws an error.

std::shared_ptr<Device> tryGetSpecificDevice(int64_t device_id, std::shared_ptr<popef::Model> model, const DeviceWaitConfig &wait_config = {})

Try to get a specific device matching the requested configuration.

Parameters

device_id – [in] The ID of the device to acquire.
model – [in] A PopEF model. This method gets the option flags from the model.
wait_config – [in] The configuration of how to wait for the expected device.

Returns

The device, if attachment is successful, otherwise a nullptr.

std::shared_ptr<Device> tryGetSpecificDevice(int64_t device_id, const poplar::OptionFlags &device_options = {}, const DeviceWaitConfig &wait_config = {})

Try to get a specific device matching the requested configuration.

Parameters

device_id – [in] The ID of the device to acquire.
device_options – [in] The device options.
wait_config – [in] The configuration of how to wait for the expected device.

Returns

The device, if attachment is successful, otherwise a nullptr.

std::shared_ptr<Device> createIpuModelDevice(int64_t num_ipus, int64_t ipu_version = 2, int64_t tiles_per_ipu = 0)

Create a model of the device.

Parameters

num_ipus – [in] The number of IPUs the device must contain.
ipu_version – [in] The target architecture version of the IPU.
tiles_per_ipu – [in] The number of tiles per IPU the model will have. If 0, defaults to the number of tiles for the IPU architecture.

Returns

The model of the device.

std::shared_ptr<Device> createIpuModelDevice(std::shared_ptr<popef::Model> model, int64_t tiles_per_ipu = 0)

Create a model of the device.

Parameters

model – [in] A PopEF model. This method gets the number of IPUs and the IPU architecture version from the model.
tiles_per_ipu – [in] The number of tiles per IPU the model will have. If 0, defaults to the number of tiles for the chosen IPU architecture.

Returns

The model of the device.

std::shared_ptr<Device> createSmallIpuModelDevice(int64_t num_ipus, int64_t ipu_version = 2)

Create a small IPU model of the device.

Small IPU models only have 4 tiles: they are much quicker to create and run than a full size model but can only run small programs.

Parameters

num_ipus – [in] The number of IPUs the device must contain.
ipu_version – [in] The target architecture version of the IPU.

Returns

A device with the model that was created.

std::shared_ptr<Device> createSmallIpuModelDevice(std::shared_ptr<popef::Model> model)

Create a small IPU model of the device.

Small IPU models only have 4 tiles: they are much quicker to create and run than a full size model but can only run small programs.

Parameters: model – [in] A PopEF model.
Returns: A device with the model attached.

10.1.2. Tensor memory representation

struct TensorMemoryView

Mutable view to already allocated memory.

Public Functions

TensorMemoryView() = default: Default constructor.

TensorMemoryView(void *data, uint64_t data_size_bytes)

Mutable view to memory.

Parameters

data – [in] The pointer to the allocated memory.
data_size_bytes – [in] The size of the memory block, in bytes.

Public Members

void *data = nullptr: The pointer to the allocated memory.

uint64_t data_size_bytes = 0: The size of the memory block, in bytes.

struct ConstTensorMemoryView

Immutable view to already allocated memory.

Public Functions

ConstTensorMemoryView() = default: Default constructor.

ConstTensorMemoryView(const TensorMemoryView &other): Default copy constructor.

ConstTensorMemoryView(const void *data, uint64_t data_size_bytes)

Immutable view to const memory.

Parameters

data – [in] The pointer to the allocated memory.
data_size_bytes – [in] The size of the memory block, in bytes.

Public Members

const void *data = nullptr: Pointer to the allocated memory.

uint64_t data_size_bytes = 0: The size of the memory block, in bytes.

struct TensorMemory

Tensor memory manager responsible for allocating, storing, sharing and releasing tensor memory.

Public Functions

TensorMemory() = default: Default constructor.

TensorMemory(int64_t data_size_bytes)

Allocate the user-requested memory block and store it in a shared_ptr.

Parameters: data_size_bytes – [in] The size of the memory block, in bytes.

TensorMemoryView getView(): Get the mutable memory view.

ConstTensorMemoryView getConstView(): Get the immutable memory view.

ConstTensorMemoryView getView() const: Get the immutable memory view.

Public Members

uint64_t data_size_bytes = 0: The size of the memory block, in bytes.

std::shared_ptr<void> data = nullptr: Pointer to the allocated memory.

10.1.3. ModelRunner

using model_runtime::InputMemoryView = std::unordered_map<std::string, ConstTensorMemoryView>

Mapping between a tensor name and an immutable memory view.

Used as input to ModelRunner::execute.

using model_runtime::OutputMemoryView = std::unordered_map<std::string, TensorMemoryView>

Mapping between a tensor name and a memory view.

Used as output from ModelRunner::execute, when the output memory is allocated and managed by the ModelRunner client.

using model_runtime::OutputMemory = std::unordered_map<std::string, TensorMemory>

Mapping between a tensor name and memory.

Used as output from ModelRunner::execute output when memory is allocated during execution by the library.

using model_runtime::OutputFutureMemoryView = std::unordered_map<std::string, std::shared_future<TensorMemoryView>>

Mapping between a tensor name and a future memory view.

Used as output from the asynchronous ModelRunner::execute when the output memory is allocated and managed by the ModelRunner client.

using model_runtime::OutputFutureMemory = std::unordered_map<std::string, std::shared_future<TensorMemory>>

Mapping between a tensor name and a future memory.

Used as output from ModelRunner::execute output when memory is allocated during execution by the library.

struct DataDesc

The description of the data used by ModelRunner.

Public Functions

DataDesc(std::string name, int64_t size_in_bytes, std::vector<int64_t> shape, popef::DataType data_type, bool popef_contains_tensor_data = false)

Create a description of input or output data.

Parameters

name – [in] The name of the input or output tensor.
size_in_bytes – [in] The size of the tensor measured in bytes.
shape – [in] A vector defining the shape of the tensor. The size of the vector is equal to the number of tensor dimensions. Each element of the vector indicates the size of the corresponding dimension.
data_type – [in] The data type of a single tensor element.
popef_contains_tensor_data – [in] If true, the model has a tensor data blob associated with the tensor. If false, the model does not have a tensor data blob associated with the tensor. Default: false.

Public Members

std::string name: The name of the input or output tensor.

int64_t size_in_bytes: The size of the tensor measured in bytes.

std::vector<int64_t> shape

A vector defining the shape of the tensor.

The size of the vector is equal to the number of tensor dimensions. Each element of the vector indicates the size of the corresponding dimension.

popef::DataType data_type: The data type of a single tensor element.

bool popef_contains_tensor_data

If true, the model has a tensor data blob associated with the tensor.

If false, the model does not have a tensor data blob associated with the tensor.

using model_runtime::InputDesc = DataDesc : Description of input data required by ModelRunner.

using model_runtime::OutputDesc = DataDesc : Description of output data required by ModelRunner.

using model_runtime::ReplicaIdToOutputMemoryView = std::unordered_map<unsigned, OutputMemoryView>: Mapping of replicas to the memory allocated for the output tensors required by ModelRunner.

using model_runtime::ReplicaIdToDevice = std::unordered_map<unsigned, std::shared_ptr<Device>>: Mapping of replicas to physical entities that can execute the IPU programs.

struct ModelRunnerConfig

ModelRunner configuration options.

Public Members

unsigned replication_factor = 1: Number of replicas to be created.

bool run_save_programs = false

If true, “save” programs will be called on ModelRunner instance destruction.

If false, “save” programs will not be called. Default: false.

bool thread_safe = false

If true, the mutex will be locked on each execution call.

If false, the mutex will not be locked. By default the model runner is not thread-safe and each replica has an independent mutex. Default: false.

InputMemoryView frozen_inputs = {}

Map of the user-data required by the model when it is loaded onto hardware as well as any data that will be considered as constant during model execution.

This allows the overwriting of tensor data saved inside a PopEF file.

ReplicaIdToOutputMemoryView replica_to_save_programs_outputs = {}

Mapping between replica ID and OutputMemoryView.

The PopEF data format allows for the creation of “save” programs that will be executed on ModelRunner instance destruction, if required. This function allows you to get data returned by these “save” programs.

ReplicaIdToDevice replica_to_device = {}

Mapping between replica ID and Device.

This allows you to set a specific device for a given replica. By default, the model runner assigns devices automatically.

DeviceWaitConfig device_wait_config = {}

By default, the model runner throws an exception when it is not able to attach to any device required by the given model.

This behaviour can be changed by setting a custom DeviceWaitConfig.

bool check_package_hash = true

If true, the Poplar hash will be checked before the executable is loaded onto the device.

If false, this check is not done. Default: true.

std::chrono::nanoseconds timeout_ns = std::chrono::seconds(5)

Duration in nanoseconds before the timeout callback is called when the IPU is waiting for input data which is not available.

If 0, never call the timeout, in other words wait forever for the data.

bool validate_io_params = true

If true, the I/O parameters will be checked when the ModelRunner “execute” functions are run.

If false, this check is not done. Default: true.

unsigned batching_dim = std::numeric_limits<unsigned>::max()

Enables dynamic batch sizing and specifies the dimension of the input and output data that contains the dynamic batch size.

By default, dynamic batch sizing is disabled and ModelRunner can only accept inputs and outputs where the batch size is an integer multiple of the batch size defined in the PopEF model.

Dynamic batch sizing is disabled when batching_dim is set to 0xFFFFFFFF.

To enable dynamic batch sizing, set batching_dim to the value of the dimension that contains the dynamic batch size. This value can be any positive integer less than the maximum dimension of the model. ModelRunner will accept the batch size specified in the tensor dimension specified by batching_dim. The batch size can now be any value.

bool auto_reset = false

If true, the IPU will be reset automatically before the next inference when:

an application runtime error occurs, or
a recoverable error occurs and the RecoveryAction is IPU_RESET.

If false, an exception will be raised and no action will be taken. Default: false.

std::string max_look_ahead = "unlimited"

The number of host synchronization points that Model Runtime is allowed to prepare in advance.

This prevents the IPU from being idle.

Possible values:

”unlimited”: Model Runtime decides the best value. (Default)
”x”: An unsigned integer value.

The default value of “unlimited” usually offers the best performance. If there are deadlocks or other timeouts which are not the result of other errors, set this value to 0.

Possible use case for max_look_ahead set to 0:

The model is compiled with prefetch enabled.
Device iterations is greater than 1.
A single input for the execute function is not enough to run an iteration number equal to device iterations.

An alternative to setting max_look_ahead to 0 is to use executeAsync to handle requests. In this case, the application must know when the data will be ready to read.

unsigned ring_buffer_size_multiplier = 2

The multiplier used to determine the size of the ring buffer.

The ring buffer size is given by ring_buffer_size_multiplier * batch size.

bool flush_on_waiting_outputs = false

If true, only flush the dummy data when there is a user request waiting for outputs from queues.

Otherwise, wait forever for user-requested data in the callback. If false, flush the dummy data anyway.

std::chrono::nanoseconds batch_size_timeout_ns = std::chrono::nanoseconds::max()

Duration in nanoseconds before the timeout callback is called when the IPU is waiting for input data to compose a batch to load.

If max(), the callback will never be called, in other words, the model will wait forever for the data.

std::chrono::nanoseconds data_parallel_timeout_ns = std::chrono::nanoseconds::max()

Duration in nanoseconds before the timeout callback is called when the IPU is waiting in parallel for input data.

If max(), the callback will never be called, in other words, the model will wait forever for the data.

bool is_batch_size_timeout_enabled = false: If true, check the batch_size_timeout_ns and data_parallel_timeout_ns in the timeout callback when the IPU is waiting for input data which doesn’t arrive.

size_t request_tracepoints_buffer_size = 1000

Size of the request tracepoint buffer.

If set to 0, the request tracepoint buffer size will be infinite. Note: if flush_on_waiting_outputs is set to false, no request tracepoints will not be recorded.

std::function<void(const std::string &tensor_id, const InputMemoryView *&inputs, const OutputMemoryView *&outputs)> flush_callback = nullptr

The config option that specifies the pointer to the callback function for flushing data that is not available.

flush_callback indicates to Model Runtime which data to flush.

If specified, the flush_callback function will be called once when the IPU has timed out after waiting for input data, which is not available.

The application which defines the flush_callback function must ensure that the inputs and outputs it specifies for Model Runtime to flush are correct. Otherwise the results of this and following inferences will be corrupted.

Param tensor_id: [in] The ID of the tensor that Model Runtime was waiting for when the timeout occurred.
Param inputs: [in] The pointer to the input data structure to be flushed. Model Runtime uses this to determine which inputs should be flushed after the flush_callback function is called.
Param outputs: [in] The pointer to the output data structure to be flushed. Model Runtime uses this to determine which outputs should be flushed after the flush_callback function is called.

class ModelRunner

Inference model abstraction.

The model runner creates a session, manages queues, runs Poplar executable programs and allows execution of inference models synchronously and asynchronously.

Public Functions

ModelRunner(const ModelRunner&) = delete

ModelRunner &operator=(const ModelRunner &other) = delete

ModelRunner(ModelRunner&&) = default: Default forward constructor.

ModelRunner &operator=(ModelRunner&&) = default: Default move assignment operator.

explicit ModelRunner(const std::string &popef_path, const ModelRunnerConfig &config = ModelRunnerConfig{})

Create a new ModelRunner object.

Parameters

popef_path – The path to PopEF files from which the model will be loaded.
config – The model runner configuration.

explicit ModelRunner(const std::vector<std::string> &popef_paths, const ModelRunnerConfig &config = ModelRunnerConfig{})

Create a new ModelRunner object.

Parameters

popef_paths – Paths to PopEF files from which the model will be loaded.
config – The model runner configuration.

explicit ModelRunner(std::shared_ptr<popef::Model> model, const ModelRunnerConfig &config = ModelRunnerConfig{})

Create a new ModelRunner object.

Parameters

model – The model which will be loaded and run.
config – The model runner configuration.

~ModelRunner(): Default destructor.

OutputMemory execute(const InputMemoryView &input_data, unsigned replica_id = 0)

Run model synchronously.

This will allocate output memory internally.

Parameters

input_data – [in] The user-allocated tensor buffer for all executable input tensors.
replica_id – [in] The user-selected replica that will execute computations. Must be less than replication_factor provided in ModelRunnerConfig.

Returns

The output memory allocated by Model Runtime for all executable output tensors.

void execute(const InputMemoryView &input_data, const OutputMemoryView &output_data, unsigned replica_id = 0)

Run a model synchronously.

This uses output memory that you allocate and pass pointers to.

Parameters

input_data – [in] The user-allocated tensor buffer for all executable input tensors.
output_data – [in] The user-allocated tensor buffer for all executable output tensors.
replica_id – [in] The user-selected replica that will execute computations. Must be less than replication_factor provided in ModelRunnerConfig.

OutputFutureMemory executeAsync(const InputMemoryView &input_data, unsigned replica_id = 0)

Run a model asynchronously.

This will allocate output memory internally.

Parameters

input_data – [in] The user-allocated tensor buffer for all executable input tensors.
replica_id – [in] The user-selected replica that will execute computations. Must be less than replication_factor provided in ModelRunnerConfig.

Returns

The future result of an asynchronous call for all executable output tensors.

OutputFutureMemoryView executeAsync(const InputMemoryView &input_data, const OutputMemoryView &output_data, unsigned replica_id = 0)

Run a model asynchronously.

This uses output memory that you allocate and pass pointers to.

Parameters

input_data – [in] The user-allocated tensor buffer for all executable input tensors.
output_data – [in] The user-allocated tensor buffer for all executable output tensors.
replica_id – [in] The user-selected replica that will execute computations. Must be less than replication_factor provided in ModelRunnerConfig.

Returns

The future result of an asynchronous call for all executable output tensors.

std::vector<InputDesc> getExecuteInputs() const

Get a description of the input data required in the execute class methods.

Returns: A vector of DataDesc instances.

std::vector<OutputDesc> getExecuteOutputs() const

Get a description of the output data required in the execute class methods.

Returns: A vector of DataDesc instances.

std::vector<InputDesc> getModelInputs() const

Get a description of all the user-provided input data.

In addition to the data used by the execute calls, this will return a description of all tensors used by the model which must be provided when loading the model onto the device. The data required for the additional tensors may be included in PopEF files. In this case, the descriptions of the additional are loaded automatically by ModelRunner.

Returns: A vector of DataDesc instances.

std::vector<OutputDesc> getModelOutputs() const

Get a description of all the user-provided output data.

In addition to the data used by the execute calls, this will return a list of descriptions of all tensors used by the model that the loading phase requires (weights tensors as an example). The data for these additional tensors can be included in PopEF files that are loaded automatically by the ModelRunner.

Returns: The vector of DataDesc instances.

std::shared_ptr<popef::Model> model() const: The model associated with this model runner.

const ModelRunnerConfig &config() const: The configuration associated with this model runner.

inline void getTimeTrace(std::map<std::string, float> &time_info, unsigned replica_id = 0, int cur = -1)

Get the time taken for different phases of the last request.

Parameters

time_info – [in] The time trace information is written to this map. An empty std::map<std::string, float> should be passed in.
replica_id – [in] The ID of the replica. Default: 0.
cur – [in] The .

Returns

The time trace of the last execute function called. The time trace information returned as std::map<std::string, float>:

request_duration_us: time taken (in microseconds) for the last request from the point it was received to the point that the computation was complete.
read_preparation_duration_us: time taken (in microseconds) for the preparation of the last request before it was added to the queue.
read_queue_duration_us: time (in microseconds) the last task spent in the queue.
computation_duration_us: time taken (in microseconds) for the computation to complete.

inline void getMonitoringStatisticsPercentile(std::map<std::string, double> &monitoring_info, double quantile = 0.9, unsigned replica_id = 0)

Get the percentile of latencies.

Parameters

monitoring_info – [in] The monitoring information is written to this map. An empty std::map<std::string, float> should be passed in.
quantile – [in] The percentile to use. Default: 0.9 returns the P90 latency.
replica_id – [in] The ID of the replica. Default: 0.

Returns

The percentile of the latencies is returned in monitoring_info as a std::map<std::string, float> containing:

request_monitoring_statistics_percentile_us: percentile of the latency (in microseconds) across the entire request.
read_preparation_monitoring_statistics_percentile_us: percentile of the latency (in microseconds) in the read preparation phase.
read_queue_monitoring_statistics_percentile_us: percentile of the latency (in microseconds) in the read queue phase.
computation_monitoring_statistics_percentile_us: percentile of the latency (in microseconds) in the computation phase.

inline void getMonitoringStatisticsMean(std::map<std::string, double> &monitoring_info, unsigned replica_id = 0)

Get the mean value of the latencies.

Parameters

monitoring_info – [in] The monitoring information is written to this map. An empty std::map<std::string, float> should be passed in.
replica_id – [in] The ID of the replica. Default: 0.

Returns

The mean value of the latencies is returned in monitoring_info as a std::map<std::string, float> containing:

request_monitoring_statistics_mean_us: mean latency (in microseconds) across the entire request.
read_preparation_monitoring_statistics_mean_us: mean latency (in microseconds) in the read preparation phase.
read_queue_monitoring_statistics_mean_us: mean latency (in microseconds) in the read queue phase.
computation_monitoring_statistics_mean_us: mean latency (in microseconds) in the computation phase.

inline void getMonitoringStatisticsTotalCount(std::map<std::string, double> &monitoring_info, unsigned replica_id = 0)

Get the total number of requests.

Parameters

monitoring_info – [in] The monitoring information is written to this map. An empty std::map<std::string, float> should be passed in.
replica_id – [in] The ID of the replica. Default: 0.

Returns

The total number of requests is returned in monitoring_info as a std::map<std::string, float> containing:

request_monitoring_statistics_count: number of requests in the request phase.
read_preparation_monitoring_statistics_count: number of requests in the read preparation phase.
read_queue_monitoring_statistics_count: number of requests in the read queue phase.
computation_monitoring_statistics_count: number of requests in the computation preparation phase.

10.2. Low-level API

10.2.1. Anchor callback management

struct CallbackInfo

Information passed to CallbackFactory.

Public Members

const popef::Anchor &anchor: Input or output tensors of the model expected by the device.

using model_runtime::CallbackHandle = std::function<void(void*)>

The callback function called whenever the stream will be read from or written to by the device.

The memory location will only be valid for reading or writing for the duration of the callback.

using model_runtime::CallbackFactory = std::function<poplar::StreamCallbackHandle(const CallbackInfo &info)>: Factory to create a callback for the given callback information.

enum class model_runtime::PopefDataUsagePolicy

The policy for using PopEF TensorData and TensorFeed when creating anchor callbacks.

Values:

enumerator USE_POPEF_DATA_IF_ANY = 0: Use the TensorData and the TensorFeed stored in the model’s PopEF data to implicitly create callbacks for the Anchors, for which the data exists.

enumerator USE_USER_DATA

Don’t use the data stored in the PopEF.

This allows you to bind your own data source.

using model_runtime::PopefDataUsagePredicate = std::function<PopefDataUsagePolicy(const popef::Anchor&)>

PopefDataUsagePredicates are used to control the use of PopEF tensor or feed data when creating callbacks for Anchors.

For more information, see the description of anchors in the PopEF User Guide.

static const PopefDataUsagePredicate model_runtime::null_popef_data_usage_predicate = {}

enum class model_runtime::AnchorCallbackPolicy

Policy to handle anchor callbacks.

Values:

enumerator BIND_USER_CB = 0: Bind user callback to anchor.

enumerator BIND_EMPTY_CB: Bind empty (dummy) callback to anchor.

enumerator SKIP_CB: Skip binding a callback to the anchor.

using model_runtime::AnchorCallbackPredicate = std::function<AnchorCallbackPolicy(const popef::Anchor&)>

AnchorCallbackPredicates are used to control the callback creation policy for an individual Anchor.

For more information, see the description of anchors in the PopEF User Guide.

static const AnchorCallbackPredicate model_runtime::null_anchor_callback_predicate = {}

namespace predicate_factory

Predefined callback predicates.

Set of basic predicates to control handling of PopEF data use or anchor callbacks.

template<typename Policy> class AnchorWithPolicy

#include <SessionUtils.hpp>

The callback-handling policy to be used with an anchor.

Template Parameters: Policy – The policy to be bound to an anchor.

Public Members

const popef::Anchor &anchor: Anchor with input or output data to a program.

Policy policy: The callback-handling policy.

template<typename Policy> class ProgramsWithPolicy

#include <SessionUtils.hpp>

The callback-handling policies to be used for each program.

Template Parameters: Policy – The policy to be bound to the programsIndexes.

Public Members

const std::vector<popef::ProgramFlow::ProgramIndexType> &programsIndexes: A set of indexes to named programs.

Policy policy: The callback-handling policy.

namespace anchor_callbacks

Functions

AnchorCallbackPredicate predProgramFlowLoad(const popef::ProgramFlow &flow, AnchorCallbackPolicy accept_policy = AnchorCallbackPolicy::BIND_USER_CB, AnchorCallbackPolicy reject_policy = AnchorCallbackPolicy::BIND_EMPTY_CB)

Callback predicate to filter all anchors owned by any “load” programs.

Parameters

flow – [in] The user-model PopEF program flow (to read “load” program numbers from).
accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.

Returns

The anchor callback predicate.

AnchorCallbackPredicate predProgramFlowMain(const popef::ProgramFlow &flow, AnchorCallbackPolicy accept_policy = AnchorCallbackPolicy::BIND_USER_CB, AnchorCallbackPolicy reject_policy = AnchorCallbackPolicy::BIND_EMPTY_CB)

Callback predicate to filter all anchors owned by the main program.

Parameters

flow – [in] The user-model PopEF program flow (to read main program number from).
accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.

Returns

The anchor callback predicate.

AnchorCallbackPredicate predProgramFlowSave(const popef::ProgramFlow &flow, AnchorCallbackPolicy accept_policy = AnchorCallbackPolicy::BIND_USER_CB, AnchorCallbackPolicy reject_policy = AnchorCallbackPolicy::BIND_EMPTY_CB)

Callback predicate to filter all anchors owned by any save programs.

Parameters

flow – [in] The user-model PopEF program flow (to read main program number from).
accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.

Returns

The anchor callback predicate.

AnchorCallbackPredicate predProgramNotAssigned(AnchorCallbackPolicy accept_policy = AnchorCallbackPolicy::BIND_USER_CB, AnchorCallbackPolicy reject_policy = AnchorCallbackPolicy::BIND_EMPTY_CB)

Callback predicate to filter all anchors that are not assigned to any programs.

Parameters

accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.

Returns

The anchor callback predicate.

AnchorCallbackPredicate predNonScalarType(AnchorCallbackPolicy accept_policy = AnchorCallbackPolicy::BIND_USER_CB, AnchorCallbackPolicy reject_policy = AnchorCallbackPolicy::BIND_EMPTY_CB)

Callback predicate to filter all anchors that are not scalars.

Parameters

accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.

Returns

The anchor callback predicate.

AnchorCallbackPredicate predProgramIndexes(const std::vector<popef::ProgramFlow::ProgramIndexType> &program_indexes, AnchorCallbackPolicy accept_policy = AnchorCallbackPolicy::BIND_USER_CB, AnchorCallbackPolicy reject_policy = AnchorCallbackPolicy::BIND_EMPTY_CB)

Callback predicate to filter all anchors owned by any of the programs passed in by the user.

Parameters

program_indexes – [in] The program indices to filter.
accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.

Returns

The anchor callback predicate.

AnchorCallbackPredicate predProgramIndexes(const std::vector<ProgramsWithPolicy<AnchorCallbackPolicy>> &accepted_programs_policies, const AnchorCallbackPolicy reject_policy = AnchorCallbackPolicy::BIND_EMPTY_CB)

Callback predicate to apply defined anchor callback handling policies to grouped program indices.

Parameters

accepted_programs_policies – The program indices with defined anchor callback handling policies.
reject_policy – The anchor rejection policy.

Returns

The anchor callback predicate.

AnchorCallbackPredicate predAnchorsPolicies(const std::vector<AnchorWithPolicy<AnchorCallbackPolicy>> &accepted_anchors_policies, const AnchorCallbackPolicy reject_policy = AnchorCallbackPolicy::BIND_EMPTY_CB)

Callback predicate to apply anchor handling policies.

Parameters

accepted_anchors_policies – [in] The anchor indices with handling policies.
reject_policy – [in] The anchor rejection policy.

Returns

The anchor callback predicate.

template<typename ...Args> AnchorCallbackPredicate andBind(AnchorCallbackPolicy accept_policy, AnchorCallbackPolicy reject_policy, Args&&... pred)

Conjunction operator.

Combines multiple predicates into one predicate.

Parameters

accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.
pred – [in] The predicates which will be combined by operator.

Returns

The anchor callback predicate which returns accept_policy when one of the passed predicates returns an accept_policy otherwise returns reject_policy.

template<typename ...Args> AnchorCallbackPredicate orBind(AnchorCallbackPolicy accept_policy, AnchorCallbackPolicy reject_policy, Args&&... pred)

Disjunction operator.

Combines multiple predicates into one predicate.

Parameters

accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.
pred – [in] The predicates which will be combined by operator.

Returns

The anchor callback predicate which returns accept_policy when one of the passed predicates returns an accept_policy otherwise returns reject_policy.

namespace popef_data_usage

Functions

PopefDataUsagePredicate predProgramFlowLoad(const popef::ProgramFlow &flow, PopefDataUsagePolicy accept_policy = PopefDataUsagePolicy::USE_USER_DATA, PopefDataUsagePolicy reject_policy = PopefDataUsagePolicy::USE_POPEF_DATA_IF_ANY)

Callback predicate to filter all anchors owned by any “load” programs.

Parameters

flow – [in] The user-model PopEF program flow (to read “load” program numbers from).
accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.

Returns

The anchor callback predicate.

PopefDataUsagePredicate predProgramFlowMain(const popef::ProgramFlow &flow, PopefDataUsagePolicy accept_policy = PopefDataUsagePolicy::USE_USER_DATA, PopefDataUsagePolicy reject_policy = PopefDataUsagePolicy::USE_POPEF_DATA_IF_ANY)

Callback predicate to filter all anchors owned by the main program.

Parameters

flow – [in] The user-model PopEF program flow (to read main program number from).
accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.

Returns

The anchor callback predicate.

PopefDataUsagePredicate predProgramFlowSave(const popef::ProgramFlow &flow, PopefDataUsagePolicy accept_policy = PopefDataUsagePolicy::USE_USER_DATA, PopefDataUsagePolicy reject_policy = PopefDataUsagePolicy::USE_POPEF_DATA_IF_ANY)

Callback predicate to filter all anchors owned by any save programs.

Parameters

flow – [in] The user-model PopEF program flow (to read main program number from).
accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.

Returns

The anchor callback predicate.

PopefDataUsagePredicate predProgramNotAssigned(PopefDataUsagePolicy accept_policy = PopefDataUsagePolicy::USE_USER_DATA, PopefDataUsagePolicy reject_policy = PopefDataUsagePolicy::USE_POPEF_DATA_IF_ANY)

Callback predicate to filter all anchors that are not assigned to any programs.

Parameters

accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.

Returns

The anchor callback predicate.

PopefDataUsagePredicate predProgramIndexes(const std::vector<popef::ProgramFlow::ProgramIndexType> &program_indexes, PopefDataUsagePolicy accept_policy = PopefDataUsagePolicy::USE_USER_DATA, PopefDataUsagePolicy reject_policy = PopefDataUsagePolicy::USE_POPEF_DATA_IF_ANY)

Callback predicate to filter all anchors owned by any of the programs passed in by the user.

Parameters

program_indexes – [in] The program indices to filter.
accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.

Returns

The anchor callback predicate.

PopefDataUsagePredicate predProgramIndexes(const std::vector<ProgramsWithPolicy<PopefDataUsagePolicy>> &accepted_programs_policies, const PopefDataUsagePolicy reject_policy = PopefDataUsagePolicy::USE_POPEF_DATA_IF_ANY)

Callback predicate to apply defined anchor callback handling policies to grouped program indices.

Parameters

accepted_programs_policies – The program indices with defined anchor callback handling policies.
reject_policy – The anchor rejection policy.

Returns

The anchor callback predicate.

PopefDataUsagePredicate predAnchorsPolicies(const std::vector<AnchorWithPolicy<PopefDataUsagePolicy>> &accepted_anchors_policies, const PopefDataUsagePolicy reject_policy = PopefDataUsagePolicy::USE_POPEF_DATA_IF_ANY)

Callback predicate to apply anchor handling policies.

Parameters

accepted_anchors_policies – [in] The anchor indices with handling policies.
reject_policy – [in] The anchor rejection policy.

Returns

The anchor callback predicate.

template<typename ...Args> PopefDataUsagePredicate orBind(PopefDataUsagePolicy accept_policy, PopefDataUsagePolicy reject_policy, Args&&... pred)

Disjunction operator.

Combines multiple predicates into one Predicate.

Parameters

accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.
pred – [in] The predicates which will be combined by operator.

Returns

The anchor callback predicate which returns accept_policy when one of the passed predicates returns an accept_policy otherwise returns reject_policy.

10.2.2. Queue memory management

class IMemoryPool

A common interface for all memory allocators.

Subclassed by model_runtime::RingMemoryPool

Public Functions

virtual void *getMemoryBlob() = 0: Get a writable memory blob.

Note

The memory blob retains ownership of the blob’s memory and is responsible for freeing the memory.

virtual int64_t blobSize() const = 0: Get the size of the memory blob.

class RingMemoryPool : public model_runtime::IMemoryPool 

Memory pool of fixed size blobs.

Allocate the requested number of blobs at construction time and loop over the blobs every time getMemoryBlob() is called.

Public Functions

RingMemoryPool(int64_t blob_size, int64_t num_blobs)

Create a ring memory pool.

This allocates memory of num_blobs * blob_size size under the hood.

Parameters

blob_size – [in] The size of a single memory blob.
num_blobs – [in] The number of memory blobs.

int64_t numBlobs() const: Get the number of memory blobs.

virtual int64_t blobSize() const override: Get the size of a memory blob.

virtual void *getMemoryBlob() override: Get a pointer to the next blob.

Note

When the end of the memory pool is reached, the iteration starts from the beginning again.

10.2.3. Queue management

template<typename BufferType> class SpscRingBuffer

A lock-free, fixed-size, single-producer, single-consumer ring buffer implementation.

auto dst = rb.writeLock();
if (!rb.isValid()) {
 // dst is not valid: don't dereference it.
 return;
}
*dst = obj;
rb.writeComplete();

Note

writeLock() and readLock() are blocking calls which might return early if the ring buffer gets invalidated, so you should always use isValid() after locking a buffer to check whether the returned buffer is safe to use or not.

Public Types

using ReadTimeoutCallback = std::function<void(SpscRingBuffer*)>: Signature of the function to call when readLock() times out.

Public Functions

SpscRingBuffer(const SpscRingBuffer &other) = delete

SpscRingBuffer(const SpscRingBuffer &&other) noexcept = delete

SpscRingBuffer &operator=(const SpscRingBuffer &other) = delete

SpscRingBuffer &operator=(SpscRingBuffer &&other) noexcept = delete

SpscRingBuffer(std::size_t num_buffers, const std::string &label, ReadTimeoutCallback timeout_cb = nullptr, std::chrono::nanoseconds timeout_ns = std::chrono::nanoseconds::zero())

Create a single-producer, single-consumer ring buffer.

Parameters

num_buffers – [in] The number of buffers to use in the ring buffer.
label – [in] The debug string to use in printState().
timeout_cb – [in] The function to call when a read times out.
timeout_ns – [in] The duration in nanoseconds before the timeout callback is called when no read input is available. If 0, never call the callback.

~SpscRingBuffer(): Default destructor.

void write(const BufferType &obj)

Lock the ring buffer, write to it and unlock it.

Parameters: obj – [in] The buffer to be written to.

BufferType *writeLock()

Lock and return a buffer for writing.

Only one buffer can be locked for writing at any time. Calling writeLock() when a buffer is already locked will return the same buffer.

If no buffer is available, then the function will block until either:

An existing buffer becomes available.
The ring buffer is invalidated.

Note

isValid() must be used to determine whether the returned buffer is valid or not.

Returns: A buffer to write to.

void writeComplete()

Unlock the currently write-locked buffer.

Pre: A buffer must have been locked for writing using writeLock().
Post: The next time writeLock() is called, a different buffer will be returned.

const BufferType &readLock()

Lock a buffer for read access.

If no buffer is available the function will block until either:

A buffer becomes available (writeComplete() is called from another thread)
The ring buffer is invalidated.
Some buffers are read-locked and readReset() is called.

Several buffers can be locked in reading mode and each call to readLock() will return a new buffer.

If timeout_ns is greater than zero and you provide a timeout callback and readLock() has been waiting for a buffer for longer than timeout_ns then call the callback function until a new read buffer becomes available.

Note

isValid() must be used to determine whether the returned buffer is valid or not.

void readComplete()

Unlock the oldest read-locked buffer.

Pre: A buffer must have been locked for reading using readLock().

void readReset(): All the buffers currently locked for reading are unlocked and placed back at the front of the reading queue.

bool readAvailable() const

Check whether any buffer is available to be read-locked.

Returns: True if there is at least one buffer available to be read-locked, false if there are no buffers available.

void invalidate()

Invalidate the ring buffer.

All the calls after this call will become non blocking.

All the objects returned by calls on an invalidated ring buffer are invalid and should be discarded or ignored.

void reset(): Reset all ring buffer values to the initial state.

bool isValid() const

Check the state of the ring buffer.

Returns: True if the ring buffer is in a valid state, or false if it was invalidated.

std::string getState(const std::string &prefix) const

Debug function to print the current state of the ring buffer.

Parameters: prefix – [in] The string to prefix the state with.
Returns: The current state of the ring buffer

std::size_t numBuffers() const: Return the maximum number of elements the ring buffer can store.

const std::string &label() const: Label associated with this ring buffer.

class IQueue

Common interface implemented by various queues.

The interface describes the memory requirements of the queue and provides an interface to disconnect the queue from its data source.

Subclassed by model_runtime::InputQueue, model_runtime::OutputQueue

Public Functions

virtual ~IQueue() = default: Default destructor.

virtual const popef::TensorInfo &tensorInfo() const = 0

Get the shape and data type of a tensor.

Each buffer in the queue has the same type and shape.

Returns: The structure encapsulating the shape and data type of a tensor.

virtual int64_t numBuffers() const = 0: Get the number of buffers this queue can store.

virtual void disconnect() = 0

Disconnect the queue ring buffers: no longer wait for data.

All queue ring buffers are invalidated and immediately return from any blocking calls.

Disconnected queues can no longer be used to feed real data.

Typically, disconnect() is used at shutdown to feed dummy data to the executable until it returns from its run() method and can be safely destroyed.

RingMemoryPool model_runtime::allocateQueueStorage(const IQueue &queue, int64_t extra_buffers = 0)

Allocate a memory pool large enough to back the given queue.

Parameters

queue – [in] The queue the memory pool will be used to feed.
extra_buffers – [in] The number of extra buffers to allocate in addition to the queue’s requirements.

Returns

A memory pool.

using model_runtime::ReadStartCallback = std::function<void(void)>: Signature of the function called just before the first chunk of data is about to be transferred to Poplar to build a complete input for the executable.

using model_runtime::ReadCompleteCallback = ReadStartCallback : Signature of the function called when the data for a complete model input is about to be consumed by the executable.

using model_runtime::WriteCompleteCallback = std::function<void(void)>: Signature of the function called after the data has been written.

struct InputData

Structure represents queue input data.

Public Members

const uint8_t *data = {nullptr}: Pointer to the buffer containing the data to read.

int64_t data_size = {0}: Size in bytes of the data.

ReadStartCallback readStartCallback = {nullptr}: Optional function to call just before the first chunk of data is about to be fetched to finally build a complete model input.

Note

The callback might be called more than once if the data is prefetched, then discarded and fetched again.

ReadCompleteCallback readCompleteCallback = {nullptr}: Optional function to call when the data for a complete model input is about to be consumed by the executable.

Note

The callback might be called more than once if the data is prefetched, then discarded and fetched again.

struct OutputData

Structure represents queue output data.

Public Members

uint8_t *data = {nullptr}: Where to write the output.

int64_t data_size = {0}: Amount of data to write in bytes.

WriteCompleteCallback writeCompleteCallback = {nullptr}: Optional function to call after the data has been written.

Note

The callback might be called more than once if the data is prefetched then discarded and fetched again.

using model_runtime::InputRingBuffer = model_runtime::SpscRingBuffer<InputData>: Fixed size, single producer, single consumer ring buffer for input data.

using model_runtime::OutputRingBuffer = model_runtime::SpscRingBuffer<OutputData>: Fixed size, single producer, single consumer ring buffer for output data.

class InputQueue : public model_runtime::IQueue 

Pack or split the data that you pass to match the amount of data expected by the executable.

For example, if the model was compiled to process an input tensor of size [256, 48, 48], which means samples of 48x48 across a batch size of 256, then the application can enqueue 48x48 tensors of any batch size. This queue will take care of either regrouping the inputs into a single run or splitting them across several runs.

Note

It is your responsibility to ensure the data size enqueued is a multiple of a single sample size.

Note

InputQueue cannot be instantiated directly; it is created by QueueManager.

Public Functions

InputQueue(std::shared_ptr<InputRingBuffer> buffer, const popef::TensorInfo &info)

Create an input queue.

Parameters

buffer – [in] The target ring buffer connected to a Poplar callback.
info – [in] The buffer description.

InputData *enqueueLock()

Lock a buffer for writing.

This is a blocking call and only one buffer can be locked for writing at any time. Calling enqueueLock() when a buffer is already locked will return the same buffer.

Returns: A pointer to an InputData object to fill.

void enqueueComplete()

Unlock the current write-locked buffer.

Pre: A buffer must have been locked for writing using enqueueLock().
Post: The next time enqueueLock() is called, a different buffer will be returned.

void enqueue(const void *input, int64_t data_size, ReadStartCallback read_start_callback = nullptr, ReadCompleteCallback read_complete_callback = nullptr)

Convenience method to lock a buffer for writing, fill InputData and unlock the buffer.

Note

The callback might be called more than once if the data is prefetched, then discarded and fetched again.

Parameters

input – [in] The address of the buffer containing the data. The data is not copied to a buffer, the pointer must remain valid until the data has been used by the ring buffer consumer.
data_size – [in] The number of bytes to use from the input. Must be a multiple of single sample size.
read_start_callback – [in] An optional callback to call when the data starts being read.
read_complete_callback – [in] An optional callback to call when the data read is complete.

void flush(): Flush any partial input still being created or enqueue a dummy batch.

void reset(): Reset underlying ring buffer.

virtual void disconnect() override: Parent interface method.

virtual const popef::TensorInfo &tensorInfo() const override: Parent interface method.

virtual int64_t numBuffers() const override: Parent interface method.

class OutputQueue : public model_runtime::IQueue 

Pack or split the data returned by the executable to the application’s batches.

10.2.4. Runtime management

enum class model_runtime::LaunchPolicy

Session creation policy.

Values:

enumerator Immediate: Acquire device and load the executable during Session construction.

enumerator Deferred: Acquire a device outside the Session object and load the executable inside the bindToDevice() method.

struct SessionConfig

Session configuration.

Public Members

LaunchPolicy policy = LaunchPolicy::Deferred 

Session creation policy that is associated with acquiring a device.

Default: LaunchPolicy::Deferred.

PopefDataUsagePredicate pred_tensor_data = null_popef_data_usage_predicate 

Predicate for anchor callback.

This controls user callback handling for the anchor. It is not used by default.

bool check_package_hash = true

If true, the Poplar hash will be checked before the executable is loaded onto the device.

If false, this check is not done. Default: true.

DeviceWaitConfig wait_config = {}

By default Session throws an exception when it is not able to attach any device needed by the given model.

This behaviour can be changed by setting a custom device wait config.

std::string max_look_ahead = "unlimited"

Limits the number of host synchronisation points the runtime library is allowed to prepare in advance.

This will help preventing the IPU from being idle. The string value “unlimited” (the default) removes this restriction completely.

Possible values:

”unlimited”: the runtime library will decide the value.
”x”: where x is an unsigned integer value.

class Session

Link a model to a device.

Note

If two or more sessions share a device, runLoadPrograms() and runSavePrograms() will implicitly be called when the device gets bound or unbound to this session.

Public Types

using ProgIdxType = popef::ProgramFlow::ProgramIndexType : The index for a runnable program available in the executable.

using ProgramsAndAnchorsMap = std::map<ProgIdxType, std::vector<const popef::Anchor*>>: Mapping between program index and a vector of popef::Anchor objects appearing in that program that are available in the executable.

Public Functions

Session(const Session&) = delete

Session &operator=(const Session &other) = delete

Session(Session&&) = default: Default forwarding constructor.

Session &operator=(Session&&) = default: Default move assignment operator.

explicit Session(const std::vector<std::string> &popef_paths, const SessionConfig &config = {})

Create a new Session object.

Parameters

popef_paths – The paths to PopEF files from which the model will be loaded.
config – The session configuration.

explicit Session(std::shared_ptr<popef::Model> model, const SessionConfig &config = {})

Create a Session object.

Parameters

model – The model which will be loaded and executed on the IPU.
config – The session configuration.

~Session(): Default destructor.

void bindToDevice(std::shared_ptr<Device> device)

Bind the session to a device and load the executable onto it.

If the session is already bound to a device, this method first unbinds the current device before binding to the new device.

Parameters: device – [in] The wrapper around a Poplar device.

void runLoadPrograms()

Run the programs to copy the data to the device.

Note

This method is implicitly called before the first call to runMainPrograms() after the device was bound to this session.

Pre: The session must be bound to a device.

void runMainPrograms()

Run the main programs.

Note

If the device was last used by a different session, this method first unbinds the device from that session, then binds it to this session and calls runLoadPrograms() before actually running the main programs.

Pre: The session must be bound to a device.

void runSavePrograms()

Run the programs to copy the data back to the host.

Note

This method is implicitly called when the device bound to this session gets unbound.

Pre: The session must be bound to a device.

void runPrograms(const std::vector<ProgIdxType> &progs)

Run your own set of programs.

Each program will run once.

If you run programs from the main set of programs and you did not previously run programs to load the data, you might get incorrect results. The same applies if you run programs to save data before running programs from the load and main set.

Therefore, proper program order in the vector is really important.

Note

This function is for advanced users who understand what the programs do during the execution, and what the result is. Remember that the order of the programs in the vector matters. Programs will be run in sequence based on their position in the vector.

Parameters: progs – [in] The set of program indices which you would like to run. Indices need to be present in the loaded popef::Model.
Pre: The session must be bound to a device.

std::shared_ptr<popef::Model> model() const: Model associated with this session.

void unloadFromDevice()

Unload the session from the device it is currently bound to.

Pre: The session must be bound to a device.

void setCallbackForAnchor(const std::string &anchor_handle, CallbackHandle callback)

Set the callback (data source or destination buffer and a way of managing it) for a popef::Anchor (an input or output tensor).

Parameters

anchor_handle – [in] The anchor handle to which the callback will be assigned. Each popef::Anchor has a unique handle.
callback – [in] The callback to be called whenever the stream is to be read or was written to by the device. This depends on whether the callback is assigned * to an input tensor or an output tensor.

Pre

The session must be bound to a device.

void setUserOutputHandler(CallbackFactory factory, const AnchorCallbackPredicate &anchor_callback_predicate = null_anchor_callback_predicate, bool skip_connected = false)

Set up handlers for output tensors.

If the factory returns nullptr for a tensor then the existing callback remains in place.

If the factory returns a callback for a tensor which already had a callback associated with it then the existing callback is discarded and the new one is used instead.

Parameters

factory – [in] The factory that will be called once per output tensor.
anchor_callback_predicate – [in] The functor controlling user callback binding.
skip_connected – [in] If true, call the factory only for the streams which are not connected. If false, call the factory for all the streams that are required.

Pre

The session must be bound to a device.

void setUserInputHandler(CallbackFactory factory, const AnchorCallbackPredicate &anchor_callback_predicate = null_anchor_callback_predicate, bool skip_connected = false)

Set up handlers for input tensors.

If the factory returns nullptr for a tensor then the existing callback remains in place.

If the factory returns a callback for a tensor which already had a callback associated with it then the existing callback is discarded and the new one is used instead.

Parameters

factory – [in] The factory that will be called once per input tensor.
anchor_callback_predicate – [in] The functor controlling user callback binding.
skip_connected – [in] If true, call the factory only for the streams which are not connected. If false, call the factory for all the streams that are required.

Pre

The session must be bound to a device.

ProgramsAndAnchorsMap anchorsNotConnectedToCallbacks(const std::vector<ProgIdxType> &progs)

Returns anchors that are not connected to callbacks for particular programs.

Parameters: progs – [in] The list of program indices.
Returns: Map, where the key is the program index and value is the vector of anchors that have no linked callbacks for that program.

void errorIfAnchorsAreNotConnectedToCallbacks(const std::vector<ProgIdxType> &progs)

Check if all programs have connected all required anchors to the callbacks.

If there are any programs that are not connected to callbacks, then this method throws an error that lists these programs in the error message.

Parameters: progs – [in] List of program indices.

void stop()

Stop the working session.

Send the stop signal to the executable. Disconnect queues if QueueManager is bound. The device will be left in an undefined state and no more programs can be run until reload() is called.

void reload(): Load executable again on the bound device.

bool isStopped(): Return true if the executable has stopped.

template<typename ...T> inline QueueManager *createQueueManager(T&&... args)

Create QueueManager.

Session takes full ownership of the created QueueManager object. The lifetime of the created QueueManager object is strictly linked with the Session lifetime.

Params should be passed in the same order as in the QueueManager constructors. See model_runtime::QueueManager class.

Returns: Access ptr to created QueueManager.

std::vector<const popef::Anchor*> getInputAnchors() const

Returns all inputs.

This includes inputs that need a user-defined callback or inputs that already have a callback defined based on data from the popef::Model.

Returns: A vector of pointers to popef::Anchor objects.

std::vector<const popef::Anchor*> getUserInputAnchors() const

Returns user inputs which are inputs that need a user-defined callback.

Returns: A vector of pointers to popef::Anchor objects.

std::vector<const popef::Anchor*> getUserOutputAnchors() const

Returns user outputs which are outputs that need a user-defined callback.

Returns: A vector of pointers to popef::Anchor objects.

Search help

10. C++ API reference

10.1. High-level API

10.1.1. Device management

10.1.2. Tensor memory representation

10.1.3. ModelRunner

10.2. Low-level API

10.2.1. Anchor callback management

10.2.2. Queue memory management

10.2.3. Queue management

10.2.4. Runtime management