10. C++ API reference

10.1. High-level API

10.1.1. Device management

enum class model_runtime::DeviceWaitStrategy

Defines the different options for waiting for a device to become available.

Values:

enumerator NO_WAIT

An exception will be thrown if no IPU device is immediately available.

enumerator WAIT_WITH_TIMEOUT

The device manager will wait for a specified amount of time for an IPU device to become available.

The device manager will try to attach to the required device at a specified interval.

enumerator WAIT_FOREVER

The device manager will wait until an IPU device is available.

The device manager will try to attach to the required device at a specified interval.

struct DeviceWaitConfig

The configuration of how to wait for the expected device.

Public Functions

constexpr DeviceWaitConfig() = default

Constructor with default values.

inline constexpr DeviceWaitConfig(DeviceWaitStrategy p_strategy, std::chrono::seconds p_timeout = std::chrono::seconds{0}, std::chrono::seconds p_sleepTime = std::chrono::seconds{15})

Constructor with specified device waiting strategy.

Parameters
inline explicit constexpr DeviceWaitConfig(std::chrono::seconds p_timeout, std::chrono::seconds p_sleepTime = std::chrono::seconds{15})

Constructor with device waiting strategy DeviceWaitStrategy::WAIT_WITH_TIMEOUT.

This means that the device manager will wait for a finite amount of time, p_timeout, for a device to become available.

Parameters
  • p_timeout[in] The time in seconds to wait for a device.

  • p_sleepTime[in] The time in seconds between attach attempts.

Public Members

DeviceWaitStrategy strategy = {DeviceWaitStrategy::NO_WAIT}

The device waiting strategy.

std::chrono::seconds timeout = {0}

The time in seconds to wait for if no device is currently available.

std::chrono::seconds sleepTime = {15}

The time in seconds between attach attempts.

struct DeviceConstraints

Requirements for the device that the user wants to connect to.

Public Functions

constexpr DeviceConstraints() = default

Constructor with default values.

inline explicit constexpr DeviceConstraints(bool p_requiresRemoteBuffersSupport)

Constructor with specified values.

Parameters

p_requiresRemoteBuffersSupport[in] If true, the device has to support remote buffers. If false, the device does not have to support remote buffers. Default: false.

inline constexpr operator bool() const

Check whether the selection of the device has any constraints.

If true, the device has constraints; if false, it does not.

Public Members

bool requiresRemoteBuffersSupport = false

Specify that the device has to support remote buffers.

If true, the device has to support remote buffers. If false, the device does not have to support remote buffers. Default: false.

class Device

Create a device.

This is a wrapper around a Poplar device.

Public Functions

Device(const Device&) = delete
Device &operator=(const Device &other) = delete
Device(Device&&) = default

Default forward constructor.

Device &operator=(Device&&) = default

Default move assignment operator.

Device(poplar::Device device, int64_t ipu_version)

Constructor with specified values.

Parameters
  • device[in] The Poplar device. This is a device that can execute code.

  • ipu_version[in] The architecture version of the IPU.

poplar::Device &device()

Get the underlying Poplar device.

const poplar::Device &device() const

Get the underlying Poplar device.

int64_t ipuVersion() const

Get the architecture version of the device.

Returns

The architecture version of the IPU or -1 if unknown.

Protected Functions

bool isActiveSession(const Session *session) const

Get whether a session is the active session for this device.

True if the session is the active session for this device, false otherwise.

void bindToSession(Session *session)

Bind the device to a session.

If the device is bound to a different session then unbindSession() is called on that session first.

Parameters

session[in] The new session to bind the device to.

void unbindSession()

Unbind the device from the active session.

Friends

friend class Session
class DeviceManager

Select which device to run on.

Public Functions

DeviceManager &operator=(const DeviceManager &other) = delete
DeviceManager &operator=(DeviceManager&&) = delete
DeviceManager()

Constructor with default values.

DeviceManager(const DeviceManager&) = default

Default copy constructor.

DeviceManager(DeviceManager&&) = default

Default forward constructor.

int64_t ipuHardwareVersion()
Returns

Either:

  • The architecture version of the IPU in the system. For example, 2 for the Mk2 IPU (GC200 or Bow) or 21 for the Mk2 IPU with FP8 support (C600).

  • -1 if there is an IPU but the architecture is unknown.

  • 0 if there is no IPU in the system.

std::shared_ptr<Device> getDevice(std::shared_ptr<popef::Model> model, const DeviceWaitConfig &wait_config = {})

Get a device matching the configuration needed by the given model.

Parameters
  • model[in] A PopEF model. This method gets the number of IPUs and the option flags from the model.

  • wait_config[in] The configuration of how to wait for the requested device.

Returns

The device, if attachment is successful, otherwise throws an error.

std::shared_ptr<Device> tryGetDevice(std::shared_ptr<popef::Model> model, const DeviceWaitConfig &wait_config = {})

Try to get a device matching the configuration needed by the given model.

Parameters
  • model[in] A PopEF model. This method gets the number of IPUs and the option flags from the model.

  • wait_config[in] The configuration of how to wait for the requested device.

Returns

The device, if attachment is successful, otherwise a nullptr.

std::shared_ptr<Device> getDevice(int64_t num_ipus, const poplar::OptionFlags &device_options = {}, const DeviceWaitConfig &wait_config = {}, const DeviceConstraints &constrains = {})

Get a device matching the requested configuration.

Parameters
  • num_ipus[in] The number of IPUs the device must contain.

  • device_options[in] The device options.

  • wait_config[in] The configuration of how to wait for the expected device.

  • constrains[in] The set of constraints that the device must meet.

Returns

The device, if attachment is successful, otherwise throws an error.

std::shared_ptr<Device> tryGetDevice(int64_t num_ipus, const poplar::OptionFlags &device_options = {}, const DeviceWaitConfig &wait_config = {}, const DeviceConstraints &constrains = {})

Try to get a device matching the requested configuration.

Parameters
  • num_ipus[in] The number of IPUs the device must contain.

  • device_options[in] The device options.

  • wait_config[in] The configuration of how to wait for the expected device.

  • constrains[in] The set of constraints that the device must meet.

Returns

The device, if attachment is successful, otherwise a nullptr.

std::shared_ptr<Device> getSpecificDevice(int64_t device_id, std::shared_ptr<popef::Model> model, const DeviceWaitConfig &wait_config = {})

Get a specific device matching the requested configuration.

Parameters
  • device_id[in] The ID of the device to acquire.

  • model[in] A PopEF model. This method gets the option flags from the model.

  • wait_config[in] The configuration of how to wait for the expected device.

Returns

The device, if attachment is successful, otherwise throws an error.

std::shared_ptr<Device> getSpecificDevice(int64_t device_id, const poplar::OptionFlags &device_options = {}, const DeviceWaitConfig &wait_config = {})

Get a specific device matching the requested configuration.

Parameters
  • device_id[in] The ID of the device to acquire.

  • device_options[in] The device options.

  • wait_config[in] The configuration of how to wait for the expected device.

Returns

The device, if attachment is successful, otherwise throws an error.

std::shared_ptr<Device> tryGetSpecificDevice(int64_t device_id, std::shared_ptr<popef::Model> model, const DeviceWaitConfig &wait_config = {})

Try to get a specific device matching the requested configuration.

Parameters
  • device_id[in] The ID of the device to acquire.

  • model[in] A PopEF model. This method gets the option flags from the model.

  • wait_config[in] The configuration of how to wait for the expected device.

Returns

The device, if attachment is successful, otherwise a nullptr.

std::shared_ptr<Device> tryGetSpecificDevice(int64_t device_id, const poplar::OptionFlags &device_options = {}, const DeviceWaitConfig &wait_config = {})

Try to get a specific device matching the requested configuration.

Parameters
  • device_id[in] The ID of the device to acquire.

  • device_options[in] The device options.

  • wait_config[in] The configuration of how to wait for the expected device.

Returns

The device, if attachment is successful, otherwise a nullptr.

std::shared_ptr<Device> createIpuModelDevice(int64_t num_ipus, int64_t ipu_version = 2, int64_t tiles_per_ipu = 0)

Create a model of the device.

Parameters
  • num_ipus[in] The number of IPUs the device must contain.

  • ipu_version[in] The target architecture version of the IPU.

  • tiles_per_ipu[in] The number of tiles per IPU the model will have. If 0, defaults to the number of tiles for the IPU architecture.

Returns

The model of the device.

std::shared_ptr<Device> createIpuModelDevice(std::shared_ptr<popef::Model> model, int64_t tiles_per_ipu = 0)

Create a model of the device.

Parameters
  • model[in] A PopEF model. This method gets the number of IPUs and the IPU architecture version from the model.

  • tiles_per_ipu[in] The number of tiles per IPU the model will have. If 0, defaults to the number of tiles for the chosen IPU architecture.

Returns

The model of the device.

std::shared_ptr<Device> createSmallIpuModelDevice(int64_t num_ipus, int64_t ipu_version = 2)

Create a small IPU model of the device.

Small IPU models only have 4 tiles: they are much quicker to create and run than a full size model but can only run small programs.

Parameters
  • num_ipus[in] The number of IPUs the device must contain.

  • ipu_version[in] The target architecture version of the IPU.

Returns

A device with the model that was created.

std::shared_ptr<Device> createSmallIpuModelDevice(std::shared_ptr<popef::Model> model)

Create a small IPU model of the device.

Small IPU models only have 4 tiles: they are much quicker to create and run than a full size model but can only run small programs.

Parameters

model[in] A PopEF model.

Returns

A device with the model attached.

10.1.2. Tensor memory representation

struct TensorMemoryView

Mutable view to already allocated memory.

Public Functions

TensorMemoryView() = default

Default constructor.

TensorMemoryView(void *data, uint64_t data_size_bytes)

Mutable view to memory.

Parameters
  • data[in] The pointer to the allocated memory.

  • data_size_bytes[in] The size of the memory block, in bytes.

Public Members

void *data = nullptr

The pointer to the allocated memory.

uint64_t data_size_bytes = 0

The size of the memory block, in bytes.

struct ConstTensorMemoryView

Immutable view to already allocated memory.

Public Functions

ConstTensorMemoryView() = default

Default constructor.

ConstTensorMemoryView(const TensorMemoryView &other)

Default copy constructor.

ConstTensorMemoryView(const void *data, uint64_t data_size_bytes)

Immutable view to const memory.

Parameters
  • data[in] The pointer to the allocated memory.

  • data_size_bytes[in] The size of the memory block, in bytes.

Public Members

const void *data = nullptr

Pointer to the allocated memory.

uint64_t data_size_bytes = 0

The size of the memory block, in bytes.

struct TensorMemory

Tensor memory manager responsible for allocating, storing, sharing and releasing tensor memory.

Public Functions

TensorMemory() = default

Default constructor.

TensorMemory(int64_t data_size_bytes)

Allocate the user-requested memory block and store it in a shared_ptr.

Parameters

data_size_bytes[in] The size of the memory block, in bytes.

TensorMemoryView getView()

Get the mutable memory view.

ConstTensorMemoryView getConstView()

Get the immutable memory view.

ConstTensorMemoryView getView() const

Get the immutable memory view.

Public Members

uint64_t data_size_bytes = 0

The size of the memory block, in bytes.

std::shared_ptr<void> data = nullptr

Pointer to the allocated memory.

10.1.3. ModelRunner

using model_runtime::InputMemoryView = std::unordered_map<std::string, ConstTensorMemoryView>

Mapping between a tensor name and an immutable memory view.

Used as input to ModelRunner::execute.

using model_runtime::OutputMemoryView = std::unordered_map<std::string, TensorMemoryView>

Mapping between a tensor name and a memory view.

Used as output from ModelRunner::execute, when the output memory is allocated and managed by the ModelRunner client.

using model_runtime::OutputMemory = std::unordered_map<std::string, TensorMemory>

Mapping between a tensor name and memory.

Used as output from ModelRunner::execute output when memory is allocated during execution by the library.

using model_runtime::OutputFutureMemoryView = std::unordered_map<std::string, std::shared_future<TensorMemoryView>>

Mapping between a tensor name and a future memory view.

Used as output from the asynchronous ModelRunner::execute when the output memory is allocated and managed by the ModelRunner client.

using model_runtime::OutputFutureMemory = std::unordered_map<std::string, std::shared_future<TensorMemory>>

Mapping between a tensor name and a future memory.

Used as output from ModelRunner::execute output when memory is allocated during execution by the library.

struct DataDesc

The description of the data used by ModelRunner.

Public Functions

DataDesc(std::string name, int64_t size_in_bytes, std::vector<int64_t> shape, popef::DataType data_type, bool popef_contains_tensor_data = false)

Create a description of input or output data.

Parameters
  • name[in] The name of the input or output tensor.

  • size_in_bytes[in] The size of the tensor measured in bytes.

  • shape[in] A vector defining the shape of the tensor. The size of the vector is equal to the number of tensor dimensions. Each element of the vector indicates the size of the corresponding dimension.

  • data_type[in] The data type of a single tensor element.

  • popef_contains_tensor_data[in] If true, the model has a tensor data blob associated with the tensor. If false, the model does not have a tensor data blob associated with the tensor. Default: false.

Public Members

std::string name

The name of the input or output tensor.

int64_t size_in_bytes

The size of the tensor measured in bytes.

std::vector<int64_t> shape

A vector defining the shape of the tensor.

The size of the vector is equal to the number of tensor dimensions. Each element of the vector indicates the size of the corresponding dimension.

popef::DataType data_type

The data type of a single tensor element.

bool popef_contains_tensor_data

If true, the model has a tensor data blob associated with the tensor.

If false, the model does not have a tensor data blob associated with the tensor.

using model_runtime::InputDesc = DataDesc

Description of input data required by ModelRunner.

using model_runtime::OutputDesc = DataDesc

Description of output data required by ModelRunner.

using model_runtime::ReplicaIdToOutputMemoryView = std::unordered_map<unsigned, OutputMemoryView>

Mapping of replicas to the memory allocated for the output tensors required by ModelRunner.

using model_runtime::ReplicaIdToDevice = std::unordered_map<unsigned, std::shared_ptr<Device>>

Mapping of replicas to physical entities that can execute the IPU programs.

struct ModelRunnerConfig

ModelRunner configuration options.

Public Members

unsigned replication_factor = 1

Number of replicas to be created.

bool run_save_programs = false

If true, “save” programs will be called on ModelRunner instance destruction.

If false, “save” programs will not be called. Default: false.

bool thread_safe = false

If true, the mutex will be locked on each execution call.

If false, the mutex will not be locked. By default the model runner is not thread-safe and each replica has an independent mutex. Default: false.

InputMemoryView frozen_inputs = {}

Map of the user-data required by the model when it is loaded onto hardware as well as any data that will be considered as constant during model execution.

This allows the overwriting of tensor data saved inside a PopEF file.

ReplicaIdToOutputMemoryView replica_to_save_programs_outputs = {}

Mapping between replica ID and OutputMemoryView.

The PopEF data format allows for the creation of “save” programs that will be executed on ModelRunner instance destruction, if required. This function allows you to get data returned by these “save” programs.

ReplicaIdToDevice replica_to_device = {}

Mapping between replica ID and Device.

This allows you to set a specific device for a given replica. By default, the model runner assigns devices automatically.

DeviceWaitConfig device_wait_config = {}

By default, the model runner throws an exception when it is not able to attach to any device required by the given model.

This behaviour can be changed by setting a custom DeviceWaitConfig.

bool check_package_hash = true

If true, the Poplar hash will be checked before the executable is loaded onto the device.

If false, this check is not done. Default: true.

std::chrono::nanoseconds timeout_ns = std::chrono::seconds(5)

Duration in nanoseconds before the timeout callback is called when the IPU is waiting for input data which is not available.

If 0, never call the timeout, in other words wait forever for the data.

bool validate_io_params = true

If true, the I/O parameters will be checked when the ModelRunner “execute” functions are run.

If false, this check is not done. Default: true.

unsigned batching_dim = std::numeric_limits<unsigned>::max()

Enables dynamic batch sizing and specifies the dimension of the input and output data that contains the dynamic batch size.

By default, dynamic batch sizing is disabled and ModelRunner can only accept inputs and outputs where the batch size is an integer multiple of the batch size defined in the PopEF model.

Dynamic batch sizing is disabled when batching_dim is set to 0xFFFFFFFF.

To enable dynamic batch sizing, set batching_dim to the value of the dimension that contains the dynamic batch size. This value can be any positive integer less than the maximum dimension of the model. ModelRunner will accept the batch size specified in the tensor dimension specified by batching_dim. The batch size can now be any value.

bool auto_reset = false

If true, the IPU will be reset automatically before the next inference when:

  • an application runtime error occurs, or

  • a recoverable error occurs and the RecoveryAction is IPU_RESET.

If false, an exception will be raised and no action will be taken. Default: false.

std::string max_look_ahead = "unlimited"

The number of host synchronization points that Model Runtime is allowed to prepare in advance.

This prevents the IPU from being idle.

Possible values:

  • ”unlimited”: Model Runtime decides the best value. (Default)

  • ”x”: An unsigned integer value.

The default value of “unlimited” usually offers the best performance. If there are deadlocks or other timeouts which are not the result of other errors, set this value to 0.

Possible use case for max_look_ahead set to 0:

  • The model is compiled with prefetch enabled.

  • Device iterations is greater than 1.

  • A single input for the execute function is not enough to run an iteration number equal to device iterations.

An alternative to setting max_look_ahead to 0 is to use executeAsync to handle requests. In this case, the application must know when the data will be ready to read.

unsigned ring_buffer_size_multiplier = 2

The multiplier used to determine the size of the ring buffer.

The ring buffer size is given by ring_buffer_size_multiplier * batch size.

bool flush_on_waiting_outputs = false

If true, only flush the dummy data when there is a user request waiting for outputs from queues.

Otherwise, wait forever for user-requested data in the callback. If false, flush the dummy data anyway.

std::chrono::nanoseconds batch_size_timeout_ns = std::chrono::nanoseconds::max()

Duration in nanoseconds before the timeout callback is called when the IPU is waiting for input data to compose a batch to load.

If max(), the callback will never be called, in other words, the model will wait forever for the data.

std::chrono::nanoseconds data_parallel_timeout_ns = std::chrono::nanoseconds::max()

Duration in nanoseconds before the timeout callback is called when the IPU is waiting in parallel for input data.

If max(), the callback will never be called, in other words, the model will wait forever for the data.

bool is_batch_size_timeout_enabled = false

If true, check the batch_size_timeout_ns and data_parallel_timeout_ns in the timeout callback when the IPU is waiting for input data which doesn’t arrive.

size_t request_tracepoints_buffer_size = 1000

Size of the request tracepoint buffer.

If set to 0, the request tracepoint buffer size will be infinite. Note: if flush_on_waiting_outputs is set to false, no request tracepoints will not be recorded.

std::function<void(const std::string &tensor_id, const InputMemoryView *&inputs, const OutputMemoryView *&outputs)> flush_callback = nullptr

The config option that specifies the pointer to the callback function for flushing data that is not available.

flush_callback indicates to Model Runtime which data to flush.

If specified, the flush_callback function will be called once when the IPU has timed out after waiting for input data, which is not available.

The application which defines the flush_callback function must ensure that the inputs and outputs it specifies for Model Runtime to flush are correct. Otherwise the results of this and following inferences will be corrupted.

Param tensor_id

[in] The ID of the tensor that Model Runtime was waiting for when the timeout occurred.

Param inputs

[in] The pointer to the input data structure to be flushed. Model Runtime uses this to determine which inputs should be flushed after the flush_callback function is called.

Param outputs

[in] The pointer to the output data structure to be flushed. Model Runtime uses this to determine which outputs should be flushed after the flush_callback function is called.

class ModelRunner

Inference model abstraction.

The model runner creates a session, manages queues, runs Poplar executable programs and allows execution of inference models synchronously and asynchronously.

Public Functions

ModelRunner(const ModelRunner&) = delete
ModelRunner &operator=(const ModelRunner &other) = delete
ModelRunner(ModelRunner&&) = default

Default forward constructor.

ModelRunner &operator=(ModelRunner&&) = default

Default move assignment operator.

explicit ModelRunner(const std::string &popef_path, const ModelRunnerConfig &config = ModelRunnerConfig{})

Create a new ModelRunner object.

Parameters
  • popef_path – The path to PopEF files from which the model will be loaded.

  • config – The model runner configuration.

explicit ModelRunner(const std::vector<std::string> &popef_paths, const ModelRunnerConfig &config = ModelRunnerConfig{})

Create a new ModelRunner object.

Parameters
  • popef_paths – Paths to PopEF files from which the model will be loaded.

  • config – The model runner configuration.

explicit ModelRunner(std::shared_ptr<popef::Model> model, const ModelRunnerConfig &config = ModelRunnerConfig{})

Create a new ModelRunner object.

Parameters
  • model – The model which will be loaded and run.

  • config – The model runner configuration.

~ModelRunner()

Default destructor.

OutputMemory execute(const InputMemoryView &input_data, unsigned replica_id = 0)

Run model synchronously.

This will allocate output memory internally.

Parameters
  • input_data[in] The user-allocated tensor buffer for all executable input tensors.

  • replica_id[in] The user-selected replica that will execute computations. Must be less than replication_factor provided in ModelRunnerConfig.

Returns

The output memory allocated by Model Runtime for all executable output tensors.

void execute(const InputMemoryView &input_data, const OutputMemoryView &output_data, unsigned replica_id = 0)

Run a model synchronously.

This uses output memory that you allocate and pass pointers to.

Parameters
  • input_data[in] The user-allocated tensor buffer for all executable input tensors.

  • output_data[in] The user-allocated tensor buffer for all executable output tensors.

  • replica_id[in] The user-selected replica that will execute computations. Must be less than replication_factor provided in ModelRunnerConfig.

OutputFutureMemory executeAsync(const InputMemoryView &input_data, unsigned replica_id = 0)

Run a model asynchronously.

This will allocate output memory internally.

Parameters
  • input_data[in] The user-allocated tensor buffer for all executable input tensors.

  • replica_id[in] The user-selected replica that will execute computations. Must be less than replication_factor provided in ModelRunnerConfig.

Returns

The future result of an asynchronous call for all executable output tensors.

OutputFutureMemoryView executeAsync(const InputMemoryView &input_data, const OutputMemoryView &output_data, unsigned replica_id = 0)

Run a model asynchronously.

This uses output memory that you allocate and pass pointers to.

Parameters
  • input_data[in] The user-allocated tensor buffer for all executable input tensors.

  • output_data[in] The user-allocated tensor buffer for all executable output tensors.

  • replica_id[in] The user-selected replica that will execute computations. Must be less than replication_factor provided in ModelRunnerConfig.

Returns

The future result of an asynchronous call for all executable output tensors.

std::vector<InputDesc> getExecuteInputs() const

Get a description of the input data required in the execute class methods.

Returns

A vector of DataDesc instances.

std::vector<OutputDesc> getExecuteOutputs() const

Get a description of the output data required in the execute class methods.

Returns

A vector of DataDesc instances.

std::vector<InputDesc> getModelInputs() const

Get a description of all the user-provided input data.

In addition to the data used by the execute calls, this will return a description of all tensors used by the model which must be provided when loading the model onto the device. The data required for the additional tensors may be included in PopEF files. In this case, the descriptions of the additional are loaded automatically by ModelRunner.

Returns

A vector of DataDesc instances.

std::vector<OutputDesc> getModelOutputs() const

Get a description of all the user-provided output data.

In addition to the data used by the execute calls, this will return a list of descriptions of all tensors used by the model that the loading phase requires (weights tensors as an example). The data for these additional tensors can be included in PopEF files that are loaded automatically by the ModelRunner.

Returns

The vector of DataDesc instances.

std::shared_ptr<popef::Model> model() const

The model associated with this model runner.

const ModelRunnerConfig &config() const

The configuration associated with this model runner.

inline void getTimeTrace(std::map<std::string, float> &time_info, unsigned replica_id = 0, int cur = -1)

Get the time taken for different phases of the last request.

Parameters
  • time_info[in] The time trace information is written to this map. An empty std::map<std::string, float> should be passed in.

  • replica_id[in] The ID of the replica. Default: 0.

  • cur[in] The .

Returns

The time trace of the last execute function called. The time trace information returned as std::map<std::string, float>:

  • request_duration_us: time taken (in microseconds) for the last request from the point it was received to the point that the computation was complete.

  • read_preparation_duration_us: time taken (in microseconds) for the preparation of the last request before it was added to the queue.

  • read_queue_duration_us: time (in microseconds) the last task spent in the queue.

  • computation_duration_us: time taken (in microseconds) for the computation to complete.

inline void getMonitoringStatisticsPercentile(std::map<std::string, double> &monitoring_info, double quantile = 0.9, unsigned replica_id = 0)

Get the percentile of latencies.

Parameters
  • monitoring_info[in] The monitoring information is written to this map. An empty std::map<std::string, float> should be passed in.

  • quantile[in] The percentile to use. Default: 0.9 returns the P90 latency.

  • replica_id[in] The ID of the replica. Default: 0.

Returns

The percentile of the latencies is returned in monitoring_info as a std::map<std::string, float> containing:

  • request_monitoring_statistics_percentile_us: percentile of the latency (in microseconds) across the entire request.

  • read_preparation_monitoring_statistics_percentile_us: percentile of the latency (in microseconds) in the read preparation phase.

  • read_queue_monitoring_statistics_percentile_us: percentile of the latency (in microseconds) in the read queue phase.

  • computation_monitoring_statistics_percentile_us: percentile of the latency (in microseconds) in the computation phase.

inline void getMonitoringStatisticsMean(std::map<std::string, double> &monitoring_info, unsigned replica_id = 0)

Get the mean value of the latencies.

Parameters
  • monitoring_info[in] The monitoring information is written to this map. An empty std::map<std::string, float> should be passed in.

  • replica_id[in] The ID of the replica. Default: 0.

Returns

The mean value of the latencies is returned in monitoring_info as a std::map<std::string, float> containing:

  • request_monitoring_statistics_mean_us: mean latency (in microseconds) across the entire request.

  • read_preparation_monitoring_statistics_mean_us: mean latency (in microseconds) in the read preparation phase.

  • read_queue_monitoring_statistics_mean_us: mean latency (in microseconds) in the read queue phase.

  • computation_monitoring_statistics_mean_us: mean latency (in microseconds) in the computation phase.

inline void getMonitoringStatisticsTotalCount(std::map<std::string, double> &monitoring_info, unsigned replica_id = 0)

Get the total number of requests.

Parameters
  • monitoring_info[in] The monitoring information is written to this map. An empty std::map<std::string, float> should be passed in.

  • replica_id[in] The ID of the replica. Default: 0.

Returns

The total number of requests is returned in monitoring_info as a std::map<std::string, float> containing:

  • request_monitoring_statistics_count: number of requests in the request phase.

  • read_preparation_monitoring_statistics_count: number of requests in the read preparation phase.

  • read_queue_monitoring_statistics_count: number of requests in the read queue phase.

  • computation_monitoring_statistics_count: number of requests in the computation preparation phase.

10.2. Low-level API

10.2.1. Anchor callback management

struct CallbackInfo

Information passed to CallbackFactory.

Public Members

const popef::Anchor &anchor

Input or output tensors of the model expected by the device.

using model_runtime::CallbackHandle = std::function<void(void*)>

The callback function called whenever the stream will be read from or written to by the device.

The memory location will only be valid for reading or writing for the duration of the callback.

using model_runtime::CallbackFactory = std::function<poplar::StreamCallbackHandle(const CallbackInfo &info)>

Factory to create a callback for the given callback information.

enum class model_runtime::PopefDataUsagePolicy

The policy for using PopEF TensorData and TensorFeed when creating anchor callbacks.

Values:

enumerator USE_POPEF_DATA_IF_ANY = 0

Use the TensorData and the TensorFeed stored in the model’s PopEF data to implicitly create callbacks for the Anchors, for which the data exists.

enumerator USE_USER_DATA

Don’t use the data stored in the PopEF.

This allows you to bind your own data source.

using model_runtime::PopefDataUsagePredicate = std::function<PopefDataUsagePolicy(const popef::Anchor&)>

PopefDataUsagePredicates are used to control the use of PopEF tensor or feed data when creating callbacks for Anchors.

For more information, see the description of anchors in the PopEF User Guide.

static const PopefDataUsagePredicate model_runtime::null_popef_data_usage_predicate = {}
enum class model_runtime::AnchorCallbackPolicy

Policy to handle anchor callbacks.

Values:

enumerator BIND_USER_CB = 0

Bind user callback to anchor.

enumerator BIND_EMPTY_CB

Bind empty (dummy) callback to anchor.

enumerator SKIP_CB

Skip binding a callback to the anchor.

using model_runtime::AnchorCallbackPredicate = std::function<AnchorCallbackPolicy(const popef::Anchor&)>

AnchorCallbackPredicates are used to control the callback creation policy for an individual Anchor.

For more information, see the description of anchors in the PopEF User Guide.

static const AnchorCallbackPredicate model_runtime::null_anchor_callback_predicate = {}
namespace predicate_factory

Predefined callback predicates.

Set of basic predicates to control handling of PopEF data use or anchor callbacks.

template<typename Policy>
class AnchorWithPolicy
#include <SessionUtils.hpp>

The callback-handling policy to be used with an anchor.

Template Parameters

Policy – The policy to be bound to an anchor.

Public Members

const popef::Anchor &anchor

Anchor with input or output data to a program.

Policy policy

The callback-handling policy.

template<typename Policy>
class ProgramsWithPolicy
#include <SessionUtils.hpp>

The callback-handling policies to be used for each program.

Template Parameters

Policy – The policy to be bound to the programsIndexes.

Public Members

const std::vector<popef::ProgramFlow::ProgramIndexType> &programsIndexes

A set of indexes to named programs.

Policy policy

The callback-handling policy.

namespace anchor_callbacks

Functions

AnchorCallbackPredicate predProgramFlowLoad(const popef::ProgramFlow &flow, AnchorCallbackPolicy accept_policy = AnchorCallbackPolicy::BIND_USER_CB, AnchorCallbackPolicy reject_policy = AnchorCallbackPolicy::BIND_EMPTY_CB)

Callback predicate to filter all anchors owned by any “load” programs.

Parameters
  • flow[in] The user-model PopEF program flow (to read “load” program numbers from).

  • accept_policy[in] The anchor acceptance policy.

  • reject_policy[in] The anchor rejection policy.

Returns

The anchor callback predicate.

AnchorCallbackPredicate predProgramFlowMain(const popef::ProgramFlow &flow, AnchorCallbackPolicy accept_policy = AnchorCallbackPolicy::BIND_USER_CB, AnchorCallbackPolicy reject_policy = AnchorCallbackPolicy::BIND_EMPTY_CB)

Callback predicate to filter all anchors owned by the main program.

Parameters
  • flow[in] The user-model PopEF program flow (to read main program number from).

  • accept_policy[in] The anchor acceptance policy.

  • reject_policy[in] The anchor rejection policy.

Returns

The anchor callback predicate.

AnchorCallbackPredicate predProgramFlowSave(const popef::ProgramFlow &flow, AnchorCallbackPolicy accept_policy = AnchorCallbackPolicy::BIND_USER_CB, AnchorCallbackPolicy reject_policy = AnchorCallbackPolicy::BIND_EMPTY_CB)

Callback predicate to filter all anchors owned by any save programs.

Parameters
  • flow[in] The user-model PopEF program flow (to read main program number from).

  • accept_policy[in] The anchor acceptance policy.

  • reject_policy[in] The anchor rejection policy.

Returns

The anchor callback predicate.

AnchorCallbackPredicate predProgramNotAssigned(AnchorCallbackPolicy accept_policy = AnchorCallbackPolicy::BIND_USER_CB, AnchorCallbackPolicy reject_policy = AnchorCallbackPolicy::BIND_EMPTY_CB)

Callback predicate to filter all anchors that are not assigned to any programs.

Parameters
  • accept_policy[in] The anchor acceptance policy.

  • reject_policy[in] The anchor rejection policy.

Returns

The anchor callback predicate.

AnchorCallbackPredicate predNonScalarType(AnchorCallbackPolicy accept_policy = AnchorCallbackPolicy::BIND_USER_CB, AnchorCallbackPolicy reject_policy = AnchorCallbackPolicy::BIND_EMPTY_CB)

Callback predicate to filter all anchors that are not scalars.

Parameters
  • accept_policy[in] The anchor acceptance policy.

  • reject_policy[in] The anchor rejection policy.

Returns

The anchor callback predicate.

AnchorCallbackPredicate predProgramIndexes(const std::vector<popef::ProgramFlow::ProgramIndexType> &program_indexes, AnchorCallbackPolicy accept_policy = AnchorCallbackPolicy::BIND_USER_CB, AnchorCallbackPolicy reject_policy = AnchorCallbackPolicy::BIND_EMPTY_CB)

Callback predicate to filter all anchors owned by any of the programs passed in by the user.

Parameters
  • program_indexes[in] The program indices to filter.

  • accept_policy[in] The anchor acceptance policy.

  • reject_policy[in] The anchor rejection policy.

Returns

The anchor callback predicate.

AnchorCallbackPredicate predProgramIndexes(const std::vector<ProgramsWithPolicy<AnchorCallbackPolicy>> &accepted_programs_policies, const AnchorCallbackPolicy reject_policy = AnchorCallbackPolicy::BIND_EMPTY_CB)

Callback predicate to apply defined anchor callback handling policies to grouped program indices.

Parameters
  • accepted_programs_policies – The program indices with defined anchor callback handling policies.

  • reject_policy – The anchor rejection policy.

Returns

The anchor callback predicate.

AnchorCallbackPredicate predAnchorsPolicies(const std::vector<AnchorWithPolicy<AnchorCallbackPolicy>> &accepted_anchors_policies, const AnchorCallbackPolicy reject_policy = AnchorCallbackPolicy::BIND_EMPTY_CB)

Callback predicate to apply anchor handling policies.

Parameters
  • accepted_anchors_policies[in] The anchor indices with handling policies.

  • reject_policy[in] The anchor rejection policy.

Returns

The anchor callback predicate.

template<typename ...Args>
AnchorCallbackPredicate andBind(AnchorCallbackPolicy accept_policy, AnchorCallbackPolicy reject_policy, Args&&... pred)

Conjunction operator.

Combines multiple predicates into one predicate.

Parameters
  • accept_policy[in] The anchor acceptance policy.

  • reject_policy[in] The anchor rejection policy.

  • pred[in] The predicates which will be combined by operator.

Returns

The anchor callback predicate which returns accept_policy when one of the passed predicates returns an accept_policy otherwise returns reject_policy.

template<typename ...Args>
AnchorCallbackPredicate orBind(AnchorCallbackPolicy accept_policy, AnchorCallbackPolicy reject_policy, Args&&... pred)

Disjunction operator.

Combines multiple predicates into one predicate.

Parameters
  • accept_policy[in] The anchor acceptance policy.

  • reject_policy[in] The anchor rejection policy.

  • pred[in] The predicates which will be combined by operator.

Returns

The anchor callback predicate which returns accept_policy when one of the passed predicates returns an accept_policy otherwise returns reject_policy.

namespace popef_data_usage

Functions

PopefDataUsagePredicate predProgramFlowLoad(const popef::ProgramFlow &flow, PopefDataUsagePolicy accept_policy = PopefDataUsagePolicy::USE_USER_DATA, PopefDataUsagePolicy reject_policy = PopefDataUsagePolicy::USE_POPEF_DATA_IF_ANY)

Callback predicate to filter all anchors owned by any “load” programs.

Parameters
  • flow[in] The user-model PopEF program flow (to read “load” program numbers from).

  • accept_policy[in] The anchor acceptance policy.

  • reject_policy[in] The anchor rejection policy.

Returns

The anchor callback predicate.

PopefDataUsagePredicate predProgramFlowMain(const popef::ProgramFlow &flow, PopefDataUsagePolicy accept_policy = PopefDataUsagePolicy::USE_USER_DATA, PopefDataUsagePolicy reject_policy = PopefDataUsagePolicy::USE_POPEF_DATA_IF_ANY)

Callback predicate to filter all anchors owned by the main program.

Parameters
  • flow[in] The user-model PopEF program flow (to read main program number from).

  • accept_policy[in] The anchor acceptance policy.

  • reject_policy[in] The anchor rejection policy.

Returns

The anchor callback predicate.

PopefDataUsagePredicate predProgramFlowSave(const popef::ProgramFlow &flow, PopefDataUsagePolicy accept_policy = PopefDataUsagePolicy::USE_USER_DATA, PopefDataUsagePolicy reject_policy = PopefDataUsagePolicy::USE_POPEF_DATA_IF_ANY)

Callback predicate to filter all anchors owned by any save programs.

Parameters
  • flow[in] The user-model PopEF program flow (to read main program number from).

  • accept_policy[in] The anchor acceptance policy.

  • reject_policy[in] The anchor rejection policy.

Returns

The anchor callback predicate.

PopefDataUsagePredicate predProgramNotAssigned(PopefDataUsagePolicy accept_policy = PopefDataUsagePolicy::USE_USER_DATA, PopefDataUsagePolicy reject_policy = PopefDataUsagePolicy::USE_POPEF_DATA_IF_ANY)

Callback predicate to filter all anchors that are not assigned to any programs.

Parameters
  • accept_policy[in] The anchor acceptance policy.

  • reject_policy[in] The anchor rejection policy.

Returns

The anchor callback predicate.

PopefDataUsagePredicate predProgramIndexes(const std::vector<popef::ProgramFlow::ProgramIndexType> &program_indexes, PopefDataUsagePolicy accept_policy = PopefDataUsagePolicy::USE_USER_DATA, PopefDataUsagePolicy reject_policy = PopefDataUsagePolicy::USE_POPEF_DATA_IF_ANY)

Callback predicate to filter all anchors owned by any of the programs passed in by the user.

Parameters
  • program_indexes[in] The program indices to filter.

  • accept_policy[in] The anchor acceptance policy.

  • reject_policy[in] The anchor rejection policy.

Returns

The anchor callback predicate.

PopefDataUsagePredicate predProgramIndexes(const std::vector<ProgramsWithPolicy<PopefDataUsagePolicy>> &accepted_programs_policies, const PopefDataUsagePolicy reject_policy = PopefDataUsagePolicy::USE_POPEF_DATA_IF_ANY)

Callback predicate to apply defined anchor callback handling policies to grouped program indices.

Parameters
  • accepted_programs_policies – The program indices with defined anchor callback handling policies.

  • reject_policy – The anchor rejection policy.

Returns

The anchor callback predicate.

PopefDataUsagePredicate predAnchorsPolicies(const std::vector<AnchorWithPolicy<PopefDataUsagePolicy>> &accepted_anchors_policies, const PopefDataUsagePolicy reject_policy = PopefDataUsagePolicy::USE_POPEF_DATA_IF_ANY)

Callback predicate to apply anchor handling policies.

Parameters
  • accepted_anchors_policies[in] The anchor indices with handling policies.

  • reject_policy[in] The anchor rejection policy.

Returns

The anchor callback predicate.

template<typename ...Args>
PopefDataUsagePredicate orBind(PopefDataUsagePolicy accept_policy, PopefDataUsagePolicy reject_policy, Args&&... pred)

Disjunction operator.

Combines multiple predicates into one Predicate.

Parameters
  • accept_policy[in] The anchor acceptance policy.

  • reject_policy[in] The anchor rejection policy.

  • pred[in] The predicates which will be combined by operator.

Returns

The anchor callback predicate which returns accept_policy when one of the passed predicates returns an accept_policy otherwise returns reject_policy.

10.2.2. Queue memory management

class IMemoryPool

A common interface for all memory allocators.

Subclassed by model_runtime::RingMemoryPool

Public Functions

virtual void *getMemoryBlob() = 0

Get a writable memory blob.

Note

The memory blob retains ownership of the blob’s memory and is responsible for freeing the memory.

virtual int64_t blobSize() const = 0

Get the size of the memory blob.

class RingMemoryPool : public model_runtime::IMemoryPool

Memory pool of fixed size blobs.

Allocate the requested number of blobs at construction time and loop over the blobs every time getMemoryBlob() is called.

Public Functions

RingMemoryPool(int64_t blob_size, int64_t num_blobs)

Create a ring memory pool.

This allocates memory of num_blobs * blob_size size under the hood.

Parameters
  • blob_size[in] The size of a single memory blob.

  • num_blobs[in] The number of memory blobs.

int64_t numBlobs() const

Get the number of memory blobs.

virtual int64_t blobSize() const override

Get the size of a memory blob.

virtual void *getMemoryBlob() override

Get a pointer to the next blob.

Note

When the end of the memory pool is reached, the iteration starts from the beginning again.

10.2.3. Queue management

template<typename BufferType>
class SpscRingBuffer

A lock-free, fixed-size, single-producer, single-consumer ring buffer implementation.

auto dst = rb.writeLock();
if (!rb.isValid()) {
 // dst is not valid: don't dereference it.
 return;
}
*dst = obj;
rb.writeComplete();

Note

writeLock() and readLock() are blocking calls which might return early if the ring buffer gets invalidated, so you should always use isValid() after locking a buffer to check whether the returned buffer is safe to use or not.

Public Types

using ReadTimeoutCallback = std::function<void(SpscRingBuffer*)>

Signature of the function to call when readLock() times out.

Public Functions

SpscRingBuffer(const SpscRingBuffer &other) = delete
SpscRingBuffer(const SpscRingBuffer &&other) noexcept = delete
SpscRingBuffer &operator=(const SpscRingBuffer &other) = delete
SpscRingBuffer &operator=(SpscRingBuffer &&other) noexcept = delete
SpscRingBuffer(std::size_t num_buffers, const std::string &label, ReadTimeoutCallback timeout_cb = nullptr, std::chrono::nanoseconds timeout_ns = std::chrono::nanoseconds::zero())

Create a single-producer, single-consumer ring buffer.

Parameters
  • num_buffers[in] The number of buffers to use in the ring buffer.

  • label[in] The debug string to use in printState().

  • timeout_cb[in] The function to call when a read times out.

  • timeout_ns[in] The duration in nanoseconds before the timeout callback is called when no read input is available. If 0, never call the callback.

~SpscRingBuffer()

Default destructor.

void write(const BufferType &obj)

Lock the ring buffer, write to it and unlock it.

Parameters

obj[in] The buffer to be written to.

BufferType *writeLock()

Lock and return a buffer for writing.

Only one buffer can be locked for writing at any time. Calling writeLock() when a buffer is already locked will return the same buffer.

If no buffer is available, then the function will block until either:

  • An existing buffer becomes available.

  • The ring buffer is invalidated.

Note

isValid() must be used to determine whether the returned buffer is valid or not.

Returns

A buffer to write to.

void writeComplete()

Unlock the currently write-locked buffer.

Pre

A buffer must have been locked for writing using writeLock().

Post

The next time writeLock() is called, a different buffer will be returned.

const BufferType &readLock()

Lock a buffer for read access.

If no buffer is available the function will block until either:

  • A buffer becomes available (writeComplete() is called from another thread)

  • The ring buffer is invalidated.

  • Some buffers are read-locked and readReset() is called.

Several buffers can be locked in reading mode and each call to readLock() will return a new buffer.

If timeout_ns is greater than zero and you provide a timeout callback and readLock() has been waiting for a buffer for longer than timeout_ns then call the callback function until a new read buffer becomes available.

Note

isValid() must be used to determine whether the returned buffer is valid or not.

void readComplete()

Unlock the oldest read-locked buffer.

Pre

A buffer must have been locked for reading using readLock().

void readReset()

All the buffers currently locked for reading are unlocked and placed back at the front of the reading queue.

bool readAvailable() const

Check whether any buffer is available to be read-locked.

Returns

True if there is at least one buffer available to be read-locked, false if there are no buffers available.

void invalidate()

Invalidate the ring buffer.

All the calls after this call will become non blocking.

All the objects returned by calls on an invalidated ring buffer are invalid and should be discarded or ignored.

void reset()

Reset all ring buffer values to the initial state.

bool isValid() const

Check the state of the ring buffer.

Returns

True if the ring buffer is in a valid state, or false if it was invalidated.

std::string getState(const std::string &prefix) const

Debug function to print the current state of the ring buffer.

Parameters

prefix[in] The string to prefix the state with.

Returns

The current state of the ring buffer

std::size_t numBuffers() const

Return the maximum number of elements the ring buffer can store.

const std::string &label() const

Label associated with this ring buffer.

class IQueue

Common interface implemented by various queues.

The interface describes the memory requirements of the queue and provides an interface to disconnect the queue from its data source.

Subclassed by model_runtime::InputQueue, model_runtime::OutputQueue

Public Functions

virtual ~IQueue() = default

Default destructor.

virtual const popef::TensorInfo &tensorInfo() const = 0

Get the shape and data type of a tensor.

Each buffer in the queue has the same type and shape.

Returns

The structure encapsulating the shape and data type of a tensor.

virtual int64_t numBuffers() const = 0

Get the number of buffers this queue can store.

virtual void disconnect() = 0

Disconnect the queue ring buffers: no longer wait for data.

All queue ring buffers are invalidated and immediately return from any blocking calls.

Disconnected queues can no longer be used to feed real data.

Typically, disconnect() is used at shutdown to feed dummy data to the executable until it returns from its run() method and can be safely destroyed.

RingMemoryPool model_runtime::allocateQueueStorage(const IQueue &queue, int64_t extra_buffers = 0)

Allocate a memory pool large enough to back the given queue.

Parameters
  • queue[in] The queue the memory pool will be used to feed.

  • extra_buffers[in] The number of extra buffers to allocate in addition to the queue’s requirements.

Returns

A memory pool.

using model_runtime::ReadStartCallback = std::function<void(void)>

Signature of the function called just before the first chunk of data is about to be transferred to Poplar to build a complete input for the executable.

using model_runtime::ReadCompleteCallback = ReadStartCallback

Signature of the function called when the data for a complete model input is about to be consumed by the executable.

using model_runtime::WriteCompleteCallback = std::function<void(void)>

Signature of the function called after the data has been written.

struct InputData

Structure represents queue input data.

Public Members

const uint8_t *data = {nullptr}

Pointer to the buffer containing the data to read.

int64_t data_size = {0}

Size in bytes of the data.

ReadStartCallback readStartCallback = {nullptr}

Optional function to call just before the first chunk of data is about to be fetched to finally build a complete model input.

Note

The callback might be called more than once if the data is prefetched, then discarded and fetched again.

ReadCompleteCallback readCompleteCallback = {nullptr}

Optional function to call when the data for a complete model input is about to be consumed by the executable.

Note

The callback might be called more than once if the data is prefetched, then discarded and fetched again.

struct OutputData

Structure represents queue output data.

Public Members

uint8_t *data = {nullptr}

Where to write the output.

int64_t data_size = {0}

Amount of data to write in bytes.

WriteCompleteCallback writeCompleteCallback = {nullptr}

Optional function to call after the data has been written.

Note

The callback might be called more than once if the data is prefetched then discarded and fetched again.

using model_runtime::InputRingBuffer = model_runtime::SpscRingBuffer<InputData>

Fixed size, single producer, single consumer ring buffer for input data.

using model_runtime::OutputRingBuffer = model_runtime::SpscRingBuffer<OutputData>

Fixed size, single producer, single consumer ring buffer for output data.

class InputQueue : public model_runtime::IQueue

Pack or split the data that you pass to match the amount of data expected by the executable.

For example, if the model was compiled to process an input tensor of size [256, 48, 48], which means samples of 48x48 across a batch size of 256, then the application can enqueue 48x48 tensors of any batch size. This queue will take care of either regrouping the inputs into a single run or splitting them across several runs.

Note

It is your responsibility to ensure the data size enqueued is a multiple of a single sample size.

Note

InputQueue cannot be instantiated directly; it is created by QueueManager.

Public Functions

InputQueue(std::shared_ptr<InputRingBuffer> buffer, const popef::TensorInfo &info)

Create an input queue.

Parameters
  • buffer[in] The target ring buffer connected to a Poplar callback.

  • info[in] The buffer description.

InputData *enqueueLock()

Lock a buffer for writing.

This is a blocking call and only one buffer can be locked for writing at any time. Calling enqueueLock() when a buffer is already locked will return the same buffer.

Returns

A pointer to an InputData object to fill.

void enqueueComplete()

Unlock the current write-locked buffer.

Pre

A buffer must have been locked for writing using enqueueLock().

Post

The next time enqueueLock() is called, a different buffer will be returned.

void enqueue(const void *input, int64_t data_size, ReadStartCallback read_start_callback = nullptr, ReadCompleteCallback read_complete_callback = nullptr)

Convenience method to lock a buffer for writing, fill InputData and unlock the buffer.

Note

The callback might be called more than once if the data is prefetched, then discarded and fetched again.

Parameters
  • input[in] The address of the buffer containing the data. The data is not copied to a buffer, the pointer must remain valid until the data has been used by the ring buffer consumer.

  • data_size[in] The number of bytes to use from the input. Must be a multiple of single sample size.

  • read_start_callback[in] An optional callback to call when the data starts being read.

  • read_complete_callback[in] An optional callback to call when the data read is complete.

void flush()

Flush any partial input still being created or enqueue a dummy batch.

void reset()

Reset underlying ring buffer.

virtual void disconnect() override

Parent interface method.

virtual const popef::TensorInfo &tensorInfo() const override

Parent interface method.

virtual int64_t numBuffers() const override

Parent interface method.

class OutputQueue : public model_runtime::IQueue

Pack or split the data returned by the executable to the application’s batches.

See also

InputQueue

Note

It is your responsibility to ensure the data size enqueued is a multiple of the output sample size.

Note

The batch size used in OutputQueue must match the one enqueued to InputQueue.

Note

OutputQueue cannot be instantiated directly; it is created by QueueManager.

Public Functions

OutputQueue(std::shared_ptr<OutputRingBuffer> buffer, const popef::TensorInfo &info)

Create an output queue.

Parameters
  • buffer[in] The source ring buffer connected to a Poplar callback.

  • info[in] The buffer description

OutputData *enqueueLock()

Blocking call to lock a buffer for reading.

Only one buffer can be locked for reading at any time. Calling readLock() when a buffer is already locked will return the same buffer.

Note

This is a queue of pointers or addresses (not data). The content of the buffer will be overwritten after readComplete() is called.

Returns

A pointer to the buffer containing the data to be read.

void enqueueComplete()

Unlock the current read-locked buffer.

Pre

A buffer must have been locked for reading using readLock().

Post

The next time readLock() is called, a different buffer will be returned.

void enqueue(void *output, int64_t data_size, WriteCompleteCallback write_complete_callback = nullptr)

Convenience method to lock a buffer for reading, fill OutputData and unlock the buffer.

Parameters
  • output[in] The address of the buffer to be read. The pointer must remain valid until the data has been copied by the ring buffer producer.

  • data_size[in] The number of bytes to copy to the output. Must be a multiple of the single sample size.

  • write_complete_callback[in] An optional callback to call when the output has been filled.

void flush()

Flush any partial output still being created or, if there isn’t any, enqueue a handler to handle the dummy output produced by the dummy input enqueued by the corresponding InputQueue flush() methods.

void reset()

Reset underlying ring buffer.

virtual void disconnect() override

Parent interface method.

virtual const popef::TensorInfo &tensorInfo() const override

Parent interface method.

virtual int64_t numBuffers() const override

Parent interface method.

using model_runtime::RingSizeMultiplierProdType = std::function<int64_t(const popef::Anchor&)>

A multiplier of the ring size of the producer data type.

A factory function type used to produce a ring size multiplier for a specific popef::Anchor. You can define this kind of functor to control the size of the QueueManager ring buffer (see: SpscRingBuffer class). For example, in case of tensors loaded by the program only once (like weights or other tensors fetched from host in “load” program) the ring size buffer can be set up to 1, as there is no need to prefetch more values from the user.

class QueueManager

Create and manage queues related to a session.

Note

Queues currently only work for models with replica == 1.

Public Functions

QueueManager(const QueueManager&) = delete
QueueManager(QueueManager&&) = delete
QueueManager &operator=(const QueueManager &other) = delete
QueueManager &operator=(QueueManager&&) = delete
~QueueManager() = default

Default destructor.

InputQueue &inputQueue(const std::string &name)

Get the input queue of the named tensor or stream.

Parameters

name[in] The name of the tensor or stream.

OutputQueue &outputQueue(const std::string &name)

Get the output queue of the named tensor or stream.

Parameters

name[in] The name of the tensor or stream.

void flushAll()

Call flush() on all the queues.

void disconnectAll()

Disconnect all the queues from their ring buffers.

void resetAll()

Reset all the queues after session stop.

Public Members

std::map<std::string, InputQueue> inputs

Map the user input streams to their input queues.

std::map<std::string, OutputQueue> outputs

Map the user output streams to their output queues.

namespace ring_size_multiplier_factory

Functions

RingSizeMultiplierProdType ringSizeMultForProgs(const std::vector<popef::ProgramFlow::ProgramIndexType> &programs, int64_t selected_ring_size_multiplier, int64_t others_ring_size_multiplier = 1)

Factory function returning the selected ring buffer size multiplier for Anchors “owned” by the set of programs (programs that fetch Anchor data from the host).

For the remaining anchors the other value is returned. This function can be passed to the QueueManager constructor to control the sizes of its internal ring buffers created for model anchors.

Parameters
  • programs[in] The programs “owning” the anchors of interest.

  • selected_ring_size_multiplier[in] The ring buffer size multiplier for anchors owned by the programs.

  • others_ring_size_multiplier[in] The ring buffer size multiplier for remaining anchors. Note: It is preferred to set the ring buffer multiplier to 1 for anchors from the “load” or “save” ProgramFlow to reduce memory.

RingSizeMultiplierProdType ringSizeMultForMainProgs(const popef::Model &model, int64_t main_ring_size_multiplier, int64_t load_save_ring_size_multiplier = 1)

Factory function returning the selected ring buffer size multiplier for anchors “owned” by the main programs (programs of the main ProgramFlow that fetch Anchor data from the host).

For the remaining anchors (“owned” by “load” or “save” ProgramFlow) the other value is returned. This function can be passed to the QueueManager constructor to control the sizes of its internal ring buffers created for model Anchor objects.

Note

It is preferred to set the ring buffer multiplier to 1 for anchors from the “load” or “save” ProgramFlow to reduce memory.

Parameters
  • model[in] A combination of PopEF blobs representing a single model.

  • main_ring_size_multiplier[in] The ring buffer size multiplier for anchors owned by the main programs.

  • load_save_ring_size_multiplier[in] The ring buffer size multiplier for remaining anchors.

10.2.4. Runtime management

enum class model_runtime::LaunchPolicy

Session creation policy.

Values:

enumerator Immediate

Acquire device and load the executable during Session construction.

enumerator Deferred

Acquire a device outside the Session object and load the executable inside the bindToDevice() method.

struct SessionConfig

Session configuration.

Public Members

LaunchPolicy policy = LaunchPolicy::Deferred

Session creation policy that is associated with acquiring a device.

Default: LaunchPolicy::Deferred.

PopefDataUsagePredicate pred_tensor_data = null_popef_data_usage_predicate

Predicate for anchor callback.

This controls user callback handling for the anchor. It is not used by default.

bool check_package_hash = true

If true, the Poplar hash will be checked before the executable is loaded onto the device.

If false, this check is not done. Default: true.

DeviceWaitConfig wait_config = {}

By default Session throws an exception when it is not able to attach any device needed by the given model.

This behaviour can be changed by setting a custom device wait config.

std::string max_look_ahead = "unlimited"

Limits the number of host synchronisation points the runtime library is allowed to prepare in advance.

This will help preventing the IPU from being idle. The string value “unlimited” (the default) removes this restriction completely.

Possible values:

  • ”unlimited”: the runtime library will decide the value.

  • ”x”: where x is an unsigned integer value.

class Session

Link a model to a device.

Note

If two or more sessions share a device, runLoadPrograms() and runSavePrograms() will implicitly be called when the device gets bound or unbound to this session.

Public Types

using ProgIdxType = popef::ProgramFlow::ProgramIndexType

The index for a runnable program available in the executable.

using ProgramsAndAnchorsMap = std::map<ProgIdxType, std::vector<const popef::Anchor*>>

Mapping between program index and a vector of popef::Anchor objects appearing in that program that are available in the executable.

Public Functions

Session(const Session&) = delete
Session &operator=(const Session &other) = delete
Session(Session&&) = default

Default forwarding constructor.

Session &operator=(Session&&) = default

Default move assignment operator.

explicit Session(const std::vector<std::string> &popef_paths, const SessionConfig &config = {})

Create a new Session object.

Parameters
  • popef_paths – The paths to PopEF files from which the model will be loaded.

  • config – The session configuration.

explicit Session(std::shared_ptr<popef::Model> model, const SessionConfig &config = {})

Create a Session object.

Parameters
  • model – The model which will be loaded and executed on the IPU.

  • config – The session configuration.

~Session()

Default destructor.

void bindToDevice(std::shared_ptr<Device> device)

Bind the session to a device and load the executable onto it.

If the session is already bound to a device, this method first unbinds the current device before binding to the new device.

Parameters

device[in] The wrapper around a Poplar device.

void runLoadPrograms()

Run the programs to copy the data to the device.

Note

This method is implicitly called before the first call to runMainPrograms() after the device was bound to this session.

Pre

The session must be bound to a device.

void runMainPrograms()

Run the main programs.

Note

If the device was last used by a different session, this method first unbinds the device from that session, then binds it to this session and calls runLoadPrograms() before actually running the main programs.

Pre

The session must be bound to a device.

void runSavePrograms()

Run the programs to copy the data back to the host.

Note

This method is implicitly called when the device bound to this session gets unbound.

Pre

The session must be bound to a device.

void runPrograms(const std::vector<ProgIdxType> &progs)

Run your own set of programs.

Each program will run once.

If you run programs from the main set of programs and you did not previously run programs to load the data, you might get incorrect results. The same applies if you run programs to save data before running programs from the load and main set.

Therefore, proper program order in the vector is really important.

Note

This function is for advanced users who understand what the programs do during the execution, and what the result is. Remember that the order of the programs in the vector matters. Programs will be run in sequence based on their position in the vector.

Parameters

progs[in] The set of program indices which you would like to run. Indices need to be present in the loaded popef::Model.

Pre

The session must be bound to a device.

std::shared_ptr<popef::Model> model() const

Model associated with this session.

void unloadFromDevice()

Unload the session from the device it is currently bound to.

Pre

The session must be bound to a device.

void setCallbackForAnchor(const std::string &anchor_handle, CallbackHandle callback)

Set the callback (data source or destination buffer and a way of managing it) for a popef::Anchor (an input or output tensor).

Parameters
  • anchor_handle[in] The anchor handle to which the callback will be assigned. Each popef::Anchor has a unique handle.

  • callback[in] The callback to be called whenever the stream is to be read or was written to by the device. This depends on whether the callback is assigned * to an input tensor or an output tensor.

Pre

The session must be bound to a device.

void setUserOutputHandler(CallbackFactory factory, const AnchorCallbackPredicate &anchor_callback_predicate = null_anchor_callback_predicate, bool skip_connected = false)

Set up handlers for output tensors.

If the factory returns nullptr for a tensor then the existing callback remains in place.

If the factory returns a callback for a tensor which already had a callback associated with it then the existing callback is discarded and the new one is used instead.

Parameters
  • factory[in] The factory that will be called once per output tensor.

  • anchor_callback_predicate[in] The functor controlling user callback binding.

  • skip_connected[in] If true, call the factory only for the streams which are not connected. If false, call the factory for all the streams that are required.

Pre

The session must be bound to a device.

void setUserInputHandler(CallbackFactory factory, const AnchorCallbackPredicate &anchor_callback_predicate = null_anchor_callback_predicate, bool skip_connected = false)

Set up handlers for input tensors.

If the factory returns nullptr for a tensor then the existing callback remains in place.

If the factory returns a callback for a tensor which already had a callback associated with it then the existing callback is discarded and the new one is used instead.

Parameters
  • factory[in] The factory that will be called once per input tensor.

  • anchor_callback_predicate[in] The functor controlling user callback binding.

  • skip_connected[in] If true, call the factory only for the streams which are not connected. If false, call the factory for all the streams that are required.

Pre

The session must be bound to a device.

ProgramsAndAnchorsMap anchorsNotConnectedToCallbacks(const std::vector<ProgIdxType> &progs)

Returns anchors that are not connected to callbacks for particular programs.

Parameters

progs[in] The list of program indices.

Returns

Map, where the key is the program index and value is the vector of anchors that have no linked callbacks for that program.

void errorIfAnchorsAreNotConnectedToCallbacks(const std::vector<ProgIdxType> &progs)

Check if all programs have connected all required anchors to the callbacks.

If there are any programs that are not connected to callbacks, then this method throws an error that lists these programs in the error message.

Parameters

progs[in] List of program indices.

void stop()

Stop the working session.

Send the stop signal to the executable. Disconnect queues if QueueManager is bound. The device will be left in an undefined state and no more programs can be run until reload() is called.

void reload()

Load executable again on the bound device.

bool isStopped()

Return true if the executable has stopped.

template<typename ...T>
inline QueueManager *createQueueManager(T&&... args)

Create QueueManager.

Session takes full ownership of the created QueueManager object. The lifetime of the created QueueManager object is strictly linked with the Session lifetime.

Params should be passed in the same order as in the QueueManager constructors. See model_runtime::QueueManager class.

Returns

Access ptr to created QueueManager.

std::vector<const popef::Anchor*> getInputAnchors() const

Returns all inputs.

This includes inputs that need a user-defined callback or inputs that already have a callback defined based on data from the popef::Model.

Returns

A vector of pointers to popef::Anchor objects.

std::vector<const popef::Anchor*> getUserInputAnchors() const

Returns user inputs which are inputs that need a user-defined callback.

Returns

A vector of pointers to popef::Anchor objects.

std::vector<const popef::Anchor*> getUserOutputAnchors() const

Returns user outputs which are outputs that need a user-defined callback.

Returns

A vector of pointers to popef::Anchor objects.