10. Model Runtime C++ API reference

10.1. High level API

10.1.1. Device management

enum class model_runtime::DeviceWaitStrategy

Defines the different options for waiting for a device to become available.

Values:

enumerator NO_WAIT: An exception will be thrown if no IPU device is immediately available.

enumerator WAIT_WITH_TIMEOUT

The device manager will wait for a specified amount of time for an IPU device to become available.

The device manager will try to attach to the required device at a specified interval.

enumerator WAIT_FOREVER

The device manager will wait until an IPU device is available.

The device manager will try to attach to the required device at a specified interval.

struct DeviceWaitConfig

The configuration of how to wait for the expected device.

Public Functions

constexpr DeviceWaitConfig() = default: Constructor with default values.

inline constexpr DeviceWaitConfig(DeviceWaitStrategy p_strategy, std::chrono::seconds p_timeout = std::chrono::seconds{0}, std::chrono::seconds p_sleepTime = std::chrono::seconds{15})

Constructor with specified device waiting strategy.

Parameters

p_strategy – [in] The device waiting strategy.
p_timeout – [in] The time in seconds to wait for a device. Only required if p_strategy is DeviceWaitStrategy::WAIT_WITH_TIMEOUT.
p_sleepTime – [in] The time in seconds between attach attempts. Required if p_strategy is DeviceWaitStrategy::WAIT_WITH_TIMEOUT or DeviceWaitStrategy::WAIT_FOREVER.

inline explicit constexpr DeviceWaitConfig(std::chrono::seconds p_timeout, std::chrono::seconds p_sleepTime = std::chrono::seconds{15})

Constructor with device waiting strategy DeviceWaitStrategy::WAIT_WITH_TIMEOUT.

This means that the device manager will wait for a finite amount of time, p_timeout, for a device to become available.

Parameters

p_timeout – [in] The time in seconds to wait for a device.
p_sleepTime – [in] The time in seconds between attach attempts.

Public Members

DeviceWaitStrategy strategy = {DeviceWaitStrategy::NO_WAIT}: The device waiting strategy.

std::chrono::seconds timeout = {0}: The time in seconds to wait for if no device is currently available.

std::chrono::seconds sleepTime = {15}: The time in seconds between attach attempts.

struct DeviceConstraints

Requirement for the device that the user wants to connect to.

Public Functions

constexpr DeviceConstraints() = default: Constructor with default values.

inline explicit constexpr DeviceConstraints(bool p_requiresRemoteBuffersSupport)

Constructor with specified values.

Parameters: p_requiresRemoteBuffersSupport – [in] If true, sets that the device has to support remote buffers. If false, sets that the device does not have to support remote buffers. Default: false.

inline constexpr operator bool() const

Check whether the selection of the machine has any constraints.

If true, the device has constraints; if false, it does not.

Public Members

bool requiresRemoteBuffersSupport = false

Set that the device has to support remote buffers.

If true, sets that the device has to support remote buffers. If false, sets that the device does not have to support remote buffers. Default: false.

class Device

Create a device.

This is a wrapper around a Poplar device.

Public Functions

Device(const Device&) = delete

Device &operator=(const Device &other) = delete

Device(Device&&) = default: Default forward constructor.

Device &operator=(Device&&) = default: Default move assignment operator.

Device(poplar::Device device, int64_t ipu_version)

Constructor with specified values.

Parameters

device – [in] The Poplar device. This is a device that can execute code.
ipu_version – [in] The version number of the IPU.

poplar::Device &device(): Get the underlying Poplar device.

const poplar::Device &device() const: Get the underlying Poplar device.

int64_t ipuVersion() const

Get the version of the device.

Returns: The version number of the device or -1 if unknown.

Protected Functions

bool isActiveSession(const Session *session) const

Get whether a session is the active session for this device.

True if the session is the active session for this device, false otherwise.

void bindToSession(Session *session)

Bind the device to a session.

If the device is bound to a different session then unbind() is called on that session first.

Parameters: session – [in] The new session to bind the device to.

void unbindSession(): Unbind the device from the active session.

Friends

friend class Session

class DeviceManager

Select which device to run on.

Public Functions

DeviceManager &operator=(const DeviceManager &other) = delete

DeviceManager &operator=(DeviceManager&&) = delete

DeviceManager(): Constructor with default values.

DeviceManager(const DeviceManager&) = default: Default copy constructor.

DeviceManager(DeviceManager&&) = default: Default forward constructor.

int64_t ipuHardwareVersion()

Returns

Either:

The version of the IPU on the physical system.
-1 if there is an IPU but the version is unknown.
0 if there is no IPU on the system.

std::shared_ptr<Device> getDevice(std::shared_ptr<popef::Model> model, const DeviceWaitConfig &wait_config = {})

Get a device matching the configuration needed by the given model.

Parameters

model – [in] A PopEF model. This method gets the number of IPUs and the option flags from the model.
wait_config – [in] The configuration of how to wait for the expected device.

Returns

The device if attachment is successful, otherwise throw an error.

std::shared_ptr<Device> tryGetDevice(std::shared_ptr<popef::Model> model, const DeviceWaitConfig &wait_config = {})

Try to get a device matching the configuration needed by the given model.

Parameters

model – [in] A PopEF model. This method gets the number of IPUs and the option flags from the model.
wait_config – [in] The configuration of how to wait for the expected device.

Returns

The device if attachment is successful, otherwise a nullptr.

std::shared_ptr<Device> getDevice(int64_t num_ipus, const poplar::OptionFlags &device_options = {}, const DeviceWaitConfig &wait_config = {}, const DeviceConstraints &constrains = {})

Get a device matching the requested configuration.

Parameters

num_ipus – [in] The number of IPUs the device must contain.
device_options – [in] The device options.
wait_config – [in] The configuration of how to wait for the expected device.
constrains – [in] The set of constraints that the device must meet.

Returns

The device if attachment is successful, otherwise throw an error.

std::shared_ptr<Device> tryGetDevice(int64_t num_ipus, const poplar::OptionFlags &device_options = {}, const DeviceWaitConfig &wait_config = {}, const DeviceConstraints &constrains = {})

Get a device matching the requested configuration.

Parameters

num_ipus – [in] The number of IPUs the device must contain.
device_options – [in] The device options.
wait_config – [in] The configuration of how to wait for the expected device.
constrains – [in] The set of constraints that the device must meet.

Returns

The device if attachment is successful, otherwise a nullptr.

std::shared_ptr<Device> getSpecificDevice(int64_t device_id, std::shared_ptr<popef::Model> model, const DeviceWaitConfig &wait_config = {})

Get a specific device matching the requested configuration.

Parameters

device_id – [in] The ID of the device to acquire.
model – [in] A PopEF model. This method gets the option flags from the model.
wait_config – [in] The configuration of how to wait for the expected device.

Returns

The device if attachment is successful, otherwise throw an error.

std::shared_ptr<Device> getSpecificDevice(int64_t device_id, const poplar::OptionFlags &device_options = {}, const DeviceWaitConfig &wait_config = {})

Get a specific device matching the requested configuration.

Parameters

device_id – [in] The ID of the device to acquire.
device_options – [in] The device options.
wait_config – [in] The configuration of how to wait for the expected device.

Returns

The device if attachment is successful, otherwise throw an error.

std::shared_ptr<Device> tryGetSpecificDevice(int64_t device_id, std::shared_ptr<popef::Model> model, const DeviceWaitConfig &wait_config = {})

Get a specific device matching the requested configuration.

Parameters

device_id – [in] The ID of the device to acquire.
model – [in] A PopEF model. This method gets the option flags from the model.
wait_config – [in] The configuration of how to wait for the expected device.

Returns

The device if attachment is successful, otherwise a nullptr.

std::shared_ptr<Device> tryGetSpecificDevice(int64_t device_id, const poplar::OptionFlags &device_options = {}, const DeviceWaitConfig &wait_config = {})

Try to get a specific device matching the requested configuration.

Parameters

device_id – [in] The ID of the device to acquire.
device_options – [in] The device options.
wait_config – [in] The configuration of how to wait for the expected device.

Returns

The device if attachment is successful, otherwise a nullptr.

std::shared_ptr<Device> createIpuModelDevice(int64_t num_ipus, int64_t ipu_version = 2, int64_t tiles_per_ipu = 0)

Create a model of the device.

Parameters

num_ipus – [in] The number of IPUs the device must contain.
ipu_version – [in] The target architecture of the IPU.
tiles_per_ipu – [in] The number of tiles per IPU the model will have. If 0, defaults to the number of tiles for the chosen IPU version.

Returns

The model of the device.

std::shared_ptr<Device> createIpuModelDevice(std::shared_ptr<popef::Model> model, int64_t tiles_per_ipu = 0)

Create a model of the device.

Parameters

model – [in] A PopEF model. This method gets the number of IPUs and the IPU version from the model.
tiles_per_ipu – [in] The number of tiles per IPU the model will have. If 0, defaults to the number of tiles for the chosen IPU version.

Returns

The model of the device.

std::shared_ptr<Device> createSmallIpuModelDevice(int64_t num_ipus, int64_t ipu_version = 2)

Create a small IPU model of the device.

Small IPU models only have 4 tiles: they are much quicker to create and run than a full size model but can only run small programs.

Parameters

num_ipus – [in] The number of IPUs the device must contain.
ipu_version – [in] The target architecture of the IPU.

Returns

The model of the device.

std::shared_ptr<Device> createSmallIpuModelDevice(std::shared_ptr<popef::Model> model)

Create a small IPU model of the device.

Small IPU models only have 4 tiles: they are much quicker to create and run than a full size model but can only run small programs.

Parameters: model – [in] A PopEF model. This method gets the number of IPUs and the IPU version from the model.
Returns: The model of the device.

10.1.2. Tensor memory representation

struct TensorMemoryView

Mutable view to already allocated memory.

Public Functions

TensorMemoryView() = default: Default constructor.

TensorMemoryView(void *data, uint64_t data_size_bytes)

Immutable view to memory.

Parameters

data – [in] The pointer to the allocated memory.
data_size_bytes – [in] The size of the memory block, in bytes.

Public Members

void *data = nullptr: The pointer to the allocated memory.

uint64_t data_size_bytes = 0: The size of the memory block, in bytes.

struct ConstTensorMemoryView

Immutable view to already allocated memory.

Public Functions

ConstTensorMemoryView() = default: Default constructor.

ConstTensorMemoryView(const TensorMemoryView &other): Default copy constructor.

ConstTensorMemoryView(const void *data, uint64_t data_size_bytes)

Immutable view to const memory.

Parameters

data – [in] The pointer to the allocated memory.
data_size_bytes – [in] The size of the memory block, in bytes.

Public Members

const void *data = nullptr: Pointer to the allocated memory.

uint64_t data_size_bytes = 0: The size of the memory block, in bytes.

struct TensorMemory

Tensor memory manager responsible for allocating, storing, sharing and releasing tensor memory.

Public Functions

TensorMemory() = default: Default constructor.

TensorMemory(int64_t data_size_bytes)

Allocate user requested memory block and store it in a shared_ptr.

Parameters: data_size_bytes – [in] The size of the memory block, in bytes.

TensorMemoryView getView(): Get the mutable memory view.

ConstTensorMemoryView getConstView(): Get the immutable memory view.

ConstTensorMemoryView getView() const: Get the immutable memory view.

Public Members

uint64_t data_size_bytes = 0: The size of the memory block, in bytes.

std::shared_ptr<void> data = nullptr: Pointer to the allocated memory.

10.1.3. Model Runner

using model_runtime::InputMemoryView = std::unordered_map<std::string, ConstTensorMemoryView>

Mapping between a tensor name and an immutable memory view.

Used as input to ModelRunner::execute.

using model_runtime::OutputMemoryView = std::unordered_map<std::string, TensorMemoryView>

Mapping between a tensor name and a memory view.

Used as output from ModelRunner::execute, when the output memory is allocated and managed by the ModelRunner client.

using model_runtime::OutputMemory = std::unordered_map<std::string, TensorMemory>

Mapping between a tensor name and memory.

Used as output from ModelRunner::execute output when memory is allocated during execution by the library.

using model_runtime::OutputFutureMemoryView = std::unordered_map<std::string, std::shared_future<TensorMemoryView>>

Mapping between a tensor name and a future memory view.

Used as output from the asynchronous ModelRunner::execute when the output memory is allocated and managed by the ModelRunner client.

using model_runtime::OutputFutureMemory = std::unordered_map<std::string, std::shared_future<TensorMemory>>

Mapping between a tensor name and a future memory.

Used as output from ModelRunner::execute output when memory is allocated during execution by the library.

struct DataDesc

The description of data used by ModelRunner.

Public Functions

DataDesc(std::string name, int64_t size_in_bytes, std::vector<int64_t> shape, popef::DataType data_type, bool popef_contains_tensor_data = false)

Create description of input/output data.

Parameters

name – [in] The name of the input/output tensor.
size_in_bytes – [in] The size of the tensor measured in bytes.
shape – [in] A vector defining the shape of the tensor. The size of the vector is equal to the number of tensor dimensions. Each element of the vector indicates the size of a single dimension.
data_type – [in] The data type of a single tensor element.
popef_contains_tensor_data – [in] If true, the model has a tensor data blob associated with the tensor. If false, the model does not have a tensor data blob associated with the tensor. Default: false.

Public Members

std::string name: The name of the input/output tensor.

int64_t size_in_bytes: The size of the tensor measured in bytes.

std::vector<int64_t> shape

A vector defining the shape of the tensor.

The size of the vector is equal to the number of tensor dimensions. Each element of the vector indicates the size of a single dimension.

popef::DataType data_type: The data type of a single tensor element.

bool popef_contains_tensor_data

If true, the model has a tensor data blob associated with the tensor.

If false, the model does not have a tensor data blob associated with the tensor.

using model_runtime::InputDesc = DataDesc : Description of input data required by ModelRunner.

using model_runtime::OutputDesc = DataDesc : Description of output data required by ModelRunner.

using model_runtime::ReplicaIdToOutputMemoryView = std::unordered_map<unsigned, OutputMemoryView>: Mapping replicas to the allocated memory defined for the output tensors required by ModelRunner.

using model_runtime::ReplicaIdToDevice = std::unordered_map<unsigned, std::shared_ptr<Device>>: Mapping replicas to physical entities that can execute the IPU programs.

struct ModelRunnerConfig

ModelRunner configuration options.

Public Members

unsigned replication_factor = 1: Number of replicas to be created.

bool run_save_programs = false

If true, save programs will be called during ModelRunner instance destruction.

If false, save programs will not be called. Default: false.

bool thread_safe = false

If true, the mutex will be locked on each execution call.

If false, the mutex will not be locked. By default the model runner is not thread-safe and each replica has an independent mutex. Default: false.

InputMemoryView frozen_inputs = {}

Map of the user-data required by the model during the loading onto hardware phase as well as any data that will be considered as constant during model execution.

It allows for overwriting of tensor data saved inside a PopEF file.

ReplicaIdToOutputMemoryView replica_to_save_programs_outputs = {}

Mapping between replica ID and OutputMemoryView.

Allows the user to get data returned by save programs. The PopEF data format allows for the creation of “save programs” that will be executed during ModelRunner instance destruction, if required.

ReplicaIdToDevice replica_to_device = {}

Mapping between replica ID and Device.

Allows the user to set a specific device for a given replica. By default, the model runner assigns devices automatically.

DeviceWaitConfig device_wait_config = {}

By default, the model runner throws an exception when it is not able to attach to any device required by the given model.

This behavior can be changed by setting a custom DeviceWaitConfig.

bool check_package_hash = true

If true, the Poplar hash will be checked before the executable is loaded onto the device.

If false, this check is not done. Default: true.

std::chrono::nanoseconds timeout_ns = std::chrono::seconds(5)

Duration in nanoseconds to wait before calling timeout callback when the IPU is waiting for input data, which is not available.

If 0, never call the timeout, in other words, wait forever for the data.

bool validate_io_params = true

If true, the I/O parameters will be checked during the execution ModelRunner “execute” functions.

If false, this check is not done. Default: true.

class ModelRunner

Inference model abstraction.

The model runner creates a session, manages queues, runs Poplar executable programs as well as allows executing inference models synchronously and asynchronously.

Public Functions

ModelRunner(const ModelRunner&) = delete

ModelRunner &operator=(const ModelRunner &other) = delete

ModelRunner(ModelRunner&&) = default: Default forward constructor.

ModelRunner &operator=(ModelRunner&&) = default: Default move assignment operator.

explicit ModelRunner(const std::string &popef_path, const ModelRunnerConfig &config = ModelRunnerConfig{})

Create a new ModelRunner object.

Parameters

popef_path – The path to PopEF files from which the model will be loaded.
config – The model runner configuration.

explicit ModelRunner(const std::vector<std::string> &popef_paths, const ModelRunnerConfig &config = ModelRunnerConfig{})

Create a new ModelRunner object.

Parameters

popef_paths – Paths to PopEF files from which the model will be loaded.
config – The model runner configuration.

explicit ModelRunner(std::shared_ptr<popef::Model> model, const ModelRunnerConfig &config = ModelRunnerConfig{})

Create a new ModelRunner object.

Parameters

model – The model which will be loaded and run.
config – The model runner configuration.

~ModelRunner(): Default destructor.

OutputMemory execute(const InputMemoryView &input_data, unsigned replica_id = 0)

Run model synchronously.

Allocate output memory internally.

Parameters

input_data – [in] The user-allocated tensor buffer for all executable input tensors.
replica_id – [in] The user-selected replica that will execute computations. Must be less than replication_factor provided in ModelRunnerConfig.

Returns

The output memory allocated by the library for all executable output tensors.

void execute(const InputMemoryView &input_data, const OutputMemoryView &output_data, unsigned replica_id = 0)

Run a model synchronously.

The user allocates and passes pointers to output memory.

Parameters

input_data – [in] The user-allocated tensor buffer for all executable input tensors.
output_data – [in] The user-allocated tensor buffer for all executable output tensors.
replica_id – [in] The user-selected replica that will execute computations. Must be less than replication_factor provided in ModelRunnerConfig.

OutputFutureMemory executeAsync(const InputMemoryView &input_data, unsigned replica_id = 0)

Run a model asynchronously.

Allocate output memory internally.

Parameters

input_data – [in] The user-allocated tensor buffer for all executable input tensors.
replica_id – [in] The user-selected replica that will execute computations. Must be less than replication_factor provided in ModelRunnerConfig.

Returns

The future result of an asynchronous call for all executable output tensors.

OutputFutureMemoryView executeAsync(const InputMemoryView &input_data, const OutputMemoryView &output_data, unsigned replica_id = 0)

Run a model asynchronously.

The user allocates and passes pointers to output memory.

Parameters

input_data – [in] The user allocated tensor buffer for all executable input tensors.
output_data – [in] The user allocated tensor buffer for all executable output tensors.
replica_id – [in] The user-selected replica that will execute computations. Must be less than replication_factor provided in ModelRunnerConfig.

Returns

The future result of an asynchronous call for all executable output tensors.

std::vector<InputDesc> getExecuteInputs() const

Get a description of the input data required in the execute class methods.

Returns: A vector of DataDesc instances.

std::vector<OutputDesc> getExecuteOutputs() const

Get a description of the output data required in the execute class methods.

Returns: A vector of DataDesc instances.

std::vector<InputDesc> getModelInputs() const

Get a description of all the user-provided input data.

In addition to the data used by the execute calls, it will return a description of all tensors used by the model which must be provided during the phase of loading the model onto the device. The data required for the additional tensors may be included in PopEF files. In this case, the descriptions of the additional are loaded automatically by ModelRunner.

Returns: A vector of DataDesc instances.

std::vector<OutputDesc> getModelOutputs() const

Get a description of all the user-provided output data.

In addition to the data used by the execute calls, it will return a list of descriptions of all tensors used by the model that the loading phase requires (weights tensors as an example). The data for these additional tensors can be included in PopEF files that are loaded automatically by the ModelRunner.

Returns: The vector of DataDesc instances.

std::shared_ptr<popef::Model> model() const: The model associated with this model runner.

10.2. Low level API

10.2.1. Anchor callback management

struct CallbackInfo

Information passed to CallbackFactory.

Public Members

const popef::Anchor &anchor: Input/Output tensors of the model expected by the device.

using model_runtime::CallbackHandle = std::function<void(void*)>

The callback function called whenever the stream will be read from or written to by the device.

The memory location will only be valid for reading or writing for the duration of the callback.

using model_runtime::CallbackFactory = std::function<poplar::StreamCallbackHandle(const CallbackInfo &info)>: Factory to create a callback for the given callback information.

enum class model_runtime::PopefDataUsagePolicy

The policy of PopEF TensorData and TensorFeed usage for Anchors callbacks creation.

Values:

enumerator USE_POPEF_DATA_IF_ANY = 0: Utilize the TensorData and the TensorFeed stored in the model’s PopEF to implicitly create callbacks for the Anchors, for which the data exists.

enumerator USE_USER_DATA

Don’t use the data stored in the PopEF.

Let the user bind his own data source.

using model_runtime::PopefDataUsagePredicate = std::function<PopefDataUsagePolicy(const popef::Anchor&)>

PopefDataUsagePredicates are used to control the PopEF’s tensor or feed data usage while creating callbacks for Anchors.

For more information, see the description of anchors in the PopEF User Guide.

static const PopefDataUsagePredicate model_runtime::null_popef_data_usage_predicate = {}

enum class model_runtime::AnchorCallbackPolicy

Policy to handle anchor callback.

Values:

enumerator BIND_USER_CB = 0: Bind user callback to anchor.

enumerator BIND_EMPTY_CB: Bind empty (dummy) callback to anchor.

enumerator SKIP_CB: Skip binding a callback to the anchor.

using model_runtime::AnchorCallbackPredicate = std::function<AnchorCallbackPolicy(const popef::Anchor&)>

AnchorCallbackPredicates are used to control the callback creation policy for an individual Anchor.

For more information, see the description of anchors in the PopEF User Guide.

static const AnchorCallbackPredicate model_runtime::null_anchor_callback_predicate = {}

namespace predicate_factory

Predefined callback predicates.

Set of basic predicates to control handling of popef data usage or anchor callbacks.

template<typename Policy> class AnchorWithPolicy

The desired anchor with callback handling policy.

Template Parameters: Policy – The policy to be bound to an anchor.

Public Members

const popef::Anchor &anchor: Input/Output data to a named program.

Policy policy: The callback handling policy.

template<typename Policy> class ProgramsWithPolicy

Program indices with desired callback handling policy.

Template Parameters: Policy – The policy to be bound to the programsIndexes.

Public Members

const std::vector<popef::ProgramFlow::ProgramIndexType> &programsIndexes: A set of indices to named programs.

Policy policy: The callback handling policy.

namespace anchor_callbacks

Functions

AnchorCallbackPredicate predProgramFlowLoad(const popef::ProgramFlow &flow, AnchorCallbackPolicy accept_policy = AnchorCallbackPolicy::BIND_USER_CB, AnchorCallbackPolicy reject_policy = AnchorCallbackPolicy::BIND_EMPTY_CB)

Callback predicate to filter all anchors owned by any load programs.

Parameters

flow – [in] The user model PopEF program flow (to read load program numbers from).
accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.

Returns

The anchor callback predicate.

AnchorCallbackPredicate predProgramFlowMain(const popef::ProgramFlow &flow, AnchorCallbackPolicy accept_policy = AnchorCallbackPolicy::BIND_USER_CB, AnchorCallbackPolicy reject_policy = AnchorCallbackPolicy::BIND_EMPTY_CB)

Callback predicate to filter all anchors owned by the main program.

Parameters

flow – [in] The user model PopEF program flow (to read main program number from).
accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.

Returns

The anchor callback predicate.

AnchorCallbackPredicate predProgramFlowSave(const popef::ProgramFlow &flow, AnchorCallbackPolicy accept_policy = AnchorCallbackPolicy::BIND_USER_CB, AnchorCallbackPolicy reject_policy = AnchorCallbackPolicy::BIND_EMPTY_CB)

Callback predicate to filter all anchors owned by any save programs.

Parameters

flow – [in] The user model PopEF program flow (to read main program number from).
accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.

Returns

The anchor callback predicate.

AnchorCallbackPredicate predProgramNotAssigned(AnchorCallbackPolicy accept_policy = AnchorCallbackPolicy::BIND_USER_CB, AnchorCallbackPolicy reject_policy = AnchorCallbackPolicy::BIND_EMPTY_CB)

Callback predicate to filter all anchors that are not assigned to any programs.

Parameters

accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.

Returns

The anchor callback predicate.

AnchorCallbackPredicate predProgramIndexes(const std::vector<popef::ProgramFlow::ProgramIndexType> &program_indexes, AnchorCallbackPolicy accept_policy = AnchorCallbackPolicy::BIND_USER_CB, AnchorCallbackPolicy reject_policy = AnchorCallbackPolicy::BIND_EMPTY_CB)

Callback predicate to filter all anchors owned by any of the programs passed in by the user.

Parameters

program_indexes – [in] The program indices to filter.
accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.

Returns

The anchor callback predicate.

AnchorCallbackPredicate predProgramIndexes(const std::vector<ProgramsWithPolicy<AnchorCallbackPolicy>> &accepted_programs_policies, const AnchorCallbackPolicy reject_policy = AnchorCallbackPolicy::BIND_EMPTY_CB)

Callback predicate to apply defined anchor callback handling policies to grouped program indices.

Parameters

accepted_programs_policies – The program indices with defined anchor callback handling policies.
reject_policy – The anchor rejection policy.

Returns

The anchor callback predicate.

AnchorCallbackPredicate predAnchorsPolicies(const std::vector<AnchorWithPolicy<AnchorCallbackPolicy>> &accepted_anchors_policies, const AnchorCallbackPolicy reject_policy = AnchorCallbackPolicy::BIND_EMPTY_CB)

Callback predicate to apply anchor handling policies.

Parameters

accepted_anchors_policies – [in] The anchor indices with handling policies.
reject_policy – [in] The anchor rejection policy.

Returns

The anchor callback predicate.

template<typename ...Args> AnchorCallbackPredicate orBind(AnchorCallbackPolicy accept_policy, AnchorCallbackPolicy reject_policy, Args&&... pred)

Disjunction operator.

Combines multiple predicates into one Predicate.

Parameters

accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.
pred – [in] The predicates which will be combined by operator.

Returns

The anchor callback predicate which returns accept_policy when one of the passed predicates returns an accept_policy otherwise returns reject_policy.

namespace popef_data_usage

Functions

PopefDataUsagePredicate predProgramFlowLoad(const popef::ProgramFlow &flow, PopefDataUsagePolicy accept_policy = PopefDataUsagePolicy::USE_USER_DATA, PopefDataUsagePolicy reject_policy = PopefDataUsagePolicy::USE_POPEF_DATA_IF_ANY)

Callback predicate to filter all anchors owned by any load programs.

Parameters

flow – [in] The user model PopEF program flow (to read load program numbers from).
accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.

Returns

The anchor callback predicate.

PopefDataUsagePredicate predProgramFlowMain(const popef::ProgramFlow &flow, PopefDataUsagePolicy accept_policy = PopefDataUsagePolicy::USE_USER_DATA, PopefDataUsagePolicy reject_policy = PopefDataUsagePolicy::USE_POPEF_DATA_IF_ANY)

Callback predicate to filter all anchors owned by the main program.

Parameters

flow – [in] The user model PopEF program flow (to read main program number from).
accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.

Returns

The anchor callback predicate.

PopefDataUsagePredicate predProgramFlowSave(const popef::ProgramFlow &flow, PopefDataUsagePolicy accept_policy = PopefDataUsagePolicy::USE_USER_DATA, PopefDataUsagePolicy reject_policy = PopefDataUsagePolicy::USE_POPEF_DATA_IF_ANY)

Callback predicate to filter all anchors owned by any save programs.

Parameters

flow – [in] The user model PopEF program flow (to read main program number from).
accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.

Returns

The anchor callback predicate.

PopefDataUsagePredicate predProgramNotAssigned(PopefDataUsagePolicy accept_policy = PopefDataUsagePolicy::USE_USER_DATA, PopefDataUsagePolicy reject_policy = PopefDataUsagePolicy::USE_POPEF_DATA_IF_ANY)

Callback predicate to filter all anchors that are not assigned to any programs.

Parameters

accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.

Returns

The anchor callback predicate.

PopefDataUsagePredicate predProgramIndexes(const std::vector<popef::ProgramFlow::ProgramIndexType> &program_indexes, PopefDataUsagePolicy accept_policy = PopefDataUsagePolicy::USE_USER_DATA, PopefDataUsagePolicy reject_policy = PopefDataUsagePolicy::USE_POPEF_DATA_IF_ANY)

Callback predicate to filter all anchors owned by any of the programs passed in by the user.

Parameters

program_indexes – [in] The program indices to filter.
accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.

Returns

The anchor callback predicate.

PopefDataUsagePredicate predProgramIndexes(const std::vector<ProgramsWithPolicy<PopefDataUsagePolicy>> &accepted_programs_policies, const PopefDataUsagePolicy reject_policy = PopefDataUsagePolicy::USE_POPEF_DATA_IF_ANY)

Callback predicate to apply defined anchor callback handling policies to grouped program indices.

Parameters

accepted_programs_policies – The program indices with defined anchor callback handling policies.
reject_policy – The anchor rejection policy.

Returns

The anchor callback predicate.

PopefDataUsagePredicate predAnchorsPolicies(const std::vector<AnchorWithPolicy<PopefDataUsagePolicy>> &accepted_anchors_policies, const PopefDataUsagePolicy reject_policy = PopefDataUsagePolicy::USE_POPEF_DATA_IF_ANY)

Callback predicate to apply anchor handling policies.

Parameters

accepted_anchors_policies – [in] The anchor indices with handling policies.
reject_policy – [in] The anchor rejection policy.

Returns

The anchor callback predicate.

template<typename ...Args> PopefDataUsagePredicate orBind(PopefDataUsagePolicy accept_policy, PopefDataUsagePolicy reject_policy, Args&&... pred)

Disjunction operator.

Combines multiple predicates into one Predicate.

Parameters

accept_policy – [in] The anchor acceptance policy.
reject_policy – [in] The anchor rejection policy.
pred – [in] The predicates which will be combined by operator.

Returns

The anchor callback predicate which returns accept_policy when one of the passed predicates returns an accept_policy otherwise returns reject_policy.

10.2.2. Queue memory management

class IMemoryPool

A common interface for all memory allocators.

Subclassed by model_runtime::RingMemoryPool

Public Functions

virtual void *getMemoryBlob() = 0: Get a writable memory blob.

Note

The memory blob retains ownership of the blob’s memory and is responsible for freeing the memory.

virtual int64_t blobSize() const = 0: Get the size of the memory blob.

class RingMemoryPool : public model_runtime::IMemoryPool 

Memory pool of fixed size blobs.

Allocate the requested number of blobs at construction time and loop over the blobs every time getMemoryBlob() is called.

Public Functions

RingMemoryPool(int64_t blob_size, int64_t num_blobs)

Create a ring memory pool.

It allocates memory of num_blobs * blob_size size under the hood.

Parameters

blob_size – [in] The size of a single memory blob.
num_blobs – [in] The number of memory blobs.

int64_t numBlobs() const: Get the number of memory blobs.

virtual int64_t blobSize() const override: Get the size of a memory blob.

virtual void *getMemoryBlob() override: Get a pointer to the next blob.

Note

When the end of the memory pool is reached, the iteration starts from the beginning again.

10.2.3. Queue management

template<typename BufferType> class SpscRingBuffer

A lock-free, fixed-size, single-producer, single-consumer ring buffer implementation.

auto dst = rb.writeLock();
if (!rb.isValid()) {
 // dst is not valid: don't dereference it.
 return;
}
*dst = obj;
rb.writeComplete();

Note

writeLock() and readLock() are blocking calls which might return early if the ring buffer gets invalidated, so the user should always use isValid() after locking a buffer to check whether the returned buffer is safe to use or not.

Public Types

using ReadTimeoutCallback = std::function<void(SpscRingBuffer*)>: Signature of the function to call when readLock() times out.

Public Functions

SpscRingBuffer(const SpscRingBuffer &other) = delete

SpscRingBuffer(const SpscRingBuffer &&other) noexcept = delete

SpscRingBuffer &operator=(const SpscRingBuffer &other) = delete

SpscRingBuffer &operator=(SpscRingBuffer &&other) noexcept = delete

SpscRingBuffer(std::size_t num_buffers, const std::string &label, ReadTimeoutCallback timeout_cb = nullptr, std::chrono::nanoseconds timeout_ns = std::chrono::nanoseconds::zero())

Create a single-producer, single-consumer ring buffer.

Parameters

num_buffers – [in] The number of buffers to use in the ring buffer.
label – [in] The debug string to use in printState().
timeout_cb – [in] The function to call when a read times out.
timeout_ns – [in] The duration in nanoseconds to wait before calling the timeout callback when no read input is available. If 0, never call the callback.

~SpscRingBuffer(): Default destructor.

void write(const BufferType &obj)

Lock the ring buffer, write to it and unlock it.

Parameters: obj – [in] The buffer to be written to.

BufferType *writeLock()

Lock and return a buffer for writing.

Only one buffer can be locked for writing at any time. Calling writeLock() when a buffer is already locked will return the same buffer.

If no buffer is available, then the function will block until either:

An existing buffer becomes available.
The ring buffer is invalidated.

Note

isValid() must be used to determine whether the returned buffer is valid or not.

Returns: A buffer to write to.

void writeComplete()

Unlock the currently write-locked buffer.

Pre: A buffer must have been locked for writing using writeLock().
Post: The next time writeLock() is called, a different buffer will be returned.

const BufferType &readLock()

Lock a buffer for read access.

If no buffer is available the function will block until either:

A buffer becomes available (writeComplete() is called from another thread)
The ring buffer is invalidated.
Some buffers are read-locked and readReset() is called.

Several buffers can be locked in reading mode and each call to readLock() will return a new buffer.

If timeout_ns is greater than zero and a timeout callback was provided: when readLock() has been waiting for a buffer for longer than timeout_ns then call the callback function until a new read buffer becomes available.

Note

isValid() must be used to determine whether the returned buffer is valid or not.

void readComplete()

Unlock the oldest read-locked buffer.

Pre: A buffer must have been locked for reading using readLock().

void readReset(): All the buffers currently locked for reading are unlocked and placed back at the front of the reading queue.

bool readAvailable() const

Check whether any buffer is available to be read-locked.

@ return True if there is at least one buffer available to be read-locked, false if there are no buffers available.

void invalidate()

Invalidate the ring buffer.

All the calls after this call will become non blocking.

All the objects returned by calls on an invalidated ring buffer are invalid and should be discarded / ignored.

void reset(): Reset all ring buffer values to initial.

bool isValid() const

Check the state of the ring buffer.

Returns: True if the ring buffer is in a valid state, or false if it was invalidated.

std::string getState(const std::string &prefix) const

Debug function to print the current state of the ring buffer.

Parameters: prefix – [in] The string to prefix the state with.
Returns: The current state of the ring buffer

std::size_t numBuffers() const: Return the maximum number of elements the ring buffer can store.

const std::string &label() const: Label associated with this ring buffer.

class IQueue

Common interface implemented by various queues.

The interface describes the memory requirements of the queue and provides an interface to disconnect the queue from its data source.

Subclassed by model_runtime::InputQueue, model_runtime::OutputQueue

Public Functions

virtual ~IQueue() = default: Default destructor.

virtual const popef::TensorInfo &tensorInfo() const = 0

Get the shape and data type of a tensor.

Each buffer in the queue has the same type and shape.

Returns: The structure encapsulating the shape and data type of a tensor.

virtual int64_t numBuffers() const = 0: Get the number of buffers this queue can store.

virtual void disconnect() = 0

Disconnect the queue ring buffers: no longer wait for data.

All queue ring buffers are invalidated and immediately return from any blocking calls.

Disconnected queues can no longer be used to feed real data.

Typically disconnect() is used at shutdown to feed dummy data to the executable until it returns from its run() method and can be safely destroyed.

RingMemoryPool model_runtime::allocateQueueStorage(const IQueue &queue, int64_t extra_buffers = 0)

Allocate a memory pool large enough to back the given queue.

Parameters

queue – [in] The queue the memory pool will be used to feed.
extra_buffers – [in] The number of extra buffers to allocate in addition to the queue’s requirements.

Returns

A memory pool.

using model_runtime::ReadStartCallback = std::function<void(void)>: Signature of the function called just before the first chunk of data is about to be transferred to Poplar to build a complete input for the executable.

using model_runtime::ReadCompleteCallback = ReadStartCallback : Signature of the function called when the data for a complete model input is about to be consumed by the executable.

using model_runtime::WriteCompleteCallback = std::function<void(void)>: Signature of the function called after the data has been written.

struct InputData

Structure represents queue input data.

Public Members

const uint8_t *data = {nullptr}: Pointer to the buffer containing the data to read.

int64_t data_size = {0}: Size in bytes of the data.

ReadStartCallback readStartCallback = {nullptr}: Optional function to call just before the first chunk of data is about to be fetched to finally build a complete model input.

Note

The callback might be called more than once if the data is prefetched, then discarded and fetched again.

ReadCompleteCallback readCompleteCallback = {nullptr}: Optional function to call when the data for a complete model input is about to be consumed by the executable.

Note

The callback might be called more than once if the data is prefetched, then discarded and fetched again.

struct OutputData

Structure represents queue output data.

Public Members

uint8_t *data = {nullptr}: Where to write the output.

int64_t data_size = {0}: Amount of data to write in bytes.

WriteCompleteCallback writeCompleteCallback = {nullptr}: Optional function to call after the data has been written.

Note

Callback might be called more than once if the data is prefetched, then discarded and fetched again.

using model_runtime::InputRingBuffer = model_runtime::SpscRingBuffer<InputData>: Fixed size, single producer, single consumer ring buffer for input data.

using model_runtime::OutputRingBuffer = model_runtime::SpscRingBuffer<OutputData>: Fixed size, single producer, single consumer ring buffer for output data.

class InputQueue : public model_runtime::IQueue 

Pack or split the data passed by the user to match the amount of data expected by the executable.

For example if the model was compiled to process an input tensor of size [256, 48, 48], which means samples of 48x48 across a batch size of 256, then the application can enqueue 48x48 tensors of any batch size and this queue will take care of either regrouping the inputs into a single run or splitting them across several runs.

Note

It is the user’s responsibility to ensure the data size enqueued is a multiple of a single sample size.

Note

InputQueue cannot be instantiated directly; it is created by QueueManager.

Public Functions

InputQueue(std::shared_ptr<InputRingBuffer> buffer, const popef::TensorInfo &info)

Create an input queue.

Parameters

buffer – [in] The target ring buffer connected to a Poplar callback.
info – [in] The buffer description.

InputData *enqueueLock()

Lock a buffer for writing.

This is a blocking call and only one buffer can be locked for writing at any time. Calling enqueueLock() when a buffer is already locked will return the same buffer.

Returns: A pointer to an InputData object to fill.

void enqueueComplete()

Unlock the current write-locked buffer.

Pre: A buffer must have been locked for writing using enqueueLock().
Post: The next time enqueueLock() is called, a different buffer will be returned.

void enqueue(const void *input, int64_t data_size, ReadStartCallback read_start_callback = nullptr, ReadCompleteCallback read_complete_callback = nullptr)

Convenience method to lock a buffer for writing, fill InputData and unlock the buffer.

Note

The callback might be called more than once if the data is prefetched, then discarded and fetched again.

Parameters

input – [in] The address of the buffer containing the data. The data is not copied to a buffer, the pointer must remain valid until the data has been used by the ring buffer consumer.
data_size – [in] The size in bytes to use from the input. Must be a multiple of single sample size.
read_start_callback – [in] An optional callback to call when the data starts being read.
read_complete_callback – [in] An optional callback to call when the data read is complete.

void flush(): Flush any partial input still being created or enqueue a dummy batch.

void reset(): Reset underlying ring buffer.

virtual void disconnect() override: Parent interface method.

virtual const popef::TensorInfo &tensorInfo() const override: Parent interface method.

virtual int64_t numBuffers() const override: Parent interface method.

class OutputQueue : public model_runtime::IQueue 

Pack or split the data returned by the executable to the application’s batches.

10.2.4. Runtime management

enum class model_runtime::LaunchPolicy

Session creation policy.

Values:

enumerator Immediate: Acquire device, load executable during Session construction.

enumerator Deferred: Acquire device outside Session object, load executable inside bindToDevice() method.

struct SessionConfig

Session configuration.

Public Members

LaunchPolicy policy = LaunchPolicy::Deferred 

Session creation policy that is associated with acquiring a device.

Default: LaunchPolicy::Deferred.

PopefDataUsagePredicate pred_tensor_data = null_popef_data_usage_predicate 

Predicate for anchor callback.

This controls user callback handling for the anchor. It is not used by default.

bool check_package_hash = true

If true, the Poplar hash will be checked before the executable is loaded onto the device.

If false, this check is not done. Default: true.

DeviceWaitConfig wait_config = {}

By default Session throws an exception when it is not able to attach any device needed by the given model.

This behavior can be changed by setting a custom device wait config.

class Session

Link a model to a device.

Note

If two or more sessions share a device, runLoadPrograms() and runSavePrograms() will implicitly be called when the device gets bound or unbound to this session.

Public Types

using ProgIdxType = popef::ProgramFlow::ProgramIndexType : The index for a runnable program available in the executable.

using ProgramsAndAnchorsMap = std::map<ProgIdxType, std::vector<const popef::Anchor*>>: Mapping between program index and a vector of popef::Anchor objects appearing in that program that are available in the executable.

Public Functions

Session(const Session&) = delete

Session &operator=(const Session &other) = delete

Session(Session&&) = default: Default forwarding constructor.

Session &operator=(Session&&) = default: Default move assignment operator.

explicit Session(const std::vector<std::string> &popef_paths, const SessionConfig &config = {})

Create a new Session object.

Parameters

popef_paths – The paths to PopEF files from which the model will be loaded.
config – The session configuration.

explicit Session(std::shared_ptr<popef::Model> model, const SessionConfig &config = {})

Create a Session object.

Parameters

model – The model which will be loaded and executed on the IPU.
config – The session configuration.

~Session(): Default destructor.

void bindToDevice(std::shared_ptr<Device> device)

Bind the session to a device and load the executable onto it.

If the sessions is already bound to a device, this method first unbinds the current device before binding to the new device.

Parameters: device – [in] The wrapper around a Poplar device.

void runLoadPrograms()

Run the programs to copy the data to the device.

Note

This method is implicitly called before the first call to runMainPrograms() after the device was bound to this session.

Pre: The session must be bound to a device.

void runMainPrograms()

Run the main programs.

Note

If the device was last used by a different session, this method first unbinds the device from that session, then binds it to this session and calls runLoadPrograms() before actually running the main programs.

Pre: The session must be bound to a device.

void runSavePrograms()

Run the programs to copy the data back to the host.

Note

This method is implicitly called when the device bound to this session gets unbound.

Pre: The session must be bound to a device.

void runPrograms(const std::vector<ProgIdxType> &progs)

Run your own set of programs.

Each program will run once.

Note

This function is for advanced users who understand what the programs do during the execution, and what the result is. Remember that the order of the programs in the vector matters. Programs will be run in sequence based on their position in the vector. Please be aware that if you run programs from the main set of programs and you did not run earlier programs for loading data you might observe incorrect results. The same situation applies if you run programs for saving data before running programs from the load and main set. Therefore proper index order in the vector is really important.

Parameters: progs – [in] The set of program indices which you would like to run. Indices need to be present in the loaded popef::Model.
Pre: The session must be bound to a device.

std::shared_ptr<popef::Model> model() const: Model associated with this session.

void unloadFromDevice()

Unload the session from the device it is currently bound to.

Pre: The session must be bound to a device.

void setCallbackForAnchor(const std::string &anchor_handle, CallbackHandle callback)

Set the callback (data source/destination buffer and way of managing it) for popef::Anchor (input/output tensor).

Parameters

anchor_handle – [in] The anchor handle to which the callback will be assigned. Each popef::Anchor has a unique handle.
callback – [in] The callback to be called whenever the stream is to be read or was written to by the device. This depends on whether the callback is assigned * to an input tensor or an output tensor.

Pre

The session must be bound to a device.

void setUserOutputHandler(CallbackFactory factory, const AnchorCallbackPredicate &anchor_callback_predicate = null_anchor_callback_predicate, bool skip_connected = false)

Set up handlers for output tensors.

If the factory returns nullptr for a tensor then the existing callback remains in place.

If the factory returns a callback for a tensor which already had a callback associated with it then the existing callback is discarded and the new one is used instead.

Parameters

factory – [in] The factory that will be called once per output tensor.
anchor_callback_predicate – [in] The functor controlling user callback binding.
skip_connected – [in] If true, call the factory only for the streams which are not connected. If false, call the factory for all the streams that are required.

Pre

The session must be bound to a device.

void setUserInputHandler(CallbackFactory factory, const AnchorCallbackPredicate &anchor_callback_predicate = null_anchor_callback_predicate, bool skip_connected = false)

Set up handlers for input tensors.

If the factory returns nullptr for a tensor then the existing callback remains in place.

If the factory returns a callback for a tensor which already had a callback associated with it then the existing callback is discarded and the new one is used instead.

Parameters

factory – [in] The factory that will be called once per input tensor.
anchor_callback_predicate – [in] The functor controlling user callback binding.
skip_connected – [in] If true, call the factory only for the streams which are not connected. If false, call the factory for all the streams that are required.

Pre

The session must be bound to a device.

ProgramsAndAnchorsMap anchorsNotConnectedToCallbacks(const std::vector<ProgIdxType> &progs)

Returns anchors that are not connected to callbacks for particular programs.

Parameters: progs – [in] The list of program indices.
Returns: Map, where the key is the program index and value is the vector of anchors that have no linked callbacks for that program.

void errorIfAnchorsAreNotConnectedToCallbacks(const std::vector<ProgIdxType> &progs)

Check if all programs have connected all required anchors to the callbacks.

If there are any programs that are not connected to callbacks, then this method throws an error that lists these programs in the error message.

Parameters: progs – [in] List of program indices.

void stop()

Stop the working session.

Send the stop signal to the executable. Disconnect queues if QueueManager is bound. The device will be left in an undefined state and no more programs can be run until reload() is called.

void reload(): Load executable again on the bound device.

template<typename ...T> inline QueueManager *createQueueManager(T&&... args)

Create QueueManager.

Session takes full ownership of the created QueueManager object. The lifetime of the created QueueManager object is strictly linked with the Session lifetime.

Params should be passed in the same order as in the QueueManager constructors. See model_runtime::QueueManager class.

Returns: Access ptr to created QueueManager.

std::vector<const popef::Anchor*> getInputAnchors() const

Returns all inputs.

This includes inputs that need a user-defined callback or inputs that already have a callback defined based on data from the popef::Model.

Returns: A vector of pointers to popef::Anchor objects.

std::vector<const popef::Anchor*> getUserInputAnchors() const

Returns user inputs which are inputs that need a user-defined callback.

Returns: A vector of pointers to popef::Anchor objects.

std::vector<const popef::Anchor*> getUserOutputAnchors() const

Returns user outputs which are outputs that need a user-defined callback.

Returns: A vector of pointers to popef::Anchor objects.