7. C++ API

7.1. PopRT Compiler

class Compiler

Public Static Functions

static void compileAndExport(const std::string &model, const std::vector<std::string> &outputs, std::ostream &out, const CompilerOptions &options = CompilerOptions())

Compile model and export PopEF model to stream.

Parameters

model – [in] An ONNX model protobuf, or the name of a file containing an ONNX model protobuf.
outputs – [in] The names of output tensors.
out – [out] The stream that the compiled PopEF model will be written to.
options – [in] The user configuration options for the Compiler class. Default: CompilerOptions().

static void compileAndExport(const std::string &model, const std::vector<std::string> &outputs, const std::string &fileName, const CompilerOptions &options = CompilerOptions())

Compile model and Export PopEF to file.

Parameters

model – [in] An ONNX model protobuf, or the name of a file containing an ONNX model protobuf.
outputs – [in] The names of the output tensors.
fileName – [out] The filename that the compiled PopEF model will be written to.
options – [in] The user configuration options for the Compiler class. Default: CompilerOptions().

static std::shared_ptr<Executable> compile(const std::string &model, const std::vector<std::string> &outputs, const CompilerOptions &options = CompilerOptions())

Compile and return an executable object.

Parameters

model – [in] An ONNX model protobuf, or the name of a file containing an ONNX model protobuf.
outputs – [in] The names of the output tensors.
options – [in] The user configuration options for the Compiler class. Default: CompilerOptions().

static std::string compileAndGetSummaryReport(const std::string &model, const std::vector<std::string> &outputs, const CompilerOptions &options, bool resetProfile = true)

Compile the model and return a summary report.

Parameters

model – [in] An ONNX model protobuf, or the name of a file containing an ONNX model protobuf.
outputs – [in] The names of the output tensors.
options – [in] The user configuration options for the Compiler class. Default: CompilerOptions().
resetProfile – [in] If True, resets the execution profile. Default = True.

Returns

A string containing the report.

struct CompilerOptions

Public Members

int64_t numIpus = 1: The number of IPUs to select.

std::string ipuVersion = ""

IPU version.

Auto detect if empty.

int64_t batchesPerStep = 1: The number of batches to run on the chip before returning.

std::string partialsType = "half"

The partials type to set globally for MatMuls and convolutions.

Valid values are "float" and "half".

float availableMemoryProportion = 0.6

The value to set globally for the the available memory proportion for matmuls and convolutions.

Valid values are between 0 and 1 (inclusive) [=0.6].

int64_t numIOTiles = 0: The number of IPU tiles dedicated to I/O.

bool enableModelFusion = false

Enable model fusion.

By default, model fusion is disabled.

bool enablePrefetchDatastreams = true

Enable prefetching for input data streams.

Prefetching is enabled by default.

Poplar will speculatively read data for a stream before it is required in order to allow the ‘preparation’ of the data to occur in parallel with compute. Enabled when True. Default: True.

unsigned streamBufferingDepth = 1: Specify the default buffering depth value used for streams.

bool enablePadConvChannel = false: Custom patterns.

bool serializeIr = false

Enable (set to True) to serialize the IR.

Disabled by default.

std::string serializedIrDest = "": Destination to dump IR serialization stream.

bool enableGatherSimplifier = true

Enable (set to True) to simplify Gather op.

Enabled by default.

std::map<std::string, std::string> engineOptions: Poplar engine options.

bool showCompilationProgressBar = true: Show progress bar when compiling.

std::function<void(int, int)> compilationProgressLogger

Callback function used to indicate PopART compilation progress.

The function should not block. All calls to the callback function will be made from the main thread so blocking in the callback will block compilation from progressing.

If this logger is not set then compilation progress will be printed on the info channel.

Param int: The progress value.
Param int: The maximum value for the progress.

std::vector<std::string> customPatterns: Specify custom patterns.

std::map<std::string, std::vector<std::string>> customTransformApplierSettings: Specify custom transforms.

std::map<std::string, std::string> opaqueBlobs: Specify opaque blob messages.

bool use128BitConvUnitLoad = false: Bit-width of the conv load.

bool enableMultiStageReduce = true

If true, perform the reduction following the convolution in multiple stages if it would significantly reduce code size.

This comes at the cost of increasing the number of cycles.

bool enableFastReduce = false: Enable fast reduce.

bool enableOutlining = true

Enable (set to True) outlining.

Enabled by default.

bool groupHostSync = false

Specify to group the h2d streams at the beginning of the schedule and the d2h streams at the end of the schedule.

When True, tensors will stay live for longer.

Default: False.

bool rearrangeStreamsOnHost = false

Enable rearrangement of h2d tensors to be done on the host.

Default: False (Rearrangement done on device).

bool rearrangeAnchorsOnHost = true

Enable rearrangement of d2h tensors to be done on the host.

Default: True (Rearrangement done on host to save device memory).

float outlineThreshold = 1.0f

Specify the incremental value that a subgraph requires, relative to its nested subgraphs (if any), to be eligible for outlining.

Default: 1.0f.

bool enableNonStableSoftmax = false

Enable the non-stable softmax Poplar function.

Default: False (not enabled).

bool enablePipelining = false

Enable pipelining of virtual graphs.

Default: False (not enabled).

bool enableEngineCaching = false

Enable Poplar executable caching.

The file is saved to the location defined with cachePath.

Default: False (not enabled).

std::string cachePath = ""

Folder to save the Poplar executable to.

Default: “”.

bool constantWeights = true

Specify an optimization for an inference session to have constant weights.

Default: True.

uint64_t subgraphCopyingStrategy = 0

Specify how copies for inputs and outputs for subgraphs are lowered.

Default: popart::OnEnterAndExit.

std::string virtualGraphMode = "off"

Specify how to place ops on virtual graphs to achieve model parallelism, either manually using model annotations, or automatically.

Default: popart::VirtualGraphMode::Off.

std::vector<float> virtualGraphSplitRatios

Specify split ratios when VirtualGraphModel::Auto is enabled.

These values represent split ratios in each device and each of the values is in the range (0, 1). For example, to uniformly split the whole graph on 4 IPUs, the values should be [0.25, 0.25, 0.25, 0.25].

double timeLimitScheduler = 1e9: The maximum allowed time (in seconds) that can be spent searching for a good graph schedule before a solution must be returned.

int64_t swapLimitScheduler = static_cast<int64_t>(1e9): The maximum number of improving steps allowed by the scheduling algorithm before a solution must be returned.

int transitiveClosureOptimizationThreshold = 100000

Specify the transitive closure optimization threshold.

The transitive closure optimization pass can significantly accelerate the scheduler. It does not, in general, affect the final schedule returned. It is run between initialization with Kahn’s algorithms and the shifting swaps. The transitive closure optimization pass is O(nOps^2) and so should not be used for extremely large graphs. If a graph is above this threshold, the transitive closure optimization pass is not run.

bool createHostTransferableTensorWithOffset = false

Accumulate the created tensors bytes, rotate the start tile of the next tensor to balance the tile mapping.

Especially when there are a lot of small input tensors, enable it can avoid mapping on tile0 all the time.

Default=false.

bool enableEfficientOverlapIOTopoCons = false

Enable simplified and equivalent overlapIO constraints.

Suppose we have the N bins in each of three stage(8 for before loop /7 for insdie loop /6 for after loop), and L ops for each bins, vallina implementaiton of overlapio creates topocons of complexity O(N*N*L*L).

To make sure InitOps in each step are scheduled before HostLoadOps, we only need to keep topo constrains in each bin and let the last of op of each bin Bin0 is scheduled before the first op of Bin1 next to Bin0. Then total complexity O(N*N*L*L) is reduced to (N*L).

Default: false (not enabled).

7.2. Executable

class Executable

Executable object of the PopEF model.

Public Functions

Executable() = delete

~Executable() = default

Executable(const Executable&) = delete

Executable &operator=(const Executable &other) = delete

Executable(Executable&&) = default

Executable &operator=(Executable&&) = default

explicit Executable(const std::string &popefPath)

Create a new Executable object.

Parameters: popefPath – [in] The PopEF model file.

explicit Executable(const std::vector<std::string> &popefPaths)

Create a new Executable object.

Parameters: popefPaths – [in] The PopEF model files.

explicit Executable(const std::shared_ptr<std::istream> &in, std::streamoff size = 0)

Create a new Executable object.

Parameters

in – [in] The stream to parse.
size – [in] The number of bytes to read from the stream. If 0, parse.

Executable(std::shared_ptr<popef::Model> popefModel)

Create a new Executable object.

Parameters: popefModel – [in] The PopEF model object.

void appendOpaqueBlobs(const std::map<std::string, std::string> &opaqueBlobs): Append opaque blobs.

void appendOpaqueBlobs(const std::pair<std::string, std::string> &opaqueBlob): Append opaque blobs.

std::shared_ptr<popef::Model> getPopefModel(): Get the PopEF Model.

const std::map<std::string, std::string> &getOpaqueBlobs(): Get the opaque blobs.

7.3. PopRT Runtime

class BaseRunner

Subclassed by poprt::runtime::LightRunner, poprt::runtime::ModelRunner, poprt::runtime::PackRunner

Public Functions

BaseRunner() = default

virtual ~BaseRunner() = default

BaseRunner &operator=(const BaseRunner&) = delete

BaseRunner(BaseRunner&&) = default

virtual BaseRunner &operator=(BaseRunner&&) = default

virtual void execute(InputMemoryView &inputData, OutputMemoryView &outputData) = 0

Run a model synchronously.

The user allocates and passes pointers to output memory. Non-const MemoryView is used here to avoid possible copy by some runners.

Parameters

inputData – [in] The user-allocated tensor buffer for all executable input tensors.
outputData – [in] The user-allocated tensor buffer for all executable output tensors

virtual OutputFutureMemoryView executeAsync(InputMemoryView &inputData, OutputMemoryView &outputData) = 0

Run a model asynchronously.

The user allocates and passes pointers to output memory. Non-const MemoryView is used here to avoid possible copy by some runners.

Parameters

inputData – [in] The user-allocated tensor buffer for all executable input tensors.
outputData – [in] The user-allocated tensor buffer for all executable output tensors.

Returns

The future result of an asynchronous call for all executable output tensors.

virtual std::vector<InputDesc> getExecuteInputs() const = 0

Get a description of the input data required in the execute class methods.

Returns: A vector of DataDesc instances.

virtual std::vector<OutputDesc> getExecuteOutputs() const = 0

Get a description of the output data required in the execute class methods.

Returns: A vector of DataDesc instances.

enum class poprt::runtime::RunnerType

Values:

enumerator ModelRunner = 0

enumerator PackRunner

enumerator ModelFusionRunner

class ModelRunner : public poprt::runtime::BaseRunner 

Subclassed by poprt::runtime::ModelFusionRunner

Public Functions

ModelRunner(const ModelRunner&) = delete

ModelRunner &operator=(const ModelRunner&) = delete

ModelRunner(ModelRunner&&) = default

ModelRunner &operator=(ModelRunner&&) = default

~ModelRunner() override

ModelRunner(const std::string &popefPath, const RuntimeConfig &config = RuntimeConfig())

Create a new ModelRunner object.

Parameters

popefPath – [in] The path to PopEF files from which the model will be loaded.
config – [in] The runtime configuration.

ModelRunner(const std::shared_ptr<Executable> &executable, const RuntimeConfig &config = RuntimeConfig())

Create a new ModelRunner object.

Parameters

executable – [in] The executable which will load the model.
config – [in] The runtime configuration.

void execute(InputMemoryView &inputData, OutputMemoryView &outputData) override

OutputFutureMemoryView executeAsync(InputMemoryView &inputData, OutputMemoryView &outputData) override

std::vector<InputDesc> getExecuteInputs() const override

std::vector<OutputDesc> getExecuteOutputs() const override

class PackRunner : public poprt::runtime::BaseRunner 

Public Functions

PackRunner(const PackRunner&) = delete

PackRunner &operator=(const PackRunner&) = delete

PackRunner(PackRunner&&) = default

PackRunner &operator=(PackRunner&&) = default

~PackRunner() override

PackRunner(const std::string &popefPath, const PackRunnerConfig &config)

Create a new PackRunner object.

Parameters

popefPath – [in] The path to PopEF files from which the model will be loaded.
config – [in] The pack runner configuration.

PackRunner(const std::shared_ptr<Executable> &executable, const PackRunnerConfig &config)

Create a new PackRunner object.

Parameters

executable – [in] The Executable which will load the model.
config – [in] The pack runner configuration.

void execute(InputMemoryView &inputData, OutputMemoryView &outputData) override

OutputFutureMemoryView executeAsync(InputMemoryView &inputData, OutputMemoryView &outputData) override

std::vector<InputDesc> getExecuteInputs() const override

std::vector<OutputDesc> getExecuteOutputs() const override

struct RuntimeConfig

Public Members

RunnerType runnerType = RunnerType::ModelRunner : The type of the target runner.

DeviceWaitConfig deviceWaitConfig

By default, the model runner throws an exception when it is not able to attach to any device required by the given model.

This behavior can be changed by setting a custom DeviceWaitConfig.

bool threadSafe = true

If true, the mutex will be locked on each execution call.

If false, the mutex will not be locked. By default the model runner is not thread-safe and each replica has an independent mutex. Default: true.

std::chrono::nanoseconds timeoutNS = std::chrono::milliseconds(10)

Duration in nanoseconds to wait before calling timeout callback when the IPU is waiting for input data, which is not available.

If 0, never call the timeout, in other words, wait forever for the data.

bool validateIOParams = true

If true, the I/O parameters will be checked during the execution ModelRunner “execute” functions.

If false, this check is not done. Default: true.

uint32_t batchingDim = std::numeric_limits<uint32_t>::max()

The dimension which the input data will extend with the batch size.

For example, the PopEF model with shape [4, 4, 3, 3] and batchingDim=0, means the batch size is extended on dimension 0. Then, input data with shapes [?, 4, 3, 3] will be allowed, where ? can be [1, 2, …, N].

The default value is std::numeric_limits<uint32_t>::max(), which means dynamic batch sizing is disabled, and the input data can only be N * batch_size_of_popef_model, for example [n * 4, 4, 3, 3] for the above model, where n can be [1, 2, …, N]

bool checkPackageHash = true

If true, the Poplar hash will be checked before the executable is loaded onto the device.

If false, this check is not done. Default: true.

int64_t ringBufferSizeMultiplier = 2

The multiplier used to determine the size of the ring buffer.

The ring buffer size is given by ringBufferSizeMultiplier * batch size.

bool autoReset = false

If true, the IPU will be reset automatically before next inference when application runtime error or recoverable error of that recovery action is IPU_RESET occur.

If false, the error will be raised and no action will be token. Default: false.

bool flushOnWaitingOutputs = false

If true, only flush the dummy data when there is a user request waiting for outputs from queues.

Otherwise, wait forever for user-requested data in the callback. If false, flush the dummy data anyway.

std::chrono::nanoseconds batchSizeTimeoutNS = std::chrono::nanoseconds::max()

Duration in nanoseconds before the timeout callback is called when the IPU is waiting for input data to compose a batch to load.

If max(), the callback will never be called, in other words, the model will wait forever for the data.

std::chrono::nanoseconds dataParallelTimeoutNS = std::chrono::nanoseconds::max()

Duration in nanoseconds before the timeout callback is called when the IPU is waiting in parallel for input data.

If max(), the callback will never be called, in other words, the model will wait forever for the data.

bool isBatchSizeTimeoutEnabled = false

Indicates whether the batch size timeout is enabled.

If true, then check the #batchSizeTimeoutNS and #dataParallelTimeoutNS in the timeout callback when the IPU is waiting for input data which doesn’t arrive.

size_t requestTracepointsBufferSize = 1000

Size of the request tracepoint buffer.

If set to 0, the request tracepoint buffer size will be infinite. Note: if flush_on_waiting_outputs is set to false, no request tracepoints will not be recorded.

ReplicaIdToDevice replicaIdToDevice = {}

Mapping between replica ID and Device.

This allows you to set a specific device for a given replica. By default, the model runner assigns devices automatically.

std::pair<InputMemoryView, OutputMemoryView> flushByName

The config option that specifies the data to the callback function for flushing data that is not available.

flushByName indicates to runtime which data to flush.

If specified, a flush_callback function will be called once when the IPU has timed out after waiting for input data, which is not available.

The application which defines the flushByName function must ensure that the inputs and outputs it specifies for runtime to flush are correct. Otherwise the results of this and following inferences will be corrupted.

struct PackRunnerConfig

Public Functions

inline explicit PackRunnerConfig(int timeoutInMicroSeconds = 0, int maxValidNum = 0, std::string dynamicInputName = "", std::string unpackInfoInputName = "")

void enablePaddingRemovePattern(std::string maskName, std::vector<std::string> dynamicGroup): Used to remove padding from user based on mask.

void enableSingleRowMode(std::string maskName, std::string unpackInfoName = "", int delimiterNum = 0): Enable pack mode in which data can no across rows.

Public Members

RunnerType runnerType = RunnerType::PackRunner : The type of the target runner.

DeviceWaitConfig deviceWaitConfig

By default, the model runner throws an exception when it is not able to attach to any device required by the given model.

This behavior can be changed by setting a custom DeviceWaitConfig.

int timeoutInMicroSeconds = 0

Used to determine when to force to push the user input data into the queue, even if the PackRunner can receive more data.

The value of timeoutInMicroSeconds should be greater than 0.

int compileBatchSize = -1: Used for notifying the pack runner about the model batch size during compile time, as it may not always be obtainable from the model.

int maxValidNum = 0

maxValidNum is the maximum samples that PackRunner can reach.

PackRunner will stop pack when reached maxValidNum samples or reached the maximum space that user allowed or reached limited time.

std::string dynamicInputName = "": Dynamic sequence input name.

std::string unpackInfoInputName = "": Unpack info input name.

std::string maskName = "": Attention mask name, used to remove padding of user input.

std::vector<std::string> dynamicGroup

Used to specify group of dynamic inputs when remove padding(e.g., {input_ids, mask, token_type, position_ids}).

Fixed size input name should not be in dynamicGroup.

PackAlgorithm algo = PackAlgorithm::NextFit

bool disableDataAcrossRows = false: User input cannot across rows in this pack mode.

bool enablePaddingRemove = false: Remove pad from user.

int delimiterNum = 0: Used to insert delimiter before pack.

bool enablePrefetchDatastreams = true: is prefetch datastreams

struct DeviceWaitConfig

Public Functions

DeviceWaitConfig() = default

inline DeviceWaitConfig(int timeoutSec, int sleepTimeSec)

Public Members

std::chrono::seconds timeoutSec = std::chrono::seconds(1): The time in seconds to wait for a device.

std::chrono::seconds sleepTimeSec = std::chrono::seconds(6): The time in seconds between attach attempts.

struct DataDesc

The description of data used by ModelRunner.

Public Functions

DataDesc(std::string name, int64_t sizeInBytes, std::vector<int64_t> shape, popef::DataType dataType, bool popefContainsTensorData = false)

Create description of input/output data.

Parameters

name – [in] The name of the input/output tensor.
sizeInBytes – [in] The size of the tensor measured in bytes.
shape – [in] A vector defining the shape of the tensor. The size of the vector is equal to the number of tensor dimensions. Each element of the vector indicates the size of a single dimension.
dataType – [in] The data type of a single tensor element.
popefContainsTensorData – [in] If true, the model has a tensor data blob associated with the tensor. If false, the model does not have a tensor data blob associated with the tensor. Default: false.

Public Members

std::string name: The name of the input/output tensor.

int64_t sizeInBytes: The size of the tensor measured in bytes.

std::vector<int64_t> shape

A vector defining the shape of the tensor.

The size of the vector is equal to the number of tensor dimensions. Each element of the vector indicates the size of a single dimension.

popef::DataType dataType: The data type of a single tensor element.

bool popefContainsTensorData

If true, the model has a tensor data blob associated with the tensor.

If false, the model does not have a tensor data blob associated with the tensor.

using poprt::runtime::InputDesc = DataDesc : Description of input data required by ModelRunner.

using poprt::runtime::OutputDesc = DataDesc : Description of output data required by ModelRunner.

using poprt::runtime::InputMemoryView = std::unordered_map<std::string, ConstTensorMemoryView>

Mapping between a tensor name and an immutable memory view.

Used as input to ModelRunner::execute.

using poprt::runtime::OutputMemoryView = std::unordered_map<std::string, TensorMemoryView>

Mapping between a tensor name and a memory view.

Used as output from ModelRunner::execute, when the output memory is allocated and managed by the ModelRunner client.

using poprt::runtime::OutputFutureMemoryView = std::unordered_map<std::string, std::shared_future<TensorMemoryView>>

Mapping between a tensor name and a future memory view.

Used as output from ModelRunner::executeAsync, when the output memory is allocated and managed by the ModelRunner client.

struct ConstTensorMemoryView

Immutable view to already allocated memory.

Public Functions

ConstTensorMemoryView() = default: Default constructor.

ConstTensorMemoryView(const TensorMemoryView &other): Default copy constructor.

ConstTensorMemoryView(const void *data, uint64_t dataSizeBytes)

Immutable view to const memory.

Parameters

data – [in] The pointer to the allocated memory.
dataSizeBytes – [in] The size of the memory block, in bytes.

Public Members

const void *data = nullptr: Pointer to the allocated memory.

uint64_t dataSizeBytes = 0: The size of the memory block, in bytes.

struct TensorMemoryView

Mutable view to already allocated memory.

Public Functions

TensorMemoryView() = default: Default constructor.

TensorMemoryView(void *data, uint64_t dataSizeBytes)

Immutable view to memory.

Parameters

data – [in] The pointer to the allocated memory.
dataSizeBytes – [in] The size of the memory block, in bytes.

Public Members

void *data = nullptr: The pointer to the allocated memory.

uint64_t dataSizeBytes = 0: The size of the memory block, in bytes.

struct TensorMemory

Tensor memory manager responsible for allocating, storing, sharing and releasing tensor memory.

Public Functions

TensorMemory() = default: Default constructor.

TensorMemory(int64_t dataSizeBytes)

Allocate the user-requested memory block and store it in a shared_ptr.

Parameters: dataSizeBytes – [in] The size of the memory block, in bytes.

TensorMemoryView getView(): Get the mutable memory view.

ConstTensorMemoryView getConstView(): Get the immutable memory view.

ConstTensorMemoryView getView() const: Get the immutable memory view.

Public Members

uint64_t dataSizeBytes = 0: The size of the memory block, in bytes.

std::shared_ptr<void> data = nullptr: Pointer to the allocated memory.

7.3.1. DeviceManager

class Device

Public Functions

Device(const Device&) = delete

Device &operator=(const Device &other) = delete

Device(Device&&) = default

Device &operator=(Device&&) = default

Device(std::shared_ptr<model_runtime::Device>)

~Device()

std::string ipuVersion() const: Get the version of the device.

const std::shared_ptr<model_runtime::Device> device() const: Get model_runtime::Device.

class DeviceManager

Public Functions

DeviceManager(const DeviceManager&) = delete

DeviceManager &operator=(const DeviceManager &other) = delete

DeviceManager(DeviceManager&&) = default

DeviceManager &operator=(DeviceManager&&) = delete

DeviceManager()

~DeviceManager()

std::string ipuHardwareVersion(): Get the version of the IPU on the physical system.

std::size_t getNumDevices() const: Get the number of devices attached to this host.

std::shared_ptr<Device> getDevice(int64_t numIpus)

Get a device matching the requested configuration.

Parameters: numIpus – [in] The number of IPUs the device must contain.
Returns: The device, if attachment is successful, otherwise throw an error.

std::shared_ptr<Device> getSpecificDevice(int64_t deviceId)

Get a specific device matching the requested configuration.

Parameters: deviceId – [in] The ID of the device to acquire.
Returns: The device, if attachment is successful, otherwise throw an error.

Search help

7. C++ API

7.1. PopRT Compiler

7.2. Executable

7.3. PopRT Runtime

7.3.1. DeviceManager