2. API reference

Knowing the details of the Triton backend API is crucial to understanding the comments and implementation below.

2.1. BackendState

class BackendState

Backend state: this state is shared amongst all the models and model states.

Public Functions

BackendState() = default

BackendState(const char *options_buf, size_t options_len)

Public Members

std::mutex poplar_init_mutex: The mutex that the model states need to acquire before using the non-thread safe parts of Poplar (for example, poplar::Engine creation).

bool check_package_hash = true: Flag controlling if the loaded PopEF is checked against hash compatibility with the active Poplar SDK.

2.2. ModelState

class ModelState : public BackendModel

State associated with a model that is using this backend.

An object of this class is created and associated with each TRITONBACKEND_Model.

A model state is shared among all the instances of a model.

Public Functions

~ModelState()

void ValidateModelConfig(): Validate that model configuration is supported by this backend.

Note

This method must be called before calling any of the other methods from this class.

inline uint64_t ExecDelay() const

For Triton testing: delay in milliseconds to wait before processing the requests.

Value comes from the Triton ‘execute_delay_ms’ parameter.

Returns: Execution delay in ms [miliseconds]
Pre: Only available after ValidateModelConfig() has been called.

inline uint64_t DelayMultiplier() const

For Triton testing: delay multiplier per instance.

That is, total_delay = max(DelayMultiplier() * InstanceId(), 1) * ExecDelay.

Value comes from the Triton instance_wise_delay_multiplier parameter.

Returns: Delay multiplier value
Pre: Only available after ValidateModelConfig() has been called.

const std::map<std::string, popef::TensorInfo> &ModelInputs() const

Map of input tensors details.

Returns

Map:

key (std::string): input tensor name
value (popef::TensorInfo): input tensor details

Pre

Only available after ValidateModelConfig() has been called.

const std::map<std::string, popef::TensorInfo> &ModelOutputs() const

Map of output tensors details:

Returns

Map:

key (std::string): output tensor name
value (popef::TensorInfo): output tensor details

Pre

Only available after ValidateModelConfig() has been called.

const std::set<std::string> &OutputNames() const

Model outputs’ names.

Returns: Set of outputs’ names strings
Pre: Only available after ValidateModelConfig() has been called.

std::unique_ptr<popef::Model> LoadModel() const

Create a new instance of the popef::Model.

Note

The PopEF models are not thread safe, each model instance should own its own model.

Returns: Pointer (unique_ptr) to model
Pre: Only available after ValidateModelConfig() has been called.

void CreationDelay(): For Triton testing: block the thread based on the value of the Triton ‘creation_delay_sec’ parameter.

BackendState *Backend(): Return the backend associated with this model.

pvti::Graph &GetGraph(const std::string &name, const std::string &unit = "")

Get or create a pvti graph.

Parameters

name – [in] The graph name. This name will be prefixed by the model name.
unit – [in] The graph unit.

Returns

Reference to Graph

std::chrono::nanoseconds TimeoutInNanoseconds() const

Time to wait before flushing the pipeline when there are no inputs left in the queues but some requests are still in flight.

Returns: Timeout in ns [nanoseconds]

void IncrementInstanceCount(): Increment the number of instances.

int InstanceCount() const

Return the current number of instances.

Returns: Number of instances

bool UseSynchronousExecution() const

If true, then the backend should wait until all the requests responses have been sent before returning from TRITONBACKEND_ModelInstanceExecute.

Returns: Synchronous execution usage flag

Public Static Functions

static void Create(TRITONBACKEND_Model *triton_model, BackendState *backend, ModelState **state)

Static method to use to create a ModelState.

Parameters

triton_model – [in] Model the state is associated with.
backend – [in] The backend the state is associated with.
state – [out] The state to initialise.

2.3. InputBuffers

using triton::backend::poplar::InputBuffers = std::map<std::string, const void*>: Map input names to data buffers.

2.4. OutputBuffers

using triton::backend::poplar::OutputBuffers = std::map<std::string, void*>: Map output names to data buffers.

2.5. ModelInstanceState

class ModelInstanceState : public BackendModelInstance

State associated with an instance of a model.

An object of this class is created and associated with each TRITONBACKEND_ModelInstance.

Public Functions

~ModelInstanceState()

inline ModelState *StateForModel() const

Get the state of the model that corresponds to this instance.

Returns: Pointer to model’s state

inline int InstanceId() const

Get the ID of the instance.

Returns: ID

void EnqueueRequest(const InputBuffers &inputs, size_t batch_size, const std::function<void(void)> &callback = nullptr)

Enqueue a request.

Note

The callback might be called multiple times (if the input is prefetched and the prefetch is later discarded).

Parameters

inputs – [in] Buffers to use for the inputs. (Must contain a buffer for each input.)
batch_size – [in] Batch size of the request.
callback – [in] Optional callback to call when the first input is being read.

Pre

Acquire the EnqueueRequest lock.

Post

For each call to EnqueueRequest() a corresponding call to SetRequestOutputs() must be made.

std::mutex &QueuesMutex()

Mutex protecting the queues: must be acquired before calling EnqueueRequest().

Returns: Mutex

void SetRequestOutputs(const OutputBuffers &outputs, size_t batch_size, const std::function<void(void)> &callback = nullptr)

Set the corresponding output buffers’ pointers for the last enqueued request.

Parameters

outputs – [in] Buffers to use for the outputs. (Must contain a pointer for each output, if a pointer is nullptr then that output will be discarded.)
batch_size – [in] Batch size of the request.
callback – [in] Optional callback to call once after the last output has been written.

void AddValuesToGraph(int64_t batch_size, double exec_ns, double compute_ns)

Add a set of measurements for a collection of requests to the PVTI graph.

Parameters

batch_size – [in] Total batch size of the collection.
exec_ns – [in] Average execution time for a single batch.
compute_ns – [in] Average compute time for a single batch.

void UnlockSessionThread(): Unlock session thread, which gets blocked after timeout callback.

void UpdateLastCollection(std::weak_ptr<RequestCollection> requests): Keep track of the last collection: as long as this weak_ptr is valid it means some requests are still in flight.

Public Static Functions

static void CreateAndRun(ModelState *model_state, TRITONBACKEND_ModelInstance *triton_model_instance, ModelInstanceState **state)

Allocate a model instance state and start SessionRunThread.

Parameters

model_state – [in] The model that the instance is associated with.
triton_model_instance – [in] Triton instance of the model.
state – [out] Backend instance of the model.

static void CreateEmptyState(ModelState *model_state, TRITONBACKEND_ModelInstance *triton_model_instance, ModelInstanceState **state)

Allocate empty model instance state. Empty means that no SessionRunThread is created.

Parameters

model_state – [in] The model that the instance is associated with.
triton_model_instance – [in] Triton instance of the model.
state – [out] Backend instance of the model.

2.6. RequestCollection

class RequestCollection

Represents a collection of requests received by one call to TRITONBACKEND_ModelInstanceExecute

Each lambda object pushed to the Session queue that is executing a request holds a reference to the collection and releases it after the request has sent a response.

Once all the requests are completed, the collection releases all the memory associated with them and reports statistics for the entire batch to the server.

Public Functions

explicit RequestCollection(uint64_t request_count, uint64_t exec_start_ns, ModelInstanceState *instance_state)

Initialise a collection of requests.

Parameters

request_count – [in] Reserve memory for this number of requests.
exec_start_ns – [in] Timestamp for when the backend received this collection of requests. If 0: don’t report stats for this collection.
instance_state – [in] Instance state this collection of requests is associated with.

Post

addRequest() and addBatch() must both be called request_count times.

template<typename ...Args> inline Request &addRequest(Args&&... args)

void addBatch(uint64_t batch_size) noexcept: Must be called once per request in the collection to indicate the batch size of each individual request.

~RequestCollection()

Implicitly called when the last request in the collection releases its reference to the collection.

Release the memory used by the requests and responses in the collection and report the statistics for this collection of requests to the server.

2.7. Request

class Request

Store a Triton request and its associated response.

Note

The ownership of both the Triton request and the Triton response are transferred to this object which will take care of releasing them once the response has been sent.

Public Functions

Request(TRITONBACKEND_Response *response, TRITONBACKEND_Request *request, uint32_t request_index, Latch *latch = nullptr)

Constructor

Parameters

response – [in] Triton response associated to this request. Ownership is transferred to this object.
request – [in] Incoming Triton request. Ownership is transferred to this object.
request_index – [in] Index of this request in the batch received by the backend.
latch – [in] Optional latch to notify once all the requests’ responses have been sent.

~Request()

Request(const Request&) = delete: Prevent requests from being copied as they contain raw pointers which are deleted on destruction.

Request &operator=(const Request &other) = delete

Request &operator=(Request &&other)

Request(Request&&)

bool errorResponseSent() const noexcept: Return true if an error response has been sent.

void getInput(const std::string &name, const popef::TensorInfo &info, int64_t *batch_size, const void **input_buffer)

Get the buffer and batch size of the requested input

Parameters

name – [in] Name of the input to retrieve.
info – [in] Tensor shape of the requested input.
batch_size – [inout] Batch size, if batch_size[0] == 0, then it will be populated with the input buffer’s batch size. Otherwise an error will be sent if the batch size doesn’t match the input buffer’s batch size.
input_buffer – [out] Will be set to the buffer containing the input data.

Post

errorResponseSent() will return true if an error occurred.

void parseRequestInfo(uint32_t num_inputs_expected, const std::set<std::string> &allowed_output_names)

Parse the request and ensure it has the correct number of inputs / outputs.

Parameters

num_inputs_expected – [in] Number of inputs the model expects.
allowed_output_names – [in] The list of outputs the user is allowed to request.

Post

errorResponseSent() will return true if the number of inputs or the requested outputs are invalid.

int64_t batchSize() const

Return the request’s batch size.

Returns: Batch size

void *getOutputBuffer(const std::string &name, const popef::TensorInfo &info, int64_t batch_size)

Allocate and return a pointer to a buffer for the given request.

Note

The response owns the returned buffer and therefore the returned pointer becomes invalid once the response is sent.

Parameters

name – [in] Name of the output we need a buffer for.
info – [in] Tensor shape of the output to allocate.
batch_size – [in] Batch size of the output to allocate.

Returns

A pointer to a buffer or nullptr if an error occurred.

void sendResponseSuccess(): Send a response indicating the request was successfully processed.

void startComputeTimer()

Start the compute timer.

Must be called just before the IPU starts processing the request.

void stopComputeTimer()

Stop the compute timer.

Must be called immediately after the IPU is done processing the request.

Pre: startComputeTimer() must have been called.

void reportStatistics(TRITONBACKEND_ModelInstance *instance)

Report the statistics related to this request.

Pre: startComputeTimer(), stopComputeTimer() and sendResponseSuccess() must have been called.

uint64_t getStartComputeTime() const noexcept

Return the start compute timestamp.

Returns: Start compute timestamp
Pre: startComputeTimer() must have been called.

uint64_t getEndComputeTime() const noexcept

Return the end compute timestamp.

Returns: End compute timestamp
Pre: stopComputeTimer() must have been called.

2.8. Tracepoint

class Tracepoint : public Tracepoint 

RAII class to create pvti tracepoints.

Public Functions

explicit Tracepoint(const std::string &label)

~Tracepoint() = default

2.9. Latch

class Latch

Public Functions

explicit Latch(int count)

Latch(const Latch&) = delete

Latch(Latch&&) = delete

Latch &operator=(const Latch &other) = delete

Latch &operator=(Latch &&other) = delete

~Latch() = default

void notify()

void wait()

Search help

2. API reference

2.1. BackendState

2.2. ModelState

2.3. InputBuffers

2.4. OutputBuffers

2.5. ModelInstanceState

2.6. RequestCollection

2.7. Request

2.8. Tracepoint

2.9. Latch