4. Sessions
The Session
class forms the basis of the low-level Model Runtime API, which gives you more flexibility but requires more knowledge about PopEF files, the Poplar runtime and
the IPU hardware.
Note
The term “device” is used to indicate an abstraction of one or more physical IPUs that can execute code. See Section 5, Managing devices for more information.
The Session
class is a composition of objects that:
binds to a device (or unbinds from it on demand) (Section 4.1, Creating a session),
uploads the user model to the device (also re-uploads and unloads from the device) (Section 4.2, Uploading your model to the IPU),
sets up handlers for input and output tensors used by the uploaded model (Section 4.3, Handlers for model tensors),
runs programs defined in the executable representing the model (Section 4.4, Running programs),
stops the executable currently running on the device (Section 4.4, Running programs),
explicitly checks if all
Anchor
objects defined for the model have their callbacks connected (Section 4.5, Retrieving information from a session)Note
The PopEF
Anchor
class defines named model entry and exit points representing model tensors, used to transfer data from the host to the IPU and back.provides a factory method (
createQueueManager()
) to create an internalQueueManager
that simplifies and optimizes management of data transfers to and from the device (Section 4.6, Managing queues of tensor data).
4.1. Creating a session
To create a Session
object, you need a model
(created by your program or manually loaded from a PopEF file):
auto reader = std::make_shared<popef::Reader>();
popef::ModelBuilder builder(reader);
std::shared_ptr<popef::Model> model = builder.createModel();
model_runtime::Session session(model);
You can also use a set of paths to PopEF files
storing the model (and its metadata) — in this case Session
will handle loading the PopEF model internally:
const std::vector<std::string> popef_paths { paths, to, popef, files };
model_runtime::Session session(popef_paths);
4.1.1. Session configuration options
Session
class constructors also accept a second
argument (config
) that lets you set up the following creation and runtime
options:
check_package_hash
: Flag to indicate whether or not to compare your model version and the active Poplar runtime version.Note
By default, the
check_package_hash
configuration option is set to true and this is the recommended setting. If there is a mismatch between the versions of your model and the active Poplar runtime, an exception is thrown from theSession
constructor (unless you setcheck_package_hash
to false). By default, Model Runtime does not provide compatibility if the active version of Poplar in your system is different from the version used to compile and store the model in PopEF.policy
: The policy associated with the device acquisition step.Session
needs aDevice
object to communicate with the IPU. This object may get created in different ways, depending on thepolicy
set up in the configuration:Immediate
: Themodel_runtime::Device
object gets created automatically in theSession
constructor with the help of theSession
object’smodel_runtime::DeviceManager
object. The selected device is the physical IPU or group of IPUs suitable for running the model. To fine-tune theDevice
creation process, anotherSessionConfig
field may be set up,wait_config
.Note
If there is no IPU hardware suitable for running the model in your system, an exception is thrown and the
Session
object is not constructed.Deferred
(default):Session
does not create aDevice
object. You need to do it explicitly and bind the createdSession
object to the device by calling thebindToDevice()
method.Session
needs a boundDevice
object to operate.Note
If a
Device
is successfully acquired with theImmediate
creation policy, theDevice
is owned by theSession
object that created it.Session
controls the lifetime of theDevice
object, including its proper destruction. In theDeferred
mode, you are responsible for the proper destruction of theDevice
object.
wait_config
: By default,Session
throws an exception if it is not able to attach to any device suitable for the given model. This behaviour can be changed by settingDeviceWaitConfig
:strategy
: Controls how the session waits for device availability; it may beNO_WAIT
, the default setting,WAIT_WITH_TIMEOUT
, which throws an exception after elapsed timeout, orWAIT_FOREVER
.timeout
: TheWAIT_WITH_TIMEOUT
parameter, in seconds.sleepTime
: The sleep time between consecutive device attach attempts, in seconds.
For example, if you wish to wait up to 5 seconds for the device to become available (
timeout
) and you wish to check its availability every second (sleepTime
), you can set the following configuration:using namespace std::chrono_literals; const DeviceWaitConfig wait_up_to_5s_config = { DeviceWaitStrategy::WAIT_WITH_TIMEOUT, // device waiting strategy. 5s, // timeout 1s}; // sleep time
pred_tensor_data
: A control mechanism for the use of tensor data sources. More details on tensor data transfer can be found in Section 4.3, Handlers for model tensors.There are two policies available (see
PopefDataUsagePolicy
):USE_POPEF_DATA_IF_ANY
: Bind the tensor or feed data toAnchor
if it exists in the PopEF files you have specified.USE_USER_DATA
: Do not bind the tensor or feed data toAnchor
. Enforce binding of the user-defined callback.
4.1.2. Session predicates
To simplify the use of predicates, Model Runtime has a set of
predicate producer functions in the
predicate_factory
namespace.
For example, if you want to provide your own values for the tensors loaded during the execution of Load programs instead of any tensor data that PopEF makes available for them (for example to load your own model weights), the following predicate factory method may be used to create the required predicate:
// Assuming there is a model (popef::Model) prepared/loaded in advance
const auto &programFlow = model.metadata.programFlow();
const auto accept_policy =
PopefDataUsagePolicy::USE_USER_DATA; // bind a user callback to every Anchor
// that is "owned" by Load programs
const auto reject_policy =
PopefDataUsagePolicy::USE_POPEF_DATA_IF_ANY; // skip assigning callback to remaining Anchors
const auto user_data_for_load_program_tensors_predicate =
model_runtime::predicate_factory::popef_data_usage::predProgramFlowLoad(
programFlow, accept_policy, reject_policy);
The complete setup of SessionConfig
may look as follows:
// Assuming we wait_up_to_5s_config (DeviceWaitConfig) defined before
model_runtime::SessionConfig session_config; // instantiate SessionConfig object
session_config.policy =
model_runtime::LaunchPolicy::Immediate; // acquire device immediately
session_config.pred_tensor_data =
user_data_for_load_program_tensors_predicate;// use the predicate created
session_config.check_package_hash = true; // perform version compatibility check
session_config.wait_config = wait_up_to_5s_config;
4.2. Uploading your model to the IPU
With the Device
and poplar::Engine
objects, Session
has all it needs to reserve the
necessary IPUs, upload the model code to that device, and set up all the
transfer channels for exchange of model tensor data.
In the Immediate
mode, all these steps take place in the
Session
constructor (assuming the
Device
object was created properly) and all you
have to do is ensure the availability of suitable and properly configured IPUs
in the system.
Note
As Immediate
is not the default
SessionConfig
mode, you have to
explicitly configure Session
to use it.
// Assuming there is a model (popef::Model) prepared/loaded in advance
model_runtime::SessionConfig config{model_runtime::LaunchPolicy::Immediate};
model_runtime::Session session(model, config);
In the Deferred
mode, you are responsible for creating the Device
object (using
DeviceManager
) and for binding the Session
instance to the device using the
bindToDevice()
method.
model_runtime::DeviceManager devices;
// Acquire a device suitable for the model
auto device = devices.getDevice(model);
// Create Session object, by default in deferred device creation mode
model_runtime::Session session(model);
// Explicitly bind the session to the device
session.bindToDevice(device);
Note
If the session is already connected to a device and running a Main program, then attempting to bind it to a new device will start the execution of the Save program. The session will wait until the Save program execution ends before binding to the new device.
Note
You can call bindToDevice()
with the same device as is currently being used by the session. To successfully bind to a device, all the steps, including reloading the executable onto the
device, have to take place.
You can also explicitly unload the session from the Device
it is bound to using the unloadFromDevice()
method, which throws an exception if the session isn’t bound to a device. As with rebinding to another device, the Save program may be run (see Section 4.7, Verification for more details).
To bind the session again to the previously set up Device
object and upload the
poplar::Executable
to it, you can use reload()
.
// Assuming there is a session (Session) already created and the user provided device
session.bindToDevice(device);
// ...
session.unloadFromDevice(); // unload the session from the device
// ...
session.reload(); // rebind the session to the device and reload the executable
4.3. Handlers for model tensors
User models operate on two types of tensors: state and input/output.
State tensors are, for example, model weights or other model parameters that have to be transferred to the IPU once (conceptually during Load program execution) before the execution of any computation.
Tensors transferred out of the IPU in Save programs also fall into the state tensor category.
Input or output tensors have to be transferred to the IPU from the host for a computational round to take place. They are fetched during the execution of the Main program.
Note
Refer to PopEF file analysis for how to display information about model tensors.
One possible way of storing state tensor data is to compile them
as constants into the IPU code representing your model. In this case, when
the Session
object uploads the executable onto its bound device, the tensor
data is also transferred to the IPU memory, so no other transfers are needed.
Another way is to save the tensor data inside the model’s PopEF file. By default, if there is tensor or feed data
for a particular tensor in PopEF, when binding to a device, Session
sets up a
transfer stream between the tensor and its tensors or feed data. When the running
IPU program requests the transfer of the tensor data from the host, it gets
exactly the bytes stored in the connected tensor or feed data.
To override this default behaviour and select which tensors the
corresponding tensor or feed data is to be used for (or not used), you can
prepare a predicate: a function returning the desired anchor callback policy for the Anchor
passed in.
There are three policies defined in AnchorCallbackPolicy
:
BIND_USER_CB
: Do not bind the tensor or feed data toAnchor
. Enforce binding of the user-defined callback.
BIND_EMPTY_CB
: Bind an empty callback. No tensor or feed data or user data will be transferred to the IPU. In the case of an input tensor, the Poplar runtime transfers random bytes from its memory. For an output tensor, the data transferred from the IPU is not accessible by your model. This callback policy may be especially helpful in the case of tensors that are not of interest to you, for example tensors returned during the execution of the Save program (see Section 4.3, Handlers for model tensors for more about Save programs).
SKIP_CB
: Skip without changing theAnchor
object’s callback binding.
// Assuming there is a model (popef::Model) prepared/loaded in advance
const auto &programFlow = model.metadata.programFlow();
const auto accept_policy =
AnchorCallbackPolicy::BIND_EMPTY_CB; // bind an empty callback to every Anchor
// that is "owned" by Save programs
const auto reject_policy =
AnchorCallbackPolicy::SKIP_CB; // skip assigning callback to remaining Anchors
const auto empty_cb_for_save_program_tensors_predicate =
model_runtime::predicate_factory::anchor_callbacks::predProgramFlowSave(
programFlow, accept_policy, reject_policy);
The last option for feeding data to or from model tensors is to set up user
callbacks (CallbackHandle
). A tensor callback is a
functor that accepts a pointer to the memory on which it is to operate. In the
case of tensors that are to be uploaded to the IPU, the functor’s task is to copy the
tensor data to the address in the pointer. For tensors transferred from the IPU
to the host, the functor’s task is to copy the data from the memory to your
model’s structure (or generally to do any other operation on the data it has
access to).
Note
As the tensor data of each model may be transferred to and from different
memory locations, a separate callback is needed for each
Anchor
representing a particular tensor.
To assign a callback to an Anchor
, you can use the
setCallbackForAnchor()
method.
// Assuming there is a session (Session) already created
const std::string input_anchor_handle =
"some_input_anchor_handle"; // input anchor handle string - can be read from PopEF
const auto input_callback = // input tensor data callback
[&user_space_for_input_tensor](void *dest) {
auto &io = user_space_for_input_tensor;
std::memcpy(dest, io.data(), io.size() * sizeof(float)); // copy data from user memory location to dest
printf("Copying input from host to IPU"); // perform extra operations - for example print to console
};
const std::string output_anchor_handle =
"some_output_anchor_handle"; // output anchor handle string - can be read from PopEF
const auto output_callback = [&user_space_for_output_tensor](void *src) {
auto &io = user_space_for_input_tensor;
std::memcpy(io.data(), src, io.size() * sizeof(float)); // copy data from src to the user memory location
printf("Copying output from IPU to host");
};
// Assign callbacks for the selected Anchors
session.setCallbackForAnchor(input_anchor_handle, input_callback);
session.setCallbackForAnchor(output_anchor_handle, output_callback);
CallbackFactory
gathers all the callbacks and
dispatches them per Anchor
. This is a functor that accepts
CallbackInfo
(that stores an Anchor
)
and returns a callback specific to that Anchor
.
To assign a user CallbackFactory
to handle model tensor data transfers,
you have to use setUserInputHandler()
(for tensors transferred from host to IPU) or
setUserOutputHandler()
(for tensors transferred from IPU to host). Both accept two extra arguments —
anchorCallbackPredicate
(a predicate described in Section 4.1, Creating a session) and skip_connected
(simplifies skipping Anchor
objects with their handlers already set up).
// Assuming there is a session (Session) already created
const auto output_factory = [](const model_runtime::CallbackInfo &ci) {
const std::string& name = ci.anchor.name();
if (name == "some_anchor_handle") {
const auto some_anchor_handle_callback = [](void *src) {
// do some operations on src
};
return some_anchor_handle_callback;
} else if (name == "some_other_anchor_handle") {
const auto some_other_anchor_handle_callback = [](void *src) {
// do some operations on dst
};
return some_other_anchor_handle_callback;
} else {
const auto just_print_out_callback = [&name](void *) {
std::cout << "Callback for " << name << " called";
};
return just_print_out_callback;
}
};
const auto &predicate = empty_cb_for_save_program_tensors_predicate;
const bool skip_connected = true; // default value for skip_connected parameter
session.setUserOutputHandler(output_factory, predicate, skip_connected);
Note
It is worth emphasising that Model Runtime supports Poplar remote memory
buffers
without you having to do anything extra. If the model has any
Anchor
objects configured to use remote buffers,
Session
will configure them properly. You simply refer to such Anchor
objects in the same way as
any other anchors .
4.4. Running programs
After creating a properly configured Session
object, binding a proper Device
object to it and
setting up model tensor data handlers, everything is ready to run the first
program.
PopEF groups Poplar programs into three types: Load, Main and Save. Each group can consist of several Poplar programs that will be executed in the programmed sequence when the group is run.
The purpose of the groups is as follows:
Programs in the Load group realize all the pre-computational tasks that are supposed to be executed before the model computation. For example, triggering transfer of state tensor data from the host (see Section 4.3, Handlers for model tensors) or generating tensor data directly on the IPU (for example RandomGen).
Programs in the Main group (in most cases there is a single Main program) perform the chain of computations representing the computational model. During execution, input and output tensors are transferred between the host and the IPU.
Programs in the Save group are very rarely used in the case of inference models. Their purpose is to transfer the model state tensor data from the IPU to the host (not to be confused with model outputs).
Note
Save programs are normally executed during model training to download the updated weights and cycle counters. They are rarely used in inference, but are still handled by Model Runtime.
By using the Session
class, you can easily run a
program group or a single program on the device bound to the session:
// Assuming there is a session (Session) already created
session.runLoadPrograms();
session.runMainPrograms();
session.runSavePrograms();
const std::vector programs_numbers = {1, 2, 3}; // to be read from PopEF
session.runPrograms(program_numbers);
Note
Refer to PopEF file analysis for how to display information about model programs.
To stop a running program, you can call the
stop()
method. However, it is worth noting
that this method does more than just stop execution of the program.
Internally, poplar::Engine::stop()
is
called which leaves the device in an undefined state. Also, if there is a
QueueManager
bound to the session, all its queues get disconnected.
4.5. Retrieving information from a session
To help you prepare callbacks for tensors defined in the model, Session
provides a set of getter functions which return Anchor
objects that refer to your model’s tensors.
Note
The functions getUserInputAnchors()
and
getUserOutputAnchors()
return
Anchor
objects that you have to define callbacks for.
getInputAnchors()
returns all
Anchor
objects, including those bound to a data source of
PopEF origin (Section 4.3, Handlers for model tensors).
// Assuming there is a session already created
const int MAX_SIZE = 1024; // max size of a single model output
std::vector<const popef::Anchor *> output_anchors = session.getUserOutputAnchors();
const size_t number_of_outputs = output_anchors.size();
// prepare outputs buffer, based on the number of the model outputs
std::vector<std::array<std::byte, SIZE>> outputs (number_of_outputs);
// prepare a helper object for mapping anchors names to the user memory slots
std::map<std::string, std::byte*> handles_to_outputs_map;
for (int idx = 0; idx < number_of_outputs; idx++) {
const std::string &anchor_name = output_anchors[idx].name();
handles_to_idx_map.emplace(anchor_name, outputs[idx].data());
}
// create a callback factory
const auto output_factory = [](const model_runtime::CallbackInfo &ci) {
const std::string &name = ci.anchor.name();
const popef::TensorInfo::ShapeDimType size = ci.anchor.tensorInfo().sizeInBytes();
const std::byte* dst = handles_to_idx_map(name);
// based on Tensor name the proper slot in the user buffer is used
const auto cb = [dst, size](void *src) {
std::memcpy(dst, src, size);
};
return cb;
};
The Session
class also provides the anchorsNotConnectedToCallbacks()
and
errorIfAnchorsAreNotConnectedToCallbacks()
functions that let you manually check if all the model-defined anchors (“pointing” to the model tensors) are connected to callbacks.
4.6. Managing queues of tensor data
An important function provided by the Session
class
is the builder function:
createQueueManager()
. It lets you initiate a
Session
managed
QueueManager
object that creates and controls a set
of tensor data queues, storing pointers to the dedicated user-memory slots. Because
the management of the input and output buffers is driven asynchronously by the
IPU program execution flow and completely managed by
QueueManager
, you can focus on other operations
(like data pre- or post-processing) and enqueue inputs and outputs in the
preferred way.
// Assuming there is a session (Session) already created
model_runtime::QueueManager *queue_manager = session.createQueueManager();
A detailed description of QueueManager
can be found in Section 4.3, Handlers for model tensors.
Note
Session
takes full ownership of the created QueueManager
object. The lifetime of the created QueueManager
is closely related to the Session
lifetime.
4.7. Verification
Session
performs parameter and state verification in all its functions.
For example, on creation, Session
checks if the model you passed was
saved with popef::Metadata::replicationFactor()
greater than 1 (see Replicated graphs),
which is not handled directly. See Section 3.2, Replication for more details on how ModelRunner
handles replication. If the replication factor is not equal to 1, Session
throws an exception.
The correctness of the model being loaded is also checked. For example, if the model contains tensors which are configured to use remote buffers, but there is no tensor or feed data for these tensors present in the model’s PopEF file, an exception is thrown.
Another set of verification steps takes place in the “run program” methods:
runLoadPrograms()
checks if theExecutable
has already been loaded, which means the selected device has been bound to a session and theExecutable
has been loaded onto it.If this condition is not satisfied,
runLoadPrograms()
performs the proper steps to reach the correct state.In addition, the
runLoadPrograms()
verifies if you have connected callbacks to all the modelAnchor
objects. If not, it throws an exception with details of the exact error.runMainPrograms()
performs the same verification steps asrunLoadPrograms()
and additionally checks ifrunLoadPrograms()
has already been executed. If not,runLoadPrograms()
gets called.Note
Calling
runMainPrograms()
has a side effect that impacts on theSession
closing phase. It calls therunSavePrograms()
method before unloading the session from the device (seeunloadFromDevice()
) if you did not explicitly call this method before.runSavePrograms()
andrunPrograms()
perform the same verifications asrunLoadPrograms()
.Note
Session
methodsrunLoadPrograms()
,runMainPrograms()
,runSavePrograms()
andrunPrograms()
may throw an exception if there are still modelAnchor
objects without connected callbacks.
The methods setUserInputHandler()
and
setUserOutputHandler()
check if the executable was already loaded, and load it if
needed.
An important verification step is also performed in the
stop()
method. It checks if your model
called stop after the session was activated (the device is ready and an executable was loaded onto it), and only then triggers stop on Executable
.
It also disconnects all the tensor data queues managed by QueueManager
if
any were bound to the session.