4. Session
The Session
class is a composition of objects that
provide the set of functionalities needed to:
bind to a given IPU device (or unbind from it on demand) (Section 4.1, Creating a session),
upload user model onto the device (also re-upload and unload from device) (Section 4.2, Uploading user model onto IPU),
set up handlers for input and output tensors used by the uploaded model ( Section 4.3, Handlers for model tensors,
run programs defined in the executable representing the user model (Section 4.4, Running programs),
stop the executable already running on the device (Section 4.4, Running programs),
explicitly check if all
Anchor
objects defined for the model have their callbacks connected (Section 4.5, Retrieving information from Session),Note
The PopEF
Anchor
class defines named model entry and exit points, used to transfer data from the host to the IPU and back, representing model tensors.provide the factory method (
createQueueManager()
) to create an internalQueueManager
that simplifies and optimizes management of data transfers to and from the device (Section 4.6, Managing queues of tensor data).
4.1. Creating a session
To create a Session
class object, you need a model (created by your program
or manually loaded from a PopEF file):
auto reader = std::make_shared<popef::Reader>();
popef::ModelBuilder builder(reader);
std::shared_ptr<popef::Model> model = builder.createModel();
model_runtime::Session session(model);
You can also use a set of paths to PopEF files
storing the model (and its metadata) — in this case Session
will handle loading the PopEF model internally:
const std::vector<std::string> popef_paths { paths, to, popef, files };
model_runtime::Session session(popef_paths);
Session
class constructors also accept a second
argument (config
) that lets you set up the following creation and runtime
options:
check_package_hash
: flag to indicate whether or not to compare the user model version and the active Poplar runtime version.Note
By default, the
check_package_hash
configuration option is set to true and this is the recommended setting. If your model and Poplar runtime versions mismatch, an exception is thrown from theSession
constructor (unlesscheck_package_hash
was set to false). By default, Model Runtime does not provide compatibility if the version of Poplar in your system is different from the version used to compile and store a model to the PopEF.policy
: the policy associated with the device acquisition step.Session
needs aDevice
object to “talk” to the IPU. This object may get created in different ways, depending on thepolicy
set up in the configuration:Immediate
: themodel_runtime::Device
gets created automatically in theSession
constructor with the help of theSession
object’smodel_runtime::DeviceManager
object. The selected device is the physical IPU partition suitable for running the model. To fine-tune theDevice
creation process, anotherSessionConfig
field may be set up,wait_config
.
Note
If there is no IPU device suitable to run the model in your system, an exception is thrown and the
Session
object is not constructed.Deferred
(default):Session
does not create aDevice
. You need to do it explicitly and bind the createdSession
to the device by calling thebindToDevice()
method. Without the device being bound to theSession
, it cannot operate.
Note
If
Device
is successfully acquired with theImmediate
creation policy, theDevice
is owned by theSession
object it was created by.Session
controls the lifetime of the device, including the proper destruction of theDevice
object.Note
In the
Deferred
mode, you are responsible for proper destruction of theDevice
object.
wait_config
: by default,Session
throws an exception if it is not able to attach to any device suitable for the given model. This behavior can be changed by settingDeviceWaitConfig
:strategy
: controls how the session waits for device availability; it may beNO_WAIT
, the default setting,WAIT_WITH_TIMEOUT
, throws an exception after elapsed timeout, orWAIT_FOREVER
.timeout
: theWAIT_WITH_TIMEOUT
parameter, in seconds.sleepTime
: the sleep time between consecutive device attach attempts, in seconds.
For example, if you wish to wait up to 5 seconds for the device to become available (
timeout
) and you wish to check its availability every second (sleepTime
), you can set the following configuration:using namespace std::chrono_literals; const DeviceWaitConfig wait_up_to_5s_config = { DeviceWaitStrategy::WAIT_WITH_TIMEOUT, // device waiting strategy. 5s, // timeout 1s}; // sleep time
pred_tensor_data
: a control mechanism for the use of tensor data sources. More details on tensor data transfer can be found in Section 4.3, Handlers for model tensors.There are two policies available (see:
PopefDataUsagePolicy
):USE_POPEF_DATA_IF_ANY
: bind the tensor or feed data toAnchor
if it exists in the PopEF files specified by the user.USE_USER_DATA
: do not bind the tensor or feed data toAnchor
. Enforce binding of the user-defined callback.
To simplify the use of predicates, Model Runtime delivers a set of
predicate producer functions gathered under the
predicate_factory
namespace.
For example, if you want to provide your own values for the tensors loaded during Load programs execution instead of any tensor data that PopEF delivers for them (for example to load your own model weights), the following predicate factory method may be used to create the required predicate:
// Assuming there is a model (popef::Model) prepared/loaded in advance
const auto &programFlow = model.metadata.programFlow();
const auto accept_policy =
PopefDataUsagePolicy::USE_USER_DATA; // bind a user callback to every Anchor
// that is "owned" by Load programs
const auto reject_policy =
PopefDataUsagePolicy::USE_POPEF_DATA_IF_ANY; // skip assigning callback to remaining Anchors
const auto user_data_for_load_program_tensors_predicate =
model_runtime::predicate_factory::popef_data_usage::predProgramFlowLoad(
programFlow, accept_policy, reject_policy);
The complete setup of SessionConfig
may look as follows:
// Assuming we wait_up_to_5s_config (DeviceWaitConfig) defined before
model_runtime::SessionConfig session_config; // instantiate SessionConfig object
session_config.policy =
model_runtime::LaunchPolicy::Immediate; // acquire device immediately
session_config.pred_tensor_data =
user_data_for_load_program_tensors_predicate;// use the predicate created
session_config.check_package_hash = true; // perform version compatibility check
session_config.wait_config = wait_up_to_5s_config;
4.2. Uploading user model onto IPU
With the Device
and Poplar Engine
objects, Session
has all the tools to reserve the IPU device for its usage,
upload the code representing the user model onto the reserved IPU and set up all
the transfer channels for exchange of model tensor data.
In the case of the Immediate
mode, all these steps take place in the
Session
constructor (assuming the
Device
object was created properly) and all you
have to ensure is availability of suitable and properly configured IPU devices
in the system.
Note
As the Immediate
is not the default
SessionConfig
mode, you have to
explicitly configure Session
to use it.
// Assuming there is a model (popef::Model) prepared/loaded in advance
model_runtime::SessionConfig config{model_runtime::LaunchPolicy::Immediate};
model_runtime::Session session(model, config);
In the Deffered
mode, you are responsible for creating the Device
(using
DeviceManager
) and binding the Session
to the device using the
bindToDevice()
method.
model_runtime::DeviceManager devices;
auto device = devices.getDevice(model); // acquire device suitable for the model
model_runtime::Session session(model); // create Session object, by default in Deffered Device creation mode
session.bindToDevice(device); // explicitly bind the session to the device
Note
This however has a consequence. If the Session
is already connected to a device and is running a Main program, then running
of Save programs will get triggered. Session
will wait until the Save program execution ends before binding to the new
device.
Note
There is no difference if the device you call
bindToDevice()
with again is
the same as the one currently used by the session instance. To perform a
proper binding, all the steps, including reloading the executable onto the
device, have to take place.
You can also explicitly unload the Session
from the Device
it was bound to, using the unloadFromDevice()
method (if there was no device bound, unloadFromDevice()
throws an error). As with rebinding to another device, the Save program may be run (see Section 4.7, Verification for more details).
To bind the session again to the previously set up Device
and upload the
poplar::Executable
onto it, you can use reload()
.
// Assuming there is a session (Session) already created and the user provided device
session.bindToDevice(device);
// ...
session.unloadFromDevice(); // unload the session from the device
// ...
session.reload(); // rebind the session to the device and reload the executable
4.3. Handlers for model tensors
User models operate on two types of tensors: state and input/output.
State tensors are, for example, model weights or other model parameters that have to be put on the IPU once (conceptually during Load program execution) before the execution of computations.
Note
Tensors transferred out of the IPU in Save programs also fall in the state tensors category.
Input/output tensors have to be transferred to the IPU from the host for a computational round to take place. They are fetched during the execution of the Main program.
Note
For instruction on how to determine the information about model tensors, refer to PopEF file analysis.
One possible way of storing state tensor data is to compile them
as constants into the IPU assembler code representing the user model. In this case, when
the Session
object uploads the executable onto its bound Device
, the tensor
data is also transferred to the IPU memory, so no other transfers are needed.
Another way is to save the tensor data inside the model’s PopEF file. By default, if there is tensor or feed Data
for a particular tensor in PopEF, when binding to a device, Session
sets up a
transfer stream between the tensor and its tensors or feed data. When the running
IPU program requests the transfer of the tensor data from the host, it gets
exactly the bytes stored in the connected tensor or feed data.
To override this default behavior and select which tensors the
corresponding tensor or feed data is to be used for (or not used), you can
prepare a predicate: a function returning the desired
AnchorCallbackPolicy
for the Anchor
passed in.
There are three policies available (see: AnchorCallbackPolicy
):
BIND_USER_CB
: do not bind the tensor or feed data toAnchor
. Enforce binding of the user-defined callback.
BIND_EMPTY_CB
: bind an empty callback. No tensor or feed data or user data will be transferred to the IPU. The Poplar runtime just transfers random bytes from its memory (in the case of an input tensor; if the tensor is an output, the data transferred from the IPU is not accessible by the user). This option may be especially helpful in the case of tensors that are not of interest to the user, for example tensors returned during the execution of the Save program (for more details about Save programs refer to Section 4.3, Handlers for model tensors).
SKIP_CB
: skip without changing theAnchor
object’s callback binding.
// Assuming there is a model (popef::Model) prepared/loaded in advance
const auto &programFlow = model.metadata.programFlow();
const auto accept_policy =
AnchorCallbackPolicy::BIND_EMPTY_CB; // bind an empty callback to every Anchor
// that is "owned" by Save programs
const auto reject_policy =
AnchorCallbackPolicy::SKIP_CB; // skip assigning callback to remaining Anchors
const auto empty_cb_for_save_program_tensors_predicate =
model_runtime::predicate_factory::anchor_callbacks::predProgramFlowSave(
programFlow, accept_policy, reject_policy);
The last option for feeding data to or from model tensors is to set up user
callbacks (CallbackHandle
). A tensor callback is a functor that
accepts a pointer to memory on which it is to operate. In the case of tensors
that are to be uploaded onto the IPU, its task is to copy the tensor data to
the address in the pointer. For tensors transferred from the IPU to the host,
the functor’s task is to copy the data from the memory to the
user structure (or generally to do any other operation on the data it has access
to).
Note
As the tensor data of each model may be transferred to and from different
memory locations, there is a separate callback needed for each
Anchor
representing a particular tensor.
To assign a callback to an Anchor
, you can use the
setCallbackForAnchor()
method.
// Assuming there is a session (Session) already created
const std::string input_anchor_handle =
"some_input_anchor_handle"; // input anchor handle string - can be read from PopEF
const auto input_callback = // input tensor data callback
[&user_space_for_input_tensor](void *dest) {
auto &io = user_space_for_input_tensor;
std::memcpy(dest, io.data(), io.size() * sizeof(float)); // copy data from user memory location to dest
printf("Copying input from host to IPU"); // perform extra operations - for example print to console
};
const std::string output_anchor_handle =
"some_output_anchor_handle"; // output anchor handle string - can be read from PopEF
const auto output_callback = [&user_space_for_output_tensor](void *src) {
auto &io = user_space_for_input_tensor;
std::memcpy(io.data(), src, io.size() * sizeof(float)); // copy data from src to the user memory location
printf("Copying output from IPU to host");
};
// Assign callbacks for the selected Anchors
session.setCallbackForAnchor(input_anchor_handle, input_callback);
session.setCallbackForAnchor(output_anchor_handle, output_callback);
CallbackFactory
gathers all the callbacks and
dispatches them per Anchor
. This is a functor that accepts
CallbackInfo
(that stores an Anchor
)
and returns a callback specific to that Anchor
.
To assign a user CallbackFactory
to handle model tensor data transfers,
you have to use setUserInputHandler()
(for tensors transferred from host to IPU) or
setUserOutputHandler()
(for tensors transferred from IPU to host). Both accept two extra arguments —
anchorCallbackPredicate
(which is a predicate described in Section 4.1, Creating a session) and skip_connected
that
simplifies skipping Anchor
objects with their handlers already set up.
// Assuming there is a session (Session) already created
const auto output_factory = [](const model_runtime::CallbackInfo &ci) {
const std::string& name = ci.anchor.name();
if (name == "some_anchor_handle") {
const auto some_anchor_handle_callback = [](void *src) {
// do some operations on src
};
return some_anchor_handle_callback;
} else if (name == "some_other_anchor_handle") {
const auto some_other_anchor_handle_callback = [](void *src) {
// do some operations on dst
};
return some_other_anchor_handle_callback;
} else {
const auto just_print_out_callback = [&name](void *) {
std::cout << "Callback for " << name << " called";
};
return just_print_out_callback;
}
};
const auto &predicate = empty_cb_for_save_program_tensors_predicate;
const bool skip_connected = true; // default value for skip_connected parameter
session.setUserOutputHandler(output_factory, predicate, skip_connected);
Note
It is worth emphasising that Model Runtime supports Poplar Remote
memory buffers.
From the user perspective, this feature is transparent. If the model has any Anchor
objects configured to use the remote buffers, Session
will properly
configure them under the hood. The user refers to such Anchor
objects just as for regular ones.
4.4. Running programs
After creating a properly configured Session
object, binding a proper
Device
to it and setting up model tensor data handlers, everything is ready
to run the first program.
PopEF groups Poplar Programs into three groups: Load, Main and Save. Each group can consist of several Poplar programs that will be executed in the programmed sequence when the group is run. Predestination of the groups is as following:
Load programs realize all the pre-computational tasks that are supposed to be executed before the model computation like: triggering transfer of state tensor data from the host (see Section 4.3, Handlers for model tensors), generating tensor data directly on the IPU (for example see: Randomgen) or others.
Main programs (in most cases there is a single Main program) perform the chain of computations representing the user computational model. During the execution, input and output tensors are transferred between the host and the IPU.
Save programs are very rarely used in the case of inference models. Their main purpose is to transfer the model state tensor data from the IPU to the host (not to be confused with model outputs).
Note
Save programs are normally executed during model training, to download the updated weights and cycle counters. They are rarely used in inference, but still handled by Model Runtime.
By using the Session
API, you can easily run a
Program group or a single Program on the device bound to the session:
// Assuming there is a session (Session) already created
session.runLoadPrograms();
session.runMainPrograms();
session.runSavePrograms();
const std::vector programs_numbers = {1, 2, 3}; // to be read from PopEF
session.runPrograms(program_numbers);
Note
For instructions on how to determine the information about model programs, refer to PopEF file analysis.
To stop the running program, you can call the
stop()
method. However, it is worth noting,
that the stopping process performs more operations than just program execution
stop. Internally, poplar::Engine::stop()
is
called which leaves the device in an undefined state. Also, if there is any
QueueManager
bound to the session, all its queues get disconnected.
4.5. Retrieving information from Session
To help you prepare callbacks for in-model defined tensors, Session
provides a set of getter functions, returning Anchor
objects that refer to the user model tensors.
Note
The functions: getUserInputAnchors()
and
getUserOutputAnchors()
return
Anchor
objects that you have to define callbacks for,
while getInputAnchors()
returns all
Anchor
objects, including those bound to a data sourc of
PopEF origin (Section 4.3, Handlers for model tensors).
// Assuming there is a session (Session) already created
const int MAX_SIZE = 1024; // max size of a single model output
std::vector<const popef::Anchor *> output_anchors = session.getUserOutputAnchors();
const size_t number_of_outputs = output_anchors.size();
// prepare outputs buffer, based on the number of the model outputs
std::vector<std::array<std::byte, SIZE>> outputs (number_of_outputs);
// prepare a helper object for mapping anchors names to the user memory slots
std::map<std::string, std::byte*> handles_to_outputs_map;
for (int idx = 0; idx < number_of_outputs; idx++) {
const std::string &anchor_name = output_anchors[idx].name();
handles_to_idx_map.emplace(anchor_name, outputs[idx].data());
}
// create a callback factory
const auto output_factory = [](const model_runtime::CallbackInfo &ci) {
const std::string &name = ci.anchor.name();
const popef::TensorInfo::ShapeDimType size = ci.anchor.tensorInfo().sizeInBytes();
const std::byte* dst = handles_to_idx_map(name);
// based on Tensor name the proper slot in the user buffer is utilized
const auto cb = [dst, size](void *src) {
std::memcpy(dst, src, size);
};
return cb;
};
The Session
class also provides functions
that let you manually check if all the model-defined anchors (“pointing” to the model tensors) were connected to callbacks:
anchorsNotConnectedToCallbacks()
and
errorIfAnchorsAreNotConnectedToCallbacks()
.
4.6. Managing queues of tensor data
An important function provided by the Session
class
is the builder function:
createQueueManager()
. It lets you initiate a
Session
managed
QueueManager
object that creates and controls a set
of tensor data queues, storing pointers to the dedicated user-memory slots. As
the management of the input and output buffers is driven asynchronously by the
IPU program execution flow and completely managed by
QueueManager
, the user can focus on other operations
(like data pre- or postprocessing) and enqueue inputs and outputs in the
preferred way.
// Assuming there is a session (Session) already created
model_runtime::QueueManager *queue_manager = session.createQueueManager();
A detailed description of QueueManager
can be found in Section 4.3, Handlers for model tensors.
Note
Session
takes full ownership of the created QueueManager
object. The lifetime of the created QueueManager
is closely related to the Session
lifetime.
4.7. Verification
Session
performs parameter and state verification in all its functions.
For example, on creation, Session
checks if the model passed by the user was
saved with popef::Metadata::replicationFactor()
greater than 1 (see Replicated graphs).
which is not handled directly. See model_runtime::ModelRunner
replication handling for more details. If the condition is not satisfied, it throws an exception.
The correctness of the model loaded is checked as well. If, for example, any of the model tensors are configured to use remote buffers, but there is no tensor or feed data for the tensor present in the model’s PopEF file, an appropriate exception is thrown.
Another set of verification steps take place in the “run program” methods:
runLoadPrograms()
checks if theExecutable
was already loaded, which means the selectedDevice
has aSession
bound and theExecutable
has been loaded onto it. If this condition is not satisfied,runLoadPrograms()
performs the proper steps to reach the correct state. In addition, the method verifies if the user has connected callbacks to all the modelAnchor
objects. If not, it throws an exception with details of the exact error.runMainPrograms()
performs the same verification steps asrunLoadPrograms()
, plus it checks ifrunLoadPrograms()
was already executed. If not,runLoadPrograms()
gets called.Note
Calling
runMainPrograms()
has a side effect impacting theSession
closing phase: it calls therunSavePrograms()
method before unloading theSession
from theDevice
(see cpp:func:~model_runtime::Session::unloadFromDevice
) if this method was not called explicitly by the user before.runSavePrograms()
andrunPrograms()
perform the same verifications asrunLoadPrograms()
.Note
Session
methodsrunLoadPrograms()
,runMainPrograms()
,runSavePrograms()
andrunPrograms()
may throw an exception if there are any modelAnchor
without connected callbacks left.
In case of methods: setUserInputHandler()
,
setUserOutputHandler()
the check is pretty
simple. The methods verify if the executable was loaded before and load it if
needed.
An important verification step is also performed in the
stop()
method. It checks if the user
calls stop after the Session
was activated, so the Device
is ready and
an executable was loaded onto it and only then triggers stop on the Executable
.
It also disconnects all the tensor data queues managed by QueueManager
if
any were bound to Session
.