4. Session

The Session class is a composition of objects that provide the set of functionalities needed to:

bind to a given IPU device (or unbind from it on demand) (Section 4.1, Creating a session),
upload user model onto the device (also re-upload and unload from device) (Section 4.2, Uploading user model onto IPU),
set up handlers for input and output tensors used by the uploaded model ( Section 4.3, Handlers for model tensors,
run programs defined in the executable representing the user model (Section 4.4, Running programs),
stop the executable already running on the device (Section 4.4, Running programs),
explicitly check if all Anchor objects defined for the model have their callbacks connected (Section 4.5, Retrieving information from Session),

Note

The PopEF Anchor class defines named model entry and exit points, used to transfer data from the host to the IPU and back, representing model tensors.
provide the factory method (createQueueManager()) to create an internal QueueManager that simplifies and optimizes management of data transfers to and from the device (Section 4.6, Managing queues of tensor data).

4.1. Creating a session

To create a Session class object, you need a model (created by your program or manually loaded from a PopEF file):

auto reader = std::make_shared<popef::Reader>();
popef::ModelBuilder builder(reader);
std::shared_ptr<popef::Model> model = builder.createModel();

model_runtime::Session session(model);

You can also use a set of paths to PopEF files storing the model (and its metadata) — in this case Session will handle loading the PopEF model internally:

const std::vector<std::string> popef_paths { paths, to, popef, files };

model_runtime::Session session(popef_paths);

Session class constructors also accept a second argument (config) that lets you set up the following creation and runtime options:

check_package_hash: flag to indicate whether or not to compare the user model version and the active Poplar runtime version.

Note

By default, the check_package_hash configuration option is set to true and this is the recommended setting. If your model and Poplar runtime versions mismatch, an exception is thrown from the Session constructor (unless check_package_hash was set to false). By default, Model Runtime does not provide compatibility if the version of Poplar in your system is different from the version used to compile and store a model to the PopEF.
policy: the policy associated with the device acquisition step.

Session needs a Device object to “talk” to the IPU. This object may get created in different ways, depending on the policy set up in the configuration:
- Immediate: the model_runtime::Device gets created automatically in the Session constructor with the help of the Session object’s model_runtime::DeviceManager object. The selected device is the physical IPU partition suitable for running the model. To fine-tune the Device creation process, another SessionConfig field may be set up, wait_config.
Note

If there is no IPU device suitable to run the model in your system, an exception is thrown and the Session object is not constructed.
- Deferred (default): Session does not create a Device. You need to do it explicitly and bind the created Session to the device by calling the bindToDevice() method. Without the device being bound to the Session, it cannot operate.
Note

If Device is successfully acquired with the Immediate creation policy, the Device is owned by the Session object it was created by. Session controls the lifetime of the device, including the proper destruction of the Device object.

Note

In the Deferred mode, you are responsible for proper destruction of the Device object.

wait_config: by default, Session throws an exception if it is not able to attach to any device suitable for the given model. This behavior can be changed by setting DeviceWaitConfig:
- strategy: controls how the session waits for device availability; it may be NO_WAIT, the default setting, WAIT_WITH_TIMEOUT, throws an exception after elapsed timeout, or WAIT_FOREVER.
- timeout: the WAIT_WITH_TIMEOUT parameter, in seconds.
- sleepTime: the sleep time between consecutive device attach attempts, in seconds.
For example, if you wish to wait up to 5 seconds for the device to become available (timeout) and you wish to check its availability every second (sleepTime), you can set the following configuration:
```
using namespace std::chrono_literals;
const DeviceWaitConfig wait_up_to_5s_config = {
  DeviceWaitStrategy::WAIT_WITH_TIMEOUT, // device waiting strategy.
  5s,                                    // timeout
  1s};                                   // sleep time
```

pred_tensor_data: a control mechanism for the use of tensor data sources. More details on tensor data transfer can be found in Section 4.3, Handlers for model tensors.

There are two policies available (see: PopefDataUsagePolicy):
- USE_POPEF_DATA_IF_ANY: bind the tensor or feed data to Anchor if it exists in the PopEF files specified by the user.
- USE_USER_DATA: do not bind the tensor or feed data to Anchor. Enforce binding of the user-defined callback.

To simplify the use of predicates, Model Runtime delivers a set of predicate producer functions gathered under the predicate_factory namespace.

For example, if you want to provide your own values for the tensors loaded during Load programs execution instead of any tensor data that PopEF delivers for them (for example to load your own model weights), the following predicate factory method may be used to create the required predicate:

// Assuming there is a model (popef::Model) prepared/loaded in advance
const auto &programFlow = model.metadata.programFlow();
const auto accept_policy =
  PopefDataUsagePolicy::USE_USER_DATA;               // bind a user callback to every Anchor
                                                     // that is "owned" by Load programs
const auto reject_policy =
  PopefDataUsagePolicy::USE_POPEF_DATA_IF_ANY;       // skip assigning callback to remaining Anchors

const auto user_data_for_load_program_tensors_predicate =
    model_runtime::predicate_factory::popef_data_usage::predProgramFlowLoad(
        programFlow, accept_policy, reject_policy);

The complete setup of SessionConfig may look as follows:

// Assuming we wait_up_to_5s_config (DeviceWaitConfig) defined before
model_runtime::SessionConfig session_config;   // instantiate SessionConfig object
session_config.policy =
  model_runtime::LaunchPolicy::Immediate;      // acquire device immediately
session_config.pred_tensor_data =
  user_data_for_load_program_tensors_predicate;// use the predicate created
session_config.check_package_hash = true;      // perform version compatibility check
session_config.wait_config = wait_up_to_5s_config;

4.2. Uploading user model onto IPU

With the Device and Poplar Engine objects, Session has all the tools to reserve the IPU device for its usage, upload the code representing the user model onto the reserved IPU and set up all the transfer channels for exchange of model tensor data.

In the case of the Immediate mode, all these steps take place in the Session constructor (assuming the Device object was created properly) and all you have to ensure is availability of suitable and properly configured IPU devices in the system.

Note

As the Immediate is not the default SessionConfig mode, you have to explicitly configure Session to use it.

// Assuming there is a model (popef::Model) prepared/loaded in advance
model_runtime::SessionConfig config{model_runtime::LaunchPolicy::Immediate};
model_runtime::Session session(model, config);

In the Deffered mode, you are responsible for creating the Device (using DeviceManager) and binding the Session to the device using the bindToDevice() method.

model_runtime::DeviceManager devices;
auto device = devices.getDevice(model); // acquire device suitable for the model

model_runtime::Session session(model);  // create Session object, by default in Deffered Device creation mode

session.bindToDevice(device);           // explicitly bind the session to the device

Note

This however has a consequence. If the Session is already connected to a device and is running a Main program, then running of Save programs will get triggered. Session will wait until the Save program execution ends before binding to the new device.

Note

There is no difference if the device you call bindToDevice() with again is the same as the one currently used by the session instance. To perform a proper binding, all the steps, including reloading the executable onto the device, have to take place.

You can also explicitly unload the Session from the Device it was bound to, using the unloadFromDevice() method (if there was no device bound, unloadFromDevice() throws an error). As with rebinding to another device, the Save program may be run (see Section 4.7, Verification for more details).

To bind the session again to the previously set up Device and upload the poplar::Executable onto it, you can use reload().

// Assuming there is a session (Session) already created and the user provided device
session.bindToDevice(device);
// ...
session.unloadFromDevice(); // unload the session from the device
// ...
session.reload();           // rebind the session to the device and reload the executable

4.3. Handlers for model tensors

User models operate on two types of tensors: state and input/output.

State tensors are, for example, model weights or other model parameters that have to be put on the IPU once (conceptually during Load program execution) before the execution of computations.

Note

Tensors transferred out of the IPU in Save programs also fall in the state tensors category.

Input/output tensors have to be transferred to the IPU from the host for a computational round to take place. They are fetched during the execution of the Main program.

Note

For instruction on how to determine the information about model tensors, refer to PopEF file analysis.

One possible way of storing state tensor data is to compile them as constants into the IPU assembler code representing the user model. In this case, when the Session object uploads the executable onto its bound Device, the tensor data is also transferred to the IPU memory, so no other transfers are needed.

Another way is to save the tensor data inside the model’s PopEF file. By default, if there is tensor or feed Data for a particular tensor in PopEF, when binding to a device, Session sets up a transfer stream between the tensor and its tensors or feed data. When the running IPU program requests the transfer of the tensor data from the host, it gets exactly the bytes stored in the connected tensor or feed data.

To override this default behavior and select which tensors the corresponding tensor or feed data is to be used for (or not used), you can prepare a predicate: a function returning the desired AnchorCallbackPolicy for the Anchor passed in.

There are three policies available (see: AnchorCallbackPolicy):

BIND_USER_CB: do not bind the tensor or feed data to Anchor. Enforce binding of the user-defined callback.

BIND_EMPTY_CB: bind an empty callback. No tensor or feed data or user data will be transferred to the IPU. The Poplar runtime just transfers random bytes from its memory (in the case of an input tensor; if the tensor is an output, the data transferred from the IPU is not accessible by the user). This option may be especially helpful in the case of tensors that are not of interest to the user, for example tensors returned during the execution of the Save program (for more details about Save programs refer to Section 4.3, Handlers for model tensors).

SKIP_CB: skip without changing the Anchor object’s callback binding.

// Assuming there is a model (popef::Model) prepared/loaded in advance
const auto &programFlow = model.metadata.programFlow();
const auto accept_policy =
  AnchorCallbackPolicy::BIND_EMPTY_CB; // bind an empty callback to every Anchor
                                       // that is "owned" by Save programs
const auto reject_policy =
  AnchorCallbackPolicy::SKIP_CB;       // skip assigning callback to remaining Anchors

const auto empty_cb_for_save_program_tensors_predicate =
  model_runtime::predicate_factory::anchor_callbacks::predProgramFlowSave(
    programFlow, accept_policy, reject_policy);

The last option for feeding data to or from model tensors is to set up user callbacks (CallbackHandle). A tensor callback is a functor that accepts a pointer to memory on which it is to operate. In the case of tensors that are to be uploaded onto the IPU, its task is to copy the tensor data to the address in the pointer. For tensors transferred from the IPU to the host, the functor’s task is to copy the data from the memory to the user structure (or generally to do any other operation on the data it has access to).

Note

As the tensor data of each model may be transferred to and from different memory locations, there is a separate callback needed for each Anchor representing a particular tensor.

To assign a callback to an Anchor, you can use the setCallbackForAnchor() method.

// Assuming there is a session (Session) already created
const std::string input_anchor_handle =
  "some_input_anchor_handle";  // input anchor handle string - can be read from PopEF
const auto input_callback =    // input tensor data callback
[&user_space_for_input_tensor](void *dest) {
  auto &io = user_space_for_input_tensor;
  std::memcpy(dest, io.data(), io.size() * sizeof(float)); // copy data from user memory location to dest
  printf("Copying input from host to IPU"); // perform extra operations - for example print to console
};
const std::string output_anchor_handle =
  "some_output_anchor_handle";  // output anchor handle string - can be read from PopEF
const auto output_callback = [&user_space_for_output_tensor](void *src) {
  auto &io = user_space_for_input_tensor;
  std::memcpy(io.data(), src, io.size() * sizeof(float));  // copy data from src to the user memory location
  printf("Copying output from IPU to host");
};

// Assign callbacks for the selected Anchors
session.setCallbackForAnchor(input_anchor_handle, input_callback);
session.setCallbackForAnchor(output_anchor_handle, output_callback);

CallbackFactory gathers all the callbacks and dispatches them per Anchor. This is a functor that accepts CallbackInfo (that stores an Anchor) and returns a callback specific to that Anchor.

To assign a user CallbackFactory to handle model tensor data transfers, you have to use setUserInputHandler() (for tensors transferred from host to IPU) or setUserOutputHandler() (for tensors transferred from IPU to host). Both accept two extra arguments — anchorCallbackPredicate (which is a predicate described in Section 4.1, Creating a session) and skip_connected that simplifies skipping Anchor objects with their handlers already set up.

// Assuming there is a session (Session) already created
const auto output_factory = [](const model_runtime::CallbackInfo &ci) {
    const std::string& name = ci.anchor.name();
    if (name == "some_anchor_handle") {
      const auto some_anchor_handle_callback = [](void *src) {
        // do some operations on src
      };
      return some_anchor_handle_callback;
    } else if (name == "some_other_anchor_handle") {
      const auto some_other_anchor_handle_callback = [](void *src) {
        // do some operations on dst
      };
      return some_other_anchor_handle_callback;
    } else {
      const auto just_print_out_callback = [&name](void *) {
        std::cout << "Callback for " << name << " called";
      };
      return just_print_out_callback;
    }
};
const auto &predicate = empty_cb_for_save_program_tensors_predicate;
const bool skip_connected = true;  // default value for skip_connected parameter
session.setUserOutputHandler(output_factory, predicate, skip_connected);

Note

It is worth emphasising that Model Runtime supports Poplar Remote memory buffers. From the user perspective, this feature is transparent. If the model has any Anchor objects configured to use the remote buffers, Session will properly configure them under the hood. The user refers to such Anchor objects just as for regular ones.

4.4. Running programs

After creating a properly configured Session object, binding a proper Device to it and setting up model tensor data handlers, everything is ready to run the first program.

PopEF groups Poplar Programs into three groups: Load, Main and Save. Each group can consist of several Poplar programs that will be executed in the programmed sequence when the group is run. Predestination of the groups is as following:

Load programs realize all the pre-computational tasks that are supposed to be executed before the model computation like: triggering transfer of state tensor data from the host (see Section 4.3, Handlers for model tensors), generating tensor data directly on the IPU (for example see: Randomgen) or others.
Main programs (in most cases there is a single Main program) perform the chain of computations representing the user computational model. During the execution, input and output tensors are transferred between the host and the IPU.
Save programs are very rarely used in the case of inference models. Their main purpose is to transfer the model state tensor data from the IPU to the host (not to be confused with model outputs).

Note

Save programs are normally executed during model training, to download the updated weights and cycle counters. They are rarely used in inference, but still handled by Model Runtime.

By using the Session API, you can easily run a Program group or a single Program on the device bound to the session:

// Assuming there is a session (Session) already created
session.runLoadPrograms();
session.runMainPrograms();
session.runSavePrograms();

const std::vector programs_numbers = {1, 2, 3}; // to be read from PopEF
session.runPrograms(program_numbers);

Note

For instructions on how to determine the information about model programs, refer to PopEF file analysis.

To stop the running program, you can call the stop() method. However, it is worth noting, that the stopping process performs more operations than just program execution stop. Internally, poplar::Engine::stop() is called which leaves the device in an undefined state. Also, if there is any QueueManager bound to the session, all its queues get disconnected.

Note

To reinstate the session after stop(), you have to call reload() before any further actions.

4.5. Retrieving information from Session

To help you prepare callbacks for in-model defined tensors, Session provides a set of getter functions, returning Anchor objects that refer to the user model tensors.

Note

The functions: getUserInputAnchors() and getUserOutputAnchors() return Anchor objects that you have to define callbacks for, while getInputAnchors() returns all Anchor objects, including those bound to a data sourc of PopEF origin (Section 4.3, Handlers for model tensors).

// Assuming there is a session (Session) already created
const int MAX_SIZE = 1024;        // max size of a single model output
std::vector<const popef::Anchor *> output_anchors = session.getUserOutputAnchors();
const size_t number_of_outputs = output_anchors.size();

// prepare outputs buffer, based on the number of the model outputs
std::vector<std::array<std::byte, SIZE>> outputs (number_of_outputs);

// prepare a helper object for mapping anchors names to the user memory slots
std::map<std::string, std::byte*> handles_to_outputs_map;
for (int idx = 0; idx < number_of_outputs; idx++) {
  const std::string &anchor_name = output_anchors[idx].name();
  handles_to_idx_map.emplace(anchor_name, outputs[idx].data());
}

// create a callback factory
const auto output_factory = [](const model_runtime::CallbackInfo &ci) {
    const std::string &name = ci.anchor.name();
    const popef::TensorInfo::ShapeDimType size = ci.anchor.tensorInfo().sizeInBytes();
    const std::byte* dst = handles_to_idx_map(name);

    // based on Tensor name the proper slot in the user buffer is utilized
    const auto cb = [dst, size](void *src) {
      std::memcpy(dst, src, size);
    };
    return cb;
};

The Session class also provides functions that let you manually check if all the model-defined anchors (“pointing” to the model tensors) were connected to callbacks: anchorsNotConnectedToCallbacks() and errorIfAnchorsAreNotConnectedToCallbacks().

4.6. Managing queues of tensor data

An important function provided by the Session class is the builder function: createQueueManager(). It lets you initiate a Session managed QueueManager object that creates and controls a set of tensor data queues, storing pointers to the dedicated user-memory slots. As the management of the input and output buffers is driven asynchronously by the IPU program execution flow and completely managed by QueueManager, the user can focus on other operations (like data pre- or postprocessing) and enqueue inputs and outputs in the preferred way.

// Assuming there is a session (Session) already created
model_runtime::QueueManager *queue_manager = session.createQueueManager();

A detailed description of QueueManager can be found in Section 4.3, Handlers for model tensors.

Note

Session takes full ownership of the created QueueManager object. The lifetime of the created QueueManager is closely related to the Session lifetime.

4.7. Verification

Session performs parameter and state verification in all its functions.

For example, on creation, Session checks if the model passed by the user was saved with popef::Metadata::replicationFactor() greater than 1 (see Replicated graphs). which is not handled directly. See model_runtime::ModelRunner replication handling for more details. If the condition is not satisfied, it throws an exception.

The correctness of the model loaded is checked as well. If, for example, any of the model tensors are configured to use remote buffers, but there is no tensor or feed data for the tensor present in the model’s PopEF file, an appropriate exception is thrown.

Another set of verification steps take place in the “run program” methods:

runLoadPrograms() checks if the Executable was already loaded, which means the selected Device has a Session bound and the Executable has been loaded onto it. If this condition is not satisfied, runLoadPrograms() performs the proper steps to reach the correct state. In addition, the method verifies if the user has connected callbacks to all the model Anchor objects. If not, it throws an exception with details of the exact error.
runMainPrograms() performs the same verification steps as runLoadPrograms(), plus it checks if runLoadPrograms() was already executed. If not, runLoadPrograms() gets called.

Note

Calling runMainPrograms() has a side effect impacting the Session closing phase: it calls the runSavePrograms() method before unloading the Session from the Device (see cpp:func:~model_runtime::Session::unloadFromDevice) if this method was not called explicitly by the user before.
runSavePrograms() and runPrograms() perform the same verifications as runLoadPrograms().

Note

Session methods runLoadPrograms(), runMainPrograms(), runSavePrograms() and runPrograms() may throw an exception if there are any model Anchor without connected callbacks left.

In case of methods: setUserInputHandler(), setUserOutputHandler() the check is pretty simple. The methods verify if the executable was loaded before and load it if needed.

An important verification step is also performed in the stop() method. It checks if the user calls stop after the Session was activated, so the Device is ready and an executable was loaded onto it and only then triggers stop on the Executable. It also disconnects all the tensor data queues managed by QueueManager if any were bound to Session.

Search help

4. Session

4.1. Creating a session

4.2. Uploading user model onto IPU

4.3. Handlers for model tensors

4.4. Running programs

4.5. Retrieving information from Session

4.6. Managing queues of tensor data

4.7. Verification