4. Sessions

The Session class forms the basis of the low-level Model Runtime API, which gives you more flexibility but requires more knowledge about PopEF files, the Poplar runtime and the IPU hardware.

Note

The term “device” is used to indicate an abstraction of one or more physical IPUs that can execute code. See Section 5, Managing devices for more information.

The Session class is a composition of objects that:

4.1. Creating a session

To create a Session object, you need a model (created by your program or manually loaded from a PopEF file):

auto reader = std::make_shared<popef::Reader>();
popef::ModelBuilder builder(reader);
std::shared_ptr<popef::Model> model = builder.createModel();

model_runtime::Session session(model);

You can also use a set of paths to PopEF files storing the model (and its metadata) — in this case Session will handle loading the PopEF model internally:

const std::vector<std::string> popef_paths { paths, to, popef, files };

model_runtime::Session session(popef_paths);

4.1.1. Session configuration options

Session class constructors also accept a second argument (config) that lets you set up the following creation and runtime options:

  • check_package_hash: Flag to indicate whether or not to compare your model version and the active Poplar runtime version.

    Note

    By default, the check_package_hash configuration option is set to true and this is the recommended setting. If there is a mismatch between the versions of your model and the active Poplar runtime, an exception is thrown from the Session constructor (unless you set check_package_hash to false). By default, Model Runtime does not provide compatibility if the active version of Poplar in your system is different from the version used to compile and store the model in PopEF.

  • policy: The policy associated with the device acquisition step.

    Session needs a Device object to communicate with the IPU. This object may get created in different ways, depending on the policy set up in the configuration:

    • Immediate: The model_runtime::Device object gets created automatically in the Session constructor with the help of the Session object’s model_runtime::DeviceManager object. The selected device is the physical IPU or group of IPUs suitable for running the model. To fine-tune the Device creation process, another SessionConfig field may be set up, wait_config.

      Note

      If there is no IPU hardware suitable for running the model in your system, an exception is thrown and the Session object is not constructed.

    • Deferred (default): Session does not create a Device object. You need to do it explicitly and bind the created Session object to the device by calling the bindToDevice() method. Session needs a bound Device object to operate.

      Note

      If a Device is successfully acquired with the Immediate creation policy, the Device is owned by the Session object that created it. Session controls the lifetime of the Device object, including its proper destruction. In the Deferred mode, you are responsible for the proper destruction of the Device object.

  • wait_config: By default, Session throws an exception if it is not able to attach to any device suitable for the given model. This behaviour can be changed by setting DeviceWaitConfig:

    • strategy: Controls how the session waits for device availability; it may be NO_WAIT, the default setting, WAIT_WITH_TIMEOUT, which throws an exception after elapsed timeout, or WAIT_FOREVER.

    • timeout: The WAIT_WITH_TIMEOUT parameter, in seconds.

    • sleepTime: The sleep time between consecutive device attach attempts, in seconds.

    For example, if you wish to wait up to 5 seconds for the device to become available (timeout) and you wish to check its availability every second (sleepTime), you can set the following configuration:

    using namespace std::chrono_literals;
    const DeviceWaitConfig wait_up_to_5s_config = {
      DeviceWaitStrategy::WAIT_WITH_TIMEOUT, // device waiting strategy.
      5s,                                    // timeout
      1s};                                   // sleep time
    

4.1.2. Session predicates

To simplify the use of predicates, Model Runtime has a set of predicate producer functions in the predicate_factory namespace.

For example, if you want to provide your own values for the tensors loaded during the execution of Load programs instead of any tensor data that PopEF makes available for them (for example to load your own model weights), the following predicate factory method may be used to create the required predicate:

// Assuming there is a model (popef::Model) prepared/loaded in advance
const auto &programFlow = model.metadata.programFlow();
const auto accept_policy =
  PopefDataUsagePolicy::USE_USER_DATA;               // bind a user callback to every Anchor
                                                     // that is "owned" by Load programs
const auto reject_policy =
  PopefDataUsagePolicy::USE_POPEF_DATA_IF_ANY;       // skip assigning callback to remaining Anchors

const auto user_data_for_load_program_tensors_predicate =
    model_runtime::predicate_factory::popef_data_usage::predProgramFlowLoad(
        programFlow, accept_policy, reject_policy);

The complete setup of SessionConfig may look as follows:

// Assuming we wait_up_to_5s_config (DeviceWaitConfig) defined before
model_runtime::SessionConfig session_config;   // instantiate SessionConfig object
session_config.policy =
  model_runtime::LaunchPolicy::Immediate;      // acquire device immediately
session_config.pred_tensor_data =
  user_data_for_load_program_tensors_predicate;// use the predicate created
session_config.check_package_hash = true;      // perform version compatibility check
session_config.wait_config = wait_up_to_5s_config;

4.2. Uploading your model to the IPU

With the Device and poplar::Engine objects, Session has all it needs to reserve the necessary IPUs, upload the model code to that device, and set up all the transfer channels for exchange of model tensor data.

In the Immediate mode, all these steps take place in the Session constructor (assuming the Device object was created properly) and all you have to do is ensure the availability of suitable and properly configured IPUs in the system.

Note

As Immediate is not the default SessionConfig mode, you have to explicitly configure Session to use it.

// Assuming there is a model (popef::Model) prepared/loaded in advance
model_runtime::SessionConfig config{model_runtime::LaunchPolicy::Immediate};
model_runtime::Session session(model, config);

In the Deferred mode, you are responsible for creating the Device object (using DeviceManager) and for binding the Session instance to the device using the bindToDevice() method.

model_runtime::DeviceManager devices;

// Acquire a device suitable for the model
auto device = devices.getDevice(model);

// Create Session object, by default in deferred device creation mode
model_runtime::Session session(model);

// Explicitly bind the session to the device
session.bindToDevice(device);

Note

If the session is already connected to a device and running a Main program, then attempting to bind it to a new device will start the execution of the Save program. The session will wait until the Save program execution ends before binding to the new device.

Note

You can call bindToDevice() with the same device as is currently being used by the session. To successfully bind to a device, all the steps, including reloading the executable onto the device, have to take place.

You can also explicitly unload the session from the Device it is bound to using the unloadFromDevice() method, which throws an exception if the session isn’t bound to a device. As with rebinding to another device, the Save program may be run (see Section 4.7, Verification for more details).

To bind the session again to the previously set up Device object and upload the poplar::Executable to it, you can use reload().

// Assuming there is a session (Session) already created and the user provided device
session.bindToDevice(device);
// ...
session.unloadFromDevice(); // unload the session from the device
// ...
session.reload();           // rebind the session to the device and reload the executable

4.3. Handlers for model tensors

User models operate on two types of tensors: state and input/output.

State tensors are, for example, model weights or other model parameters that have to be transferred to the IPU once (conceptually during Load program execution) before the execution of any computation.

Tensors transferred out of the IPU in Save programs also fall into the state tensor category.

Input or output tensors have to be transferred to the IPU from the host for a computational round to take place. They are fetched during the execution of the Main program.

Note

Refer to PopEF file analysis for how to display information about model tensors.

One possible way of storing state tensor data is to compile them as constants into the IPU code representing your model. In this case, when the Session object uploads the executable onto its bound device, the tensor data is also transferred to the IPU memory, so no other transfers are needed.

Another way is to save the tensor data inside the model’s PopEF file. By default, if there is tensor or feed data for a particular tensor in PopEF, when binding to a device, Session sets up a transfer stream between the tensor and its tensors or feed data. When the running IPU program requests the transfer of the tensor data from the host, it gets exactly the bytes stored in the connected tensor or feed data.

To override this default behaviour and select which tensors the corresponding tensor or feed data is to be used for (or not used), you can prepare a predicate: a function returning the desired anchor callback policy for the Anchor passed in.

There are three policies defined in AnchorCallbackPolicy:

  • BIND_USER_CB: Do not bind the tensor or feed data to Anchor. Enforce binding of the user-defined callback.

  • BIND_EMPTY_CB: Bind an empty callback. No tensor or feed data or user data will be transferred to the IPU. In the case of an input tensor, the Poplar runtime transfers random bytes from its memory. For an output tensor, the data transferred from the IPU is not accessible by your model. This callback policy may be especially helpful in the case of tensors that are not of interest to you, for example tensors returned during the execution of the Save program (see Section 4.3, Handlers for model tensors for more about Save programs).

  • SKIP_CB: Skip without changing the Anchor object’s callback binding.

// Assuming there is a model (popef::Model) prepared/loaded in advance
const auto &programFlow = model.metadata.programFlow();
const auto accept_policy =
  AnchorCallbackPolicy::BIND_EMPTY_CB; // bind an empty callback to every Anchor
                                       // that is "owned" by Save programs
const auto reject_policy =
  AnchorCallbackPolicy::SKIP_CB;       // skip assigning callback to remaining Anchors

const auto empty_cb_for_save_program_tensors_predicate =
  model_runtime::predicate_factory::anchor_callbacks::predProgramFlowSave(
    programFlow, accept_policy, reject_policy);

The last option for feeding data to or from model tensors is to set up user callbacks (CallbackHandle). A tensor callback is a functor that accepts a pointer to the memory on which it is to operate. In the case of tensors that are to be uploaded to the IPU, the functor’s task is to copy the tensor data to the address in the pointer. For tensors transferred from the IPU to the host, the functor’s task is to copy the data from the memory to your model’s structure (or generally to do any other operation on the data it has access to).

Note

As the tensor data of each model may be transferred to and from different memory locations, a separate callback is needed for each Anchor representing a particular tensor.

To assign a callback to an Anchor, you can use the setCallbackForAnchor() method.

// Assuming there is a session (Session) already created
const std::string input_anchor_handle =
  "some_input_anchor_handle";  // input anchor handle string - can be read from PopEF

const auto input_callback =    // input tensor data callback
  [&user_space_for_input_tensor](void *dest) {
    auto &io = user_space_for_input_tensor;
    std::memcpy(dest, io.data(), io.size() * sizeof(float)); // copy data from user memory location to dest
    printf("Copying input from host to IPU"); // perform extra operations - for example print to console
  };

const std::string output_anchor_handle =
  "some_output_anchor_handle";  // output anchor handle string - can be read from PopEF

const auto output_callback = [&user_space_for_output_tensor](void *src) {
  auto &io = user_space_for_input_tensor;
  std::memcpy(io.data(), src, io.size() * sizeof(float));  // copy data from src to the user memory location
  printf("Copying output from IPU to host");
};

// Assign callbacks for the selected Anchors
session.setCallbackForAnchor(input_anchor_handle, input_callback);
session.setCallbackForAnchor(output_anchor_handle, output_callback);

CallbackFactory gathers all the callbacks and dispatches them per Anchor. This is a functor that accepts CallbackInfo (that stores an Anchor) and returns a callback specific to that Anchor.

To assign a user CallbackFactory to handle model tensor data transfers, you have to use setUserInputHandler() (for tensors transferred from host to IPU) or setUserOutputHandler() (for tensors transferred from IPU to host). Both accept two extra arguments — anchorCallbackPredicate (a predicate described in Section 4.1, Creating a session) and skip_connected (simplifies skipping Anchor objects with their handlers already set up).

// Assuming there is a session (Session) already created
const auto output_factory = [](const model_runtime::CallbackInfo &ci) {
    const std::string& name = ci.anchor.name();
    if (name == "some_anchor_handle") {
      const auto some_anchor_handle_callback = [](void *src) {
        // do some operations on src
      };
      return some_anchor_handle_callback;
    } else if (name == "some_other_anchor_handle") {
      const auto some_other_anchor_handle_callback = [](void *src) {
        // do some operations on dst
      };
      return some_other_anchor_handle_callback;
    } else {
      const auto just_print_out_callback = [&name](void *) {
        std::cout << "Callback for " << name << " called";
      };
      return just_print_out_callback;
    }
};
const auto &predicate = empty_cb_for_save_program_tensors_predicate;
const bool skip_connected = true;  // default value for skip_connected parameter
session.setUserOutputHandler(output_factory, predicate, skip_connected);

Note

It is worth emphasising that Model Runtime supports Poplar remote memory buffers without you having to do anything extra. If the model has any Anchor objects configured to use remote buffers, Session will configure them properly. You simply refer to such Anchor objects in the same way as any other anchors .

4.4. Running programs

After creating a properly configured Session object, binding a proper Device object to it and setting up model tensor data handlers, everything is ready to run the first program.

PopEF groups Poplar programs into three types: Load, Main and Save. Each group can consist of several Poplar programs that will be executed in the programmed sequence when the group is run.

The purpose of the groups is as follows:

  • Programs in the Load group realize all the pre-computational tasks that are supposed to be executed before the model computation. For example, triggering transfer of state tensor data from the host (see Section 4.3, Handlers for model tensors) or generating tensor data directly on the IPU (for example RandomGen).

  • Programs in the Main group (in most cases there is a single Main program) perform the chain of computations representing the computational model. During execution, input and output tensors are transferred between the host and the IPU.

  • Programs in the Save group are very rarely used in the case of inference models. Their purpose is to transfer the model state tensor data from the IPU to the host (not to be confused with model outputs).

Note

Save programs are normally executed during model training to download the updated weights and cycle counters. They are rarely used in inference, but are still handled by Model Runtime.

By using the Session class, you can easily run a program group or a single program on the device bound to the session:

// Assuming there is a session (Session) already created
session.runLoadPrograms();
session.runMainPrograms();
session.runSavePrograms();

const std::vector programs_numbers = {1, 2, 3}; // to be read from PopEF
session.runPrograms(program_numbers);

Note

Refer to PopEF file analysis for how to display information about model programs.

To stop a running program, you can call the stop() method. However, it is worth noting that this method does more than just stop execution of the program. Internally, poplar::Engine::stop() is called which leaves the device in an undefined state. Also, if there is a QueueManager bound to the session, all its queues get disconnected.

Note

To reinstate the session after stop(), you have to first call reload().

4.5. Retrieving information from a session

To help you prepare callbacks for tensors defined in the model, Session provides a set of getter functions which return Anchor objects that refer to your model’s tensors.

Note

The functions getUserInputAnchors() and getUserOutputAnchors() return Anchor objects that you have to define callbacks for. getInputAnchors() returns all Anchor objects, including those bound to a data source of PopEF origin (Section 4.3, Handlers for model tensors).

// Assuming there is a session already created
const int MAX_SIZE = 1024;        // max size of a single model output
std::vector<const popef::Anchor *> output_anchors = session.getUserOutputAnchors();
const size_t number_of_outputs = output_anchors.size();

// prepare outputs buffer, based on the number of the model outputs
std::vector<std::array<std::byte, SIZE>> outputs (number_of_outputs);

// prepare a helper object for mapping anchors names to the user memory slots
std::map<std::string, std::byte*> handles_to_outputs_map;
for (int idx = 0; idx < number_of_outputs; idx++) {
  const std::string &anchor_name = output_anchors[idx].name();
  handles_to_idx_map.emplace(anchor_name, outputs[idx].data());
}

// create a callback factory
const auto output_factory = [](const model_runtime::CallbackInfo &ci) {
    const std::string &name = ci.anchor.name();
    const popef::TensorInfo::ShapeDimType size = ci.anchor.tensorInfo().sizeInBytes();
    const std::byte* dst = handles_to_idx_map(name);

    // based on Tensor name the proper slot in the user buffer is used
    const auto cb = [dst, size](void *src) {
      std::memcpy(dst, src, size);
    };
    return cb;
};

The Session class also provides the anchorsNotConnectedToCallbacks() and errorIfAnchorsAreNotConnectedToCallbacks() functions that let you manually check if all the model-defined anchors (“pointing” to the model tensors) are connected to callbacks.

4.6. Managing queues of tensor data

An important function provided by the Session class is the builder function: createQueueManager(). It lets you initiate a Session managed QueueManager object that creates and controls a set of tensor data queues, storing pointers to the dedicated user-memory slots. Because the management of the input and output buffers is driven asynchronously by the IPU program execution flow and completely managed by QueueManager, you can focus on other operations (like data pre- or post-processing) and enqueue inputs and outputs in the preferred way.

// Assuming there is a session (Session) already created
model_runtime::QueueManager *queue_manager = session.createQueueManager();

A detailed description of QueueManager can be found in Section 4.3, Handlers for model tensors.

Note

Session takes full ownership of the created QueueManager object. The lifetime of the created QueueManager is closely related to the Session lifetime.

4.7. Verification

Session performs parameter and state verification in all its functions.

For example, on creation, Session checks if the model you passed was saved with popef::Metadata::replicationFactor() greater than 1 (see Replicated graphs), which is not handled directly. See Section 3.2, Replication for more details on how ModelRunner handles replication. If the replication factor is not equal to 1, Session throws an exception.

The correctness of the model being loaded is also checked. For example, if the model contains tensors which are configured to use remote buffers, but there is no tensor or feed data for these tensors present in the model’s PopEF file, an exception is thrown.

Another set of verification steps takes place in the “run program” methods:

The methods setUserInputHandler() and setUserOutputHandler() check if the executable was already loaded, and load it if needed.

An important verification step is also performed in the stop() method. It checks if your model called stop after the session was activated (the device is ready and an executable was loaded onto it), and only then triggers stop on Executable. It also disconnects all the tensor data queues managed by QueueManager if any were bound to the session.