11. Session

The Session class represents the PopART runtime session and lets you execute a PopXL graph of operations. You create a session as follows:

Listing 11.1 Example of Session construction
25# Construct an Ir `ir`...
26
27ir.num_host_transfers = 1
28
29with popxl.Session(ir, "ipu_model") as session:
30    outputs = session.run()

Download tensor_addition.py

where ir is the Ir object you have created.

Warning

The session takes ownership of the Ir object from creation onwards; the Ir object cannot be changed after this point.

The session will prepare and compile the IR that your Python Ir object represents. A popart_exception error will be thrown if the Ir object is not valid. This can happen if there are cycles in the graph or if the configuration for the device(s) you have specified is invalid.

11.1. Running a session

At this point, the IR is compiled, but we must perform some more steps before executing it on the device:

  • Attach to the device

  • Initialise the weight tensors on the device (from the values stored on the host when building the IR)

  • Connect the popxl.HostToDeviceStream streams (that is, the inputs of your model) to buffers of the appropriate size, shape and dtype

popxl.Session should be used as a context manager to achieve the first two steps; it will attach to the device and load the weights from host onto device. Then, to execute the Ir on given input data, call popxl.Session.run() inside the context, passing a Mapping from popxl.HostToDeviceStream to the relevant input data buffers. Note, Calling popxl.Session.run() before attaching to the device will result in an error. Finally, on exiting the context, the weights will be loaded from device back onto host, and the session will detach from the device.

Listing 11.2 Example of running with popxl.Session.run()
35# run the model
36with session:
37    outputs = session.run(inputs)
38
39print(f"Input a is {data_a}")
40print(f"Input b is {data_b}")
41print(f"Result is {outputs[o_d2h]}")

Once you have constructed your session, you can run your model with the relevant inputs to return your outputs. You can do this in two ways:

  1. outputs = session.run(inputs, device_desc).

This runs the session with inputs and constructs outputs in the form of NumPy ndarray objects to return to the caller. Input shapes will be validated and outputs will be returned in the shape inferred by the IR. device_desc is a string describing the type of device you will use to run the session as described in Section 11.6, Device Types.

Note

It is important that the context manager keeps the host weights in sync with the device, as attach-detach-attach-ing can invalidate the weights on device. This is because Poplar may zero the memory on attach/detach, or another process used the IPU whilst you were detached and overwrote that memory.

popxl.Session.run() will validate that all the required input streams have been passed, and that the input buffers are of correct shape. See Section 11.5, Data input shape for what the shapes should be. It will also internally create the output buffers for you, as a Mapping from popxl.DeviceToHostStream to np.ndarray. The correct shape and dtype will be inferred from the Ir.

Alternatively, you can create the output buffers yourself and pass them to popxl.Session.run_with_outputs() to fill in for you. The Mapping you pass will be validated similarly to the inputs. 2. popxl.Session.run_with_outputs

If you want to write to part of a larger array, or you already have output arrays constructed, use run_with_outputs(). You must first construct the output arrays with the necessary shape, then pass them to the session. The session runs the model and writes the values to the output arrays. The shapes of the output arrays will be validated against the inferred shape and an error is thrown if the shapes do not correspond.

Listing 11.3 Example of running with popxl.Session.run_with_outputs()
45# Alternatively:
46run_output = {o_d2h: np.empty(shape=[1]).astype(np.float32)}
47
48with session:
49    session.run_with_outputs(inputs, run_output)
50
51print(f"Input a is {data_a}")
52print(f"Input b is {data_b}")
53print(f"Result is {run_output[o_d2h]}")

Finally, there is also popxl.Session.create_host_outputs() that will create the Mapping for you, with each stream mapped to an empty np.ndarray. This is the method used internally in popxl.Session.run() and provides a shortcut to constructing the output arrays required for popxl.Session.run_with_outputs().

11.2. Getting and setting tensor data

Once you have created a session, it is possible to write to variable tensors, and read variable tensors and constant tensors from the device. You can do this if you want to make comparisons between trained weights and a reference, or to update or reset weights for debugging. You can also do this if you want to save progress on your model by storing and reloading the variable tensors.

Listing 11.4 Example of getting and setting tensor data
29with popxl.Session(ir, "ipu_model") as session:
30    outputs = session.run()
31
32    print(f"Result is {outputs[o_d2h]}")
33
34    # Takes a `popxl.Constant` or `popxl.Variable` tensor
35    # Retrieves the data for the specified tensor
36    a_data = session.get_tensor_data(a)
37    b_data = session.get_tensor_data(b)
38
39    # Check the device values of 'a', 'b'
40    assert a_data == np.array(3)
41    assert b_data == np.array(1)
42
43    # Takes a `popxl.Variable`
44    # Writes a new value for `a` on device
45    session.write_variable_data(a, np.array(5).astype(np.int32))
46
47    # Variable now updated on 'device'
48    assert session.get_tensor_data(a) == np.array(5)

Download tensor_get_write.py

You can get or set data for multiple tensors via the methods popxl.Session.get_tensors_data() and popxl.Session.write_variables_data() respectively. It will defer the transfer to the host or device until the end so that all the reads or writes are performed in one operation.

11.2.1. When transfers will occur between host and device

If attached to device, popxl.Session.write_variable_data() will update both the host and device weights. If not attached, only the host weights will be updated. The device weights will be updated on the next weights_from_host. This will happen on the next popxl.Session context enter, or when you call popxl.Session.weights_from_host() manually.

Similarly, popxl.Session.get_tensor_data() will only ensure the most up-to-date weights from device are returned if attached to device. If not attached, the current host weights will be returned.

Furthermore, popxl.Session.get_tensor_data() treats the host weights as a cache of the device weights, so will only perform the weights_to_host if the host weights are possibly out of date. This happens if a runtime function that can mutate the weights on device has been called (like run) since the last popxl.Session.weights_to_host(). This happens when calling it manually, or possibly when exiting the popxl.Session context or calling get_tensor_data. Note popxl.Session.write_variable_data() does not invalidate the host weights as it updates them too.

If only the data for popxl.Constant tensors are required, then there will never be a device-to-host transfer. You can perform this operation when not attached to a device.

The above points are demonstrated in the following example:

Listing 11.5 Demonstration of exactly when host-device transfers occur during tensor reading and writing.
31session = popxl.Session(ir, "ipu_model")
32
33# Allowed when detached, device weights will be updated on next context enter
34session.write_variable_data(a, some_new_np_data)
35
36with session:
37    # No weights_to_host as no `run` has invalidated the host weights yet
38    a_data = session.get_tensor_data(a)
39
40    outputs = session.run()
41
42    # No weights_to_host, is constant
43    b_data = session.get_tensor_data(b)
44    # Causes weights_to_host as host weights currently out-of-sync
45    datas = session.get_tensors_data([a, b])
46
47    # Performs weights_from_host
48    session.write_variable_data(a, some_new_np_data)
49
50    # No weights_to_host, write_variable_data did not invalidate
51    a_data = session.get_tensor_data(a)
52
53    outputs = session.run()
54
55# Leave context, causes detach and weights_to_host
56
57# No weights_to_host as detached. Latest weights were already fetched due to
58# leaving context
59tensor_datas = session.get_tensors_data([a, b])
60

Download tensor_get_write_adv.py

11.3. Nested Session Contexts

It is possible to nest popxl.Session contexts. Every time you go from detached to attached on entering the context, a weights_from_host will occur. When you leave the context, only if you were attached when entering that context, a weights_to_host and detach will occur.

The following code demonstrates the semantics:

Listing 11.6 Demonstration of semantics of nested Session contexts
28
29# Enter Ctxt 1, causes attach, weights_from_host
30with popxl.Session(ir, "ipu_model") as session:
31    # Attach, weights_from_host
32    assert session.is_attached
33
34    # Enter Ctxt 2, still attached, no weights_from_host again
35    with session:
36        assert session.is_attached
37
38    # Exit Ctxt 2, no detach or weights_to_host, as attached on enter
39    assert session.is_attached
40
41# Exit Ctxt3, causes detach and weights weights_to_host as detached on enter
42assert not session.is_attached
43

Download nested_session_contexts.py

11.4. Number of host transfers

The num_host_transfers property of the Ir class determines the number of iterations required for each session.run call. For each host_load() (per tensor) operation in your model to run, you will need to increment num_host_transfers by one. This includes host_load() operations inside called subgraphs and repeated subgraphs. For example, if you have two host_load() ops for tensors x and label in the main graph, num_host_transfers will be 1. However if you put these ops inside a repeat op that repeats 10 times, you will need to set num_host_transfers to 10.

If you have different numbers of host_load() ops for different tensors in your graph, you will find that some streams will exhaust their data before others, resulting in the exhausted stream looping around the data buffer before Python is able to provide more data. For example, assume you have two repeat() ops that have host loaded tensors inside - stream A repeats three times and stream B repeats five times - providing three batches of data for stream A and five for stream B. This will result in stream A exhausting its data. For every model run, both streams will advance by one batch leading to A hitting the end of its allotted data before B.

In this case, set num_host_transfers to 2 * ceil(number_of_host_load_runs) and provide ceil(number_of_host_load_runs) batches for each stream. In the example, this would mean a ir.num_host_transfers = 5 * 2 = 10 and you would need to provide five batches for streams A and B. You will need to keep track of how many batches have been consumed by the model, and perhaps move data to the next model run if it was not consumed. For example, stream A would need to move the last two unused batches to the next model run’s data. Alternatively, pad the last two batches of data for stream A with zeros on every model run.

Note

This behaviour will likely change in the future so that the correct number of batches of data are required per stream.

11.5. Data input shape

When providing data for the session, you also need to ensure that you provide enough data for replication_factor as well as for num_host_transfers. The input data will need to have a shape as follows:

[num_host_transfers, replication_factor, *device_data_shape]

For example, with:

device_data_shape = (5, 9, 9)
num_host_transfers = 7
replication_factor = 2

then:

input_shape = (7, ) + (2 , ) + (5, 9) = (7, 2,5, 9, 9).

Note, replication_factor and num_host_transfers are independent and need to have separate dimensions in the input data, or you will find that the data will be consumed out of order.

Listing 11.7 Example of num_host_transfers with a repeat op.
12# Creating a model with popxl
13ir = popxl.Ir()
14main = ir.main_graph
15
16_INPUT_SHAPE = [2, 2]
17_REPEAT_COUNT = 8
18
19
20class Linear(popxl.Module):
21    def __init__(
22        self, x_h2d_: popxl.HostToDeviceStream, y_d2h_: popxl.DeviceToHostStream
23    ):
24        self.x_h2d = x_h2d_
25        self.y_d2h = y_d2h_
26        self.W: popxl.Tensor = None
27        self.b: popxl.Tensor = None
28
29    def build(
30        self, x: popxl.Tensor, out_features: int, bias: bool = True
31    ) -> Tuple[popxl.Tensor, ...]:
32
33        x = ops.host_load(self.x_h2d, "x")
34        self.W = popxl.graph_input((x.shape[-1], out_features), popxl.float32, "W")
35        y = x @ self.W
36        if bias:
37            self.b = popxl.graph_input((out_features,), popxl.float32, "b")
38            y = y + self.b
39
40        ops.host_store(self.y_d2h, y)
41        return y
42
43
44with main:
45    # host load
46    x_h2d = popxl.h2d_stream(_INPUT_SHAPE, popxl.float32, name="x_stream")
47    y_d2h = popxl.d2h_stream(_INPUT_SHAPE, popxl.float32, name="y_stream")
48    W_data = np.ones([2, 2], np.float32)
49    b_data = np.ones([2], np.float32)
50    W = popxl.variable(W_data, name="W")
51    b = popxl.variable(b_data, name="b")
52
53    # This is the loop carried input.
54    x = ops.init([2, 2], popxl.float32, "init")
55
56    # Create graph, pass in streams
57    linear = Linear(x_h2d, y_d2h)
58    linear_graph = ir.create_graph(linear, x, out_features=2)
59
60    # call graph in a loop
61    # the x, W, b will be copied to the input of the `linear_graph` before the first iteration
62    # the outputs of each iteration will be copied to the inputs of the next iteration
63    # The outputs of the last iteration serve as the output of the `repeat` op
64    # Note the iterations of 8, which we will also use as the num_host_transfers
65    (o,) = ops.repeat(
66        linear_graph, _REPEAT_COUNT, x, inputs_dict={linear.W: W, linear.b: b}
67    )
68
69# The ost_load and host_store ops are both run _REPEAT_COUNT number of times, so set num_host_transfers
70# to _REPEAT_COUNT.
71ir.num_host_transfers = _REPEAT_COUNT
72
73# Note the input shape here (_REPEAT_COUNT, *data_shape):
74x_data = np.random.random([_REPEAT_COUNT, 2, 2]).astype(np.float32)
75input_ = {x_h2d: x_data}
76
77with popxl.Session(ir, "ipu_model") as session:
78    outputs = session.run(input_)

Download repeat_graph_2.py

11.6. Device Types

When creating a session, you need to describe the device you are using with device_desc. Possible values are:

  1. ipu_hw

This indicates that you are using physical IPU hardware.

  1. ipu_model

This indicates that you are using the IPU Model. The IPU Model is a simulation of the behaviour of the IPU hardware, but it does not completely implement every aspect of a real IPU. For example, the IPU Model does not fully support replicated graphs nor the same random number generation as the IPU hardware. Its arithmetic results may differ from what would be obtained by using the IPU hardware. It also does does not support remote storing and loading of variable tensors.

  1. cpu

This indicates that you are using a CPU. In some use cases it is faster to use a CPU than the IPU Model. The cpu device type does not support remote storing and loading of variable tensors. The cpu device type also does not support replication in any use case.

Note

You do not need to set the number of devices, as this is calculated automatically from the number of virtual graphs used and the replication factor. An error will be thrown if the number of devices required exceeds the number available.