11. Session
The Session
class represents the PopART runtime
session and lets you execute a PopXL graph of operations. You create a session
as follows:
25# Construct an Ir `ir`...
26
27ir.num_host_transfers = 1
28
29with popxl.Session(ir, "ipu_model") as session:
30 outputs = session.run()
where ir
is the Ir
object you have created.
Warning
The session takes ownership of the Ir
object from
creation onwards; the Ir
object cannot be changed after this point.
The session will prepare and compile the IR that your Python
Ir
object represents. A popart_exception
error will be thrown if the Ir
object is not valid. This can happen if there
are cycles in the graph or if the configuration for the device(s) you have
specified is invalid.
11.1. Running a session
At this point, the IR is compiled, but we must perform some more steps before executing it on the device:
Attach to the device
Initialise the weight tensors on the device (from the values stored on the host when building the IR)
Connect the
popxl.HostToDeviceStream
streams (that is, the inputs of your model) to buffers of the appropriate size, shape and dtype
popxl.Session
should be used as a context manager to achieve the first two steps; it will attach to the device and load the weights from host onto device.
Then, to execute the Ir
on given input data, call popxl.Session.run()
inside the context, passing a Mapping
from popxl.HostToDeviceStream
to the relevant input data buffers.
Note, Calling popxl.Session.run()
before attaching to the device will result in an error.
Finally, on exiting the context, the weights will be loaded from device back onto host, and the session will detach from the device.
35# run the model
36with session:
37 outputs = session.run(inputs)
38
39print(f"Input a is {data_a}")
40print(f"Input b is {data_b}")
41print(f"Result is {outputs[o_d2h]}")
Once you have constructed your session, you can run your model with the relevant inputs to return your outputs. You can do this in two ways:
This runs the session with
inputs
and constructsoutputs
in the form of NumPyndarray
objects to return to the caller. Input shapes will be validated and outputs will be returned in the shape inferred by the IR.device_desc
is a string describing the type of device you will use to run the session as described in Section 11.6, Device Types.
Note
It is important that the context manager keeps the host weights in sync with the device, as attach-detach-attach-ing can invalidate the weights on device. This is because Poplar may zero the memory on attach/detach, or another process used the IPU whilst you were detached and overwrote that memory.
popxl.Session.run()
will validate that all the required input streams have been passed, and that the input buffers are of correct shape.
See Section 11.5, Data input shape for what the shapes should be.
It will also internally create the output buffers for you, as a Mapping
from popxl.DeviceToHostStream
to np.ndarray
.
The correct shape and dtype will be inferred from the Ir
.
Alternatively, you can create the output buffers yourself and pass them to popxl.Session.run_with_outputs()
to fill in for you. The
Mapping
you pass will be validated similarly to the inputs.
2. popxl.Session.run_with_outputs
If you want to write to part of a larger array, or you already have output arrays constructed, use
run_with_outputs()
. You must first construct the output arrays with the necessary shape, then pass them to the session. The session runs the model and writes the values to the output arrays. The shapes of the output arrays will be validated against the inferred shape and an error is thrown if the shapes do not correspond.45# Alternatively: 46run_output = {o_d2h: np.empty(shape=[1]).astype(np.float32)} 47 48with session: 49 session.run_with_outputs(inputs, run_output) 50 51print(f"Input a is {data_a}") 52print(f"Input b is {data_b}") 53print(f"Result is {run_output[o_d2h]}")
Finally, there is also popxl.Session.create_host_outputs()
that will create the Mapping
for you, with each stream mapped to an empty np.ndarray
.
This is the method used internally in popxl.Session.run()
and provides a shortcut to constructing the output arrays required for popxl.Session.run_with_outputs()
.
11.2. Getting and setting tensor data
Once you have created a session, it is possible to write to variable tensors, and read variable tensors and constant tensors from the device. You can do this if you want to make comparisons between trained weights and a reference, or to update or reset weights for debugging. You can also do this if you want to save progress on your model by storing and reloading the variable tensors.
29with popxl.Session(ir, "ipu_model") as session:
30 outputs = session.run()
31
32 print(f"Result is {outputs[o_d2h]}")
33
34 # Takes a `popxl.Constant` or `popxl.Variable` tensor
35 # Retrieves the data for the specified tensor
36 a_data = session.get_tensor_data(a)
37 b_data = session.get_tensor_data(b)
38
39 # Check the device values of 'a', 'b'
40 assert a_data == np.array(3)
41 assert b_data == np.array(1)
42
43 # Takes a `popxl.Variable`
44 # Writes a new value for `a` on device
45 session.write_variable_data(a, np.array(5).astype(np.int32))
46
47 # Variable now updated on 'device'
48 assert session.get_tensor_data(a) == np.array(5)
You can get or set data for multiple tensors via the methods
popxl.Session.get_tensors_data()
and popxl.Session.write_variables_data()
respectively. It will defer the transfer to the host or device until the end so
that all the reads or writes are performed in one operation.
11.2.1. When transfers will occur between host and device
If attached to device, popxl.Session.write_variable_data()
will update
both the host and device weights. If not attached, only the host weights will be
updated. The device weights will be updated on the next weights_from_host
.
This will happen on the next popxl.Session
context enter, or
when you call popxl.Session.weights_from_host()
manually.
Similarly, popxl.Session.get_tensor_data()
will only ensure the most
up-to-date weights from device are returned if attached to device. If not
attached, the current host weights will be returned.
Furthermore, popxl.Session.get_tensor_data()
treats the host weights as
a cache of the device weights, so will only perform the weights_to_host
if
the host weights are possibly out of date. This happens if a runtime function
that can mutate the weights on device has been called (like run
) since the
last popxl.Session.weights_to_host()
. This happens when calling it
manually, or possibly when exiting the popxl.Session
context or
calling get_tensor_data
. Note popxl.Session.write_variable_data()
does not invalidate the host weights as it updates them too.
If only the data for popxl.Constant
tensors are required, then there
will never be a device-to-host transfer. You can perform this operation when not
attached to a device.
The above points are demonstrated in the following example:
31session = popxl.Session(ir, "ipu_model")
32
33# Allowed when detached, device weights will be updated on next context enter
34session.write_variable_data(a, some_new_np_data)
35
36with session:
37 # No weights_to_host as no `run` has invalidated the host weights yet
38 a_data = session.get_tensor_data(a)
39
40 outputs = session.run()
41
42 # No weights_to_host, is constant
43 b_data = session.get_tensor_data(b)
44 # Causes weights_to_host as host weights currently out-of-sync
45 datas = session.get_tensors_data([a, b])
46
47 # Performs weights_from_host
48 session.write_variable_data(a, some_new_np_data)
49
50 # No weights_to_host, write_variable_data did not invalidate
51 a_data = session.get_tensor_data(a)
52
53 outputs = session.run()
54
55# Leave context, causes detach and weights_to_host
56
57# No weights_to_host as detached. Latest weights were already fetched due to
58# leaving context
59tensor_datas = session.get_tensors_data([a, b])
60
11.3. Nested Session Contexts
It is possible to nest popxl.Session
contexts.
Every time you go from detached to attached on entering the context, a
weights_from_host
will occur.
When you leave the context, only if you were attached when entering that
context, a weights_to_host
and detach will occur.
The following code demonstrates the semantics:
28
29# Enter Ctxt 1, causes attach, weights_from_host
30with popxl.Session(ir, "ipu_model") as session:
31 # Attach, weights_from_host
32 assert session.is_attached
33
34 # Enter Ctxt 2, still attached, no weights_from_host again
35 with session:
36 assert session.is_attached
37
38 # Exit Ctxt 2, no detach or weights_to_host, as attached on enter
39 assert session.is_attached
40
41# Exit Ctxt3, causes detach and weights weights_to_host as detached on enter
42assert not session.is_attached
43
11.4. Number of host transfers
The num_host_transfers
property of the
Ir
class determines the number of iterations required for each
session.run
call. For each host_load()
(per
tensor) operation in your model to run, you will need to increment
num_host_transfers
by one. This includes host_load()
operations inside called subgraphs and repeated subgraphs. For
example, if you have two host_load()
ops for
tensors x
and label
in the main graph, num_host_transfers
will
be 1. However if you put these ops inside a repeat op that repeats 10 times, you
will need to set num_host_transfers
to 10.
If you have different numbers of host_load()
ops
for different tensors in your graph, you will find that some streams will
exhaust their data before others, resulting in the exhausted stream looping
around the data buffer before Python is able to provide more data. For example,
assume you have two repeat()
ops that have host
loaded tensors inside - stream A repeats three times and stream B repeats five
times - providing three batches of data for stream A and five for stream B. This
will result in stream A exhausting its data. For every model run, both streams
will advance by one batch leading to A hitting the end of its allotted data
before B.
In this case, set num_host_transfers
to 2 *
ceil(number_of_host_load_runs)
and provide ceil(number_of_host_load_runs)
batches for each stream. In the example, this would mean a
ir.num_host_transfers = 5 * 2 = 10
and you would need to provide five
batches for streams A and B. You will need to keep track of how many batches
have been consumed by the model, and perhaps move data to the next model run if
it was not consumed. For example, stream A would need to move the last two
unused batches to the next model run’s data. Alternatively, pad the last two
batches of data for stream A with zeros on every model run.
Note
This behaviour will likely change in the future so that the correct number of batches of data are required per stream.
11.5. Data input shape
When providing data for the session, you also need to ensure that you provide
enough data for replication_factor
as well as
for num_host_transfers
. The input data
will need to have a shape as follows:
[num_host_transfers, replication_factor, *device_data_shape]
For example, with:
device_data_shape = (5, 9, 9)
num_host_transfers = 7
replication_factor = 2
then:
input_shape = (7, ) + (2 , ) + (5, 9) = (7, 2,5, 9, 9).
Note, replication_factor
and num_host_transfers
are
independent and need to have separate dimensions in the input data, or you will
find that the data will be consumed out of order.
12# Creating a model with popxl
13ir = popxl.Ir()
14main = ir.main_graph
15
16_INPUT_SHAPE = [2, 2]
17_REPEAT_COUNT = 8
18
19
20class Linear(popxl.Module):
21 def __init__(
22 self, x_h2d_: popxl.HostToDeviceStream, y_d2h_: popxl.DeviceToHostStream
23 ):
24 self.x_h2d = x_h2d_
25 self.y_d2h = y_d2h_
26 self.W: popxl.Tensor = None
27 self.b: popxl.Tensor = None
28
29 def build(
30 self, x: popxl.Tensor, out_features: int, bias: bool = True
31 ) -> Tuple[popxl.Tensor, ...]:
32
33 x = ops.host_load(self.x_h2d, "x")
34 self.W = popxl.graph_input((x.shape[-1], out_features), popxl.float32, "W")
35 y = x @ self.W
36 if bias:
37 self.b = popxl.graph_input((out_features,), popxl.float32, "b")
38 y = y + self.b
39
40 ops.host_store(self.y_d2h, y)
41 return y
42
43
44with main:
45 # host load
46 x_h2d = popxl.h2d_stream(_INPUT_SHAPE, popxl.float32, name="x_stream")
47 y_d2h = popxl.d2h_stream(_INPUT_SHAPE, popxl.float32, name="y_stream")
48 W_data = np.ones([2, 2], np.float32)
49 b_data = np.ones([2], np.float32)
50 W = popxl.variable(W_data, name="W")
51 b = popxl.variable(b_data, name="b")
52
53 # This is the loop carried input.
54 x = ops.init([2, 2], popxl.float32, "init")
55
56 # Create graph, pass in streams
57 linear = Linear(x_h2d, y_d2h)
58 linear_graph = ir.create_graph(linear, x, out_features=2)
59
60 # call graph in a loop
61 # the x, W, b will be copied to the input of the `linear_graph` before the first iteration
62 # the outputs of each iteration will be copied to the inputs of the next iteration
63 # The outputs of the last iteration serve as the output of the `repeat` op
64 # Note the iterations of 8, which we will also use as the num_host_transfers
65 (o,) = ops.repeat(
66 linear_graph, _REPEAT_COUNT, x, inputs_dict={linear.W: W, linear.b: b}
67 )
68
69# The ost_load and host_store ops are both run _REPEAT_COUNT number of times, so set num_host_transfers
70# to _REPEAT_COUNT.
71ir.num_host_transfers = _REPEAT_COUNT
72
73# Note the input shape here (_REPEAT_COUNT, *data_shape):
74x_data = np.random.random([_REPEAT_COUNT, 2, 2]).astype(np.float32)
75input_ = {x_h2d: x_data}
76
77with popxl.Session(ir, "ipu_model") as session:
78 outputs = session.run(input_)
11.6. Device Types
When creating a session, you need to describe the device you are using with
device_desc
. Possible values are:
ipu_hw
This indicates that you are using physical IPU hardware.
ipu_model
This indicates that you are using the IPU Model. The IPU Model is a simulation of the behaviour of the IPU hardware, but it does not completely implement every aspect of a real IPU. For example, the IPU Model does not fully support replicated graphs nor the same random number generation as the IPU hardware. Its arithmetic results may differ from what would be obtained by using the IPU hardware. It also does does not support remote storing and loading of variable tensors.
cpu
This indicates that you are using a CPU. In some use cases it is faster to use a CPU than the IPU Model. The
cpu
device type does not support remote storing and loading of variable tensors. Thecpu
device type also does not support replication in any use case.
Note
You do not need to set the number of devices, as this is calculated automatically from the number of virtual graphs used and the replication factor. An error will be thrown if the number of devices required exceeds the number available.