19.1. Ir

class popxl.Ir(replication=1)
Parameters

replication (Union[int, Literal['popdist']]) –

__init__(replication=1)

PopXL intermediate representation (IR).

An IR contains a main graph (property main_graph) and can create additional graphs using member methods such as create_graph() and create_empty_graph().

Parameters

replication (Union[int, Literal['popdist']], optional) – Set the replication_factor of the IR. Value of ‘popdist’ configures the IR with settings from popdist/poprun. Defaults to 1.

create_empty_graph(name=None)

Create a new graph.

Parameters

name (Optional[str]) – Name of the graph. Defaults to “graph”.

Returns

An empty graph.

Return type

Graph

create_graph(fn, *args, **kwargs)

Create a graph from a Python callable fn or the build method of a Module. The graph inputs are determined using the signature of the function fn and the supplied arguments args and kwargs. Tensors or TensorSpecs passed via the arguments are used to determine the shape and dtype of the graph inputs (the tensors are not actually passed to the graph). The graph outputs are determined using the outputs of the function when called.

The order of inputs in the returned graph will be the same as the order of the tensor inputs in the function signature, the order of the kwargs and the order of called popxl.graph_inputs. This determines the order in which you pass the parent tensors as inputs at the callsite.

The function fn can take any arguments. Any Tensor arguments are automatically detected. Any Tensor arguments inside a tuple, list, *arg or **kwargs are also detected. *args, **kwargs, lists cannot contain a mixture of tensors and other types. Nested lists or dicts of tensors are not supported.

If an input is type hinted with TensorByRef or List[TensorByRef] where appropriate in the signature of fn then the corresponding inputs will be passed by reference instead of by value when the graph is called.

The output of fn must be either None, a Tensor or an iterable of Tensors.

Parameters
  • fn (Callable[..., Any]) – The Python function that defines the graph. The signature of fn with its arguments is used to determine the inputs of the graph.

  • args (Any) – Arguments passed to the Python function that defines the graph that can be a mixture of tensors and other types. Tensors are used to determine the tensor info of the inputs.

  • kwargs (Any) – Keyword arguments passed to the Python function that defines the graph that can be a mixture of tensors and other types. Tensors are used to determine the tensor info of the inputs.

Raises
  • TypeError – If fn is not a callable extending the popxl.Module or if any of the arguments listed in *args mixes Tensors with other types

  • ValueError – If the *args and **kwargs don’t match the signature or if the output of a subgraph is not a Tensor, an iterable of Tensors or None.

Returns

A graph that corresponds to the input Python function.

Return type

Graph

dot_checkpoint(check, save_dir=None)

Output a graphical representation of the graph in Graphviz DOT format.

Checkpoints can be activated by either setting the dotChecks option in session options or the POPART_DOT_CHECKS environmental variable. These should be set to the list of the checks to be activated. Note that if either dotChecks or POPART_DOT_CHECKS is set to ALL, all checkpoints will be activated. See the PopART User Guide for more information.

If no checkpoints are activated, this function will activate them all by setting the dotChecks option to ALL.

Parameters
  • check (str) – Name of this checkpoint.

  • save_dir (Optional[Union[Path, str]]) – Directory to store the dot files in. Note that this will set the save directory for all dot checkpoints in the graph.

Return type

None

get_all_d2h_streams()

Return all DeviceToHostStream in the IR which has a host_store op that streams along it.

Return type

Set[DeviceToHostStream]

get_all_h2d_streams()

Return all HostToDeviceStream objects in the IR which have a host_load op that streams along it.

Return type

Set[HostToDeviceStream]

property id: int
property instance_replication_factor: int
property main_graph: popxl.graph.Graph

Every IR is initialised with a main graph. This method returns this graph.

Returns

The main graph of the IR.

Return type

Graph

property num_host_transfers: int

Return the number of fwd-bwd iterations of the model that your Ir computes.

This property MUST be set before creating a popxl.Session.

More concretely, if your Ir contains an input tensor x with shape (2, 5), and you expect that your Ir will stream this tensor a total of 4 times, and therefore you need to pass a buffer with shape (4, 2, 5) to each session.run() call; then ir.num_host_transfers should equal 4. Note there will also be a replica dimension if using replication.

Note there are no separate values for “batches per step” and “gradient accumulation”, as they are known in PopART’s ONNX API. If your Ir represents a batches per step of bps and a gradient accumulation factor of af, then you should set num_host_transfers to bps * af. There are no separate setters for the two values. There will only be a single “num_host_transfers” dimension in the buffer passed to session.run.

replica_grouping(stride=1, group_size=None)

Create a ReplicaGrouping object.

A ReplicaGrouping object represents a way in which replicas are grouped for the purpose of getting and setting variable values and collective operations.

A grouping always exactly partitions a set of replicas, so every replica is exactly in one group. We specify these partitions with a stride and group_size argument. The stride specifies the offset between replicas within a group and the group_size specifies the number of replicas within a group.

Group with stride 1 and group_size 2 for 8 replicas):

ir.replica_grouping(1, 2).assignment
[0, 0, 1, 1, 2, 2, 3, 3]

Group with stride 1 and group_size 4 for 8 replicas:

ir.replica_grouping(1, 4).assignment
[0, 0, 0, 0, 1, 1, 1, 1]

Group with stride 2 and group_size 4 for 8 replicas:

ir.replica_grouping(2, 4).assignment
[0, 1, 0, 1, 0, 1, 0, 1]

Group with stride 4 and group_size 2 for 8 replicas:

ir.replica_grouping(4, 2).assignment
[0, 1, 2, 3, 0, 1, 2, 3]
Parameters
  • stride (int) – The offset between elements in a replica group. Defaults to 1.

  • group_size (Optional[int]) – The number of replicas in each replica group. If not provided the group_size = ir.replication_factor // stride

Returns

An object describing the replica grouping.

Return type

ReplicaGrouping

replica_grouping_from_assignments(assignment)

Create a ReplicaGrouping object with an arbitrary replica group assignment. If a non-constant stride is provided, the ReplicaGrouping can be used for variable settings and cannot be used with GCL operations.

An example of a non-constant stride is the assignment [0, 1, 0, 0, 1, 1] as the strides for group 0 is [2, 1].

For more information about replica groupings see the docstring for Ir.replica_grouping

ir.replica_grouping([0, 0, 1, 1, 0, 0, 1, 1]).assignment
[0, 0, 1, 1, 0, 0, 1, 1]
Parameters

assignment (List[int]) – The group each replica is assigned to.

Returns

An object describing the replica grouping.

Return type

ReplicaGrouping

property replication_factor: int

Set the number of model replications.

For example, if your model requires 1 IPU, a replication_factor of 2 will replicate your model so that 2 IPUs are used. If your model is pipelined across 4 IPUs, a replication_factor of 4 will use 16 IPUs total. If the training is done across multiple instances then the replication_factor is the number of replicas for this instance.

When using distributed replication this will return the global replication factor.