18.1. Ir
- class popxl.Ir(replication=1)
-
- __init__(replication=1)
PopXL intermediate representation (IR).
An IR contains a main graph (property
main_graph
) and can create additional graphs using member methods such ascreate_graph()
andcreate_empty_graph()
.- Parameters
replication (Union[int, Literal['popdist']], optional) – Set the replication_factor of the IR. Value of ‘popdist’ configures the IR with settings from popdist/poprun. Defaults to 1.
- create_empty_graph(name=None)
Create a new graph.
- create_graph(fn, *args, **kwargs)
Create a graph from a Python callable
fn
or the build method of aModule
. The graph inputs are determined using the signature of the functionfn
and the supplied argumentsargs
andkwargs
. Tensors or TensorSpecs passed via the arguments are used to determine the shape and dtype of the graph inputs (the tensors are not actually passed to the graph). The graph outputs are determined using the outputs of the function when called.The order of inputs in the returned graph will be the same as the order of the tensor inputs in the function signature, the order of the kwargs and the order of called
popxl.graph_inputs
. This determines the order in which you pass the parent tensors as inputs at the callsite.The function
fn
can take any arguments. Any Tensor arguments are automatically detected. Any Tensor arguments inside a tuple, list,*arg
or**kwargs
are also detected.*args
,**kwargs
, lists cannot contain a mixture of tensors and other types. Nested lists or dicts of tensors are not supported.If an input is type hinted with
TensorByRef
orList[TensorByRef]
where appropriate in the signature offn
then the corresponding inputs will be passed by reference instead of by value when the graph is called.The output of
fn
must be either None, a Tensor or an iterable of Tensors.- Parameters
fn (Callable[..., Any]) – The Python function that defines the graph. The signature of
fn
with its arguments is used to determine the inputs of the graph.args (Any) – Arguments passed to the Python function that defines the graph that can be a mixture of tensors and other types. Tensors are used to determine the tensor info of the inputs.
kwargs (Any) – Keyword arguments passed to the Python function that defines the graph that can be a mixture of tensors and other types. Tensors are used to determine the tensor info of the inputs.
- Raises
TypeError – If
fn
is not a callable extending the popxl.Module or if any of the arguments listed in*args
mixes Tensors with other typesValueError – If the
*args
and**kwargs
don’t match the signature or if the output of a subgraph is not a Tensor, an iterable of Tensors or None.
- Returns
A graph that corresponds to the input Python function.
- Return type
- dot_checkpoint(check, save_dir=None)
Output a graphical representation of the graph in Graphviz DOT format.
Checkpoints can be activated by either setting the
dotChecks
option in session options or thePOPART_DOT_CHECKS
environmental variable. These should be set to the list of the checks to be activated. Note that if eitherdotChecks
orPOPART_DOT_CHECKS
is set toALL
, all checkpoints will be activated. See the PopART User Guide for more information.If no checkpoints are activated, this function will activate them all by setting the
dotChecks
option toALL
.
- get_all_d2h_streams()
Return all
DeviceToHostStream
in the IR which has a host_store op that streams along it.- Return type
- get_all_h2d_streams()
Return all
HostToDeviceStream
objects in the IR which have ahost_load
op that streams along it.- Return type
- property main_graph: popxl.graph.Graph
Every IR is initialised with a main graph. This method returns this graph.
- Returns
The main graph of the IR.
- Return type
- property num_host_transfers: int
Return the number of fwd-bwd iterations of the model that your Ir computes.
This property MUST be set before creating a
popxl.Session
.More concretely, if your Ir contains an input tensor
x
with shape (2, 5), and you expect that your Ir will stream this tensor a total of 4 times, and therefore you need to pass a buffer with shape (4, 2, 5) to eachsession.run()
call; then ir.num_host_transfers should equal 4. Note there will also be a replica dimension if using replication.Note there are no separate values for “batches per step” and “gradient accumulation”, as they are known in PopART’s ONNX API. If your Ir represents a batches per step of
bps
and a gradient accumulation factor ofaf
, then you should set num_host_transfers tobps * af
. There are no separate setters for the two values. There will only be a single “num_host_transfers” dimension in the buffer passed tosession.run
.
- replica_grouping(stride=1, group_size=None)
Create a
ReplicaGrouping
object.A
ReplicaGrouping
object represents a way in which replicas are grouped for the purpose of getting and setting variable values and collective operations.A grouping always exactly partitions a set of replicas, so every replica is exactly in one group. We specify these partitions with a
stride
andgroup_size
argument. Thestride
specifies the offset between replicas within a group and thegroup_size
specifies the number of replicas within a group.Group with
stride
1 andgroup_size
2 for 8 replicas):ir.replica_grouping(1, 2).assignment [0, 0, 1, 1, 2, 2, 3, 3]
Group with
stride
1 andgroup_size
4 for 8 replicas:ir.replica_grouping(1, 4).assignment [0, 0, 0, 0, 1, 1, 1, 1]
Group with
stride
2 andgroup_size
4 for 8 replicas:ir.replica_grouping(2, 4).assignment [0, 1, 0, 1, 0, 1, 0, 1]
Group with
stride
4 andgroup_size
2 for 8 replicas:ir.replica_grouping(4, 2).assignment [0, 1, 2, 3, 0, 1, 2, 3]
- Parameters
- Returns
An object describing the replica grouping.
- Return type
- replica_grouping_from_assignments(assignment)
Create a
ReplicaGrouping
object with an arbitrary replica group assignment. If a non-constant stride is provided, the ReplicaGrouping can be used for variable settings and cannot be used with GCL operations.An example of a non-constant stride is the assignment
[0, 1, 0, 0, 1, 1]
as the strides for group 0 is[2, 1]
.For more information about replica groupings see the docstring for
Ir.replica_grouping
ir.replica_grouping([0, 0, 1, 1, 0, 0, 1, 1]).assignment [0, 0, 1, 1, 0, 0, 1, 1]
- Parameters
assignment (List[int]) – The group each replica is assigned to.
- Returns
An object describing the replica grouping.
- Return type
- property replication_factor: int
Set the number of model replications.
For example, if your model requires 1 IPU, a
replication_factor
of 2 will replicate your model so that 2 IPUs are used. If your model is pipelined across 4 IPUs, areplication_factor
of 4 will use 16 IPUs total. If the training is done across multiple instances then thereplication_factor
is the number of replicas for this instance.When using distributed replication this will return the global replication factor.