19.1. Ir
- class popxl.Ir(replication=1)
-
- __init__(replication=1)
PopXL intermediate representation (IR).
An IR contains a main graph (property
main_graph) and can create additional graphs using member methods such ascreate_graph()andcreate_empty_graph().- Parameters
replication (Union[int, Literal['popdist']], optional) – Set the replication_factor of the IR. Value of ‘popdist’ configures the IR with settings from popdist/poprun. Defaults to 1.
- create_empty_graph(name=None)
Create a new graph.
- create_graph(fn, *args, **kwargs)
Create a graph from a Python callable
fnor the build method of aModule. The graph inputs are determined using the signature of the functionfnand the supplied argumentsargsandkwargs. Tensors or TensorSpecs passed via the arguments are used to determine the shape and dtype of the graph inputs (the tensors are not actually passed to the graph). The graph outputs are determined using the outputs of the function when called.The order of inputs in the returned graph will be the same as the order of the tensor inputs in the function signature, the order of the kwargs and the order of called
popxl.graph_inputs. This determines the order in which you pass the parent tensors as inputs at the callsite.The function
fncan take any arguments. Any Tensor arguments are automatically detected. Any Tensor arguments inside a tuple, list,*argor**kwargsare also detected.*args,**kwargs, lists cannot contain a mixture of tensors and other types. Nested lists or dicts of tensors are not supported.If an input is type hinted with
TensorByReforList[TensorByRef]where appropriate in the signature offnthen the corresponding inputs will be passed by reference instead of by value when the graph is called.The output of
fnmust be either None, a Tensor or an iterable of Tensors.- Parameters
fn (Callable[..., Any]) – The Python function that defines the graph. The signature of
fnwith its arguments is used to determine the inputs of the graph.args (Any) – Arguments passed to the Python function that defines the graph that can be a mixture of tensors and other types. Tensors are used to determine the tensor info of the inputs.
kwargs (Any) – Keyword arguments passed to the Python function that defines the graph that can be a mixture of tensors and other types. Tensors are used to determine the tensor info of the inputs.
- Raises
TypeError – If
fnis not a callable extending the popxl.Module or if any of the arguments listed in*argsmixes Tensors with other typesValueError – If the
*argsand**kwargsdon’t match the signature or if the output of a subgraph is not a Tensor, an iterable of Tensors or None.
- Returns
A graph that corresponds to the input Python function.
- Return type
- dot_checkpoint(check, save_dir=None)
Output a graphical representation of the graph in Graphviz DOT format.
Checkpoints can be activated by either setting the
dotChecksoption in session options or thePOPART_DOT_CHECKSenvironmental variable. These should be set to the list of the checks to be activated. Note that if eitherdotChecksorPOPART_DOT_CHECKSis set toALL, all checkpoints will be activated. See the PopART User Guide for more information.If no checkpoints are activated, this function will activate them all by setting the
dotChecksoption toALL.
- get_all_d2h_streams()
Return all
DeviceToHostStreamin the IR which has a host_store op that streams along it.- Return type
- get_all_h2d_streams()
Return all
HostToDeviceStreamobjects in the IR which have ahost_loadop that streams along it.- Return type
- property main_graph: popxl.graph.Graph
Every IR is initialised with a main graph. This method returns this graph.
- Returns
The main graph of the IR.
- Return type
- property num_host_transfers: int
Return the number of fwd-bwd iterations of the model that your Ir computes.
This property MUST be set before creating a
popxl.Session.More concretely, if your Ir contains an input tensor
xwith shape (2, 5), and you expect that your Ir will stream this tensor a total of 4 times, and therefore you need to pass a buffer with shape (4, 2, 5) to eachsession.run()call; then ir.num_host_transfers should equal 4. Note there will also be a replica dimension if using replication.Note there are no separate values for “batches per step” and “gradient accumulation”, as they are known in PopART’s ONNX API. If your Ir represents a batches per step of
bpsand a gradient accumulation factor ofaf, then you should set num_host_transfers tobps * af. There are no separate setters for the two values. There will only be a single “num_host_transfers” dimension in the buffer passed tosession.run.
- replica_grouping(stride=1, group_size=None)
Create a
ReplicaGroupingobject.A
ReplicaGroupingobject represents a way in which replicas are grouped for the purpose of getting and setting variable values and collective operations.A grouping always exactly partitions a set of replicas, so every replica is exactly in one group. We specify these partitions with a
strideandgroup_sizeargument. Thestridespecifies the offset between replicas within a group and thegroup_sizespecifies the number of replicas within a group.Group with
stride1 andgroup_size2 for 8 replicas):ir.replica_grouping(1, 2).assignment [0, 0, 1, 1, 2, 2, 3, 3]
Group with
stride1 andgroup_size4 for 8 replicas:ir.replica_grouping(1, 4).assignment [0, 0, 0, 0, 1, 1, 1, 1]
Group with
stride2 andgroup_size4 for 8 replicas:ir.replica_grouping(2, 4).assignment [0, 1, 0, 1, 0, 1, 0, 1]
Group with
stride4 andgroup_size2 for 8 replicas:ir.replica_grouping(4, 2).assignment [0, 1, 2, 3, 0, 1, 2, 3]
- Parameters
- Returns
An object describing the replica grouping.
- Return type
- replica_grouping_from_assignments(assignment)
Create a
ReplicaGroupingobject with an arbitrary replica group assignment. If a non-constant stride is provided, the ReplicaGrouping can be used for variable settings and cannot be used with GCL operations.An example of a non-constant stride is the assignment
[0, 1, 0, 0, 1, 1]as the strides for group 0 is[2, 1].For more information about replica groupings see the docstring for
Ir.replica_groupingir.replica_grouping([0, 0, 1, 1, 0, 0, 1, 1]).assignment [0, 0, 1, 1, 0, 0, 1, 1]
- Parameters
assignment (List[int]) – The group each replica is assigned to.
- Returns
An object describing the replica grouping.
- Return type
- property replication_factor: int
Set the number of model replications.
For example, if your model requires 1 IPU, a
replication_factorof 2 will replicate your model so that 2 IPUs are used. If your model is pipelined across 4 IPUs, areplication_factorof 4 will use 16 IPUs total. If the training is done across multiple instances then thereplication_factoris the number of replicas for this instance.When using distributed replication this will return the global replication factor.