- class popxl.Ir(replication=1)
PopXL intermediate representation (IR).
replication (Union[int, Literal['popdist']], optional) – Set the replication_factor of the IR. Value of ‘popdist’ configures the IR with settings from popdist/poprun. Defaults to 1.
Create a new graph.
- create_graph(fn, *args, **kwargs)
Create a graph from a Python callable
fnor the build method of a
Module. The graph inputs are determined using the signature of the function
fnand the supplied arguments
kwargs. Tensors or TensorSpecs passed via the arguments are used to determine the shape and dtype of the graph inputs (the tensors are not actually passed to the graph). The graph outputs are determined using the outputs of the function when called.
The order of inputs in the returned graph will be the same as the order of the tensor inputs in the function signature, the order of the kwargs and the order of called
popxl.graph_inputs. This determines the order in which you pass the parent tensors as inputs at the callsite.
fncan take any arguments. Any Tensor arguments are automatically detected. Any Tensor arguments inside a tuple, list,
**kwargsare also detected.
**kwargs, lists cannot contain a mixture of tensors and other types. Nested lists or dicts of tensors are not supported.
If an input is type hinted with
List[TensorByRef]where appropriate in the signature of
fnthen the corresponding inputs will be passed by reference instead of by value when the graph is called.
The output of
fnmust be either None, a Tensor or an iterable of Tensors.
fn (Callable[..., Any]) – The Python function that defines the graph. The signature of
fnwith its arguments is used to determine the inputs of the graph.
args (Any) – Arguments passed to the Python function that defines the graph that can be a mixture of tensors and other types. Tensors are used to determine the tensor info of the inputs.
kwargs (Any) – Keyword arguments passed to the Python function that defines the graph that can be a mixture of tensors and other types. Tensors are used to determine the tensor info of the inputs.
A graph that corresponds to the input Python function.
- Return type
- dot_checkpoint(check, save_dir=None)
Output a graphical representation of the graph in Graphviz DOT format.
Checkpoints can be activated by either setting the
dotChecksoption in session options or the
POPART_DOT_CHECKSenvironmental variable. These should be set to the list of the checks to be activated. Note that if either
POPART_DOT_CHECKSis set to
ALL, all checkpoints will be activated. See the PopART User Guide for more information.
If no checkpoints are activated, this function will activate them all by setting the
DeviceToHostStreamin the IR which has a host_store op that streams along it.
HostToDeviceStreamobjects in the IR which have a
host_loadop that streams along it.
- property main_graph: popxl.graph.Graph
Every IR is initialised with a main graph. This method returns this graph.
The main graph of the IR.
- Return type
- property num_host_transfers: int
Return the number of fwd-bwd iterations of the model that your Ir computes.
This property MUST be set before creating a
More concretely, if your Ir contains an input tensor
xwith shape (2, 5), and you expect that your Ir will stream this tensor a total of 4 times, and therefore you need to pass a buffer with shape (4, 2, 5) to each
session.run()call; then ir.num_host_transfers should equal 4. Note there will also be a replica dimension if using replication.
Note there are no separate values for “batches per step” and “gradient accumulation”, as they are known in PopART’s ONNX API. If your Ir represents a batches per step of
bpsand a gradient accumulation factor of
af, then you should set num_host_transfers to
bps * af. There are no separate setters for the two values. There will only be a single “num_host_transfers” dimension in the buffer passed to
- replica_grouping(stride=1, group_size=None)
A grouping always exactly partitions a set of replicas, so every replica is exactly in one group. We specify these partitions with a
stridespecifies the offset between replicas within a group and the
group_sizespecifies the number of replicas within a group.
group_size2 for 8 replicas):
ir.replica_grouping(1, 2).assignment [0, 0, 1, 1, 2, 2, 3, 3]
group_size4 for 8 replicas:
ir.replica_grouping(1, 4).assignment [0, 0, 0, 0, 1, 1, 1, 1]
group_size4 for 8 replicas:
ir.replica_grouping(2, 4).assignment [0, 1, 0, 1, 0, 1, 0, 1]
group_size2 for 8 replicas:
ir.replica_grouping(4, 2).assignment [0, 1, 2, 3, 0, 1, 2, 3]
An object describing the replica grouping.
- Return type
ReplicaGroupingobject with an arbitrary replica group assignment. If a non-constant stride is provided, the ReplicaGrouping can be used for variable settings and cannot be used with GCL operations.
An example of a non-constant stride is the assignment
[0, 1, 0, 0, 1, 1]as the strides for group 0 is
For more information about replica groupings see the docstring for
ir.replica_grouping([0, 0, 1, 1, 0, 0, 1, 1]).assignment [0, 0, 1, 1, 0, 0, 1, 1]
- property replication_factor: int
Set the number of model replications.
For example, if your model requires 1 IPU, a
replication_factorof 2 will replicate your model so that 2 IPUs are used. If your model is pipelined across 4 IPUs, a
replication_factorof 4 will use 16 IPUs total. If the training is done across multiple instances then the
replication_factoris the number of replicas for this instance.
When using distributed replication this will return the global replication factor.