19.13. Transforms
- class popxl.transforms.ExpectedConnection
Create connection between tensors and gradient graph inputs or outputs.
GradGraphInfo
contains the expected inputs and outputs for the gradient graph. The expected inputs and outputs are associated either with tensors in the forward graph or with gradients of tensors in the forward graph.This class should not be constructed directly and it raises a
TypeError
exception if it is constructed directly.- Return type
None
- __init__()
Construct the
ExpectedConnection
class.- Return type
None
- property connection_type
Get the type of the connection.
- property fwd_tensor: popxl.tensor.Tensor
Get the tensor this connection applies to.
- class popxl.transforms.ExpectedConnectionType(value)
Expected connection type of gradient graph inputs and outputs.
The expected inputs and expected outputs of the gradient graph are associated either with tensors in the forward graph or with gradients of tensors in the forward graph.
- Fwd = 'Fwd'
Expected input/output is a tensor of the forward graph.
- FwdGrad = 'FwdGrad'
Expected input/output is the gradient of a tensor from the forward graph.
- class popxl.transforms.GradGraphInfo
Create the result of the
autodiff
transform.fwd_graph
is the forward graph.expected_inputs
are tensors from the forward graph that are required as inputs to the gradient graph.expected_outputs
are tensors from the forward graph that have gradients as outputs of the gradient graph.
This class should not be constructed directly and it raises a
TypeError
exception if it is constructed directly.- Return type
None
- __init__()
Construct for the
GradGraphInfo
class.- Return type
None
- property expected_inputs: Tuple[popxl.transforms.autodiff.ExpectedConnection, ...]
Get information about all expected inputs of the gradient graph.
Inputs are tensors or their gradients from the forward graph that are required as inputs to the gradient graph.
- property expected_outputs: Tuple[popxl.transforms.autodiff.ExpectedConnection, ...]
Get information about all expected outputs of the gradient graph.
Outputs are tensors from the forward graph that have gradients as outputs of the gradient graph.
- property forward_graph
Get the forward graph that was differentiated.
- fwd_graph_ins_to_grad_parent_outs(grad_call_info)
Return mapping between forward graph inputs and outputs of the parent of the gradient graph.
autodiff
is applied to the forward graph. This method returns the mapping between the input tensors of the forward graph and the output tensors of the parent of the gradient graph.Example:
# `module`: subgraph module, `x`: parent graph inputs, `x_dash`: gradient parent graph input graph = ir.create_graph(module, x, out_features=16) # Forward graph call_info = ops.call_with_info(graph, x, inputs_dict={module.W: W, module.b: b}) grads_graph = popxl.transforms.autodiff(graph) activations = ss_bwd_info.inputs_dict(call_info) grads_call_info = ops.call_with_info(grads_graph, x_dash, inputs_dict=activations) # Obtain a mapping between subgraph tensors that correspond to `x`, `W` and `b` and the corresponding parent gradient tensors outputs grad_tensor_map = grads_graph.fwd_graph_ins_to_grad_parent_outs(grads_call_info) assert all(t in graph for t in grad_tensor_map.keys()) assert all(t in main for t in grad_tensor_map.values()) assert [t.id for t in grad_tensor_map.keys()] == [ "Module_subgraph(0)/x", "Module_subgraph(0)/W", "Module_subgraph(0)/b", ] assert [t.id for t in grad_tensor_map.values()] == [ "Gradient___x", "Gradient___W", "Gradient___b", ]
- Parameters
grad_call_info (CallSiteInfo) – Call site information of the forward graph. This is the graph that
autodiff
is applied to. This can be accessed withpopxl.ops.call_with_info()
- Returns
A dictionary that maps from an input tensor in the forward graph to an output tensor in the parent of the gradient graph.
- Return type
- fwd_parent_ins_to_grad_parent_outs(fwd_call_info, grad_call_info)
Return mapping between forward’s parent graph inputs and gradient’s parent graph outputs.
autodiff
is applied to the forward graph. This method returns the mapping between the input tensors from the parent of the forward graph and the output tensors of the parent of the gradient graph.Example:
# `module`: subgraph module, `x`: graph inputs, `x_dash`: gradient graph input graph = ir.create_graph(module, x, out_features=16) # Forward graph call_info = ops.call_with_info(graph, x, inputs_dict={module.W: W, module.b: b}) grads_graph = popxl.transforms.autodiff(graph) activations = ss_bwd_info.inputs_dict(call_info) grads_call_info = ops.call_with_info(grads_graph, x_dash, inputs_dict=activations) # Obtain a mapping between input tensors `x`, `W` and `b` and the corresponding gradient tensors grad_tensor_map = grads_graph.fwd_parent_ins_to_grad_parent_outs( call_info, grads_call_info ) assert [t.id for t in grad_tensor_map.keys()] == ["x", "W", "b"]
- Parameters
fwd_call_info (CallSiteInfo) – Call site information of the forward graph that was differentiated. This can be accessed with
popxl.ops.call_with_info()
grad_call_info (CallSiteInfo) – Call site information of the associated gradient graph. This can be accessed with
popxl.ops.call_with_info()
- Returns
A dictionary that maps from an input tensor in the parent of the forward graph to an output tensor in the parent of the gradient graph.
- Return type
- property graph
Get the gradient graph.
- property inputs: Tuple[popxl.tensor.Tensor, ...]
Get a tuple of all expected inputs for the gradient graph.
- inputs_dict(fwd_call_info)
Create mapping between gradient graph inputs to parent graph inputs.
The mapping between the inputs to the gradient graph and the inputs to the parent graph is created from the forward graph’s call site information.
Note
This provides an easy way to pass activations to the gradient graph. It does not handle the gradient inputs of the gradient graph.
Example:
# `module`: subgraph module, `x`: parent graph inputs, `x_dash`: gradient graph parent input graph = ir.create_graph(module, x, out_features=16) # Forward graph call_info = ops.call_with_info(graph, x, inputs_dict={module.W: W, module.b: b}) grads_graph = popxl.transforms.autodiff(graph) activations = ss_bwd_info.inputs_dict(call_info) grads_call_info = ops.call_with_info(grads_graph, x_dash, inputs_dict=activations)
This method raises a
TypeError
exception if the forward graph does not match the graph that was differentiated.- Parameters
fwd_call_info (CallSiteInfo) – Call site information of a call to the forward graph that was differentiated. This can be accessed with
popxl.ops.call_with_info()
.- Raises
TypeError – If the called graph does not match the graph that was auto-differentiated.
- Returns
A dictionary that maps from a tensor in the gradient graph to an input or output parent graph tensor at a call site of the corresponding forward graph.
- Return type
- property outputs: Tuple[popxl.tensor.Tensor, ...]
Get a tuple of all expected outputs of the gradient graph.
- popxl.transforms.autodiff(graph, grads_provided=None, grads_required=None, called_graphs_grad_info=None, return_all_grad_graphs=False)
Perform automatic differentiation of a graph.
graph
will be differentiated using the chain rule starting fromgrads_provided
. The outputs of the returned graph will be the gradients of the tensors ingrads_required
. By default,grads_provided
will be all of the outputs of the forward graph andgrads_required
will be all of the inputs to the forward graph.Any tensors in the forward graph that are needed to compute the gradients will be added as outputs to the forward graph (if not already an input or output).
The returned
popxl.GradGraphInfo
object contains information about the inputs (expected_inputs
) and outputs (expected_outputs
) of the gradient graph. These are lists of tuples where the first element is eitherpopxl.ExpectedConnectionType.Fwd
orpopxl.ExpectedConnectionType.FwdGrad
meaning the input or output is associated with a tensor in the forward graph, or the gradient of a tensor in the forward graph, respectively. The second element is a tensor of the forward graph itself. These tensors are guaranteed to be either inputs or outputs of the forward graph.The
expected_inputs
list that describes the gradient graph’s inputs is guaranteed to start withpopxl.ExpectedConnectionType.FwdGrad
entries that exactly match the order of the entries ingrads_provided
.expected_outputs
describes the gradient graph’s outputs and is guaranteed to comprise onlypopxl.ExpectedConnectionType.FwdGrad
entries and has entries that exactly match the size and order ofgrads_required
.Any subgraphs called from the forward graph will recursively have
autodiff
called on them.return_all_grad_graphs
can be set toTrue
to return information on all graphs that theautodiff
transform has been recursively applied to. Then,called_graphs_grad_info
can be used to pass these previously calculated gradients toautodiff
for those subgraphs. By default,popxl.GradGraphInfo
will only be returned for the input forward graph.- Parameters
graph (Graph) – The graph to differentiate. grads_provided (Optional[Iterable[popxl.Tensor]], optional): The list of gradients that are available for
autodiff
to use. Defaults to all outputs of the graph.grads_required (Optional[Iterable[popxl.Tensor]], optional) – The list of inputs of the forward graph for which gradients are required. Defaults to all inputs of the graph.
grads_provided (Optional[Iterable[popxl.Tensor]], optional) – The list of inputs of the forward graph for which gradients are provided. Defaults to all inputs of the graph.
called_graphs_grad_info (Optional[Mapping[Graph,GradGraphInfo]], optional) – The gradient graph information for the subgraphs that the
autodiff
transform has been recursively applied to. Defaults to None.return_all_grad_graphs (bool, optional) – Indicates whether to return the gradient graph information for all the subgraphs that the
autodiff
transform has been recursively applied to (True
) or to only return the gradient graph information forgraph
(False
). Defaults toFalse
.
- Returns
Information about the gradient graph.
- Return type
popxl.GradGraphInfo
- popxl.transforms.decompose_sum(graph)
Transform the input Graph by decomposing Sum operations with >2 inputs into a liveness optimal tree of additions.
- Parameters
graph (Graph) – The graph to decompose sum.
- Returns
decomposed gradient sum graph.
- Return type
decomposed_graph (
popxl.Graph
)
- popxl.transforms.io_tile_exchange(verify_overlap=True)
Combine io_tiles, merge_exchange and in_sequence(False).
- Parameters
verify_overlap (bool, optional) – Verify only one Operation remains after the context closes. This is an important requirement for overlapping IO and Compute. Defaults to True.
- Raises
RuntimeError – If more than one Op remains after io_tile_exchange.
- popxl.transforms.merge_exchange()
Combine RemoteLoad/RemoteStore/HostLoad/HostStore operations into a single MergeExchange operation. This guarantees that any external synchronisation for these operations are merged allowing for the operations to execute in parallel.
Only applies to operations on the current graph. Used as a contextmanager:
with popxl.merge_exchange(): ops.host_load(...) ops.host_store(...)
Note: Operations must be able to be scheduled in any order to be merged. For this reason it is recommended to combine with
with popxl.in_sequence(False)
to avoid topological constraints that would prevent merging. Related:io_tile_exchange()
.