18.13. Transforms

class popxl.transforms.ExpectedConnection

Create connection between tensors and gradient graph inputs or outputs.

GradGraphInfo contains the expected inputs and outputs for the gradient graph. The expected inputs and outputs are associated either with tensors in the forward graph or with gradients of tensors in the forward graph.

This class should not be constructed directly and it raises a TypeError exception if it is constructed directly.

Return type

None

__init__()

Construct the ExpectedConnection class.

Return type

None

property connection_type

Get the type of the connection.

property fwd_tensor: popxl.tensor.Tensor

Get the tensor this connection applies to.

class popxl.transforms.ExpectedConnectionType(value)

Expected connection type of gradient graph inputs and outputs.

The expected inputs and expected outputs of the gradient graph are associated either with tensors in the forward graph or with gradients of tensors in the forward graph.

Fwd = 'Fwd'

Expected input/output is a tensor of the forward graph.

FwdGrad = 'FwdGrad'

Expected input/output is the gradient of a tensor from the forward graph.

class popxl.transforms.GradGraphInfo

Create the result of the autodiff transform.

  • fwd_graph is the forward graph.

  • expected_inputs are tensors from the forward graph that are required as inputs to the gradient graph.

  • expected_outputs are tensors from the forward graph that have gradients as outputs of the gradient graph.

This class should not be constructed directly and it raises a TypeError exception if it is constructed directly.

Return type

None

__init__()

Construct for the GradGraphInfo class.

Return type

None

property expected_inputs: Tuple[popxl.transforms.autodiff.ExpectedConnection, ...]

Get information about all expected inputs of the gradient graph.

Inputs are tensors or their gradients from the forward graph that are required as inputs to the gradient graph.

property expected_outputs: Tuple[popxl.transforms.autodiff.ExpectedConnection, ...]

Get information about all expected outputs of the gradient graph.

Outputs are tensors from the forward graph that have gradients as outputs of the gradient graph.

property forward_graph

Get the forward graph that was differentiated.

fwd_graph_ins_to_grad_parent_outs(grad_call_info)

Return mapping between forward graph inputs and outputs of the parent of the gradient graph.

autodiff is applied to the forward graph. This method returns the mapping between the input tensors of the forward graph and the output tensors of the parent of the gradient graph.

Example:

# `module`: subgraph module, `x`: parent graph inputs, `x_dash`: gradient parent graph input
graph = ir.create_graph(module, x, out_features=16)  # Forward graph
call_info = ops.call_with_info(graph, x, inputs_dict={module.W: W, module.b: b})

grads_graph = popxl.transforms.autodiff(graph)
activations = ss_bwd_info.inputs_dict(call_info)
grads_call_info = ops.call_with_info(grads_graph, x_dash, inputs_dict=activations)

# Obtain a mapping between subgraph tensors that correspond to `x`, `W` and `b` and the corresponding parent gradient tensors outputs
grad_tensor_map = grads_graph.fwd_graph_ins_to_grad_parent_outs(grads_call_info)
assert all(t in graph for t in grad_tensor_map.keys())
assert all(t in main for t in grad_tensor_map.values())
assert [t.id for t in grad_tensor_map.keys()] == [
    "Module_subgraph(0)/x",
    "Module_subgraph(0)/W",
    "Module_subgraph(0)/b",
]
assert [t.id for t in grad_tensor_map.values()] == [
    "Gradient___x",
    "Gradient___W",
    "Gradient___b",
]
Parameters

grad_call_info (CallSiteInfo) – Call site information of the forward graph. This is the graph that autodiff is applied to. This can be accessed with popxl.ops.call_with_info()

Returns

A dictionary that maps from an input tensor in the forward graph to an output tensor in the parent of the gradient graph.

Return type

Dict[Tensor, Tensor]

fwd_parent_ins_to_grad_parent_outs(fwd_call_info, grad_call_info)

Return mapping between forward’s parent graph inputs and gradient’s parent graph outputs.

autodiff is applied to the forward graph. This method returns the mapping between the input tensors from the parent of the forward graph and the output tensors of the parent of the gradient graph.

Example:

# `module`: subgraph module, `x`: graph inputs, `x_dash`: gradient graph input
graph = ir.create_graph(module, x, out_features=16)  # Forward graph
call_info = ops.call_with_info(graph, x, inputs_dict={module.W: W, module.b: b})

grads_graph = popxl.transforms.autodiff(graph)
activations = ss_bwd_info.inputs_dict(call_info)
grads_call_info = ops.call_with_info(grads_graph, x_dash, inputs_dict=activations)

# Obtain a mapping between input tensors `x`, `W` and `b` and the corresponding gradient tensors
grad_tensor_map = grads_graph.fwd_parent_ins_to_grad_parent_outs(
    call_info, grads_call_info
)
assert [t.id for t in grad_tensor_map.keys()] == ["x", "W", "b"]
Parameters
Returns

A dictionary that maps from an input tensor in the parent of the forward graph to an output tensor in the parent of the gradient graph.

Return type

Dict[Tensor, Tensor]

property graph

Get the gradient graph.

property inputs: Tuple[popxl.tensor.Tensor, ...]

Get a tuple of all expected inputs for the gradient graph.

inputs_dict(fwd_call_info)

Create mapping between gradient graph inputs to parent graph inputs.

The mapping between the inputs to the gradient graph and the inputs to the parent graph is created from the forward graph’s call site information.

Note

This provides an easy way to pass activations to the gradient graph. It does not handle the gradient inputs of the gradient graph.

Example:

# `module`: subgraph module, `x`: parent graph inputs, `x_dash`: gradient graph parent input
graph = ir.create_graph(module, x, out_features=16)  # Forward graph
call_info = ops.call_with_info(graph, x, inputs_dict={module.W: W, module.b: b})

grads_graph = popxl.transforms.autodiff(graph)
activations = ss_bwd_info.inputs_dict(call_info)
grads_call_info = ops.call_with_info(grads_graph, x_dash, inputs_dict=activations)

This method raises a TypeError exception if the forward graph does not match the graph that was differentiated.

Parameters

fwd_call_info (CallSiteInfo) – Call site information of a call to the forward graph that was differentiated. This can be accessed with popxl.ops.call_with_info().

Raises

TypeError – If the called graph does not match the graph that was auto-differentiated.

Returns

A dictionary that maps from a tensor in the gradient graph to an input or output parent graph tensor at a call site of the corresponding forward graph.

Return type

Dict[Tensor, Tensor]

property outputs: Tuple[popxl.tensor.Tensor, ...]

Get a tuple of all expected outputs of the gradient graph.

popxl.transforms.autodiff(graph, grads_provided=None, grads_required=None, called_graphs_grad_info=None, return_all_grad_graphs=False)

Perform automatic differentiation of a graph.

graph will be differentiated using the chain rule starting from grads_provided. The outputs of the returned graph will be the gradients of the tensors in grads_required. By default, grads_provided will be all of the outputs of the forward graph and grads_required will be all of the inputs to the forward graph.

Any tensors in the forward graph that are needed to compute the gradients will be added as outputs to the forward graph (if not already an input or output).

The returned popxl.GradGraphInfo object contains information about the inputs (expected_inputs) and outputs (expected_outputs) of the gradient graph. These are lists of tuples where the first element is either popxl.ExpectedConnectionType.Fwd or popxl.ExpectedConnectionType.FwdGrad meaning the input or output is associated with a tensor in the forward graph, or the gradient of a tensor in the forward graph, respectively. The second element is a tensor of the forward graph itself. These tensors are guaranteed to be either inputs or outputs of the forward graph.

The expected_inputs list that describes the gradient graph’s inputs is guaranteed to start with popxl.ExpectedConnectionType.FwdGrad entries that exactly match the order of the entries in grads_provided.

expected_outputs describes the gradient graph’s outputs and is guaranteed to comprise only popxl.ExpectedConnectionType.FwdGrad entries and has entries that exactly match the size and order of grads_required.

Any subgraphs called from the forward graph will recursively have autodiff called on them. return_all_grad_graphs can be set to True to return information on all graphs that the autodiff transform has been recursively applied to. Then, called_graphs_grad_info can be used to pass these previously calculated gradients to autodiff for those subgraphs. By default, popxl.GradGraphInfo will only be returned for the input forward graph.

Parameters
  • graph (Graph) – The graph to differentiate. grads_provided (Optional[Iterable[popxl.Tensor]], optional): The list of gradients that are available for autodiff to use. Defaults to all outputs of the graph.

  • grads_required (Optional[Iterable[popxl.Tensor]], optional) – The list of inputs of the forward graph for which gradients are required. Defaults to all inputs of the graph.

  • grads_provided (Optional[Iterable[popxl.Tensor]], optional) – The list of inputs of the forward graph for which gradients are provided. Defaults to all inputs of the graph.

  • called_graphs_grad_info (Optional[Mapping[Graph,GradGraphInfo]], optional) – The gradient graph information for the subgraphs that the autodiff transform has been recursively applied to. Defaults to None.

  • return_all_grad_graphs (bool, optional) – Indicates whether to return the gradient graph information for all the subgraphs that the autodiff transform has been recursively applied to (True) or to only return the gradient graph information for graph (False). Defaults to False.

Returns

Information about the gradient graph.

Return type

popxl.GradGraphInfo

popxl.transforms.decompose_sum(graph)

Transform the input Graph by decomposing Sum operations with >2 inputs into a liveness optimal tree of additions.

Parameters

graph (Graph) – The graph to decompose sum.

Returns

decomposed gradient sum graph.

Return type

decomposed_graph (popxl.Graph)

popxl.transforms.io_tile_exchange(verify_overlap=True)

Combine io_tiles, merge_exchange and in_sequence(False).

Parameters

verify_overlap (bool, optional) – Verify only one Operation remains after the context closes. This is an important requirement for overlapping IO and Compute. Defaults to True.

Raises

RuntimeError – If more than one Op remains after io_tile_exchange.

popxl.transforms.merge_exchange()

Combine RemoteLoad/RemoteStore/HostLoad/HostStore operations into a single MergeExchange operation. This guarantees that any external synchronisation for these operations are merged allowing for the operations to execute in parallel.

Only applies to operations on the current graph. Used as a contextmanager:

with popxl.merge_exchange():
    ops.host_load(...)
    ops.host_store(...)

Note: Operations must be able to be scheduled in any order to be merged. For this reason it is recommended to combine with with popxl.in_sequence(False) to avoid topological constraints that would prevent merging. Related: io_tile_exchange().