18.15. Ops available in PopXL
- class popxl.ops.CallSiteInfo(subgraph_op)
Information relating to a parent graph calling a subgraph, for example using a call op or repeat op.
This is a convenience class for extracting information about the callsite and its subgraph.
- Parameters
subgraph_op (Union[CallOp, LoopOp]) –
- property called_graph: popxl.graph.Graph
Get the called graph.
- graph_to_parent(graph_tensor)
Get the tensor in the parent graph using the tensor in the called graph.
Both input and output tensors can be used
- Parameters
graph_tensor (Tensor) – The tensor in the called graph.
- Raises
ValueError – If
graph_tensor
is not an input or output of the called graph.- Returns
The associated input or output tensor on the
CallOp
.- Return type
- graph_to_parent_input_index(idx)
Get the parent graph input tensor index given the graph input tensor index.
- graph_to_parent_output_index(idx)
Get the parent graph output tensor index given the graph output tensor index.
- property inputs: Tuple[popxl.tensor.Tensor, ...]
Get the parent graph inputs.
- Returns
Tuple[Tensor, …]
- property outputs: Tuple[popxl.tensor.Tensor, ...]
Get the parent graph outputs.
- Returns
Tuple[Tensor, …]
- parent_input(idx)
Get the parent graph input tensor at a given index.
- parent_output(idx)
Get the parent graph output tensor at a given index.
- parent_to_graph(parent_tensor)
Get the input tensor in the called graph using the input tensor in the parent graph.
If the
parent_tensor
has been used multiple times as an input only the first instance is returned.- Parameters
parent_tensor (Tensor) – The tensor from the parent graph.
- Raises
ValueError – If
parent_tensor
is not an input to the CallOp.- Returns
The tensor in the
called_graph
.- Return type
- parent_to_graph_input_index(idx)
Get the graph input tensor index given the parent graph input tensor index.
- parent_to_graph_output_index(idx)
Get the graph output tensor index given the parent graph output tensor index.
- set_parent_input_modified(parent_tensor, infer_modified_regions=False)
Specify that the parent graph’s input tensor
parent_tensor
is modified by the call op.This will guarantee that any modification to the graph input during the execution of the called graph will also change
parent_tensor
.
- popxl.ops.abs(t)
Compute the absolute value of each element of the input tensor.
See also PyTorch Tensor.abs.
- popxl.ops.add(lhs, rhs)
Add two tensors elementwise.
Follows NumPy broadcasting rules. Arguments must have the same dtype.
See also PyTorch Tensor.add, NumPy add, ONNX Add.
- popxl.ops.add_(lhs, rhs)
Add two tensors elementwise in place, in the lhs tensor. Follows NumPy broadcasting rules. Arguments must have the same dtype.
Note: There is no operation that adds to the rhs tensor in place. Use add_(rhs, lhs) or rhs += lhs for the same functionality.
See also PyTorch Tensor.add_.
- popxl.ops.argmax(t, dim=0, keepdim=False)
Compute the argmax of a tensor.
Compute the indices of the max elements of the input tensor’s element along the provided axis. The resulting tensor has the same rank as the input if keepdim is True. If keepdim is False, then the resulting tensor has the reduced dimension pruned.
See also PyTorch Tensor.argmax, NumPy argmax, ONNX ArgMax.
- popxl.ops.argmin(t, dim=0, keepdim=False)
Compute the argmin of a tensor.
Compute the indices of the min elements of the input tensor’s element along the provided axis. The resulting tensor has the same rank as the input if keepdim is True. If keepdim is False, then the resulting tensor has the reduced dimension pruned.
See also PyTorch Tensor.argmin, NumPy argmin, ONNX ArgMin.
- popxl.ops.average_pool(t, kernel_size, stride=None, padding=None, out_pads=None, dilation=None, in_dilations=None, auto_pad='not_set', ceil_mode=False)
Average pool a tensor.
average_pool consumes an input tensor
t
and applies average pooling across the tensor according to kernel sizes, stride sizes, and pad lengths. Average pooling consisting of computing the average on all values of a subset of the input tensor according to the kernel size and downsampling the data into the output tensor Y for further processing.- Parameters
t (Tensor) –
Input data tensor from previous layer.
If the input is a 3D tensor, the size is (N, C, L), where:
N is the batch size,
C is the number of channel,
L is the length.
If the input is a 2D image, the size is (N, C, H, W), where:
N is the batch size,
C is the number of channel,
H and W are the height and width.
If the input is a 3D, image the size is (N, C, D, H, W), where:
N is the batch size,
C is the number of channel, D is the depth,
H and W are the height and width.
kernel_size (Tuple[int]) – The size of the kernel along each axis.
stride (Tuple[int]) – Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.
padding (Tuple[int]) –
Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0.
The value represents the number of pixels added to the beginning and end part of the corresponding axis.
padding
format should be as follows: [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axisi
and xi_end, the number of pixels added at the end of axisi
.out_pads (Tuple[int]) – The output padding for pooling.
dilation (Tuple[int]) – dilation value along each spatial axis of the filter.
in_dilations (Tuple[int]) – The input dilations attributes along each spatial axis of the filter.
auto_pad (Literal) – auto_pad must be either “not_set”, “same_upper”, “same_lower” or “valid”. The default value is “not_set”, which means explicit padding is used. “same_upper” or “same_lower” mean pad the input so that
output_shape[i] = ceil(input_shape[i] / strides[i])
for each axisi
. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In the case that the padding is an odd number, the extra padding is added at the end for “same_upper” and at the beginning for “same_lower”.ceil_mode (bool) – When True, will use ceil instead of floor to compute the output shape.
- Returns
Output data tensor from average pooling across the input tensor. Dimensions will vary based on various kernel, stride, and pad sizes. Floor value of the dimension is used.
- Return type
- popxl.ops.batch_norm_inference(t, scale, bias, mean, var, epsilon=1e-05, momentum=0.9)
Apply batch normalisation to a tensor in an inference setting.
For more details, refer to the paper Group Normalization.
- Parameters
t (Tensor) – Tensor to be normalized.
scale (Tensor) – Tensor used to scale the result of normalisation.
bias (Tensor) – Tensor used to shift the result of normalisation.
mean (Tensor) – Mean estimate.
var (Tensor) – Variance estimate.
epsilon (float) – small quantity for avoidance of div-by-zero when variance is zero.
momentum (float) – coefficient for the exponential moving average (not used in inference).
- Returns
The batch normalised tensor.
- Return type
- popxl.ops.call(graph, *inputs, inputs_dict=None)
Call a graph.
The
inputs
andinputs_dict
tensors are passed as graph inputs. You can specify an input either positionally usinginputs
or via a tensor map usinginputs_dict
.Graph inputs are determined when the graph was created using
ir.create_graph(callable, ...)
. The order of inputs in will be the same as the order of the tensor inputs in the function signature and the order of calledpopxl.graph_inputs
.See
create_graph()
for more information.- Parameters
graph (Graph) – The graph to call.
*inputs (Union[TensorLike, Iterable[TensorLike]]) – Provide inputs via position.
inputs_dict (Optional[Mapping[Tensor, TensorLike]]) – Provide inputs via graph tensor. Mapping of
graph tensor -> parent tensor
.inputs (Union[Tensor, int, float, bool, ndarray, Iterable[Union[int, float, bool]], Iterable[Union[Tensor, int, float, bool, ndarray, Iterable[Union[int, float, bool]]]]]) –
- Returns
Tuple of the output tensors of the call in the parent graph.
- Return type
Tuple[Tensor, …]
- popxl.ops.call_with_info(graph, *inputs, inputs_dict=None, check_inputs=True)
Call a graph and return information about the call site.
The
inputs
andinputs_dict
tensors are passed as graph inputs. You can specify an input either positionally usinginputs
or via a tensor map usinginputs_dict
. This op returnsCallSiteInfo
that can be used to inspect call site inputs/outputs.Graph inputs are determined when the graph was created using
ir.create_graph(callable, ...)
.The order of inputs will be the same as the order of the tensor inputs in the function signature and the order of called
popxl.graph_inputs
.See
create_graph()
for more information.- Parameters
graph (Graph) – The graph to call.
*inputs (Union[TensorLike, Iterable[TensorLike]]) – Provide inputs via position.
inputs_dict (Optional[Mapping[Tensor, TensorLike]]) – Provide inputs via graph tensor. Mapping of
graph tensor -> parent tensor
.check_inputs (bool) – Check when called if all inputs have been provided. Defaults to True.
inputs (Union[Tensor, int, float, bool, ndarray, Iterable[Union[int, float, bool]], Iterable[Union[Tensor, int, float, bool, ndarray, Iterable[Union[int, float, bool]]]]]) –
- Raises
ValueError – A ValueError will be raised if: - An incorrect number of inputs have been provided - A parent input tensor is not in the parent graph - A graph input tensor is specified twice
TypeError – A TypeError will be raised if: - Graph input tensor is specified twice - If a graph input cannot be coerced into a tensor
- Returns
- CallSiteInfo
Information on the created callsite.
- Return type
info
- popxl.ops.cast(t, data_type)
Cast a tensor to a specific data type.
This operation casts tensor
t
to data typedata_type
.See also ONNX Cast.
- Parameters
t (Tensor) – The tensor to be cast.
data_type (popxl.dtypes.dtype) – The dtype to cast to.
- Returns
The tensor cast to the specified type.
- Return type
- popxl.ops.cast_then_pow2scale(t, log2_scale, data_type)
Add a fused operation
cast(X, dtype) * pow2(log2_scale)
to cast from floating point 8 type.See the PopXL documentation on floating point 8 types for more details.
- Parameters
- Raises
TypeError – If
data_type
is not of type float16 or float32.- Returns
The converted float16 or float32 tensor.
- Return type
- popxl.ops.concat(ts, axis=0)
Concatenate tensors along an axis. The result will be copied to a new tensor.
See also ONNX Concat.
- popxl.ops.concat_(ts, axis=0)
Concatenate tensors along an axis.
The result will alias both of the input tensors.
- popxl.ops.conditional(cond, then_branch, else_branch, then_inputs=None, else_inputs=None, then_inputs_dict=None, else_inputs_dict=None)
Execute
then_branch
orelse_branch
according to the value of tensorcond
at runtime.The
then/else_inputs
andthen/else_inputs_dict
tensors are passed as then/else_branch inputs. You can specify a then/else_input either positionally usingthen/else_inputs
or via a tensor map usingthen/else_inputs_dict
.Graph inputs are determined when the graph was created using
ir.create_graph(callable, ...)
.The order of inputs will be the same as the order of the tensor inputs in the function signature and the order of called
popxl.graph_inputs
.See
create_graph()
for more information.- Parameters
cond (Tensor) – A boolean single-value tensor. If true the then_branch is executed otherwise the else_branch is executed.
then_branch (Graph) – Graph to run if condition is true.
else_branch (Graph) – Graph to run if condition is false.
then_inputs (Optional[Iterable[Union[Tensor, Iterable[Tensor]]]]) – Provide inputs to then_branch via position,
then_inputs
follow the same rules asinputs
incall
andrepeat
op.else_inputs (Optional[Iterable[Union[Tensor, Iterable[Tensor]]]]) – Provide inputs to else_branch via position,
else_inputs
follow the same rules asinputs
incall
andrepeat
op.then_inputs_dict (Optional[Mapping[Tensor, Tensor]]) – Provide inputs to then_branch via a tensor map. Mapping of
graph tensor -> parent tensor
,then_inputs_dict
follow the same rules asinputs_dict
incall
andrepeat
op.else_inputs_dict (Optional[Mapping[Tensor, Tensor]]) –
else_inputs_dict
follow the same rules asinputs_dict
incall
andrepeat
op.
- Raises
ValueError – If: - An incorrect number of inputs have been provided. - A parent input tensor is not in the parent graph. - A graph input tensor is specified twice.
TypeError – If: - A graph input tensor is specified twice. - A graph input cannot be coerced into a tensor.
- Returns
The values that are live after the execution of the conditional. The return values in
then_branch
andelse_branch
must be of the same data type. The number of the outputs inthen_branch
andelse_branch
must be equal. The shape of the input and outputs inthen_branch
andelse_branch
must also be the same.- Return type
Tuple[Tensor, …]
- popxl.ops.conditional_with_info(cond, then_branch, else_branch, then_inputs=None, else_inputs=None, then_inputs_dict=None, else_inputs_dict=None, check_inputs=True)
Execute
then_branch
orelse_branch
according to the value of tensorcond
at runtime and return the call site info.The
then/else_inputs
andthen/else_inputs_dict
tensors are passed as then/else_branch inputs. You can specify a then/else_input either positionally usingthen/else_inputs
or via a tensor map usingthen/else_inputs_dict
.Graph inputs are determined when the graph was created using
ir.create_graph(callable, ...)
.The order of inputs will be the same as the order of the tensor inputs in the function signature and the order of called
popxl.graph_inputs
.See
create_graph()
for more information.- Parameters
cond (Tensor) – A boolean single-value tensor. If true the then_branch is executed otherwise the else_branch is executed.
then_branch (Graph) – Graph to run if condition is true.
else_branch (Graph) – Graph to run if condition is false.
then_inputs (Optional[Iterable[Union[Tensor, Iterable[Tensor]]]]) – Provide inputs to then_branch via position,
then_inputs
follow the same rules asinputs
incall
andrepeat
op.else_inputs (Optional[Iterable[Union[Tensor, Iterable[Tensor]]]]) – Provide inputs to else_branch via position,
else_inputs
follow the same rules asinputs
incall
andrepeat
op.then_inputs_dict (Optional[Mapping[Tensor, Tensor]]) – Provide inputs to then_branch via a tensor map. Mapping of
graph tensor -> parent tensor
,then_inputs_dict
follow the same rules asinputs_dict
incall
andrepeat
op.else_inputs_dict (Optional[Mapping[Tensor, Tensor]]) –
else_inputs_dict
follow the same rules asinputs_dict
incall
andrepeat
op.check_inputs (bool) – Check when called if all inputs have been provided to both graphs. Defaults to True.
- Raises
ValueError – If: - An incorrect number of inputs have been provided. - A parent input tensor is not in the parent graph. - A graph input tensor is specified twice.
TypeError – If: - A graph input tensor is specified twice. - A graph input cannot be coerced into a tensor.
- Returns
Information on the created conditional site.
- Return type
ConditionalSiteInfo
- popxl.ops.conv(t, weight, stride=(1, 1), padding=(0, 0, 0, 0), dilation=(1, 1), groups=1, pad_type='not_set', available_memory_proportions=None, partials_types=None, enable_conv_dithering=None)
Use the convolution operator on a tensor.
The convolution operator consumes an input tensor and a filter, and computes the output.
- Parameters
t (Tensor) – Input data tensor from previous layer; If the input is a 3D tensor, the size is (N, C, L), where N is the batch size, C is the number of channel, L is the length; If the input is a 2D image, the size is (N, C, H, W), where N is the batch size, C is the number of channel, H and W are the height and width; If the input is a 3D image, the size is (N, C, D, H, W), where N is the batch size, C is the number of channel, D is the depth, H and W are the height and width.
weight (Tensor) – The weight tensor that will be used in the convolutions; If the input is a 3D tensor, the weight size is (M, C/group, k), where C is the number of channels, k is the length of the kernel, M is the number of feature maps. If the input is a 2D image, the weight size is (M, C/group, kH, kW), where C is the number of channels, kH and kW are the height and width of the kernel, M is the number of feature maps. If the input is a 3D image, the weight size is (M, C/group, kD, kH, kW), where C is the number of channels, kD, kH and kW are the depth, height and width of the kernel, M is the number of feature maps.
stride (Tuple[int]) – Stride along each spatial axis.
padding (Tuple[int]) – Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis.
pads
format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axisi
and xi_end, the number of pixels added at the end of axisi
.dilation (Tuple[int]) – dilation value along each spatial axis of the filter.
groups (int(default is 1)) – number of groups input channels and output channels are divided into.
pad_type (PadType(default is not_set)) – pad_type must be either “not_set”, “same_upper”, “same_lower” or “valid”. The default value is “not_set”, which means explicit padding is used. “same_upper” or “same_lower” mean pad the input so that
output_shape[i] = ceil(input_shape[i] / strides[i])
for each axisi
. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In the case that the padding is an odd number, the extra padding is added at the end for “same_upper” and at the beginning for “same_lower”.available_memory_proportions (List[float]) – The available memory proportions per conv, each [0, 1).
partials_types (List[str]) – The partials type per convolution, choose between half and float.
enable_conv_dithering (List[int]) – Enable convolution dithering per convolution. If true, then convolutions with different parameters will be laid out from different tiles in an effort to improve tile balance in models.
- Returns
A tensor that contains the result of the convolution. The output dimensions are functions of the kernel size, stride size, and pad lengths.
- Return type
- popxl.ops.conv_pow2scaled(t, weight, log2_scale, stride=(1, 1), padding=(0, 0, 0, 0), dilation=(1, 1), groups=1, pad_type='not_set', available_memory_proportions=None, enable_conv_dithering=None)
Perform a scaled convolution on a float8 tensor.
The convolution operator consumes an input tensor, a filter and computes the output. The dtype of the input tensor and filter must be one of
popxl.float8_143
orpopxl.float8_152
.The result of the convolution is scaled by
pow2(log2_scale)
before it is converted to float16.The
log2_scale
must be a scalar tensor of typepopxl.int32
and contain a runtime value in the range[-32, 32)
- Parameters
t (Tensor) – Input data tensor from previous layer of type either
popxl.float8_143
orpopxl.float8_152
; If the input is a 3D tensor, the size is (N, C, L), where N is the batch size, C is the number of channel, L is the length; If the input is a 2D image, the size is (N, C, H, W), where N is the batch size, C is the number of channel, H and W are the height and width; If the input is a 3D image, the size is (N, C, D, H, W), where N is the batch size, C is the number of channel, D is the depth, H and W are the height and width.weight (Tensor) – The weight tensor that will be used in the convolutions of type either
popxl.float8_143
orpopxl.float8_152
; If the input is a 3D tensor, the weight size is (M, C/group, k), where C is the number of channels, k is the length of the kernel, M is the number of feature maps. If the input is a 2D image, the weight size is (M, C/group, kH, kW), where C is the number of channels, kH and kW are the height and width of the kernel, M is the number of feature maps. If the input is a 3D image, the weight size is (M, C/group, kD, kH, kW), where C is the number of channels, kD, kH and kW are the depth, height and width of the kernel, M is the number of feature maps.log2_scale (Tensor) – 32-bit integer power-of-two exponent, where the convolution output is multiplied by
pow2(log2_scale)
before conversion to float16.stride (Tuple[int]) – Stride along each spatial axis.
padding (Tuple[int]) – Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis.
pads
format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axisi
and xi_end, the number of pixels added at the end of axisi
.dilation (Tuple[int]) – dilation value along each spatial axis of the filter.
groups (int(default is 1)) – number of groups input channels and output channels are divided into.
pad_type (PadType(default is not_set)) – pad_type must be either “not_set”, “same_upper”, “same_lower” or “valid”. The default value is “not_set”, which means explicit padding is used. “same_upper” or “same_lower” mean pad the input so that
output_shape[i] = ceil(input_shape[i] / strides[i])
for each axisi
. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In the case that the padding is an odd number, the extra padding is added at the end for “same_upper” and at the beginning for “same_lower”.available_memory_proportions (List[float]) – The available memory proportions per conv, each [0, 1).
enable_conv_dithering (List[int]) – Enable convolution dithering per convolution. If true, then convolutions with different parameters will be laid out from different tiles in an effort to improve tile balance in models.
- Returns
A tensor that contains the result of the convolution of type
popxl.float16
. The output dimensions are functions of the kernel size, stride size, and pad lengths.- Return type
- Raises
TypeError – If the tensor or weight tensors do not have a dtype in
{popxl.float8_143, popxl.float8_152}
, or if thelog2_scale
tensor does not have dtypepopxl.int32
ValueError – If
log2_scale
is not a scalar tensor.
- popxl.ops.conv_transpose(t, weight, stride=(1, 1), padding=(0, 0, 0, 0), dilation=(1, 1), groups=1, pad_type='not_set', output_padding=(), output_shape=(), available_memory_proportions=None, partials_types=None, enable_conv_dithering=None)
Perform a convolution transpose operation on a tensor.
The convolution transpose operator consumes an input tensor and a filter, and computes the output.
If the
padding
parameter is provided the shape of the output is auto generated.output_shape
can also be explicitly specified in which casepadding
values are auto generated. See attribute descriptions for more details.See also PyTorch Tensor.ConvTranspose2d, ONNX ConvTranspose.
- popxl.ops.t
Input data tensor from a previous layer. If the input is a 3D tensor, the size is (N, C, L), where N is the batch size, C is the number of channels, L is the length; If the input is a 2D image, the size is (N, C, H, W), where N is the batch size, C is the number of channels, H and W are the height and width; If the input is a 3D image, the size is (N, C, D, H, W), where N is the batch size, C is the number of channels, D is the depth, H and W are the height and width.
- Type
- popxl.ops.weight
The weight tensor that will be used in the convolutions. If the input is a 3D tensor, the weight size is (M, C/group, k), where C is the number of channels, k is the length of the kernel, M is the number of feature maps. If the input is a 2D image, the weight size is (M, C/group, kH, kW), where C is the number of channels, kH and kW are the height and width of the kernel, M is the number of feature maps. If the input is a 3D image, the weight size is (M, C/group, kD, kH, kW), where C is the number of channels, kD, kH and kW are the depth, height and width of the kernel, M is the number of feature maps.
- Type
- popxl.ops.padding
Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis.
pads
format should be[x1_begin, x2_begin...x1_end, x2_end,...]
, wherexi_begin
is the number of pixels added at the beginning of axisi
andxi_end
is the number of pixels added at the end of axisi
. If the pads parameter is provided the shape of the output is auto generated. See ONNX Conv Transpose for details.- Type
Tuple[int]
- popxl.ops.groups
Number of groups input channels and output channels are divided into.
- Type
int(default is 1)
- popxl.ops.pad_type
The
pad_type
must be either “not_set”, “same_upper”, “same_lower” or “valid”. The default value is “not_set”, which means explicit padding is used. “same_upper” or “same_lower” mean pad the input such thatoutput_shape[i] = ceil(input_shape[i] / strides[i])
for each axisi
. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In the case that the padding is an odd number, the extra padding is added at the end for “same_upper” and at the beginning for “same_lower”.- Type
PadType(default is not_set)
- popxl.ops.output_padding
Additional elements added to the side with higher coordinate indices in the output. Each padding value in
output_padding
must be strictly less than the corresponding stride/dilation dimension. Note that this attribute doesn’t directly affect the computed output values. It only controls the selection of the computed values, so changing this attribute only adds or removes output elements. Ifoutput_shape
is explicitly provided,output_padding
does not contribute additional size tooutput_shape
but participates in the computation of the needed padding amount.- Type
Tuple[int]
- popxl.ops.output_shape
The shape of the output can be explicitly set which will cause padding values to be auto generated. If output_shape is specified pads values are ignored. See ONNX Conv Transpose for details on how padding is generated.
- Type
Tuple[int]
- popxl.ops.available_memory_proportions
The available memory proportions per conv, each [0, 1).
- partials_types (List[str]):
The partials type per convolution, choose between half and float.
- Type
List[float]
- popxl.ops.enable_conv_dithering
Enable convolution dithering per convolution. If true, then convolutions with different parameters will be laid out from different tiles in an effort to improve tile balance in models.
- Type
List[int]
- Returns
- Output data tensor that contains the result of the convolution. The output dimensions are functions
of the kernel size, stride size, pad lengths and group count.
- Return type
- Parameters
- popxl.ops.conv_transpose_pow2scaled(t, weight, log2_scale, stride=(1, 1), padding=(0, 0, 0, 0), dilation=(1, 1), groups=1, pad_type='not_set', output_padding=(), output_shape=(), available_memory_proportions=None, enable_conv_dithering=None)
Perform a single transposed and scaled convolution operation on a tensor.
This operator consumes an input, weight, and log2 scale tensor to compute a transposed convolution, then scales the convolution output by
pow2(log2_scale)
before converting to float16.The dtype of the input
t
and weight tensor must be one ofpopxl.float8_143
orpopxl.float8_152
. Thelog2_scale
must be a scalar tensor of typepopxl.int32
and contain a runtime value in the range[-32, 32)
If the
padding
parameter is provided the shape of the output is auto generated.output_shape
can also be explicitly specified in which casepadding
values are auto generated. See attribute descriptions for more details.See also PyTorch Tensor.ConvTranspose2d, ONNX ConvTranspose.
- popxl.ops.t
Input data tensor from previous layer of type either
popxl.float8_143
orpopxl.float8_152
; If the input is a 3D tensor, the size is (N, C, L), where N is the batch size, C is the number of channels, L is the length; If the input is a 2D image, the size is (N, C, H, W), where N is the batch size, C is the number of channels, H and W are the height and width; If the input is a 3D image, the size is (N, C, D, H, W), where N is the batch size, C is the number of channels, D is the depth, H and W are the height and width.- Type
- popxl.ops.weight
The weight tensor that will be used as a kernel in the convolution, of dtype either
popxl.float8_143
orpopxl.float8_152
; If the input is a 3D tensor, the weight size is (M, C/group, k), where C is the number of channels, k is the length of the kernel, M is the number of feature maps. If the input is a 2D image, the weight size is (M, C/group, kH, kW), where C is the number of channels, kH and kW are the height and width of the kernel, M is the number of feature maps. If the input is a 3D image, the weight size is (M, C/group, kD, kH, kW), where C is the number of channels, kD, kH and kW are the depth, height and width of the kernel, M is the number of feature maps.- Type
- popxl.ops.log2_scale
32-bit integer power-of-two exponent, where the convolution output is multiplied by
pow2(log2_scale)
before conversion to float16. Must be of dtypepopxl.int32
.- Type
- popxl.ops.padding
Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis.
pads
format should be[x1_begin, x2_begin...x1_end, x2_end,...]
, wherexi_begin
is the number of pixels added at the beginning of axisi
andxi_end
is the number of pixels added at the end of axisi
. If the pads parameter is provided the shape of the output is auto generated. See ONNX Conv Transpose for details.- Type
Tuple[int]
- popxl.ops.groups
Number of groups input channels and output channels are divided into.
- Type
int(default is 1)
- popxl.ops.pad_type
The
pad_type
must be either “not_set”, “same_upper”, “same_lower” or “valid”. The default value is “not_set”, which means explicit padding is used. “same_upper” or “same_lower” mean pad the input such thatoutput_shape[i] = ceil(input_shape[i] / strides[i])
for each axisi
. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In the case that the padding is an odd number, the extra padding is added at the end for “same_upper” and at the beginning for “same_lower”.- Type
PadType(default is not_set)
- popxl.ops.output_padding
Additional elements added to the side with higher coordinate indices in the output. Each padding value in
output_padding
must be strictly less than the corresponding stride/dilation dimension. Note that this attribute doesn’t directly affect the computed output values. It only controls the selection of the computed values, so changing this attribute only adds or removes output elements. Ifoutput_shape
is explicitly provided,output_padding
does not contribute additional size tooutput_shape
but participates in the computation of the needed padding amount.- Type
Tuple[int]
- popxl.ops.output_shape
The shape of the output can be explicitly set which will cause padding values to be auto generated. If output_shape is specified pads values are ignored. See ONNX Conv Transpose for details on how padding is generated.
- Type
Tuple[int]
- popxl.ops.available_memory_proportions
The available memory proportions per conv, each [0, 1).
- Type
List[float]
- popxl.ops.enable_conv_dithering
Enable convolution dithering per convolution. If true, then convolutions with different parameters will be laid out from different tiles in an effort to improve tile balance in models.
- Type
List[int]
- Returns
- Output data tensor that contains the result of the convolution. The output dimensions are functions
of the kernel size, stride size, pad lengths and group count.
- Return type
- Raises
TypeError – If the tensor or weight tensors do not have a dtype in
{popxl.float8_143, popxl.float8_152}
, or if thelog2_scale
tensor does not have dtypepopxl.int32
ValueError – If
log2_scale
is not a scalar tensor.
- Parameters
- popxl.ops.cos(t)
Compute the cosine of each element of the input tensor.
See also PyTorch Tensor.cos.
- popxl.ops.cumsum(t, dim=0)
Performs the cumulative sum of the input elements along the given dimension
dim
.See also Pytorch Tensor.cumsum, Numpy cumsum.
- popxl.ops.detach(t)
Prevent gradient computation of this tensor.
This operation is numerically equivalent to the identity op.
See also PyTorch Tensor.detach.
- popxl.ops.detach_(t)
Prevent in-place gradient computation of this tensor.
The in-place version of
detach()
. The behaviour is the same, it blocks gradient propagation on the input tensor but does not make a copy of the input tensor.See also PyTorch Tensor.detach_.
- popxl.ops.div(lhs, rhs)
Divide two tensors elementwise.
Follows NumPy broadcasting rules. The arguments must have the same dtype. The output will be the same dtype as the inputs. Floor division is used with integer values.
See also PyTorch Tensor.div, ONNX Div.
- popxl.ops.dropout(t, seed_tensor, p)
Randomly set elements of the input tensor to zero.
This operation will zero elements of tensor
t
with a probability ofp
. The dropout mask is created using samples from a Bernoulli distribution seeded with theseed_tensor
.You needs to manage updating the
seed_tensor
for each forward pass and replica.See also ONNX Dropout.
- Parameters
- Returns
A new tensor with the dropout applied.
- Return type
- popxl.ops.dynamic_slice(t, index, axes, sizes, no_overlap)
Return a cloned slice of the input tensor.
The name “dynamic” refers to the fact that the index can be specified at runtime.
A slice along an axis can be defined by the tuple (
start
,stop
,step
) where:start
is the index for the respective axisstop
is index + size for the respective axisstep
equals 1
Limitations:
Assuming we would like to slice
t
with dimension [4, 3]:A step other than 1 is not supported (that is,
t[::2,:]
is not supported)Negative slicing is not supported (that is,
t[:-1,:]
is not supported)A stop value greater than the size of the axis is not supported (that is,
t[:5,:]
is not supported)
- Parameters
t (Tensor) – The input tensor.
index (Tensor) – The indices to start the slice from.
axes (List[int]) – The axes to slice from.
sizes (List[int]) –
The sizes of the slices for the specified axes. For example:
If
index
= [1, 2],axes
= [0, 3] andsizes
= [2, 4], then the tensor will be sliced ast[1:2, :, :, 2:4]
.no_overlap (bool) – If set to true, then correct gradient backpropagation is only guaranteed if each region in the output tensor has exactly one populator (operation that writes data to this region). There are no run-time or compile-time checks possible to ensure this.
- Returns
A clone (not a view) of the sliced input tensor.
- Return type
- popxl.ops.dynamic_update(t, index, t_update, axes, sizes, no_overlap)
Update a slice of a tensor.
The name “dynamic” refers to the fact that the index can be specified at runtime.
index
,axes
andsizes
determine the slice oft
which will be updated. The dimensions of this slice andt_update
must match. A slice along an axis can be defined by the tuple (start
,stop
,step
) where:start
is the index for the respective axisstop
isindex
+size
for the respective axisstep
equals 1
Limitations:
Assuming we would like to update
t
with dimension [4, 3], the slicing oft
will have the following limitations:A
step
other than 1 is not supported (that is,t[::2,:]
is not supported)Negative slicing is not supported (that is,
t[:-1,:]
is not supported)A value of
stop
larger than the size of the axis is not supported (for example,t[:5,:]
is not supported)
- Parameters
t (Tensor) – The tensor to update.
index (Tensor) – The indices to start the slice from.
t_update (Tensor) – The tensor to update
t
with.axes (Iterable[int]) – The axes of
t
to make the update on.sizes (Iterable[int]) – The sizes of the updates along the specified axes. For example, if
index
= [1, 2],axes
= [0, 3] andsizes
= [2, 4], then the tensor will be updated att[1:2, :, :, 2:4]
.no_overlap (bool) – If set to true, then correct gradient backpropagation is only guaranteed if each region in the output tensor has exactly one populator (operation that writes data to this region). There are no run-time or compile-time checks possible to ensure this.
- Returns
The updated tensor.
- Return type
- popxl.ops.dynamic_update_(t, index, t_update, axes, sizes, no_overlap)
Update a slice of a tensor in place.
Dynamically updates tensor
t
in place. The name “dynamic” refers to the fact that the index can be specified during runtime.index
,axes
andsizes
determine the slice oft
which will be updated. The dimensions of this slice andt_update
must match. A slice along an axis can be defined by the tuple (start
,stop
,step
) where:start
is the index for the respective axisstop
isindex
+size
for the respective axisstep
equals 1
Limitations:
Assuming we would like to update
t
with dimension [4, 3], the slicing oft
will have the following limitations:A step value other than 1 is not supported (that is,
t[::2,:]
is not supported)Negative slicing is not supported (that is, t[:-1,:] is not supported)
A
stop
value larger than the size of the axis is not supported (for example, t[:5,:] is not supported)
- Parameters
t (Tensor) – Tensor to update.
index (Tensor) – The indices to start the slice from.
t_update (Tensor) – The tensor to update
t
with.axes (List[int]) – The axes of
t
to make the update on.sizes (List[int]) – The sizes of the updates along the specified axes. For example, if
index
= [1, 2],axes
= [0, 3] andsizes
= [2, 4], the tensor will be updated att[1:2, :, :, 2:4]
.no_overlap (bool) – If set to true, then correct gradient backpropagation is only guaranteed if each region in the output tensor has exactly one populator (operation that writes data to this region). There are no run-time or compile-time checks possible to ensure this.
- Returns
The updated tensor.
- Return type
- popxl.ops.equal(lhs, rhs)
Apply an elementwise equality operation.
Follows NumPy broadcasting rules.
See also PyTorch Tensor.equal, NumPy equal, ONNX Equal.
- popxl.ops.exp(t)
Compute the exponential of the elements of input tensor.
See also PyTorch Tensor.exp, NumPy exp, ONNX Exp.
- popxl.ops.exp_(t)
Compute the exponential of the elements of input tensor (in-place).
See also PyTorch Tensor.exp_.
- popxl.ops.flatten(t)
Flatten a tensor.
Internally this uses
reshape()
.See also PyTorch Tensor.flatten, ONNX Flatten.
- popxl.ops.flatten_(t)
Flatten a tensor in place.
Internally this uses
reshape_()
.This is the in-place version of
flatten()
.
- popxl.ops.fmod(lhs, rhs)
Compute the elementwise remainder after division (modulo operation).
Follows NumPy broadcasting rules. Arguments must have the same dtype.
See also PyTorch Tensor.fmod, NumPy fmod.
- popxl.ops.gather(t, indices, axis=0, available_memory_proportion=None, zero_OOR=False)
Select multiple elements from a tensor along specified axes.
Elements are specified via
indices
, along a specified axis. Equivalent tonumpy.take()
. Note that this is different fromtorch.gather()
.Examples:
x = popxl.variable(np.arange(16).reshape(4, 4)) # [[ 0, 1, 2, 3], # [ 4, 5, 6, 7], # [ 8, 9, 10, 11], # [12, 13, 14, 15]] gather(x, [3, 1, 2]) == Tensor([x[3], x[1], x[2]]) # [[12, 13, 14, 15], # [ 4, 5, 6, 7], # [ 8, 9, 10, 11]] gather(x, [[0, 1], [1, 2]]) == gather(x, [0, 1, 1, 2]).reshape(2, 2, 4) # [[[ 0, 1, 2, 3], # [ 4, 5, 6, 7]], # [[ 4, 5, 6, 7], # [ 8, 9, 10, 11]]]
See also PyTorch Tensor.gather, ONNX Gather.
- Parameters
t (Tensor) – The input tensor.
indices (Tensor) – The indices of the elements to extract.
axis (int) – The axis to gather on. The default is 0.
available_memory_proportion (Optional[float]) – The maximum proportion of available memory on each tile that this layer should consume temporarily during the course of the operation. Defaults to 1.0 if not set globally.
zero_OOR (bool) – If
False
, out of range (OOR) indices will produce undefined data. IfTrue
, out of range indices will produce zeros.
- Returns
The gathered elements concatenated.
- Return type
- popxl.ops.gelu(t)
Compute the GELU activation on a tensor.
For more details, refer to the paper Gaussian Error Linear Units
- popxl.ops.gelu_(t)
Compute the GELU activation on a tensor (in-place).
For more details, refer to the paper Gaussian Error Linear Units
- popxl.ops.greater(input, other)
Computes where the first tensor is greater than the second tensor.
This is an element-wise operation (with NumPy-style broadcasting support).
See also Pytorch greater, NumPy greater.
- popxl.ops.group_norm(t, weight, bias, num_groups, eps=1e-05)
Apply group normalisation to a tensor.
For more details, refer to the paper Group Normalization.
- Parameters
t (Tensor) – Tensor to be normalized.
weight (Tensor) – Tensor used to scale the result of normalisation.
bias (Tensor) – Tensor used to shift the result of normalisation.
num_groups (int) – Number of groups to separate the channels into.
eps (float) – The small value to use to avoid division by zero.
- Returns
The group normalised tensor.
- Return type
- popxl.ops.histogram(t, levels, absolute_of_input)
Compute the histogram of the input tensor.
See also PyTorch torch.histc,
NumPy histogram
.- Parameters
- Returns
Output counts.
- Return type
- popxl.ops.host_load(h2d_stream, name=None)
Transfer a tensor from the host to the IPU.
This operation represents the transfer of data from the host to the IPU. It uses the existing host to IPU transfers created when building the IR, but defers the actual poplar::Copy until the op itself runs. This allows the copy to be scheduled as part of the normal op scheduling.
Data is sent from the host via the
IStepIO
object passed tosession.run()
.- Parameters
h2d_stream (HostToDeviceStream) – Stream to load from.
name (str) – Name to use for the returned tensor.
- Returns
The output tensor streamed from the host.
- Return type
- popxl.ops.host_store(d2h_stream, t)
Transfer a tensor from the IPU to the host.
This operation represents the transfer of data from the IPU to the host. It uses the existing device to host transfers created when building the IR, but defers the actual poplar::Copy until the op itself runs. This allows the copy to be scheduled as part of the normal op scheduling.
Data is received on the host via the
IStepIO
object passed tosession.run()
.- Raises
ValueError – If the stream shape or dtype doesn’t match the tensor shape.
- Parameters
d2h_stream (DeviceToHostStream) – The stream to use for the host store.
t (Tensor) – The input tensor to copy to host.
- Return type
None
- popxl.ops.increment_mod(t, increment, modulus)
Increment the elements of a tensor using modulo arithmetic.
- popxl.ops.increment_mod_(t, increment, modulus)
Increment the elements of a tensor using modulo arithmetic in place.
- popxl.ops.init(shape, dtype, name=None, init_type='zero', meta_shape=None)
Create a tensor that is initialised with zero or undefined values.
The returned tensor is not considered a variable. A variable must be created in the main graph; it can be initialised to arbitrary values and can be read/written with session methods.
In contrast,
init
can be executed anywhere so it can return an initialised tensor in non-main graphs.The tensor can only be initialised to zero or undefined values.
- Parameters
dtype (dtypes.dtype) – Data type of the output tensor.
shape (Tuple[int]) – Shape of the output tensor.
name (str) – Name of the output tensor.
init_type (Union[Literal["zero"], Literal["undef"]]) – Initialisation of the output tensor.
meta_shape (Tuple[int]) – meta shape of tensor
- Raises
ValueError – If the
init_type
is unknown.- Returns
An initialised tensor.
- Return type
- popxl.ops.interpolate(t, scale_factor=(1.0, 1.0, 1.0, 1.0), mode='nearest', nearest_mode='round_prefer_floor', coordinate_transformation_mode='half_pixel')
Interpolate the input tensor. Each dimension value of the output tensor is: output_dimension = floor(input_dimension * scale_factor).
- Parameters
t (Tensor) – Input data tensor from previous layer.
scale_factor (Tuple[float]) – The scale array along each dimension. It takes value greater than or equal to 1. The number of elements of ‘scales’ should be the same as the rank of input ‘t’.
mode (InterpolateType) – The interpolate algorithm, three interpolation modes: nearest (default), linear and cubic.
nearest_mode (InterpolateNearestType) – Four modes: round_prefer_floor (default, as known as round half down), round_prefer_ceil (as known as round half up), floor, ceil. Only used by nearest interpolation. It indicates how to get “nearest” pixel in input tensor from x_original, so this attribute is valid only if “mode” is “nearest”.
coordinate_transformation_mode (InterpolateCoordinateTransformationType) –
This attribute describes how to transform the coordinate in the interpolated tensor to the coordinate in the original tensor. The coordinate of each dimension is transformed individually. Let’s describe a case using axis x as an example.
Some variables are defined as follows:
x_interpolated: the coordinate of axis x in the interpolated tensor.
x_original: the coordinate of axis x in the original tensor.
length_original: the length of the original tensor in axis x.
length_interpolated: the length of the interpolated tensor in axis x.
roi_x: roi_x = (start_x, end_x) of the axis x in input “roi”.
scale: scale = length_interpolated / length_original.
Then:
if coordinate_transformation_mode is “half_pixel”, x_original = (x_interpolated + 0.5) / scale - 0.5,
if coordinate_transformation_mode is “pytorch_half_pixel”, x_original = length_interpolated > 1 ? (x_interpolated + 0.5) / scale - 0.5 : 0,
if coordinate_transformation_mode is “align_corners”, x_original = x_interpolated * (length_original - 1) / (length_interpolated - 1),
if coordinate_transformation_mode is “asymmetric”, x_original = x_interpolated / scale,
if coordinate_transformation_mode is “tf_crop_and_resize”, x_original = length_interpolated > 1 ? start_x * (length_original - 1) + x_interpolated * (end_x - start_x) * (length_original - 1) / (length_interpolated - 1) : 0.5 * (start_x + end_x) * (length_original - 1).
- Returns
Output data tensor after interpolate.
- Return type
- popxl.ops.io_tile_copy(t)
Copy a tensor to or from I/O tiles on the current IPU.
- popxl.ops.ipu_copy(t, destination, source=None)
Copy a tensor to an IPU.
- Parameters
- Raises
ValueError – If the source IPU could not be inferred and the source is not specified.
- Returns
The copied tensor.
- Return type
- popxl.ops.l1(t, axis=None, keepdims=False)
Compute the sum of the magnitudes of the elements in a tensor (L1 norm) along specified axes.
- Parameters
t (Tensor) – Tensor to compute the L1 norm of.
axis (int or list) – Axis or axes to compute L1 norm along. If none is specified then all elements will be normalised. If an axis is negative then it indexes from the last to the first axis.
keepdims (bool) – Keep the axis that is being reduced (
True
) or not (False
).
- Returns
The reduced tensor containing the L1 norm of elements along the specified axes.
- Return type
- popxl.ops.l2(t, axis=None, keepdims=False)
Compute the square root of the sum of the squares of the elements in a tensor (L2 norm) along specified axes.
- Parameters
- Returns
The reduced tensor containing the L2 norm of elements along the specified axes.
- Return type
- popxl.ops.lamb_square(t)
Square each element before applying an add reduction.
Used in the LAMB optimizer: https://arxiv.org/abs/1904.00962
- popxl.ops.layer_norm(t, weight, bias, eps=1e-05)
Apply layer normalisation to a tensor.
Uses
group_norm
under the hood.- Parameters
- Returns
The layer normalised tensor.
- Return type
- popxl.ops.log(t)
Compute the log of the elements of input tensor.
See also PyTorch torch.log, NumPy log, ONNX Log.
- popxl.ops.logical_and(lhs, rhs)
Compute the elementwise logical
and
of two tensors.Follows NumPy broadcasting rules. Inputs will be cast to bool if necessary.
See also PyTorch Tensor.logical_and, NumPy logical_and.
- popxl.ops.logical_not(t)
Compute the elementwise
not
of a tensor.Inputs will be cast to bool if necessary.
See also PyTorch Tensor.logical_not, NumPy logical_not.
- popxl.ops.logical_or(lhs, rhs)
Compute the elementwise logical
or
of the input tensors.Follows NumPy broadcasting rules. Inputs will be cast to bool if necessary.
See also PyTorch Tensor.logical_or, NumPy logical_or.
- popxl.ops.logsum(t, axis=None, keepdims=False)
Compute the log of summed elements of a tensor along specified axes.
Supported dtypes: float.
- Parameters
t (Tensor) – Tensor to compute the log of the sum of elements.
axis (int or list) – Axis or axes to compute the log of the sum along. If none is specified all axes will be summed. If an axis is negative it indexes from the last to the first axis.
keepdims (bool) – Keep the axis that is being computed (
True
or not (False
).
- Returns
A new tensor containing the log of the summed elements along the specified axes.
- Return type
- popxl.ops.logsumexp(t, axis=None, keepdims=False)
Compute the log of the summed exponentials of elements in a tensor, along specified axes.
Supported dtypes: floats.
See also PyTorch Tensor.logsumexp.
- Parameters
t (Tensor) – Tensor to compute the log of the summed exponentials of the elements.
axis (int or list) – Axis or axes to compute the log of the summed exponentials along. If none is specified all axes will be reduced. If axis is negative it indexes from the last to the first axis.
keepdims (bool) – Keep the axis that is being computed (
True
) or not (False
).
- Returns
A new tensor containing the log of the summed exponentials of the elements along the specified axes.
- Return type
- popxl.ops.matmul(lhs, rhs, available_memory_proportion=None, output_type=None, partials_type=None)
Perform matrix multiplication of two tensors.
Follows NumPy matrix multiplication rules for N-D tensors, see
numpy.matmul()
.Arguments must have the same dtype. Shapes must be compatible as defined by the NumPy matrix multiplication rules.
See also PyTorch Tensor.matmul, NumPy matmul, ONNX MatMul.
- Parameters
lhs (Tensor) – Left hand side of matrix multiplication.
rhs (Tensor) – Right hand side of matrix multiplication.
available_memory_proportion (Optional[float]) – The maximum proportion of available memory on each tile that this layer should consume temporarily during the course of the operation. Defaults to 1.0.
output_type (Optional[dtypes.dtype], optional) – 3 Output datatype to enforce. Defaults to the dtype of lhs/rhs.
partials_type (dtypes.dtype, optional) – The type to use for partial results (float16, float32). Defaults to dtypes.float32.
- Returns
The matrix product of
lhs
andrhs
.- Return type
- popxl.ops.matmul_pow2scaled(lhs, rhs, log2_scale, available_memory_proportion=None)
Perform a scaled matrix multiplication between two tensors.
Compute a matrix multiplication between
lhs
andrhs
, then multiply the result bypow2(log2_scale)
.The matrix multiply arguments must have either
popxl.float8_143
orpopxl.float8_152
dtype. Thelog2_scale
argument must be of typepopxl.int8
and be in the range in [-32, 32).Follows NumPy matrix multiplication rules for N-D tensors, see
numpy.matmul()
.- Parameters
lhs (Tensor) – Left hand side of matrix multiplication.
rhs (Tensor) – Right hand side of matrix multiplication.
log2_scale (Tensor) – integer power-of-two exponent, where the matrix multiplication output is multiplied by pow2(log2_scale).
available_memory_proportion (Optional[float]) – The maximum proportion of available memory on each tile that this layer should consume temporarily during the course of the operation. Defaults to 1.0.
- Raises
TypeError – If the matrix multiply operand tensors do not have a dtype in
{popxl.float8_143, popxl.float8_152}
, or if thelog2_scale
tensor does not have dtypepopxl.int32
ValueError – If
log2_scale
is not a scalar tensor.
- Return type
- popxl.ops.max(t, axis=None, keepdims=False)
Compute the maximum value of the elements in a tensor along specified axes.
See also PyTorch Tensor.max, ONNX Max.
- Parameters
- Returns
The reduced tensor containing the maximum of elements computed along the specified axes.
- Return type
- popxl.ops.max_pool(t, kernel_size, stride=None, padding=None, out_pads=None, dilation=None, in_dilations=None, auto_pad='not_set', ceil_mode=False, storage_order='row')
Max pool a tensor.
This consumes an input tensor
t
and applies max pooling across the tensor according to kernel sizes, stride sizes, and pad lengths. Max pooling consists of computing the max on all values of a subset of the input tensor according to the kernel size and down-sampling the data into the output tensor Y for further processing.- Parameters
t (Tensor) –
Input data tensor from the previous layer.
If the input is a 3D tensor, the size is (N, C, L), where N is the batch size, C is the number of channel, L is the length;
If the input is a 2D image, the size is (N, C, H, W), where N is the batch size, C is the number of channel, H and W are the height and width;
If the input is a 3D image, the size is (N, C, D, H, W), where N is the batch size, C is the number of channel, D is the depth, H and W are the height and width.
kernel_size (Tuple[int]) – The size of the kernel along each axis.
stride (Tuple[int]) – Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.
padding (Tuple[int]) – Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis.
padding
format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axisi
and xi_end, the number of pixels added at the end of axisi
.out_pads (Tuple[int]) – The output padding for pooling.
dilation (Tuple[int]) – dilation value along each spatial axis of the filter.
in_dilations (Tuple[int]) – The input dilations attributes along each spatial axis of the filter.
auto_pad (Literal) – auto_pad must be either “not_set”, “same_upper”, “same_lower” or “valid”. The default value is “not_set”, which means explicit padding is used. “same_upper” or “same_lower” mean pad the input so that
output_shape[i] = ceil(input_shape[i] / strides[i])
for each axisi
. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In the case that the padding is an odd number, the extra padding is added at the end for “same_upper” and at the beginning for “same_lower”.ceil_mode (bool) – When True, will use ceil instead of floor to compute the output shape.
storage_order (Literal['row', 'column']) – The storage order of the tensor. Default is row.
- Returns
Output data tensor from max pooling across the input tensor. Dimensions will vary based on various kernel, stride, and pad sizes. Floor value of the dimension is used.
- Return type
- popxl.ops.maximum(*ts)
Compute the elementwise maximum of N tensors.
Follows NumPy broadcasting rules. Arguments must have the same dtype.
- popxl.ops.mean(t, axis=None, keepdims=False)
Compute the arithmetic mean of elements in a tensor along the specified axes.
See also PyTorch Tensor.mean, NumPy mean, ONNX Mean.
- Parameters
t (Tensor) – Tensor to compute the mean of elements.
axis (int or list) – Axis or axes to compute the mean along. If none is provided all axes will be reduced. If axis is negative it indexes from the last to the first axis.
keepdims (bool) – Keep the axis that is being reduced (
True
) or not (False
).
- Returns
The reduced tensor containing the arithmetic means computed along the specified axes.
- Return type
- popxl.ops.median(t, axis=None, keepdims=False)
Compute the median of elements in a tensor along axes.
See also PyTorch Tensor.median, NumPy median.
- Parameters
- Returns
The reduced tensor.
- Return type
- popxl.ops.min(t, axis=None, keepdims=False)
Compute the minimum of the elements of a tensor along axes.
See also PyTorch Tensor.min, ONNX Min.
- Parameters
- Returns
The reduced tensor containing the minimum of the elements along the axes.
- Return type
- popxl.ops.mul(lhs, rhs)
Multiply two tensors elementwise.
Follows NumPy broadcasting rules. Arguments must have the same dtype.
See also PyTorch Tensor.mul, ONNX Mul.
- popxl.ops.negate(t)
Perform elementwise negation (two’s complement) of a tensor.
- popxl.ops.nll_loss(probs, labels, ignore_index=None, reduction='mean', log_prob=False)
Compute the negative log likelihood loss.
Compute the negative log likelihood loss
l
whereprobs = softmax(x)
. The returned loss will be reduced byreduction
(default mean) across items intargets
. Any item intarget
equal toignore_index
will not contribute tol
ordl/dx
.See also PyTorch nll_loss, ONNX NegativeLogLikelihoodLoss.
- Parameters
probs (Tensor) – The probabilities. Expected to be the output of
softmax()
.labels (Tensor) – The labels. Target values for the probabilities.
ignore_index (Optional[int], optional) – Specify label values that should not contribute to the loss
reduction (str) – Specify how to reduce the loss. Defaults to
mean
. Optionsmean
,sum
andnone
log_prob (bool) – If true input probabilities are logged
- Returns
The calculated negative log likelihood loss.
- Return type
- popxl.ops.nll_loss_with_softmax_grad(probs, labels, loss_grad=1, ignore_index=None, reduction='mean')
Compute the negative log likelihood loss.
Compute the negative log likelihood loss
l
and returns the gradientdE/dx
whereprobs = softmax(x)
.loss_grad
should be the gradientdE/dl
, whereE
is the error from which back propagation is initialised. Typically,E = l
therefore in order to returndl/dx
theloss_grad
should bedl/dl
which would be1
.- Parameters
probs (Tensor) – The probabilities. Expected to be the output of
softmax()
.labels (Tensor) – The labels. Target values for the probabilities.
loss_grad (Tensor) – The gradient,
dE/dl
. Supports float32 dtypes with float16probs
reduction (ReductionType) – Specify how to reduce the loss. Defaults to
mean
. Optionsmean
,sum
andnone
ignore_index (Optional[int]) – Specify label values that should not contribute to
l
ordE/dx
. Defaults to None.
- Returns
A tuple of the loss and the gradient: (
l
,dE/dx
).- Return type
- popxl.ops.onehot(t, num_classes, values, axis)
Produce a one-hot tensor based on inputs.
See also ONNX OneHot.
- Parameters
- Returns
Output tensor.
- Return type
- popxl.ops.pow2scale_then_cast(t, log2_scale, data_type)
Add a fused operation
cast(src * pow2(log2_scale), dtype)
to cast to floating point 8 data type.See the PopXL documentation on floating point 8 types for more details.
- Parameters
- Raises
TypeError – If
data_type
is not of type float8_143 or float8_152.- Returns
The converted float8 tensor.
- Return type
- popxl.ops.print_tensor(t, title=None, print_self=True, print_gradient=False, summarise_threshold=1000, edge_items=3, max_line_width=75, digits=8, float_format='auto', separator=' ', open_bracket='[', close_bracket=']')
Print a tensor.
The output tensor of this op must be consumed if you want to print the gradient tensor. If the output is not consumed this op does not get pruned when running
removeIsolatedTensors
.The default output format will split large lines, print all elements in the same format, pad elements so that they align and summarise large tensors.
- Parameters
t (Tensor) – The tensor to print.
title (str, optional) – Title to print. Defaults to None.
print_self (bool, optional) – Print the tensor itself. Defaults to
True
.print_gradient (bool, optional) – Indicates if the associated gradient tensor of t is also printed (
True
) or not (False
). Defaults to False.summarise_threshold (int) – default 1000. If the number of elements of the tensor exceeds this threshold the output will be summarised. Only the edge elements will be displayed with an ellipsis indicating skipped elements. A value of 0 will disable summarisation.
edge_items (int) – default 3. Number of edge elements to include at the beginning and end when summarisation is enabled.
max_line_width (int) – default 75. lines longer than this limit will be split across multiple lines. A value of 0 will disable line splitting.
digits (int) – default 8. Number of digits to display. For integers this limit can be exceeded if any number is large enough. For floating points this does not include the exponent. The number of digits is used in conjunction analysis of the tensor to determine the width of each element to align all elements when printed. A value of 0 disables this analysis and each elements will be printed in an unaligned format.
float_format (str) – default ‘auto’. Determines the floating point format to use. Options: ‘auto’, ‘fixed’, ‘scientific’ and ‘none’. ‘auto’ mode determines the appropriate format based on the data. ‘fixed’ uses fixed point format e.g.
-100.00
. ‘scientific’ uses scientific notation e.g.-1.123e+10
. ‘none’ does not take care to display numbers in the same format. Ifdigits==0
this option is disregarded and thefloat_format
is set to ‘none’separator (str) – default ‘,’. Character used to delininate values.
open_bracket (str) – default ‘[’. character used to open a tensor.
close_bracket (str) – default ‘]’. Character used to close a tensor.
- Raises
ValueError – if separator, open_bracket or close_bracket are not a single character.
KeyError – if float_format is not one of the amiable options (see parameter docstring above)
- Returns
The input tensor, unchanged.
- Return type
- popxl.ops.prod(t, axis=None, keepdims=False)
Compute the product of elements along an axis.
See also PyTorch Tensor.prod, NumPy prod.
- Parameters
t (Tensor) – Tensor to compute product of.
axis (int or list) – Axis or axes to compute product along. If none is provided, all axes will be reduced. If the axis is negative, the product is computed from the last to the first axis.
keepdims (bool) – Keep the axis that is being reduced (‘True`) or not (‘False`).
- Returns
The reduced tensor.
- Return type
- popxl.ops.random_normal(seed_tensor, shape, mean=0.0, std=1.0, dtype=popxl.dtypes.float32)
Randomly sample from a normal distribution.
The mean and standard deviation of the distribution is specified by
mean
andstd
respectively.Note: not compatible with IPU Model.
- Parameters
seed_tensor (Tensor) – A tensor used to seed the probability distribution. Must have data type uint32 and shape (2,).
shape (Tuple[int, ...]) – The shape of the output tensor.
mean (float, optional) – Mean of the distribution. Defaults to 0.0.
std (float, optional) – Standard deviation of the distribution. Defaults to 1.0.
dtype (dtypes.dtype, optional) – Data type of output tensor. Defaults to dtypes.float32.
- Returns
A new tensor with elements sampled from a normal distribution.
- Return type
- popxl.ops.random_uniform(seed_tensor, shape, low=0.0, high=1.0, dtype=popxl.dtypes.float32)
Randomly sample from a uniform distribution.
This operation will sample uniformly from a range with minimum value
low
and maximum valuehigh
.Note: not compatible with IPU Model.
- Parameters
seed_tensor (Tensor) – A tensor used to seed the probability distribution. Must have data type uint32 and shape (2,).
shape (Tuple[int, ...]) – The shape of the output tensor.
low (float, optional) – Minimum value. Defaults to 0.0.
high (float, optional) – Maximum value. Defaults to 1.0.
dtype (dtypes.dtype, optional) – Data type of output tensor. Defaults to dtypes.float32.
- Returns
A new tensor with element values sampled from a uniform distribution.
- Return type
- popxl.ops.relu(t)
Compute the ReLU activation of a tensor.
For more details, refer to Rectifier (neural networks).
See also ONNX Relu.
- popxl.ops.relu_(t)
Compute the ReLU activation of a tensor in place.
For more details, refer to Rectifier (neural networks).
- popxl.ops.remote_load(remote_buffer, offset, name=None)
Load a tensor from Streaming Memory.
This operation loads a tensor from the remote buffer residing in Streaming Memory.
The tensor will be loaded from the memory location corresponding to
remote_buffer_id
(specified inremote_buffer
).The value of
offset
must be >= 0.The relationship between
offset
andremote_buffer_id
is described inremote_store()
.Note
There is no data dependency in the graph between remote store and remote load. Thus, the remote load operator may end up before the remote store operator in the serialized graph. One way to avoid this is by using
with popxl.in_sequence(True)
.See also
- Parameters
remote_buffer (RemoteBuffer) – The handle to the remote buffer.
offset (Union[int, Tensor]) – Integer or rank-0 tensor indicating which entry in the remote buffer to load from.
name (str) – Name to use for the returned tensor.
- Returns
A new tensor loaded from the remote buffer.
- Return type
- popxl.ops.remote_load_(remote_buffer, offset, t)
Load from Streaming Memory into a specified tensor.
This operation loads from the remote buffer in Streaming Memory into an existing tensor.
This op is identical to
remote_load
, except that the data loaded from the remote buffer will be written to the tensort
.Note
There is no data dependency (in the graph) between remote store and remote load. Thus, the remote load operator may end up before the remote store operator in the serialized graph. One way to avoid this is by using
with popxl.in_sequence(True)
.See also
- Parameters
remote_buffer (RemoteBuffer) – The handle to the remote buffer.
offset (Union[int, Tensor]) – Integer or rank-0 tensor indicating which entry in the remote buffer to load from.
t (Tensor) – The tensor the loaded data will written to.
- Returns
The tensor loaded from the remote buffer
- Return type
- popxl.ops.remote_store(remote_buffer, offset, t)
Store a tensor in Streaming Memory.
This operation stores the input tensor in the remote buffer residing in Streaming Memory.
This op is typically used to store different, identically-shaped tensors to the same remote buffer by specifying the offset.
Instances of the op with matching
remote_buffer_id
(specified inremote_buffer
) will outline together, meaning that if different tensors are to be stored under the same remote buffer ID, a differentoffset
value has to be supplied for each tensor.remote_buffer
handles the relationship betweenremote_buffer_id
,shape
anddtype
becauseshape
anddtype
needs to be fixed for eachremote_buffer_id
.The value of
offset
must be >= 0.If
t
is of rankx
, the remote buffer withremote_buffer_id
will be of rankx+1
, where the new dimension (the row) will be of sizeentries
.Note
There is no data dependency (in the graph) between remote store and remote load. Thus, the remote load operator may end up before the remote store operator in the serialized graph. One way to avoid this is by using
with popxl.in_sequence(True)
.See also
- Parameters
remote_buffer (RemoteBuffer) – The handle to the remote buffer.
offset (Union[int, Tensor]) – Integer or rank-0 tensor indicating which entry in the remote buffer to store to.
t (Tensor) – Tensor to copy and store in the remote buffer.
- Return type
None
- popxl.ops.repeat(graph, repeat_count, *inputs, inputs_dict=None)
Repeatedly call a graph.
This operation repeatedly executes a graph
repeat_count
times. The input tensors are provided as graph inputs for the first iteration.The
inputs
andinputs_dict
tensors are passed as graph inputs. You can specify an input either positionally usinginputs
, or via a tensor map usinginputs_dict
.Graph inputs are determined when the graph is created using
create_graph(callable, ...)
. The order of inputs will be the same as the order of the tensor inputs in the function signature and the order of calledpopxl.graph_inputs
. Seecreate_graph()
for more information.Between each execution of the subgraph, the N outputs of subgraph will be copied to the first N inputs. These are called loop carried inputs. The number of outputs must be less than or equal to the number of inputs. The remaining inputs will be unchanged throughout the loop iterations (unless modified in place).
Example:
# popxl.Module to repeat class AddWeight(popxl.Module): def __init__(self): self.w: popxl.Tensor = None def build(self, x): self.w = popxl.graph_input(x.shape, x.dtype, "w") return self.w + x, w with g: # a graph add_weight0 = AddWeight() add_weight_graph0 = ir.create_graph(add_weight0, x0) # repeat 8 times y0, w0 = ops.repeat(add_weight_graph0, 8, x0, inputs_dict={add_weight0.w: w0})
See also PyTorch Tensor.repeat, NumPy repeat.
- Parameters
graph (Graph) – User defined graph to repeat
repeat_count
times.repeat_count (int) – Number of times to repeat calling the graph.
*inputs (Tensor, List[Tensor], int, float) – Provide inputs via position.
inputs_dict (Optional[Mapping[Tensor, Tensor]]) – Provide inputs via a tensor map. Mapping of
graph tensor -> parent tensor
.check_inputs (bool = True) – If true, then check when called that all inputs have been provided.
- Return type
- Throws:
ValueError: If repeat_count < 0. ValueError: If the number of subgraph inputs < subgraph outputs.
- popxl.ops.repeat_with_info(graph, repeat_count, *inputs, inputs_dict=None, check_inputs=True)
Repeatedly call a graph and return information about the call site.
This operation repeatedly executes a graph
repeat_count
number of times. The input tensors are provided as graph inputs for the first iteration.Returns
CallSiteInfo
that can be used to inspect callsite inputs/outputs.The
inputs
andinputs_dict
tensors are passed as graph inputs. You can specify an input either positionally usinginputs
or via a tensor map usinginputs_dict
.Graph inputs are determined when the graph is created using
ir.create_graph(callable, ...)
. The order of inputs will be the same as the order of the tensor inputs in the function signature and the order of calledpopxl.graph_inputs
. Seecreate_graph()
for more information.Between each execution of the subgraph, the N outputs of subgraph will be copied to the first N inputs. These are called loop carried inputs. The number of outputs must be less than or equal to the number of inputs.
Implementation detail: In order to maintain the input / output indices of the subgraph, we must call the user provided subgraph, and create a “middle” subgraph to repeat the user provided subgraph inside:
LoopOp Keep | Going Loop Carried | Iterator | Inputs | | | | | | |- Implicit Inputs V V V V V V V .-Wrapper_subgraph--+-+-+----+-----. Parent graph | | | | | | | | | | | | | | | | | | | V V V | | | CallOp .-Loop_subgraph---. | | | | | (user provided) |<- | | '-->| | | | | (Ops) | | | | | | | | | | | '----------+-+-+--' | | | | | | | V V V | '---+---------------+-+-+----------' | | | | | | | | V V V V Keep Loop Carried Going Outputs
Example:
# popxl.Module to repeat class AddWeight(popxl.Module): def __init__(self): self.w: popxl.Tensor = None def build(self, x): self.w = popxl.graph_input(x.shape, x.dtype, "w") return self.w + x, w with g: # a graph add_weight0 = AddWeight() add_weight_graph0 = ir.create_graph(add_weight0, x0) # repeat 8 times call_info = ops.repeat( add_weight_graph0, 8, x0, inputs_dict={add_weight0.w: w0} ) y0, w0 = call_info.outputs()
- Parameters
graph (Graph) – User defined graph to repeat
repeat_count
times.repeat_count (int) – Number of times to repeat calling the graph.
*inputs (Tensor, List[Tensor], int, float) – Provide inputs via position.
inputs_dict (Optional[Mapping[Tensor, Tensor]]) – Provide inputs via a tensor map. Mapping of
graph tensor -> parent tensor
.check_inputs (bool) – Check when called if all inputs have been provided. Defaults to True.
- Raises
ValueError – If
repeat_count < 0
.ValueError – If the number of explicitly passed inputs + the number of loop created inputs != the number of outputs.
- Returns
Information on the created callsite for the repeat op.
- Return type
- popxl.ops.reshape(t, shape)
Reshape a tensor.
See also PyTorch Tensor.reshape, NumPy reshape, ONNX Reshape.
- Parameters
- Raises
ValueError – A ValueError will be raised if: - An invalid value is encountered in the shape. - If more than -1 is given in shape.
- Returns
The reshaped tensor.
- Return type
- popxl.ops.reshape_(t, shape)
Reshape a tensor (in-place).
This is the in-place version of
reshape()
.- Parameters
- Raises
ValueError – A ValueError will be raised if: - An invalid value is encountered in the shape. - If more than -1 is given in shape.
- Returns
An alias of the input tensor, reshaped.
- Return type
- popxl.ops.roi_align(t, rois, batch_index, output_size, spatial_scale, sampling_ratio)
Apply pooling across each region of interest.
This consumes an input tensor
t
and regions of interest (ROIs) to apply pooling across each ROI. Only supports average pooling. Max pooling is not supported.- Parameters
t (Tensor) – Input data tensor from the previous operator; 4-D feature map of shape (
N
,C
,H
,W
), whereN
is the batch size,C
is the number of channels, andH
andW
are the height and the width of the data.rois (Tensor) – ROIs to pool over.
rois
is 2-D input of shape (numRois
, 4) given as [[x1, y1, x2, y2], …], wherenumRois
is the number of ROIs. The ROI coordinates are in the coordinate system of the input image. Each coordinate set has a 1:1 correspondence with thebatch_index
input.batch_index (Tensor) – 1-D tensor of shape [
numRois
,] with each element denoting the index of the corresponding image in the batch.output_size (Tuple[int]) – Pooled output height and width.
spatial_scale (float) – Multiplicative spatial scale factor to translate ROI coordinates from their input spatial scale to the scale used when pooling; that is, the spatial scale of the input feature map
t
relative to the input image.sampling_ratio (int) – Number of sampling points in the interpolation grid used to compute the output value of each pooled output bin.
- Returns
ROI pooled output Y, a 4-D tensor of shape (
numRois
,channels
,aligned_height
,aligned_width
) wherealigned_height
is the output height andaligned_width
is the output height. The r-th batch elementY[r-1]
is a pooled feature map corresponding to ther
-th ROIt[r-1]
.- Return type
- popxl.ops.scaled_add(X, Y, a=1.0, b=1.0)
Perform a scaled addition of two tensors.
Compute the sum of
X
scaled bya and `Y
byb
, which meansaX + bY
.Does not apply NumPy broadcasting. Uses mixed precision poplibs operations.
X
andY
must be the same shape, but can be different types.a
andb
must be scalars.- Parameters
- Returns
A tensor containing
aX + bY
.- Return type
- popxl.ops.scaled_add_(X, Y, a=1.0, b=1.0)
Perform a scaled addition of two tensors (in-place).
Compute the sum of
X
scaled bya and `Y
byb
. This is performed in place onX
, which means thatX = aX + bY
.Does not apply NumPy broadcasting. Uses mixed precision poplibs operations.
X
andY
must be the same shape, but can be different types.- Parameters
- Returns
The
X
tensor containingaX + bY
.- Return type
- popxl.ops.scatter(t, indices, values, axis=0, available_memory_proportion=None)
Update the values of multiple elements in an tensor.
The elements specified by
indices
are updated with the values invalues
.scatter
requires the three input tensor to be of the same rankr >= 1
. The optional attributeaxis
identifies the axis of the tensor along which the update will be performed. By default, the outer-most axis, axis 0, is used. The output of the operation is produced by creating a copy of the input tensor,t
, and then updating its elements to the values specified byvalues
at the index positions specified byindices
. The output shape is the same as the shape of the input tensor.For each entry in
values
, the target index int
is obtained by combining the corresponding entry inindices
with the index of the entry itself: the index-value for dimension = axis is obtained from the value of the corresponding entry in indices and the index-value for dimension != axis is obtained from the index of the entry itself.Pseudo-code example:
x1 = x.copy() scatter(x1, [1, 2, 3], [-1, -2, -3]) x2 = x.copy() x[1] = -1 x[2] = -2 x[3] = -3 x1 == x2
See also PyTorch Tensor.scatter.
- Parameters
t (Tensor) – The input tensor.
indices (Tensor) – The indices of the elements to update.
values (Tensor) – The values to update the tensor with.
axis (int) – Which axis to set on. Default is 0.
available_memory_proportion (Optional[float]) – The maximum proportion of available memory on each tile that this layer should consume temporarily during the course of the operation. Defaults to 1.0 if not set globally.
- Returns
The tensor with updated values.
- Return type
- popxl.ops.scatter_reduce(data, indices, reduction, initial_values=None, axis=0, axis_size=None, available_memory_proportion=None)
- popxl.ops.shaped_dropout(t, seed_tensor, shape, ratio)
Add a shaped dropout operation to the input tensor.
Applies a shaped dropout to the input tensor
t
. This operator requires ashape
parameter that is used to define the shape of the dropout mask so that strongly correlated features in the input tensort
can be preserved. Theshape
parameter must be broadcastable to the input tensort
. The dropout mask is created using samples from a Bernoulli distribution seeded with a seed tensorseed_tensor
.- Parameters
t (Tensor) – The Tensor to apply the shaped dropout operation to.
seed_tensor (Tensor) – The Tensor used to seed the probability distribution which generates the dropout mask. Must have data type uint32 and shape [2,].
shape (Iterable[int]) – The shape of the dropout mask. This must be broadcastable to the input tensor.
ratio (float) – The probability of dropping an input feature. Default = 0.5.
- Returns
A new tensor with the shaped dropout applied.
- Return type
- popxl.ops.sin(t)
Compute the sine of each element of the input tensor.
See also PyTorch Tensor.sin.
- popxl.ops.slice(t, start=None, stop=None, step=None, axis=None)
Select elements from a tensor using a slice or multiple slices.
A slice specifies the start (inclusive) and stop (exclusive) index of elements to select. Multiple slices can be specified using a list of items for each parameter (
start
,stop
,step
). Ifstep
is-1
, the slice is performed backwards.If
axis
is not specified, each slice will correspond to dimensions 0 toN
whereN
is the number of slices.Examples:
t == slice(t) == slice(t, axis=1) slice(t, start=1) # Slice axis 0 from start index 1 slice(t, start=[1, 2]) == slice(t, start=[1, 2], axis=[0, 1]) slice(t, stop=-2) # Slice axis 0 upto second last element (exclusive) slice( t, stop=3, step=-1 ) # Slice backwards from last element (inclusive) to third last element (exclusive)
See also ONNX Slice.
- Parameters
t (Tensor) – Tensor to slice
start (Optional[Union[int, List[Optional[int]]]]) – Index of first element (inclusive) or
None
which defaults to 0.stop (Optional[Union[int, List[Optional[int]]]]) – Index of last element (exclusive) or
None
which defaults to last element (inclusive) if step is forward or first element (inclusive) if step is backwards.step (Optional[Union[int, List[Optional[int]]]]) –
1
for forward or-1
for backwards.axis (Optional[Union[int, List[int]]]) – Axis of tensor to slice on or
None
will default to each axis sequentially.
- Returns
A tensor containing the selected slices.
- Return type
- popxl.ops.slice_(t, start=None, stop=None, step=None, axis=None)
Select elements from a tensor, in place, using a slice or multiple slices.
This is the in-place version of
slice()
. The functionality is the same, but the tensor is sliced in place.A slice specifies the start (inclusive) and stop (exclusive) index of elements to select. Multiple slices can be specified using a list of items for each parameter (
start
,stop
,step
). Ifstep
is-1
, the slice is performed backwards.If
axis
is not specified, each slice will correspond to dimensions 0 toN
whereN
is the number of slices.- Parameters
t (Tensor) – Tensor to slice
start (Optional[Union[int, List[Optional[int]]]]) – Index of first element (inclusive) or
None
which defaults to 0.stop (Optional[Union[int, List[Optional[int]]]]) – Index of last element (exclusive) or
None
which defaults to last element (inclusive) if step is forward or first element (inclusive) if step is backwards.step (Optional[Union[int, List[Optional[int]]]]) –
1
for forward or-1
for backwards.axis (Optional[Union[int, List[int]]]) – Axis of tensor to slice on or
None
will default to each axis sequentially.
- Returns
An alias of the input tensor containing the selected slices.
- Return type
- popxl.ops.softmax(t, axis)
Normalize the elements of a tensor along specified axes.
This rescales the slices of
axis
such that all elements are within the range [0, 1] and sum to 1. The output shape and dtype match the input.See also ONNX Softmax.
- popxl.ops.split(t, splits, axis=0)
Split a tensor along an axis into a list of tensors.
See also PyTorch Tensor.split, NumPy split, ONNX Split.
- Parameters
- Raises
ValueError – If the split doesn’t equally divide the tensor.
- Returns
A list of tensors.
- Return type
List[Tensor]
- popxl.ops.split_random_seed(seed, n=2)
Produce
n
random seeds from an initial seed.Chaining calls to
split_random_seed
can be used to ensure unique random behaviour across a program. For example:seed, s1 = ops.split_random_seed(seed) y = ops.dropout(x, s1) seed, s2 = ops.split_random_seed(seed) z = ops.dropout(y, s2)
- popxl.ops.sqrt(t)
Compute the square root of the elements of a tensor.
If
t
is negative, then this will return NaN.
- popxl.ops.squeeze(t, axes=None)
Remove axes of length one from the tensor.
Takes an input
axes
with a list of axes to squeeze. Ifaxes
is not provided, all the single dimensions will be removed from the shape. If an axis is selected with shape entry not equal to one, an error is raised. Implemented usingreshape
under the hood.See also PyTorch Tensor.squeeze, NumPy squeeze, ONNX Squeeze.
- Parameters
- Raises
ValueError – A ValueError is raised if: - The axes contains duplicates. - The axis cannot be squeezed.
- Returns
The squeezed tensor.
- Return type
- popxl.ops.sub(lhs, rhs)
Subtract two tensors elementwise.
Follows NumPy broadcasting rules. Arguments must have the same dtype.
See also PyTorch Tensor.sub, ONNX Sub.
- popxl.ops.subsample(t, strides)
Subsamples a tensor by selecting every n’th element from each dimension. The subsample count N is provided for each dimension.
- Parameters
- Returns
A subsampled output tensor.
- Return type
- Raises
ValueError – Thrown if the length of the strides list is larger than the rank of the input tensor.
- popxl.ops.sum(t, axis=None, keepdims=False)
Sum elements over an axis.
See also PyTorch Tensor.sum, NumPy sum, ONNX Sum.
- Parameters
- Returns
The reduced tensor.
- Return type
- popxl.ops.sumsquare(t, axis=None, keepdims=False)
Compute the sum of the squares of tensor elements over an axis.
- Parameters
t (Tensor) – Tensor to compute the sum of squares from.
axis (int or list) – Axis or axes over which to compute the sum of squares. If none is provided all axes will be reduced. If axis is negative it counts from the last to the first axis.
keepdims (bool) – Keep the axis that is being reduced or not.
- Returns
The reduced tensor.
- Return type
- popxl.ops.swish(t)
Compute the Swish activation of a tensor.
For more details, refer to Rectifier (neural networks).
- popxl.ops.swish_(t)
Compute the Swish activation of a tensor in place.
For more details, refer to Rectifier (neural networks).
- popxl.ops.tanh(t)
Compute the hyperbolic tangent function elementwise on a tensor.
See also PyTorch Tensor.tanh, NumPy tanh, ONNX Tanh.
- popxl.ops.tied_gather(t, indices, axis=0, available_memory_proportion=None, zero_OOR=False)
Select multiple elements from an array.
Elements are specified given by
indices
, along a specified axis. Equivalent tonumpy.take()
. Note that this is different fromtorch.gather()
.Numerically the same as the
gather
op but does not specify the tile layout of theindices
tensor. When preceding amatmul
op the tile layout of the indices is determined by thematmul
, not thetied_gather
. This has a has lower memory footprint but costs extra cycles due to the exchange.Examples:
x = popxl.variable(np.arange(16).reshape(4, 4)) # [[ 0, 1, 2, 3], # [ 4, 5, 6, 7], # [ 8, 9, 10, 11], # [12, 13, 14, 15]] gather(x, [3, 1, 2]) == Tensor([x[3], x[1], x[2]]) # [[12, 13, 14, 15], # [ 4, 5, 6, 7], # [ 8, 9, 10, 11]] gather(x, [[0, 1], [1, 2]]) == gather(x, [0, 1, 1, 2]).reshape(2, 2, 4) # [[[ 0, 1, 2, 3], # [ 4, 5, 6, 7]], # [[ 4, 5, 6, 7], # [ 8, 9, 10, 11]]]
- Parameters
t (Tensor) – The input tensor.
indices (Tensor) – The indices of the elements to extract.
axis (int) – The axis to gather on. The default is 0.
available_memory_proportion (Optional[float]) – The maximum proportion of available memory on each tile that this layer should consume temporarily during the course of the operation. Defaults to 1.0 if not set globally.
zero_OOR (bool) – If False, out of range (OOR) indices will produce garbage data. If True, OOR indices will produce zeros.
- Returns
The gathered elements concatenated.
- Return type
- popxl.ops.topk(t, k, axis, largest, sorted)
Retrieve the top-K largest or smallest elements along a specified axis.
See also PyTorch torch.topk, ONNX TopK.
- Parameters
- Returns
A tuple of output values and indices.
- Return type
- popxl.ops.transpose(t, permutation=None)
Permute the axes of a tensor.
By default this operation reverses the axes of
t
.See also PyTorch Tensor.transpose, NumPy transpose, ONNX Transpose.
- popxl.ops.transpose_(t, permutation=None)
Permute the axes of a tensor in place.
By default this operation reverses the axes of
t
.This is the in-place version of
transpose()
. The behaviour is the same, but it modifies the tensor in place.See also PyTorch Tensor.transpose_.
- popxl.ops.where(condition, lhs, rhs)
Elementwise selection based on satisfying a condition.
Choose elements from
lhs
orrhs
depending on whether the corresponding element incondition
is satisfied or not. The operator supports multi-directional broadcasting (NumPy style).See also PyTorch Tensor.where, NumPy where, ONNX Where.
- Parameters
- Returns
The tensor containing elementwise
lhs if condition else rhs
.- Return type
- class popxl.ops.collectives.CommGroup
Class to specify sub-groups of replicas.
Examples of derived sub-groups: - IPU-link domain sub-rack:
where N is power of two and replicaGroupSize > 1.
Complete IPU-link domain / full rack:
Using GW-links only:
- __init__(*args, **kwargs)
Overloaded function.
__init__(self: popart_internal_ir.CommGroup) -> None
__init__(self: popart_internal_ir.CommGroup, type: popart_internal_ir.CommGroupType, replicaGroupSize: int) -> None
__init__(self: popart_internal_ir.CommGroup, grouping: popart_internal_ir.ReplicaGrouping) -> None
- property replicaGroupSize
Replica group size.
- toReplicaGrouping(self: popart_internal_ir.CommGroup, numReplicas: int) popart_internal_ir.ReplicaGrouping
- property type
Replica group type.
- class popxl.ops.collectives.CommGroupType
PopART equivalent of GCL CommGroupType. Each of these enumeration constants have a corresponding GCL CommGroupType value.
Members:
All : All replicas viewed as one group, replica group size is ignored. */
Consecutive : Groups are consecutive in replica.
If there are N replicas denoted {0, … N-1} and group size is k, then there are N/k groups of size k:
{0, 1, … k-1}, {k, … 2k-1} … {N-k-1, … N-1}
Orthogonal : Groups are sliced orthogonal to the replica ordering.
If there are N replicas denoted {0, … N-1} and group size is k, then there are m = N/k groups of size k:
{0, m, 2m, …}, {1, m+1, 2m+1, …} … {m-1, 2m-1, … N-1}
Ungrouped : Each replica is in it’s own group, replica group size is ignored. */
- All = <CommGroupType.All: 0>
- Consecutive = <CommGroupType.Consecutive: 1>
- Orthogonal = <CommGroupType.Orthogonal: 2>
- Ungrouped = <CommGroupType.Ungrouped: 3>
- __init__(self: popart_internal_ir.CommGroupType, value: int) None
- property name
- property value
- popxl.ops.collectives.all_reduce(ts, ipus=None, op='add')
Allreduce tensors across IPUs within a replica.
Currently only the
add
reduce op is supported by autodiff.- Parameters
- Returns
Output Tensors. The data of each tensor is identical on the IPUs corresponding to
ipus
- Return type
List[Tensor]
- popxl.ops.collectives.all_reduce_identical_grad_inputs(ts, ipus=None, op='add')
Allreduce tensors across IPUs within a replica where the grad tensors of the corresponding grad op are identical.
This means that this op is an all-reduce and the corresponding grad op an identity.
Currently only the
add
reduce op is supported by autodiff.The
AllReduceToIdentityPattern
pattern must be run for this op to function correctly.- Parameters
- Returns
Output Tensors. Each Tensors data is identical on a IPU corresponding to
ipus
- Return type
List[Tensor]
- popxl.ops.collectives.all_reduce_identical_inputs(ts, ipus=None, op='add')
Allreduce tensors across IPUs within a replica where the input tensors are identical.
This means the op is an identity but the corresponding grad op is an allreduce.
Currently only the
add
reduce op is supported by autodiff.The
AllReduceToIdentityPattern
pattern must be run for this op to function correctly.- Parameters
- Returns
Output Tensors. Each Tensors data is identical on a IPU corresponding to
ipus
- Return type
List[Tensor]
- popxl.ops.collectives.replica_sharded_slice(t, group=None)
Take the replicated tensor sharded slice of a Tensor.
- popxl.ops.collectives.replicated_all_gather(t, axis=0, group=None, output_shape='auto')
Gather a tensor across replicas such that the output tensor contains the values of the tensor from each replica.
The shape of the output tensor is determined by the value of
output_shape
:new_axis
: the output shape is(group.size, *t.shape)
concat
: the output shape has the same behavior as concat onaxis
meta_shape
: the output shape ist.meta_shape
auto
: if the input has a meta-shapemeta_shape
is chosen, otherwiseconcat
This op is auto-differentiable and it’s corresponding grad op is an replicated_slice (except when
output_shape==meta_shape
).- Parameters
t (Tensor) – Tensor to be gathered.
axis (int) – Axis to gather and concatenate values when using ‘concat’ mode
group (Optional[ReplicaGrouping]) – Replicas to gather from. Defaults to All replicas.
output_shape (str) – see above for details. Choose ‘new_axis’, ‘concat’, ‘meta_shape’ or ‘auto’.
- Returns
Gathered tensor.
- Return type
- Raises
ValueError – if
output_shape
is not one of ‘new_axis’, ‘concat’, ‘meta_shape’ or ‘auto’.
- popxl.ops.collectives.replicated_all_reduce(t, op='add', group=None)
Reduce a tensor across replicas.
- Parameters
t (Tensor) – Tensor to be reduced
op (str, optional) – Operation to reduce with. Defaults to ‘add’. Options: ‘add’, ‘mean’, ‘mul’, ‘min’, ‘max’, ‘and’, ‘or’, ‘square_add’.
group (Optional[ReplicaGrouping]) – Replicas to reduce across. Defaults to All replicas.
- Returns
Reduced tensor
- Return type
- popxl.ops.collectives.replicated_all_reduce_(t, op='add', group=None)
Reduces tensor
t
across replicas inplace ont
.- Parameters
t (Tensor) – Tensor to be reduced
sharding. (operations for replicated tensor) –
op (str, optional) – Operation to reduce with. Defaults to ‘add’. Options: ‘add’, ‘mean’, ‘mul’, ‘min’, ‘max’, ‘and’, ‘or’, ‘square_add’.
group (Optional[ReplicaGrouping]) – Replicas to reduce across. Defaults to All replicas.
- Returns
Reduced tensor
- Return type
- popxl.ops.collectives.replicated_reduce_scatter(t, op='add', group=None, configure_output_for_replicated_tensor_sharding=False)
Reduce a tensor across replicas with each replica receiving a unique slice of the tensor.
- Parameters
t (Tensor) – Tensor to be reduced. Inputs will be flattened.
op (str, optional) – Operation to reduce with. Defaults to ‘add’. Options: ‘add’, ‘mean’, ‘mul’, ‘min’, ‘max’, ‘and’, ‘or’, ‘square_add’.
group (Optional[CommGroup]) – Replicas to reduce across. Defaults to All replicas.
configure_output_for_replicated_tensor_sharding (Optional[bool]) – Configures the output to be a replica sharded tensor. Defaults to false. Replicated tensor sharded tensors do not follow the data element order of the original tensor, and can only be used in operations that belong to the same replicated tensor sharding group, where all tensor inputs follow the same data order.
- Returns
A slice of the reduced tensor. Always a 1D tensor.
- Return type
- popxl.ops.collectives.replicated_slice(t, axis=0, group=None)
Each replica takes a equal slice of
t
split along axisaxis
. e.g. ift
has shape(2,4)
, there are two replicas andaxis==0
: the first replica will output[0:1, ...]
and the second replica[1:2, ...]
.This op is similar to
replica_sharded_slice
but differs in that it maintains the output shape and does not configure the output for replicated tensor sharding.This op is auto-differentiable and it’s corresponding grad op is an replicated_all_gather.
- Parameters
t (that slice) – Tensor to split
axis (int) – Axis to slice along
group (Optional[ReplicaGrouping]) – Replica grouping that determines group of replicas
t –
- Returns
A slice of the tensor.
- Return type
- Raises
ValueError – if the group size does not equally divide the axis size
- popxl.ops.var_updates.accumulate_(t, X, f=None)
Update (in-place) tensor
t
given updater valuesX
and a factorf
according tot = t + (f * X)
.Does not apply NumPy broadcasting. Uses mixed precision PopLibs operations.
t
andX
must have the same shape, but can be different types.f
must be scalar.
- popxl.ops.var_updates.accumulate_mean_(t, X, step)
Update (in-place) tensor
t
given updater valuesX
and a factorf
according tot = (step/(step+1)) * t + (1/(step+1)) * X
.Intended to be used to keep track of the mean of a series of values.
For example:
with g: accum = popxl.variable(0, dtype=popxl.float32) a = popxl.variable(1, dtype=popxl.float32) b = popxl.variable(2, dtype=popxl.float32) accumulate_mean(accum, a, 0.0) accumulate_mean(accum, b, 1.0)
will result in
accum
having the value(a+b)/2 = 1.5
.Does not apply NumPy broadcasting. Uses mixed precision PopLibs operations.
t
andX
must have the same shape, but can be different types.step
must be scalar.
- popxl.ops.var_updates.accumulate_moving_average_(t, X, f)
Update (in-place) tensor
t
given updater valuesX
and a factorf
according tot = (f * t) + ((1-f) * X)
.Does not apply NumPy broadcasting. Uses mixed precision PopLibs operations.
t
andX
must have the same shape, but can be different types.f
must be scalar.
- popxl.ops.var_updates.accumulate_moving_average_square_(t, X, f)
Update (in-place) tensor
t
given updater valuesX
and a factorf
according tot = (f * t) + ((1-f) * X^2)
.Does not apply NumPy broadcasting. Uses mixed precision PopLibs operations.
t
andX
must have the same shape, but can be different types.f
must be scalar.
- popxl.ops.var_updates.accumulate_square_(t, X, f=1.0)
Update (in-place) tensor
t
given updater valuesX
and a factorf
according tot = t + (f * X^2)
.Does not apply NumPy broadcasting. Uses mixed precision PopLibs operations.
t
andX
must have the same shape, but can be different types.f
must be scalar.
- popxl.ops.var_updates.accumulator_scale_(t, f)
Scale a tensor in-place.
This op will directly zero the input tensor if the factor is const and 0.
Does not apply NumPy broadcasting. Uses mixed precision PopLibs operations.
- Parameters
- Returns
An alias to the updated tensor.
- Return type
- popxl.ops.var_updates.accumulator_zero_(t)
Zero the input tensor.
This is an AccumulatorScaleOp with a factor of 0, and this zeroes the input tensor.
- popxl.ops.var_updates.adam_updater(acc_first_order, acc_second_order, weight=None, time_step=None, weight_decay=None, beta1=None, beta2=None, epsilon=1e-07)
Calculate an updater term to update the weights for Adam.
Accumulated bias corrected first order momentum (FP16/FP32)
mc
:mc = m / (1 - b1 ** t)
Without correction:
mc = m
Accumulated bias corrected second order momentum (FP16/FP32)
vc
:vc = v / (1 - b2 ** t)
Without correction:
vc = v
Updater term (FP16/FP32, with weight decay mode:
decay >0.0
andwd > 0.0
)x
:x = mc / (sqrt(vc) + eps) + wd * w
Updater term (FP16/FP32, without weight decay mode:
decay
)x
:x = mc / (sqrt(vc) + eps)
Note
time_step
will be incremented by 1.- Parameters
acc_first_order (Tensor) – Tensor (
m
) First order momentum (FP16/FP32).acc_second_order (Tensor) – Tensor (
v
) Second order momentum (FP16/FP32).weight (Optional[Tensor]) – Optional[Tensor] (
w
) Weight. Only required forweight_decay
.time_step (Optional[Tensor]) – Tensor (
t
) Time step. Providing this tensor enables bias correction.weight_decay (Optional[Union[float, Tensor]]) – Optional[Union[float, Tensor]] = None Optional scalar to apply weight decay.
beta1 (Optional[Union[float, Tensor]]) – Optional[Union[float, Tensor]] = None Only required in bias correction for
m
beta2 (Optional[Union[float, Tensor]]) – Optional[Union[float, Tensor]] = None Only required in bias correction for
v
epsilon (Union[float, Tensor]) – Union[float, Tensor] = 1e-07 Scalar to calculate updater.
- Raises
ValueError – If
weight_decay
is set andweight
is None.ValueError – If
time_step
set to None andbeta1
andbeta2
are not set (no bias correction can take place).
- Returns
An updater to update the weight for Adam.
- Return type
- popxl.ops.var_updates.adam_var_update(t, x, r1, r2, learning_rate=None, max_weight_norm=None)
Calculate the updated weight tensor for Adam or LAMB.
x
= updater term (seeadamupdater()
)lr
= learning ratemax_weight_norm
= max weight norm (c.f. \(\phi\) or scaling function in Lamb paper)r1
= (Lamb) L2 norm of the weight (w
)r2
= (Lamb) L2 norm of the updater term (x
)
Lamb r1 (FP32): \(r1 = ||w||_2\) (without Lamb or \(\phi (r1) == 0: r1/r2 = 1\))
Special case: replicated weight sharding; every replica only stores a shard of
w
, therefore the sum-of-squares is computed replicated, and thereafter all-reduced before every replica takes the square root ofr1sq
.Lamb r2 (FP32): \(r2 = ||x||_2\) (without Lamb or \(r2 == 0: r1/r2 = 1\))
Special case: replicated weight sharding; every replica only stores a shard of
x
, therefore the sum-of-squares is computed replicated, and thereafter all-reduced before every replica takes the square root ofr2sq
.Scale factor: \(\phi (r1) = min(r1, max_weight_norm)\)
Variable update: \(w -= (\phi (r1) / r2) * lr * x\) where \(\phi (r1) / r2\) is the Lamb trust ratio.
- Parameters
t (Tensor) – The weight to update.
x (Tensor) – The updater term.
r1 (Tensor) – The
r1
squared input tensor.r2 (Tensor) – The
r2
squared input tensor.learning_rate (Optional[Union[float, Tensor]]) – Optional learning rate tensor to use. Will be constant if this argument is a float or None. Defaults to None.
max_weight_norm (Optional[Union[float, Tensor]]) – Optional max weight tensor to use. Will be constant if this argument is is a float or None. Defaults to None.
- Returns
The updated weight tensor.
- Return type
- popxl.ops.var_updates.adamax_updater(acc_first_order, acc_second_order, weight=None, time_step=None, weight_decay=None, beta1=0.9, epsilon=1e-07)
Calculate an updater term to update the weights for Adamax.
Accumulated bias corrected first order momentum (FP16/FP32)
mc
:mc = m / (1 - b1 ** t)
Updater term (FP16/FP32, with weight decay mode:
decay > 0.0
andwd > 0.0
)x
:x = mc / (vc + eps) + wd * w
Updater term (FP16/FP32, without weight decay mode: decay)
x
:x = mc / (vc + eps)
Note
time_step
will be incremented by 1.- Parameters
acc_first_order (Tensor) – First order momentum (FP16/FP32) (
m
).acc_second_order (Tensor) – Second order momentum (FP16/FP32) (
v
).weight (Optional[Tensor]) – Weight (
w
). Only required forweight_decay
.time_step (Tensor) – Time step (
t
).weight_decay (Optional[Union[float, Tensor]]) – Optional scalar to apply weight decay. Defaults to None
beta1 (Union[float, Tensor]) – Scalar to do bias correction for
m.
Defaults to 0.9epsilon (Union[float, Tensor]) – Scalar to calculate updater. Defaults to 1e-07
- Raises
ValueError – If
weight_decay
is set andweight
is None.ValueError – If
time_step
is None.
- Returns
An updater to update the weight for Adamax.
- Return type
- popxl.ops.var_updates.copy_var_update_(t, X)
Update a tensor in-place by copying the tensor containing the updater values.
- popxl.ops.var_updates.lamb_updater(acc_first_order, acc_second_order, weight=None, time_step=None, weight_decay=None, beta1=None, beta2=None, epsilon=1e-07)
Calculate an updater term to update the weights for LAMB.
Accumulated bias corrected first order momentum (FP16/FP32)
mc
:mc = m / (1 - b1 ** t) (without correction: mc = m)
Accumulated bias corrected second order momentum (FP16/FP32)
vc
:vc = v / (1 - b2 ** t) (without correction: vc = v)
Updater term (FP16/FP32, with weight decay mode:
decay > 0.0
andwd > 0.0
)x
:x = mc / (sqrt(vc) + eps) + wd * w
Updater term (FP16/FP32, without weight decay mode: decay)
x
:x = mc / (sqrt(vc) + eps)
Note
time_step
will be incremented by 1.- Parameters
acc_first_order (Tensor) – First order momentum (FP16/FP32) (
m
).acc_second_order (Tensor) – Second order momentum (FP16/FP32) (
v
).weight (Optional[Tensor], optional) – Weight (
w
). Only required forweight_decay
. Defaults to None.time_step (Optional[Tensor], optional) – Time step (
t
). Providing this tensor enables bias correction. Defaults to None.weight_decay (Optional[Union[float, Tensor]], optional) – Optional scalar to apply weight decay. Defaults to None.
beta1 (Optional[Union[float, Tensor]], optional) – Only required in bias correction for
m
. Defaults to None.beta2 (Optional[Union[float, Tensor]], optional) – Only required in bias correction for
v
. Defaults to None.epsilon (Union[float, Tensor], optional) – Scalar to calculate updater. Defaults to 1e-07.
- Raises
ValueError – If
weight_decay
is set andweight
is None.ValueError – If
time_step
is set to None andbeta1
andbeta2
are not set (no bias correction can take place).
- Returns
An updater to update the weight for LAMB.
- Return type
- popxl.ops.var_updates.sparse_accumulate_(t, X, indices, axis=0, f=None, W=None)
Apply a sparse accumulate operation to a tensor.
Does not apply NumPy broadcasting. Uses mixed precision PopLibs operations.
t
andX
must have the same shape, but can be different types.Detail:
Assume you have:
w -> Gather -> x
and when the optimiser step is grown:
dW <- GatherGrad <- x \ Accumulate -> accum' / accum
GatherGrad is essentially a scatter operation. Then we Accumulate the resultant
dW
onaccum
. This involves creating an extradW
tensor, so we can do the following instead:x | V accum -> SparseAccumulate -> accum'
SparseAccumulate can accumulate the slices of
x
intoaccum
as required, in one operation, without extra requiring extra memory.When calling this op, the input tensor
W
is an optional input. This can be used when two different views of the weight are consumed in the forward pass, and one of those ops is a Gather, thus requiring a SparseAccumulate in the weight update step.We connect the op to the other view of the weight instead of the view this SparseAccumulate is for. Then, the lowering will clone that tensor (and its layout) when creating
accum
.- Parameters
t (Tensor) – Tensor to be updated
X (Tensor) – Value to update the tensor with.
indices (Tensor) – The indices of the scatter operation.
axis (int, optional) – Which axis to set on. Default is 0.
f (Optional[Union[float, Tensor]], optional) – Optional scalar to apply to update before the addition. Defaults to None.
W (Optional[Tensor], optional) – Tile mapping reference tensor for
t
to be cloned from.
- Returns
An alias to the updated tensor.
- Return type