19.15. Ops available in PopXL

class popxl.ops.CallSiteInfo(subgraph_op)

Information relating to a parent graph calling a subgraph, for example using a call op or repeat op.

This is a convenience class for extracting information about the callsite and its subgraph.

Parameters

subgraph_op (Union[CallOp, LoopOp]) –

__init__(subgraph_op)
Parameters

subgraph_op (Union[CallOp, LoopOp]) –

property called_graph: popxl.graph.Graph

Get the called graph.

graph_to_parent(graph_tensor)

Get the tensor in the parent graph using the tensor in the called graph.

Both input and output tensors can be used

Parameters

graph_tensor (Tensor) – The tensor in the called graph.

Raises

ValueError – If graph_tensor is not an input or output of the called graph.

Returns

The associated input or output tensor on the CallOp.

Return type

Tensor

graph_to_parent_input_index(idx)

Get the parent graph input tensor index given the graph input tensor index.

Parameters

idx (int) –

Return type

int

graph_to_parent_output_index(idx)

Get the parent graph output tensor index given the graph output tensor index.

Parameters

idx (int) –

Return type

int

property inputs: Tuple[popxl.tensor.Tensor, ...]

Get the parent graph inputs.

Returns

Tuple[Tensor, …]

property outputs: Tuple[popxl.tensor.Tensor, ...]

Get the parent graph outputs.

Returns

Tuple[Tensor, …]

parent_input(idx)

Get the parent graph input tensor at a given index.

Parameters

idx (int) –

Return type

Tensor

parent_output(idx)

Get the parent graph output tensor at a given index.

Parameters

idx (int) –

Return type

Tensor

parent_to_graph(parent_tensor)

Get the input tensor in the called graph using the input tensor in the parent graph.

If the parent_tensor has been used multiple times as an input only the first instance is returned.

Parameters

parent_tensor (Tensor) – The tensor from the parent graph.

Raises

ValueError – If parent_tensor is not an input to the CallOp.

Returns

The tensor in the called_graph.

Return type

Tensor

parent_to_graph_input_index(idx)

Get the graph input tensor index given the parent graph input tensor index.

Parameters

idx (int) –

Return type

int

parent_to_graph_output_index(idx)

Get the graph output tensor index given the parent graph output tensor index.

Parameters

idx (int) –

Return type

int

set_parent_input_modified(parent_tensor, infer_modified_regions=False)

Specify that the parent graph’s input tensor parent_tensor is modified by the call op.

This will guarantee that any modification to the graph input during the execution of the called graph will also change parent_tensor.

Parameters
  • parent_tensor (Tensor) – Input tensor in parent graph to be modified.

  • infer_modified_regions (bool) – Whether or not to infer the modified regions.

  • op_tensor (Tensor) – Tensor to be modified.

  • infer_modified_regions – Set the modified regions from the Ops in the called graph.

popxl.ops.abs(t)

Compute the absolute value of each element of the input tensor.

See also PyTorch Tensor.abs.

Parameters

t (Tensor) – Input tensor

Returns

Output tensor

Return type

Tensor

popxl.ops.add(lhs, rhs)

Add two tensors elementwise.

Follows NumPy broadcasting rules. Arguments must have the same dtype.

See also PyTorch Tensor.add, NumPy add, ONNX Add.

Parameters
  • lhs (Tensor) – Tensor to be added.

  • rhs (Tensor) – Tensor to be added.

Returns

The sum of the input tensors.

Return type

Tensor

popxl.ops.add_(lhs, rhs)

Add two tensors elementwise in place, in the lhs tensor. Follows NumPy broadcasting rules. Arguments must have the same dtype.

Note: There is no operation that adds to the rhs tensor in place. Use add_(rhs, lhs) or rhs += lhs for the same functionality.

See also PyTorch Tensor.add_.

Parameters
  • lhs (Tensor) – Tensors to be added.

  • rhs (Tensor) – Tensors to be added.

Returns

The lhs tensor with rhs added in place.

Return type

Tensor

popxl.ops.argmax(t, dim=0, keepdim=False)

Compute the argmax of a tensor.

Compute the indices of the max elements of the input tensor’s element along the provided axis. The resulting tensor has the same rank as the input if keepdim is True. If keepdim is False, then the resulting tensor has the reduced dimension pruned.

See also PyTorch Tensor.argmax, NumPy argmax, ONNX ArgMax.

Parameters
  • t (Tensor) – Input data tensor.

  • dim (int) – The axis in which to compute the arg indices.

  • keepdim (bool) – Keep the reduced dimension or not, True means keep reduced dimension.

Returns

The indices of the maximum values of a tensor across a dimension.

Return type

Tensor

popxl.ops.argmin(t, dim=0, keepdim=False)

Compute the argmin of a tensor.

Compute the indices of the min elements of the input tensor’s element along the provided axis. The resulting tensor has the same rank as the input if keepdim is True. If keepdim is False, then the resulting tensor has the reduced dimension pruned.

See also PyTorch Tensor.argmin, NumPy argmin, ONNX ArgMin.

Parameters
  • t (Tensor) – Input data tensor.

  • dim (int) – The axis in which to compute the arg indices.

  • keepdim (bool) – Keep the reduced dimension or not, True means keep reduced dimension.

Returns

The indices of the minimum values of a tensor across a dimension.

Return type

Tensor

popxl.ops.average_pool(t, kernel_size, stride=None, padding=None, out_pads=None, dilation=None, in_dilations=None, auto_pad='not_set', ceil_mode=False)

Average pool a tensor.

average_pool consumes an input tensor t and applies average pooling across the tensor according to kernel sizes, stride sizes, and pad lengths. Average pooling consisting of computing the average on all values of a subset of the input tensor according to the kernel size and downsampling the data into the output tensor Y for further processing.

Parameters
  • t (Tensor) –

    Input data tensor from previous layer.

    • If the input is a 3D tensor, the size is (N, C, L), where:

      • N is the batch size,

      • C is the number of channel,

      • L is the length.

    • If the input is a 2D image, the size is (N, C, H, W), where:

      • N is the batch size,

      • C is the number of channel,

      • H and W are the height and width.

    If the input is a 3D, image the size is (N, C, D, H, W), where:

    • N is the batch size,

    • C is the number of channel, D is the depth,

    • H and W are the height and width.

  • kernel_size (Tuple[int]) – The size of the kernel along each axis.

  • stride (Tuple[int]) – Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.

  • padding (Tuple[int]) –

    Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0.

    The value represents the number of pixels added to the beginning and end part of the corresponding axis. padding format should be as follows: [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i.

  • out_pads (Tuple[int]) – The output padding for pooling.

  • dilation (Tuple[int]) – dilation value along each spatial axis of the filter.

  • in_dilations (Tuple[int]) – The input dilations attributes along each spatial axis of the filter.

  • auto_pad (Literal) – auto_pad must be either “not_set”, “same_upper”, “same_lower” or “valid”. The default value is “not_set”, which means explicit padding is used. “same_upper” or “same_lower” mean pad the input so that output_shape[i] = ceil(input_shape[i] / strides[i]) for each axis i. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In the case that the padding is an odd number, the extra padding is added at the end for “same_upper” and at the beginning for “same_lower”.

  • ceil_mode (bool) – When True, will use ceil instead of floor to compute the output shape.

Returns

Output data tensor from average pooling across the input tensor. Dimensions will vary based on various kernel, stride, and pad sizes. Floor value of the dimension is used.

Return type

Tensor

popxl.ops.batch_norm_inference(t, scale, bias, mean, var, epsilon=1e-05, momentum=0.9)

Apply batch normalisation to a tensor in an inference setting.

For more details, refer to the paper Group Normalization.

Parameters
  • t (Tensor) – Tensor to be normalized.

  • scale (Tensor) – Tensor used to scale the result of normalisation.

  • bias (Tensor) – Tensor used to shift the result of normalisation.

  • mean (Tensor) – Mean estimate.

  • var (Tensor) – Variance estimate.

  • epsilon (float) – small quantity for avoidance of div-by-zero when variance is zero.

  • momentum (float) – coefficient for the exponential moving average (not used in inference).

Returns

The batch normalised tensor.

Return type

Tensor

popxl.ops.call(graph, *inputs, inputs_dict=None)

Call a graph.

The inputs and inputs_dict tensors are passed as graph inputs. You can specify an input either positionally using inputs or via a tensor map using inputs_dict.

Graph inputs are determined when the graph was created using ir.create_graph(callable, ...). The order of inputs in will be the same as the order of the tensor inputs in the function signature and the order of called popxl.graph_inputs.

See create_graph() for more information.

Parameters
Returns

Tuple of the output tensors of the call in the parent graph.

Return type

Tuple[Tensor, …]

popxl.ops.call_with_info(graph, *inputs, inputs_dict=None, check_inputs=True)

Call a graph and return information about the call site.

The inputs and inputs_dict tensors are passed as graph inputs. You can specify an input either positionally using inputs or via a tensor map using inputs_dict. This op returns CallSiteInfo that can be used to inspect call site inputs/outputs.

Graph inputs are determined when the graph was created using ir.create_graph(callable, ...).

The order of inputs will be the same as the order of the tensor inputs in the function signature and the order of called popxl.graph_inputs.

See create_graph() for more information.

Parameters
Raises
  • ValueError – A ValueError will be raised if: - An incorrect number of inputs have been provided - A parent input tensor is not in the parent graph - A graph input tensor is specified twice

  • TypeError – A TypeError will be raised if: - Graph input tensor is specified twice - If a graph input cannot be coerced into a tensor

Returns

CallSiteInfo

Information on the created callsite.

Return type

info

popxl.ops.cast(t, data_type)

Cast a tensor to a specific data type.

This operation casts tensor t to data type data_type.

See also ONNX Cast.

Parameters
Raises

TypeError – If data_type is of type float8_143 or float8_152.

Returns

The tensor cast to the specified type.

Return type

Tensor

popxl.ops.ceil(t)

Compute the ceil of the elements of input tensor. NaN values are propagated.

Parameters

t (Tensor) – Input tensor.

Returns

ceil output

Return type

Tensor

popxl.ops.clamp(t, min=- inf, max=inf)

Clip all elements so they are within the range [min, max]. NaN values are propagated.

Parameters
  • t (Tensor) – Input Tensor.

  • min (float) – min output range

  • max (float) – max output range

Returns

clipped tensor.

Return type

Tensor

popxl.ops.clip(t, min=- inf, max=inf)

Clip all elements so they are within the range [min, max]. NaN values are propagated.

Parameters
  • t (Tensor) – Input Tensor.

  • min (float) – min output range

  • max (float) – max output range

Returns

clipped tensor.

Return type

Tensor

popxl.ops.concat(ts, axis=0)

Concatenate tensors along an axis. The result will be copied to a new tensor.

See also ONNX Concat.

Parameters
  • ts (Iterable[Tensor]) – Tensors to be concatenated.

  • axis (int) – Axis of ts to concatenate along.

Returns

The concatenated tensors.

Return type

Tensor

popxl.ops.concat_(ts, axis=0)

Concatenate tensors along an axis.

The result will alias both of the input tensors.

Parameters
  • ts (Iterable[Tensor]) – Tensors to be concatenated.

  • axis (int) – Axis of ts to concatenate along.

Returns

The concatenated tensors.

Return type

Tensor

popxl.ops.conditional(cond, then_branch, else_branch, then_inputs=None, else_inputs=None, then_inputs_dict=None, else_inputs_dict=None)

Execute then_branch or else_branch according to the value of tensor cond at runtime.

The then/else_inputs and then/else_inputs_dict tensors are passed as then/else_branch inputs. You can specify a then/else_input either positionally using then/else_inputs or via a tensor map using then/else_inputs_dict.

Graph inputs are determined when the graph was created using ir.create_graph(callable, ...).

The order of inputs will be the same as the order of the tensor inputs in the function signature and the order of called popxl.graph_inputs.

See create_graph() for more information.

Parameters
  • cond (Tensor) – A boolean single-value tensor. If true the then_branch is executed otherwise the else_branch is executed.

  • then_branch (Graph) – Graph to run if condition is true.

  • else_branch (Graph) – Graph to run if condition is false.

  • then_inputs (Optional[Iterable[Union[Tensor, Iterable[Tensor]]]]) – Provide inputs to then_branch via position, then_inputs follow the same rules as inputs in call and repeat op.

  • else_inputs (Optional[Iterable[Union[Tensor, Iterable[Tensor]]]]) – Provide inputs to else_branch via position, else_inputs follow the same rules as inputs in call and repeat op.

  • then_inputs_dict (Optional[Mapping[Tensor, Tensor]]) – Provide inputs to then_branch via a tensor map. Mapping of graph tensor -> parent tensor, then_inputs_dict follow the same rules as inputs_dict in call and repeat op.

  • else_inputs_dict (Optional[Mapping[Tensor, Tensor]]) – else_inputs_dict follow the same rules as inputs_dict in call and repeat op.

Raises
  • ValueError – If: - An incorrect number of inputs have been provided. - A parent input tensor is not in the parent graph. - A graph input tensor is specified twice.

  • TypeError – If: - A graph input tensor is specified twice. - A graph input cannot be coerced into a tensor.

Returns

The values that are live after the execution of the conditional. The return values in then_branch and else_branch must be of the same data type. The number of the outputs in then_branch and else_branch must be equal. The shape of the input and outputs in then_branch and else_branch must also be the same.

Return type

Tuple[Tensor, …]

popxl.ops.conditional_with_info(cond, then_branch, else_branch, then_inputs=None, else_inputs=None, then_inputs_dict=None, else_inputs_dict=None, check_inputs=True)

Execute then_branch or else_branch according to the value of tensor cond at runtime and return the call site info.

The then/else_inputs and then/else_inputs_dict tensors are passed as then/else_branch inputs. You can specify a then/else_input either positionally using then/else_inputs or via a tensor map using then/else_inputs_dict.

Graph inputs are determined when the graph was created using ir.create_graph(callable, ...).

The order of inputs will be the same as the order of the tensor inputs in the function signature and the order of called popxl.graph_inputs.

See create_graph() for more information.

Parameters
  • cond (Tensor) – A boolean single-value tensor. If true the then_branch is executed otherwise the else_branch is executed.

  • then_branch (Graph) – Graph to run if condition is true.

  • else_branch (Graph) – Graph to run if condition is false.

  • then_inputs (Optional[Iterable[Union[Tensor, Iterable[Tensor]]]]) – Provide inputs to then_branch via position, then_inputs follow the same rules as inputs in call and repeat op.

  • else_inputs (Optional[Iterable[Union[Tensor, Iterable[Tensor]]]]) – Provide inputs to else_branch via position, else_inputs follow the same rules as inputs in call and repeat op.

  • then_inputs_dict (Optional[Mapping[Tensor, Tensor]]) – Provide inputs to then_branch via a tensor map. Mapping of graph tensor -> parent tensor, then_inputs_dict follow the same rules as inputs_dict in call and repeat op.

  • else_inputs_dict (Optional[Mapping[Tensor, Tensor]]) – else_inputs_dict follow the same rules as inputs_dict in call and repeat op.

  • check_inputs (bool) – Check when called if all inputs have been provided to both graphs. Defaults to True.

Raises
  • ValueError – If: - An incorrect number of inputs have been provided. - A parent input tensor is not in the parent graph. - A graph input tensor is specified twice.

  • TypeError – If: - A graph input tensor is specified twice. - A graph input cannot be coerced into a tensor.

Returns

Information on the created conditional site.

Return type

ConditionalSiteInfo

popxl.ops.conv(t, weight, stride=(1, 1), padding=(0, 0, 0, 0), dilation=(1, 1), groups=1, pad_type='not_set', available_memory_proportions=None, partials_types=None, enable_conv_dithering=None)

Use the convolution operator on a tensor.

The convolution operator consumes an input tensor and a filter, and computes the output.

Parameters
  • t (Tensor) – Input data tensor from previous layer; If the input is a 3D tensor, the size is (N, C, L), where N is the batch size, C is the number of channel, L is the length; If the input is a 2D image, the size is (N, C, H, W), where N is the batch size, C is the number of channel, H and W are the height and width; If the input is a 3D image, the size is (N, C, D, H, W), where N is the batch size, C is the number of channel, D is the depth, H and W are the height and width.

  • weight (Tensor) – The weight tensor that will be used in the convolutions; If the input is a 3D tensor, the weight size is (M, C/group, k), where C is the number of channels, k is the length of the kernel, M is the number of feature maps. If the input is a 2D image, the weight size is (M, C/group, kH, kW), where C is the number of channels, kH and kW are the height and width of the kernel, M is the number of feature maps. If the input is a 3D image, the weight size is (M, C/group, kD, kH, kW), where C is the number of channels, kD, kH and kW are the depth, height and width of the kernel, M is the number of feature maps.

  • stride (Tuple[int]) – Stride along each spatial axis.

  • padding (Tuple[int]) – Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i.

  • dilation (Tuple[int]) – dilation value along each spatial axis of the filter.

  • groups (int(default is 1)) – number of groups input channels and output channels are divided into.

  • pad_type (PadType(default is not_set)) – pad_type must be either “not_set”, “same_upper”, “same_lower” or “valid”. The default value is “not_set”, which means explicit padding is used. “same_upper” or “same_lower” mean pad the input so that output_shape[i] = ceil(input_shape[i] / strides[i]) for each axis i. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In the case that the padding is an odd number, the extra padding is added at the end for “same_upper” and at the beginning for “same_lower”.

  • available_memory_proportions (List[float]) – The available memory proportions per conv, each [0, 1).

  • partials_types (List[str]) – The partials type per convolution, choose between half and float.

  • enable_conv_dithering (List[int]) – Enable convolution dithering per convolution. If true, then convolutions with different parameters will be laid out from different tiles in an effort to improve tile balance in models.

Returns

A tensor that contains the result of the convolution. The output dimensions are functions of the kernel size, stride size, and pad lengths.

Return type

Tensor

popxl.ops.conv_pow2scaled(t, weight, log2_scale, stride=(1, 1), padding=(0, 0, 0, 0), dilation=(1, 1), groups=1, pad_type='not_set', available_memory_proportions=None, enable_conv_dithering=None)

Perform a scaled convolution on a float8 tensor.

The convolution operator consumes an input tensor, a filter and computes the output. The dtype of the input tensor and filter must be one of popxl.float8_143 or popxl.float8_152.

The result of the convolution is scaled by pow2(log2_scale) before it is converted to float16.

The log2_scale must be a scalar tensor of type popxl.int32 and contain a runtime value in the range [-32, 32)

Parameters
  • t (Tensor) – Input data tensor from previous layer of type either popxl.float8_143 or popxl.float8_152; If the input is a 3D tensor, the size is (N, C, L), where N is the batch size, C is the number of channel, L is the length; If the input is a 2D image, the size is (N, C, H, W), where N is the batch size, C is the number of channel, H and W are the height and width; If the input is a 3D image, the size is (N, C, D, H, W), where N is the batch size, C is the number of channel, D is the depth, H and W are the height and width.

  • weight (Tensor) – The weight tensor that will be used in the convolutions of type either popxl.float8_143 or popxl.float8_152; If the input is a 3D tensor, the weight size is (M, C/group, k), where C is the number of channels, k is the length of the kernel, M is the number of feature maps. If the input is a 2D image, the weight size is (M, C/group, kH, kW), where C is the number of channels, kH and kW are the height and width of the kernel, M is the number of feature maps. If the input is a 3D image, the weight size is (M, C/group, kD, kH, kW), where C is the number of channels, kD, kH and kW are the depth, height and width of the kernel, M is the number of feature maps.

  • log2_scale (Tensor) – 32-bit integer power-of-two exponent, where the convolution output is multiplied by pow2(log2_scale) before conversion to float16.

  • stride (Tuple[int]) – Stride along each spatial axis.

  • padding (Tuple[int]) – Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i.

  • dilation (Tuple[int]) – dilation value along each spatial axis of the filter.

  • groups (int(default is 1)) – number of groups input channels and output channels are divided into.

  • pad_type (PadType(default is not_set)) – pad_type must be either “not_set”, “same_upper”, “same_lower” or “valid”. The default value is “not_set”, which means explicit padding is used. “same_upper” or “same_lower” mean pad the input so that output_shape[i] = ceil(input_shape[i] / strides[i]) for each axis i. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In the case that the padding is an odd number, the extra padding is added at the end for “same_upper” and at the beginning for “same_lower”.

  • available_memory_proportions (List[float]) – The available memory proportions per conv, each [0, 1).

  • enable_conv_dithering (List[int]) – Enable convolution dithering per convolution. If true, then convolutions with different parameters will be laid out from different tiles in an effort to improve tile balance in models.

Returns

A tensor that contains the result of the convolution of type popxl.float16. The output dimensions are functions of the kernel size, stride size, and pad lengths.

Return type

Tensor

Raises
  • TypeError – If the tensor or weight tensors do not have a dtype in {popxl.float8_143, popxl.float8_152}, or if the log2_scale tensor does not have dtype popxl.int32

  • ValueError – If log2_scale is not a scalar tensor.

popxl.ops.conv_transpose(t, weight, stride=(1, 1), padding=(0, 0, 0, 0), dilation=(1, 1), groups=1, pad_type='not_set', output_padding=(), output_shape=(), available_memory_proportions=None, partials_types=None, enable_conv_dithering=None)

Perform a convolution transpose operation on a tensor.

The convolution transpose operator consumes an input tensor and a filter, and computes the output.

If the padding parameter is provided the shape of the output is auto generated. output_shape can also be explicitly specified in which case padding values are auto generated. See attribute descriptions for more details.

See also PyTorch Tensor.ConvTranspose2d, ONNX ConvTranspose.

popxl.ops.t

Input data tensor from a previous layer. If the input is a 3D tensor, the size is (N, C, L), where N is the batch size, C is the number of channels, L is the length; If the input is a 2D image, the size is (N, C, H, W), where N is the batch size, C is the number of channels, H and W are the height and width; If the input is a 3D image, the size is (N, C, D, H, W), where N is the batch size, C is the number of channels, D is the depth, H and W are the height and width.

Type

Tensor

popxl.ops.weight

The weight tensor that will be used in the convolutions. If the input is a 3D tensor, the weight size is (M, C/group, k), where C is the number of channels, k is the length of the kernel, M is the number of feature maps. If the input is a 2D image, the weight size is (M, C/group, kH, kW), where C is the number of channels, kH and kW are the height and width of the kernel, M is the number of feature maps. If the input is a 3D image, the weight size is (M, C/group, kD, kH, kW), where C is the number of channels, kD, kH and kW are the depth, height and width of the kernel, M is the number of feature maps.

Type

Tensor

popxl.ops.padding

Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be [x1_begin, x2_begin...x1_end, x2_end,...], where xi_begin is the number of pixels added at the beginning of axis i and xi_end is the number of pixels added at the end of axis i. If the pads parameter is provided the shape of the output is auto generated. See ONNX Conv Transpose for details.

Type

Tuple[int]

popxl.ops.dilation

Dilation value along each spatial axis of the filter.

Type

Tuple[int]

popxl.ops.groups

Number of groups input channels and output channels are divided into.

Type

int(default is 1)

popxl.ops.pad_type

The pad_type must be either “not_set”, “same_upper”, “same_lower” or “valid”. The default value is “not_set”, which means explicit padding is used. “same_upper” or “same_lower” mean pad the input such that output_shape[i] = ceil(input_shape[i] / strides[i]) for each axis i. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In the case that the padding is an odd number, the extra padding is added at the end for “same_upper” and at the beginning for “same_lower”.

Type

PadType(default is not_set)

popxl.ops.output_padding

Additional elements added to the side with higher coordinate indices in the output. Each padding value in output_padding must be strictly less than the corresponding stride/dilation dimension. Note that this attribute doesn’t directly affect the computed output values. It only controls the selection of the computed values, so changing this attribute only adds or removes output elements. If output_shape is explicitly provided, output_padding does not contribute additional size to output_shape but participates in the computation of the needed padding amount.

Type

Tuple[int]

popxl.ops.output_shape

The shape of the output can be explicitly set which will cause padding values to be auto generated. If output_shape is specified pads values are ignored. See ONNX Conv Transpose for details on how padding is generated.

Type

Tuple[int]

popxl.ops.available_memory_proportions

The available memory proportions per conv, each [0, 1).

partials_types (List[str]):

The partials type per convolution, choose between half and float.

Type

List[float]

popxl.ops.enable_conv_dithering

Enable convolution dithering per convolution. If true, then convolutions with different parameters will be laid out from different tiles in an effort to improve tile balance in models.

Type

List[int]

Returns

Output data tensor that contains the result of the convolution. The output dimensions are functions

of the kernel size, stride size, pad lengths and group count.

Return type

Tensor

Parameters
popxl.ops.conv_transpose_pow2scaled(t, weight, log2_scale, stride=(1, 1), padding=(0, 0, 0, 0), dilation=(1, 1), groups=1, pad_type='not_set', output_padding=(), output_shape=(), available_memory_proportions=None, enable_conv_dithering=None)

Perform a single transposed and scaled convolution operation on a tensor.

This operator consumes an input, weight, and log2 scale tensor to compute a transposed convolution, then scales the convolution output by pow2(log2_scale) before converting to float16.

The dtype of the input t and weight tensor must be one of popxl.float8_143 or popxl.float8_152. The log2_scale must be a scalar tensor of type popxl.int32 and contain a runtime value in the range [-32, 32)

If the padding parameter is provided the shape of the output is auto generated. output_shape can also be explicitly specified in which case padding values are auto generated. See attribute descriptions for more details.

See also PyTorch Tensor.ConvTranspose2d, ONNX ConvTranspose.

popxl.ops.t

Input data tensor from previous layer of type either popxl.float8_143 or popxl.float8_152; If the input is a 3D tensor, the size is (N, C, L), where N is the batch size, C is the number of channels, L is the length; If the input is a 2D image, the size is (N, C, H, W), where N is the batch size, C is the number of channels, H and W are the height and width; If the input is a 3D image, the size is (N, C, D, H, W), where N is the batch size, C is the number of channels, D is the depth, H and W are the height and width.

Type

Tensor

popxl.ops.weight

The weight tensor that will be used as a kernel in the convolution, of dtype either popxl.float8_143 or popxl.float8_152; If the input is a 3D tensor, the weight size is (M, C/group, k), where C is the number of channels, k is the length of the kernel, M is the number of feature maps. If the input is a 2D image, the weight size is (M, C/group, kH, kW), where C is the number of channels, kH and kW are the height and width of the kernel, M is the number of feature maps. If the input is a 3D image, the weight size is (M, C/group, kD, kH, kW), where C is the number of channels, kD, kH and kW are the depth, height and width of the kernel, M is the number of feature maps.

Type

Tensor

popxl.ops.log2_scale

32-bit integer power-of-two exponent, where the convolution output is multiplied by pow2(log2_scale) before conversion to float16. Must be of dtype popxl.int32.

Type

Tensor

popxl.ops.padding

Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be [x1_begin, x2_begin...x1_end, x2_end,...], where xi_begin is the number of pixels added at the beginning of axis i and xi_end is the number of pixels added at the end of axis i. If the pads parameter is provided the shape of the output is auto generated. See ONNX Conv Transpose for details.

Type

Tuple[int]

popxl.ops.dilation

Dilation value along each spatial axis of the filter.

Type

Tuple[int]

popxl.ops.groups

Number of groups input channels and output channels are divided into.

Type

int(default is 1)

popxl.ops.pad_type

The pad_type must be either “not_set”, “same_upper”, “same_lower” or “valid”. The default value is “not_set”, which means explicit padding is used. “same_upper” or “same_lower” mean pad the input such that output_shape[i] = ceil(input_shape[i] / strides[i]) for each axis i. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In the case that the padding is an odd number, the extra padding is added at the end for “same_upper” and at the beginning for “same_lower”.

Type

PadType(default is not_set)

popxl.ops.output_padding

Additional elements added to the side with higher coordinate indices in the output. Each padding value in output_padding must be strictly less than the corresponding stride/dilation dimension. Note that this attribute doesn’t directly affect the computed output values. It only controls the selection of the computed values, so changing this attribute only adds or removes output elements. If output_shape is explicitly provided, output_padding does not contribute additional size to output_shape but participates in the computation of the needed padding amount.

Type

Tuple[int]

popxl.ops.output_shape

The shape of the output can be explicitly set which will cause padding values to be auto generated. If output_shape is specified pads values are ignored. See ONNX Conv Transpose for details on how padding is generated.

Type

Tuple[int]

popxl.ops.available_memory_proportions

The available memory proportions per conv, each [0, 1).

Type

List[float]

popxl.ops.enable_conv_dithering

Enable convolution dithering per convolution. If true, then convolutions with different parameters will be laid out from different tiles in an effort to improve tile balance in models.

Type

List[int]

Returns

Output data tensor that contains the result of the convolution. The output dimensions are functions

of the kernel size, stride size, pad lengths and group count.

Return type

Tensor

Raises
  • TypeError – If the tensor or weight tensors do not have a dtype in {popxl.float8_143, popxl.float8_152}, or if the log2_scale tensor does not have dtype popxl.int32

  • ValueError – If log2_scale is not a scalar tensor.

Parameters
popxl.ops.cos(t)

Compute the cosine of each element of the input tensor.

See also PyTorch Tensor.cos.

Parameters

t (Tensor) – Input tensor

Returns

Output tensor

Return type

Tensor

popxl.ops.cumsum(t, dim=0)

Performs the cumulative sum of the input elements along the given dimension dim.

See also Pytorch Tensor.cumsum, Numpy cumsum.

Parameters
  • t (Tensor) – The input tensor.

  • dim (int) – The dimension to perform the operation over.

Returns

The result contains the cumulative sum of the elements of the input tensor along the dimension dim.

Return type

Tensor

popxl.ops.detach(t)

Prevent gradient computation of this tensor.

This operation is numerically equivalent to the identity op.

See also PyTorch Tensor.detach.

Parameters

t (Tensor) – Input tensor.

Returns

The input tensor.

Return type

Tensor

popxl.ops.detach_(t)

Prevent in-place gradient computation of this tensor.

The in-place version of detach(). The behaviour is the same, it blocks gradient propagation on the input tensor but does not make a copy of the input tensor.

See also PyTorch Tensor.detach_.

Parameters

t (Tensor) – Input tensor.

Returns

The input tensor.

Return type

Tensor

popxl.ops.div(lhs, rhs)

Divide two tensors elementwise.

Follows NumPy broadcasting rules. The arguments must have the same dtype. The output will be the same dtype as the inputs. Floor division is used with integer values.

See also PyTorch Tensor.div, ONNX Div.

Parameters
Returns

The result of dividing lhs by rhs.

Return type

Tensor

popxl.ops.dropout(t, seed_tensor, p)

Randomly set elements of the input tensor to zero.

This operation will zero elements of tensor t with a probability of p. The dropout mask is created using samples from a Bernoulli distribution seeded with the seed_tensor.

You needs to manage updating the seed_tensor for each forward pass and replica.

See also ONNX Dropout.

Parameters
  • t (Tensor) – Tensor to apply the dropout to.

  • seed_tensor (Tensor) – Used to seed the probability distribution which generates the dropout mask. Must have data type uint32 and shape [2,].

  • p (float) – Probability that an element will be zeroed.

Returns

A new tensor with the dropout applied.

Return type

Tensor

popxl.ops.dynamic_slice(t, index, axes, sizes, no_overlap)

Return a cloned slice of the input tensor.

The name “dynamic” refers to the fact that the index can be specified at runtime.

A slice along an axis can be defined by the tuple (start, stop, step) where:

  • start is the index for the respective axis

  • stop is index + size for the respective axis

  • step equals 1

Limitations:

Assuming we would like to slice t with dimension [4, 3]:

  • A step other than 1 is not supported (that is, t[::2,:] is not supported)

  • Negative slicing is not supported (that is, t[:-1,:] is not supported)

  • A stop value greater than the size of the axis is not supported (that is, t[:5,:] is not supported)

Parameters
  • t (Tensor) – The input tensor.

  • index (Tensor) – The indices to start the slice from.

  • axes (List[int]) – The axes to slice from.

  • sizes (List[int]) –

    The sizes of the slices for the specified axes. For example:

    If index = [1, 2], axes = [0, 3] and sizes = [2, 4], then the tensor will be sliced as t[1:2, :, :, 2:4].

  • no_overlap (bool) – If set to true, then correct gradient backpropagation is only guaranteed if each region in the output tensor has exactly one populator (operation that writes data to this region). There are no run-time or compile-time checks possible to ensure this.

Returns

A clone (not a view) of the sliced input tensor.

Return type

Tensor

popxl.ops.dynamic_update(t, index, t_update, axes, sizes, no_overlap)

Update a slice of a tensor.

The name “dynamic” refers to the fact that the index can be specified at runtime.

index, axes and sizes determine the slice of t which will be updated. The dimensions of this slice and t_update must match. A slice along an axis can be defined by the tuple (start, stop, step) where:

  • start is the index for the respective axis

  • stop is index + size for the respective axis

  • step equals 1

Limitations:

Assuming we would like to update t with dimension [4, 3], the slicing of t will have the following limitations:

  • A step other than 1 is not supported (that is, t[::2,:] is not supported)

  • Negative slicing is not supported (that is, t[:-1,:] is not supported)

  • A value of stop larger than the size of the axis is not supported (for example, t[:5,:] is not supported)

Parameters
  • t (Tensor) – The tensor to update.

  • index (Tensor) – The indices to start the slice from.

  • t_update (Tensor) – The tensor to update t with.

  • axes (Iterable[int]) – The axes of t to make the update on.

  • sizes (Iterable[int]) – The sizes of the updates along the specified axes. For example, if index = [1, 2], axes = [0, 3] and sizes = [2, 4], then the tensor will be updated at t[1:2, :, :, 2:4].

  • no_overlap (bool) – If set to true, then correct gradient backpropagation is only guaranteed if each region in the output tensor has exactly one populator (operation that writes data to this region). There are no run-time or compile-time checks possible to ensure this.

Returns

The updated tensor.

Return type

Tensor

popxl.ops.dynamic_update_(t, index, t_update, axes, sizes, no_overlap)

Update a slice of a tensor in place.

Dynamically updates tensor t in place. The name “dynamic” refers to the fact that the index can be specified during runtime.

index, axes and sizes determine the slice of t which will be updated. The dimensions of this slice and t_update must match. A slice along an axis can be defined by the tuple (start, stop, step) where:

  • start is the index for the respective axis

  • stop is index + size for the respective axis

  • step equals 1

Limitations:

Assuming we would like to update t with dimension [4, 3], the slicing of t will have the following limitations:

  • A step value other than 1 is not supported (that is, t[::2,:] is not supported)

  • Negative slicing is not supported (that is, t[:-1,:] is not supported)

  • A stop value larger than the size of the axis is not supported (for example, t[:5,:] is not supported)

Parameters
  • t (Tensor) – Tensor to update.

  • index (Tensor) – The indices to start the slice from.

  • t_update (Tensor) – The tensor to update t with.

  • axes (List[int]) – The axes of t to make the update on.

  • sizes (List[int]) – The sizes of the updates along the specified axes. For example, if index = [1, 2], axes = [0, 3] and sizes = [2, 4], the tensor will be updated at t[1:2, :, :, 2:4].

  • no_overlap (bool) – If set to true, then correct gradient backpropagation is only guaranteed if each region in the output tensor has exactly one populator (operation that writes data to this region). There are no run-time or compile-time checks possible to ensure this.

Returns

The updated tensor.

Return type

Tensor

popxl.ops.equal(lhs, rhs)

Apply an elementwise equality operation.

Follows NumPy broadcasting rules.

See also PyTorch Tensor.equal, NumPy equal, ONNX Equal.

Parameters
  • lhs (Tensor) – Tensor to be compared.

  • rhs (Tensor) – Tensor to be compared.

Returns

The value (lhs == rhs)

Return type

Tensor

popxl.ops.exp(t)

Compute the exponential of the elements of input tensor.

See also PyTorch Tensor.exp, NumPy exp, ONNX Exp.

Parameters

t (Tensor) – Input tensor.

Returns

Output tensor.

Return type

Tensor

popxl.ops.exp_(t)

Compute the exponential of the elements of input tensor (in-place).

See also PyTorch Tensor.exp_.

Parameters

t (Tensor) – Input tensor.

Returns

The input tensor with the exp applied to it in place.

Return type

Tensor

popxl.ops.flatten(t)

Flatten a tensor.

Internally this uses reshape().

See also PyTorch Tensor.flatten, ONNX Flatten.

Parameters

t (Tensor) – The tensor to be flattened.

Returns

Tensor with 1-D shape.

Return type

Tensor

popxl.ops.flatten_(t)

Flatten a tensor in place.

Internally this uses reshape_().

This is the in-place version of flatten().

Parameters

t (Tensor) – The tensor to be flattened.

Returns

An alias of the input tensor with a 1-D shape.

Return type

Tensor

popxl.ops.fmod(lhs, rhs)

Compute the elementwise remainder after division (modulo operation).

Follows NumPy broadcasting rules. Arguments must have the same dtype.

See also PyTorch Tensor.fmod, NumPy fmod.

Parameters
  • lhs (Tensor) – Tensor to be modded.

  • rhs (Tensor) – Tensor to be modded.

Returns

The fmod of lhs and rhs

Return type

Tensor

popxl.ops.gather(t, indices, axis=0, available_memory_proportion=None, zero_OOR=False)

Select multiple elements from a tensor along specified axes.

Elements are specified via indices, along a specified axis. Equivalent to numpy.take(). Note that this is different from torch.gather().

Examples:

x = popxl.variable(np.arange(16).reshape(4, 4))
# [[ 0,  1,  2,  3],
#  [ 4,  5,  6,  7],
#  [ 8,  9, 10, 11],
#  [12, 13, 14, 15]]

gather(x, [3, 1, 2]) == Tensor([x[3], x[1], x[2]])
# [[12, 13, 14, 15],
#  [ 4,  5,  6,  7],
#  [ 8,  9, 10, 11]]

gather(x, [[0, 1], [1, 2]]) == gather(x, [0, 1, 1, 2]).reshape(2, 2, 4)
#  [[[ 0,  1,  2,  3],
#    [ 4,  5,  6,  7]],
#   [[ 4,  5,  6,  7],
#    [ 8,  9, 10, 11]]]

See also PyTorch Tensor.gather, ONNX Gather.

Parameters
  • t (Tensor) – The input tensor.

  • indices (Tensor) – The indices of the elements to extract.

  • axis (int) – The axis to gather on. The default is 0.

  • available_memory_proportion (Optional[float]) – The maximum proportion of available memory on each tile that this layer should consume temporarily during the course of the operation. Defaults to 1.0 if not set globally.

  • zero_OOR (bool) – If False, out of range (OOR) indices will produce undefined data. If True, out of range indices will produce zeros.

Returns

The gathered elements concatenated.

Return type

Tensor

popxl.ops.gelu(t)

Compute the GELU activation on a tensor.

For more details, refer to the paper Gaussian Error Linear Units

Parameters

t (Tensor) – The input tensor.

Returns

A new tensor with the GELU activation applied to it.

Return type

Tensor

popxl.ops.gelu_(t)

Compute the GELU activation on a tensor (in-place).

For more details, refer to the paper Gaussian Error Linear Units

Parameters

t (Tensor) – The input tensor.

Returns

The input tensor with the GELU activation applied to it.

Return type

Tensor

popxl.ops.geluerf(t)

Compute the accurate GELU activation on a tensor.

For more details, refer to the paper Gaussian Error Linear Units

Parameters

t (Tensor) – The input tensor.

Returns

A new tensor with the accurate GELU activation applied to it.

Return type

Tensor

popxl.ops.geluerf_(t)

Compute the accurate GELU activation on a tensor (in-place).

For more details, refer to the paper Gaussian Error Linear Units

Parameters

t (Tensor) – The input tensor.

Returns

The input tensor with the accurate GELU activation applied to it.

Return type

Tensor

popxl.ops.greater(input, other)

Computes where the first tensor is greater than the second tensor.

This is an element-wise operation (with NumPy-style broadcasting support).

See also Pytorch greater, NumPy greater.

Parameters
  • input (Tensor) – The first input operand for the logical operator.

  • other (Tensor) – The second input operand for the logical operator.

Returns

A tensor with true if the corresponding element of input is greater than other and false otherwise.

Return type

Tensor

popxl.ops.group_norm(t, weight, bias, num_groups, eps=1e-05)

Apply group normalisation to a tensor.

For more details, refer to the paper Group Normalization.

Parameters
  • t (Tensor) – Tensor to be normalized.

  • weight (Tensor) – Tensor used to scale the result of normalisation.

  • bias (Tensor) – Tensor used to shift the result of normalisation.

  • num_groups (int) – Number of groups to separate the channels into.

  • eps (float) – The small value to use to avoid division by zero.

Returns

The group normalised tensor.

Return type

Tensor

popxl.ops.groupedgather(t, indices, axis=0, group_size=1, available_memory_proportion=None, zero_OOR=False)

Select multiple elements from a tensor along specified axes.

Elements are specified via indices, along a specified axis for each group.

Examples:

x = popxl.variable(np.arange(16).reshape(2, 2, 4))
# [[[ 0,  1,  2,  3],
#   [ 4,  5,  6,  7]],
#  [[ 8,  9, 10, 11],
#   [12, 13, 14, 15]]]

gather(x, [[0, 1, 0], [1, 0, 1]]) == Tensor(
    [[x[0][3], x[0][1], x[0][2]], [x[1][1], x[0][2], x[1][3]]]
)
# [[[ 0,  1,  2,  3],
#   [ 4,  5,  6,  7],
#   [ 0,  1,  2,  3]],
#  [[12, 13, 14, 15],
#   [ 8,  9, 10, 11],
#   [12, 13, 14, 15]]]
Parameters
  • t (Tensor) – The input tensor.

  • indices (Tensor) – The indices of the elements to extract.

  • axis (int) – The axis to gather on. The default is 0.

  • group_size (int) – The group size of the data. The default is 1.

  • available_memory_proportion (Optional[float]) – The maximum proportion of available memory on each tile that this layer should consume temporarily during the course of the operation. Defaults to 1.0 if not set globally.

  • zero_OOR (bool) – If False, out of range (OOR) indices will produce undefined data. If True, out of range indices will produce zeros.

Returns

The gathered elements concatenated.

Return type

Tensor

popxl.ops.histogram(t, levels, absolute_of_input)

Compute the histogram of the input tensor.

All but the last bin are half-open. In other words, if levels is:

` [1, 2, 3, 4] `

then the first bin is [1, 2) (including 1, but excluding 2) and the second [2, 3). The last bin, however, is [3, 4], which includes 4.

See also PyTorch torch.histc, NumPy histogram.

Parameters
  • t (Tensor) – Input tensor.

  • levels (List[float]) – A monotonically increasing list of bin edges.

  • absolute_of_input (bool) – If True, the absolute value of each input is calculated before comparison to the levels data.

Returns

Counts of the number of values in each bin.

Return type

Tensor

popxl.ops.host_load(h2d_stream, name=None)

Transfer a tensor from the host to the IPU.

This operation represents the transfer of data from the host to the IPU. It uses the existing host to IPU transfers created when building the IR, but defers the actual poplar::Copy until the op itself runs. This allows the copy to be scheduled as part of the normal op scheduling.

Data is sent from the host via the IStepIO object passed to session.run().

Parameters
  • h2d_stream (HostToDeviceStream) – Stream to load from.

  • name (str) – Name to use for the returned tensor.

Returns

The output tensor streamed from the host.

Return type

Tensor

popxl.ops.host_store(d2h_stream, t)

Transfer a tensor from the IPU to the host.

This operation represents the transfer of data from the IPU to the host. It uses the existing device to host transfers created when building the IR, but defers the actual poplar::Copy until the op itself runs. This allows the copy to be scheduled as part of the normal op scheduling.

Data is received on the host via the IStepIO object passed to session.run().

Raises

ValueError – If the stream shape or dtype doesn’t match the tensor shape.

Parameters
  • d2h_stream (DeviceToHostStream) – The stream to use for the host store.

  • t (Tensor) – The input tensor to copy to host.

Return type

None

popxl.ops.identity(t, output_name=None)

Input is equal to the output. This can also be used to rename a Tensor.

Parameters
  • t (Tensor) – Tensor to provide as an output.

  • output_name (str) – Name of output tensor

Returns

equal to input.

Return type

Tensor

popxl.ops.increment_mod(t, increment, modulus)

Increment the elements of a tensor using modulo arithmetic.

Parameters
  • t (Tensor) – Tensor to increment (modulo).

  • increment (float) – How much to increment the input tensor by.

  • modulus (float) – The modulo operand.

Returns

A new tensor with result of (t + increment) % modulus.

Return type

Tensor

popxl.ops.increment_mod_(t, increment, modulus)

Increment the elements of a tensor using modulo arithmetic in place.

Parameters
  • t (Tensor) – Tensor to increment (modulo).

  • increment (float) – How much to increment the input tensor by.

  • modulus (float) – The modulo operand.

Returns

Alias of the input tensor with result of (t + increment) % modulus.

Return type

Tensor

popxl.ops.init(shape, dtype, name=None, init_type='zero', meta_shape=None)

Create a tensor that is initialised with zero or undefined values.

The returned tensor is not considered a variable. A variable must be created in the main graph; it can be initialised to arbitrary values and can be read/written with session methods.

In contrast, init can be executed anywhere so it can return an initialised tensor in non-main graphs.

The tensor can only be initialised to zero or undefined values.

Parameters
  • dtype (dtypes.dtype) – Data type of the output tensor.

  • shape (Tuple[int]) – Shape of the output tensor.

  • name (str) – Name of the output tensor.

  • init_type (Union[Literal["zero"], Literal["undef"]]) – Initialisation of the output tensor.

  • meta_shape (Tuple[int]) – meta shape of tensor

Raises

ValueError – If the init_type is unknown.

Returns

An initialised tensor.

Return type

Tensor

popxl.ops.interpolate(t, scale_factor=(1.0, 1.0, 1.0, 1.0), mode='nearest', nearest_mode='round_prefer_floor', coordinate_transformation_mode='half_pixel')

Interpolate the input tensor. Each dimension value of the output tensor is: output_dimension = floor(input_dimension * scale_factor).

Parameters
  • t (Tensor) – Input data tensor from previous layer.

  • scale_factor (Tuple[float]) – The scale array along each dimension. It takes value greater than or equal to 1. The number of elements of ‘scales’ should be the same as the rank of input ‘t’.

  • mode (InterpolateType) – The interpolate algorithm, three interpolation modes: nearest (default), linear and cubic.

  • nearest_mode (InterpolateNearestType) – Four modes: round_prefer_floor (default, as known as round half down), round_prefer_ceil (as known as round half up), floor, ceil. Only used by nearest interpolation. It indicates how to get “nearest” pixel in input tensor from x_original, so this attribute is valid only if “mode” is “nearest”.

  • coordinate_transformation_mode (InterpolateCoordinateTransformationType) –

    This attribute describes how to transform the coordinate in the interpolated tensor to the coordinate in the original tensor. The coordinate of each dimension is transformed individually. Let’s describe a case using axis x as an example.

    Some variables are defined as follows:

    • x_interpolated: the coordinate of axis x in the interpolated tensor.

    • x_original: the coordinate of axis x in the original tensor.

    • length_original: the length of the original tensor in axis x.

    • length_interpolated: the length of the interpolated tensor in axis x.

    • roi_x: roi_x = (start_x, end_x) of the axis x in input “roi”.

    • scale: scale = length_interpolated / length_original.

    Then:

    • if coordinate_transformation_mode is “half_pixel”, x_original = (x_interpolated + 0.5) / scale - 0.5,

    • if coordinate_transformation_mode is “pytorch_half_pixel”, x_original = length_interpolated > 1 ? (x_interpolated + 0.5) / scale - 0.5 : 0,

    • if coordinate_transformation_mode is “align_corners”, x_original = x_interpolated * (length_original - 1) / (length_interpolated - 1),

    • if coordinate_transformation_mode is “asymmetric”, x_original = x_interpolated / scale,

    • if coordinate_transformation_mode is “tf_crop_and_resize”, x_original = length_interpolated > 1 ? start_x * (length_original - 1) + x_interpolated * (end_x - start_x) * (length_original - 1) / (length_interpolated - 1) : 0.5 * (start_x + end_x) * (length_original - 1).

Returns

Output data tensor after interpolate.

Return type

Tensor

popxl.ops.io_tile_copy(t)

Copy a tensor to or from I/O tiles on the current IPU.

Parameters

t (Tensor) – The tensor to be copied.

Returns

A copy of the input tensor.

Return type

Tensor

popxl.ops.ipu_copy(t, destination, source=None)

Copy a tensor to an IPU.

Parameters
  • t (Tensor) – Tensor to be copied.

  • destination (int) – IPU to copy the tensor to.

  • source (Optional[int]) – IPU to copy the tensor from. By default, the source IPU will be taken from the operation that produces t. If t does not have a producer then a source must be specified.

Raises

ValueError – If the source IPU could not be inferred and the source is not specified.

Returns

The copied tensor.

Return type

Tensor

popxl.ops.isfinite(t)

Return a boolean tensor of the same shape indicating which elements are finite (not NaN or infinity).

Parameters

t (Tensor) – Tensor to check.

Returns

boolean tensor of the same shape indicating which elements are finite (not NaN or infinity).

Return type

Tensor

popxl.ops.isinf(t)

Return a boolean tensor of the same shape indicating which elements are positive or negative infinity.

Parameters

t (Tensor) – Tensor to check.

Returns

boolean tensor of the same shape indicating which elements are positive or negative infinity.

Return type

Tensor

popxl.ops.isnan(t)

Return a boolean tensor of the same shape indicating which elements are NaN.

Parameters

t (Tensor) – Tensor to check.

Returns

boolean tensor of the same shape indicating which elements are NaN.

Return type

Tensor

popxl.ops.l1(t, axis=None, keepdims=False)

Compute the sum of the magnitudes of the elements in a tensor (L1 norm) along specified axes.

Parameters
  • t (Tensor) – Tensor to compute the L1 norm of.

  • axis (int or list) – Axis or axes to compute L1 norm along. If none is specified then all elements will be normalised. If an axis is negative then it indexes from the last to the first axis.

  • keepdims (bool) – Keep the axis that is being reduced (True) or not (False).

Returns

The reduced tensor containing the L1 norm of elements along the specified axes.

Return type

Tensor

popxl.ops.l2(t, axis=None, keepdims=False)

Compute the square root of the sum of the squares of the elements in a tensor (L2 norm) along specified axes.

Parameters
  • t (Tensor) – Tensor to compute the L2 norm of.

  • axis (int or list) – Axis or axes to compute L2 norm along. If none is provided all elements will be normalised. If axis is negative it counts from the last to the first axis.

  • keepdims (bool) – Keep the axis that is being reduced (True) or not (False).

Returns

The reduced tensor containing the L2 norm of elements along the specified axes.

Return type

Tensor

popxl.ops.lamb_square(t)

Square each element before applying an add reduction.

Used in the LAMB optimizer: https://arxiv.org/abs/1904.00962

Parameters

t (Tensor) – The input tensor.

Returns

A new tensor containing the squared values of the input tensor.

Return type

Tensor

popxl.ops.layer_norm(t, weight, bias, eps=1e-05)

Apply layer normalisation to a tensor.

Uses group_norm under the hood.

Parameters
  • t (Tensor) – The tensor to be normalized.

  • weight (Tensor) – Tensor used to scale the result of normalisation.

  • bias (Tensor) – Tensor used to shift result of normalisation.

  • eps (float) – The small value to use to avoid division by zero

Returns

The layer normalised tensor.

Return type

Tensor

popxl.ops.log(t)

Compute the log of the elements of input tensor.

See also PyTorch torch.log, NumPy log, ONNX Log.

Parameters

t (Tensor) – Input tensor.

Returns

Output tensor.

Return type

Tensor

popxl.ops.logical_and(lhs, rhs)

Compute the elementwise logical and of two tensors.

Follows NumPy broadcasting rules. Inputs will be cast to bool if necessary.

See also PyTorch Tensor.logical_and, NumPy logical_and.

Parameters
  • lhs (Tensor) – Tensor to be combined.

  • rhs (Tensor) – Tensor to be combined.

Returns

A new tensor with the result of elementwise lhs and rhs.

Return type

Tensor

popxl.ops.logical_not(t)

Compute the elementwise not of a tensor.

Inputs will be cast to bool if necessary.

See also PyTorch Tensor.logical_not, NumPy logical_not.

Parameters

t (Tensor) – The input tensor.

Returns

A new tensor with the elementwise logical not of the input.

Return type

Tensor

popxl.ops.logical_or(lhs, rhs)

Compute the elementwise logical or of the input tensors.

Follows NumPy broadcasting rules. Inputs will be cast to bool if necessary.

See also PyTorch Tensor.logical_or, NumPy logical_or.

Parameters
  • lhs (Tensor) – Tensors to be combined.

  • rhs (Tensor) – Tensors to be combined.

Returns

A new tensor with the result of elementwise lhs or rhs.

Return type

Tensor

popxl.ops.logsum(t, axis=None, keepdims=False)

Compute the log of summed elements of a tensor along specified axes.

Supported dtypes: float.

Parameters
  • t (Tensor) – Tensor to compute the log of the sum of elements.

  • axis (int or list) – Axis or axes to compute the log of the sum along. If none is specified all axes will be summed. If an axis is negative it indexes from the last to the first axis.

  • keepdims (bool) – Keep the axis that is being computed (True or not (False).

Returns

A new tensor containing the log of the summed elements along the specified axes.

Return type

Tensor

popxl.ops.logsumexp(t, axis=None, keepdims=False)

Compute the log of the summed exponentials of elements in a tensor, along specified axes.

Supported dtypes: floats.

See also PyTorch Tensor.logsumexp.

Parameters
  • t (Tensor) – Tensor to compute the log of the summed exponentials of the elements.

  • axis (int or list) – Axis or axes to compute the log of the summed exponentials along. If none is specified all axes will be reduced. If axis is negative it indexes from the last to the first axis.

  • keepdims (bool) – Keep the axis that is being computed (True) or not (False).

Returns

A new tensor containing the log of the summed exponentials of the elements along the specified axes.

Return type

Tensor

popxl.ops.matmul(lhs, rhs, available_memory_proportion=None, output_type=None, partials_type=None)

Perform matrix multiplication of two tensors.

Follows NumPy matrix multiplication rules for N-D tensors, see numpy.matmul().

Arguments must have the same dtype. Shapes must be compatible as defined by the NumPy matrix multiplication rules.

See also PyTorch Tensor.matmul, NumPy matmul, ONNX MatMul.

Parameters
  • lhs (Tensor) – Left hand side of matrix multiplication.

  • rhs (Tensor) – Right hand side of matrix multiplication.

  • available_memory_proportion (Optional[float]) – The maximum proportion of available memory on each tile that this layer should consume temporarily during the course of the operation. Defaults to 1.0.

  • output_type (Optional[dtypes.dtype], optional) – 3 Output datatype to enforce. Defaults to the dtype of lhs/rhs.

  • partials_type (dtypes.dtype, optional) – The type to use for partial results (float16, float32). Defaults to dtypes.float32.

Returns

The matrix product of lhs and rhs.

Return type

Tensor

popxl.ops.matmul_pow2scaled(lhs, rhs, log2_scale, available_memory_proportion=None)

Perform a scaled matrix multiplication between two tensors.

Compute a matrix multiplication between lhs and rhs, then multiply the result by pow2(log2_scale).

The matrix multiply arguments must have either popxl.float8_143 or popxl.float8_152 dtype. The log2_scale argument must be of type popxl.int8 and be in the range in [-32, 32).

Follows NumPy matrix multiplication rules for N-D tensors, see numpy.matmul().

Parameters
  • lhs (Tensor) – Left hand side of matrix multiplication.

  • rhs (Tensor) – Right hand side of matrix multiplication.

  • log2_scale (Tensor) – integer power-of-two exponent, where the matrix multiplication output is multiplied by pow2(log2_scale).

  • available_memory_proportion (Optional[float]) – The maximum proportion of available memory on each tile that this layer should consume temporarily during the course of the operation. Defaults to 1.0.

Raises
  • TypeError – If the matrix multiply operand tensors do not have a dtype in {popxl.float8_143, popxl.float8_152}, or if the log2_scale tensor does not have dtype popxl.int32

  • ValueError – If log2_scale is not a scalar tensor.

Return type

Tensor

popxl.ops.max(t, axis=None, keepdims=False)

Compute the maximum value of the elements in a tensor along specified axes.

See also PyTorch Tensor.max, ONNX Max.

Parameters
  • t (Tensor) – Tensor to compute maximum of.

  • axis (int or list) – Axis or axes to computer maximum along. If none is provided all axes will be reduced. If an axis is negative it indexes from the last to the first axis.

  • keepdims (bool) – Keep the axis that is being reduced (True) or not (False).

Returns

The reduced tensor containing the maximum of elements computed along the specified axes.

Return type

Tensor

popxl.ops.max_pool(t, kernel_size, stride=None, padding=None, out_pads=None, dilation=None, in_dilations=None, auto_pad='not_set', ceil_mode=False, storage_order='row')

Max pool a tensor.

This consumes an input tensor t and applies max pooling across the tensor according to kernel sizes, stride sizes, and pad lengths. Max pooling consists of computing the max on all values of a subset of the input tensor according to the kernel size and down-sampling the data into the output tensor Y for further processing.

Parameters
  • t (Tensor) –

    Input data tensor from the previous layer.

    • If the input is a 3D tensor, the size is (N, C, L), where N is the batch size, C is the number of channel, L is the length;

    • If the input is a 2D image, the size is (N, C, H, W), where N is the batch size, C is the number of channel, H and W are the height and width;

    • If the input is a 3D image, the size is (N, C, D, H, W), where N is the batch size, C is the number of channel, D is the depth, H and W are the height and width.

  • kernel_size (Tuple[int]) – The size of the kernel along each axis.

  • stride (Tuple[int]) – Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.

  • padding (Tuple[int]) – Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. padding format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i.

  • out_pads (Tuple[int]) – The output padding for pooling.

  • dilation (Tuple[int]) – dilation value along each spatial axis of the filter.

  • in_dilations (Tuple[int]) – The input dilations attributes along each spatial axis of the filter.

  • auto_pad (Literal) – auto_pad must be either “not_set”, “same_upper”, “same_lower” or “valid”. The default value is “not_set”, which means explicit padding is used. “same_upper” or “same_lower” mean pad the input so that output_shape[i] = ceil(input_shape[i] / strides[i]) for each axis i. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In the case that the padding is an odd number, the extra padding is added at the end for “same_upper” and at the beginning for “same_lower”.

  • ceil_mode (bool) – When True, will use ceil instead of floor to compute the output shape.

  • storage_order (Literal['row', 'column']) – The storage order of the tensor. Default is row.

Returns

Output data tensor from max pooling across the input tensor. Dimensions will vary based on various kernel, stride, and pad sizes. Floor value of the dimension is used.

Return type

Tensor

popxl.ops.maximum(*ts)

Compute the elementwise maximum of N tensors.

Follows NumPy broadcasting rules. Arguments must have the same dtype.

Parameters

ts (Tensor) – Tensors to compute the elementwise maximum of.

Returns

A tensor with the maximum elements from the input tensors.

Return type

Tensor

popxl.ops.mean(t, axis=None, keepdims=False)

Compute the arithmetic mean of elements in a tensor along the specified axes.

See also PyTorch Tensor.mean, NumPy mean, ONNX Mean.

Parameters
  • t (Tensor) – Tensor to compute the mean of elements.

  • axis (int or list) – Axis or axes to compute the mean along. If none is provided all axes will be reduced. If axis is negative it indexes from the last to the first axis.

  • keepdims (bool) – Keep the axis that is being reduced (True) or not (False).

Returns

The reduced tensor containing the arithmetic means computed along the specified axes.

Return type

Tensor

popxl.ops.median(t, axis=None, keepdims=False)

Compute the median of elements in a tensor along axes.

See also PyTorch Tensor.median, NumPy median.

Parameters
  • t (Tensor) – Tensor to compute median of.

  • axis (int or list) – Axis or axes to compute the median along. If none is provided all axes will be reduced. If axis is negative it indexes from the last to the first axis.

  • keepdims (bool) – Keep the axis that is being reduced (True) or not (False).

Returns

The reduced tensor.

Return type

Tensor

popxl.ops.min(t, axis=None, keepdims=False)

Compute the minimum of the elements of a tensor along axes.

See also PyTorch Tensor.min, ONNX Min.

Parameters
  • t (Tensor) – Tensor to compute minimum of.

  • axis (int or list) – Axis or axes to compute minimum along. If none is provided, all axes will be reduced. If the axis is negative, it indexes from the last to the first axis.

  • keepdims (bool) – Keep the axis that is being reduced (True or not (False).

Returns

The reduced tensor containing the minimum of the elements along the axes.

Return type

Tensor

popxl.ops.mul(lhs, rhs)

Multiply two tensors elementwise.

Follows NumPy broadcasting rules. Arguments must have the same dtype.

See also PyTorch Tensor.mul, ONNX Mul.

Parameters
  • lhs (Tensor) – Tensors to be multiplied.

  • rhs (Tensor) – Tensors to be multiplied.

Returns

The product of lhs and rhs.

Return type

Tensor

popxl.ops.negate(t)

Perform elementwise negation (two’s complement) of a tensor.

Parameters

t (Tensor) – The input tensor.

Returns

The output tensor that is the elementwise negation of the input.

Return type

Tensor

popxl.ops.nll_loss(probs, labels, ignore_index=None, reduction='mean', log_prob=False)

Compute the negative log likelihood loss.

Compute the negative log likelihood loss l where probs = softmax(x). The returned loss will be reduced by reduction (default mean) across items in targets. Any item in target equal to ignore_index will not contribute to l or dl/dx.

See also PyTorch nll_loss, ONNX NegativeLogLikelihoodLoss.

Parameters
  • probs (Tensor) – The probabilities. Expected to be the output of softmax().

  • labels (Tensor) – The labels. Target values for the probabilities.

  • ignore_index (Optional[int], optional) – Specify label values that should not contribute to the loss

  • reduction (str) – Specify how to reduce the loss. Defaults to mean. Options mean, sum and none

  • log_prob (bool) – If true input probabilities are logged

Returns

The calculated negative log likelihood loss.

Return type

Tensor

popxl.ops.nll_loss_with_softmax_grad(probs, labels, loss_grad=1, ignore_index=None, reduction='mean')

Compute the negative log likelihood loss.

Compute the negative log likelihood loss l and returns the gradient dE/dx where probs = softmax(x). loss_grad should be the gradient dE/dl, where E is the error from which back propagation is initialised. Typically, E = l therefore in order to return dl/dx the loss_grad should be dl/dl which would be 1.

Parameters
  • probs (Tensor) – The probabilities. Expected to be the output of softmax().

  • labels (Tensor) – The labels. Target values for the probabilities.

  • loss_grad (Tensor) – The gradient, dE/dl. Supports float32 dtypes with float16 probs

  • reduction (ReductionType) – Specify how to reduce the loss. Defaults to mean. Options mean, sum and none

  • ignore_index (Optional[int]) – Specify label values that should not contribute to l or dE/dx. Defaults to None.

Returns

A tuple of the loss and the gradient: (l, dE/dx).

Return type

Tuple[Tensor, Tensor]

popxl.ops.onehot(t, num_classes, values, axis)

Produce a one-hot tensor based on inputs.

See also ONNX OneHot.

Parameters
  • t (Tensor) – Input tensor containing indices.

  • num_classes (Tensor) – Scalar specifying the number of classes in one-hot tensor.

  • values (Tensor) – The value used for filling locations specified in ‘t’ input tensor

  • axis (int) – Axis along which one-hot representation in added.

Returns

Output tensor.

Return type

Tensor

popxl.ops.pow(t, e)

Raise the elements of t to the power of e.

If e is TensorLike, then t[i] will be raised to the power of e[i]. If e is a float or int, all elements will be raised to the power of e. Follows NumPy broadcasting rules.

Parameters
  • t (Tensor) – Input tensor.

  • e (Union[float, int, TensorLike]) – Exponent tensor.

Returns

Output tensor containing the result of t raised to the power of e.

Return type

Tensor

popxl.ops.pow2scale_cast_from_fp8(t, log2_scale, data_type)

Add a fused operation cast(X, dtype) * pow2(log2_scale) to cast from floating point 8 type.

See the PopXL documentation on floating point 8 types for more details.

Parameters
  • t (Tensor) – Tensor to convert.

  • log2_scale (Tensor) – Scalar Tensor to use as the log 2 scale.

  • data_type (dtype) – Data type to convert to. Must be float16 or float32.

Raises

TypeError – If data_type is not of type float16 or float32.

Returns

The converted float16 or float32 tensor.

Return type

Tensor

popxl.ops.pow2scale_cast_to_fp8(t, log2_scale, data_type)

Add a fused operation cast(src * pow2(log2_scale), dtype) to cast to floating point 8 data type.

See the PopXL documentation on floating point 8 types for more details.

Parameters
  • t (Tensor) – Tensor to convert.

  • log2_scale (Tensor) – Scalar Tensor to use as the log 2 scale.

  • data_type (dtype) – Data type to convert to. Must be float8_143 or float8_152

Raises

TypeError – If data_type is not of type float8_143 or float8_152.

Returns

The converted float8 tensor.

Return type

Tensor

popxl.ops.print_tensor(t, title=None, print_self=True, print_gradient=False, summarise_threshold=1000, edge_items=3, max_line_width=75, digits=8, float_format='auto', separator=' ', open_bracket='[', close_bracket=']')

Print a tensor.

The output tensor of this op must be consumed if you want to print the gradient tensor. If the output is not consumed this op does not get pruned when running removeIsolatedTensors.

The default output format will split large lines, print all elements in the same format, pad elements so that they align and summarise large tensors.

Parameters
  • t (Tensor) – The tensor to print.

  • title (str, optional) – Title to print. Defaults to None.

  • print_self (bool, optional) – Print the tensor itself. Defaults to True.

  • print_gradient (bool, optional) – Indicates if the associated gradient tensor of t is also printed (True) or not (False). Defaults to False.

  • summarise_threshold (int) – default 1000. If the number of elements of the tensor exceeds this threshold the output will be summarised. Only the edge elements will be displayed with an ellipsis indicating skipped elements. A value of 0 will disable summarisation.

  • edge_items (int) – default 3. Number of edge elements to include at the beginning and end when summarisation is enabled.

  • max_line_width (int) – default 75. lines longer than this limit will be split across multiple lines. A value of 0 will disable line splitting.

  • digits (int) – default 8. Number of digits to display. For integers this limit can be exceeded if any number is large enough. For floating points this does not include the exponent. The number of digits is used in conjunction analysis of the tensor to determine the width of each element to align all elements when printed. A value of 0 disables this analysis and each elements will be printed in an unaligned format.

  • float_format (str) – default ‘auto’. Determines the floating point format to use. Options: ‘auto’, ‘fixed’, ‘scientific’ and ‘none’. ‘auto’ mode determines the appropriate format based on the data. ‘fixed’ uses fixed point format e.g. -100.00. ‘scientific’ uses scientific notation e.g. -1.123e+10. ‘none’ does not take care to display numbers in the same format. If digits==0 this option is disregarded and the float_format is set to ‘none’

  • separator (str) – default ‘,’. Character used to delininate values.

  • open_bracket (str) – default ‘[’. character used to open a tensor.

  • close_bracket (str) – default ‘]’. Character used to close a tensor.

Raises
  • ValueError – if separator, open_bracket or close_bracket are not a single character.

  • KeyError – if float_format is not one of the amiable options (see parameter docstring above)

Returns

The input tensor, unchanged.

Return type

Tensor

popxl.ops.prod(t, axis=None, keepdims=False)

Compute the product of elements along an axis.

See also PyTorch Tensor.prod, NumPy prod.

Parameters
  • t (Tensor) – Tensor to compute product of.

  • axis (int or list) – Axis or axes to compute product along. If none is provided, all axes will be reduced. If the axis is negative, the product is computed from the last to the first axis.

  • keepdims (bool) – Keep the axis that is being reduced (‘True`) or not (‘False`).

Returns

The reduced tensor.

Return type

Tensor

popxl.ops.random_normal(seed_tensor, shape, mean=0.0, std=1.0, dtype=popxl.dtypes.float32)

Randomly sample from a normal distribution.

The mean and standard deviation of the distribution is specified by mean and std respectively.

Note: not compatible with IPU Model.

Parameters
  • seed_tensor (Tensor) – A tensor used to seed the probability distribution. Must have data type uint32 and shape (2,).

  • shape (Tuple[int, ...]) – The shape of the output tensor.

  • mean (float, optional) – Mean of the distribution. Defaults to 0.0.

  • std (float, optional) – Standard deviation of the distribution. Defaults to 1.0.

  • dtype (dtypes.dtype, optional) – Data type of output tensor. Defaults to dtypes.float32.

Returns

A new tensor with elements sampled from a normal distribution.

Return type

Tensor

popxl.ops.random_uniform(seed_tensor, shape, low=0.0, high=1.0, dtype=popxl.dtypes.float32)

Randomly sample from a uniform distribution.

This operation will sample uniformly from a range with minimum value low and maximum value high.

Note: not compatible with IPU Model.

Parameters
  • seed_tensor (Tensor) – A tensor used to seed the probability distribution. Must have data type uint32 and shape (2,).

  • shape (Tuple[int, ...]) – The shape of the output tensor.

  • low (float, optional) – Minimum value. Defaults to 0.0.

  • high (float, optional) – Maximum value. Defaults to 1.0.

  • dtype (dtypes.dtype, optional) – Data type of output tensor. Defaults to dtypes.float32.

Returns

A new tensor with element values sampled from a uniform distribution.

Return type

Tensor

popxl.ops.relu(t)

Compute the ReLU activation of a tensor.

For more details, refer to Rectifier (neural networks).

See also ONNX Relu.

Parameters

t (Tensor) – The input tensor to calculate the activation of.

Returns

A tensor containing the activations.

Return type

Tensor

popxl.ops.relu_(t)

Compute the ReLU activation of a tensor in place.

For more details, refer to Rectifier (neural networks).

Parameters

t (Tensor) – The input tensor to calculate the activation of.

Returns

The input tensor with the ReLU activation applied to it.

Return type

Tensor

popxl.ops.remote_load(remote_buffer, offset, name=None)

Load a tensor from Streaming Memory.

This operation loads a tensor from the remote buffer residing in Streaming Memory.

The tensor will be loaded from the memory location corresponding to remote_buffer_id (specified in remote_buffer).

The value of offset must be >= 0.

The relationship between offset and remote_buffer_id is described in remote_store().

Note

There is no data dependency in the graph between remote store and remote load. Thus, the remote load operator may end up before the remote store operator in the serialized graph. One way to avoid this is by using with popxl.in_sequence(True).

Parameters
  • remote_buffer (RemoteBuffer) – The handle to the remote buffer.

  • offset (Union[int, Tensor]) – Integer or rank-0 tensor indicating which entry in the remote buffer to load from.

  • name (str) – Name to use for the returned tensor.

Returns

A new tensor loaded from the remote buffer.

Return type

Tensor

popxl.ops.remote_load_(remote_buffer, offset, t)

Load from Streaming Memory into a specified tensor.

This operation loads from the remote buffer in Streaming Memory into an existing tensor.

This op is identical to remote_load, except that the data loaded from the remote buffer will be written to the tensor t.

Note

There is no data dependency (in the graph) between remote store and remote load. Thus, the remote load operator may end up before the remote store operator in the serialized graph. One way to avoid this is by using with popxl.in_sequence(True).

Parameters
  • remote_buffer (RemoteBuffer) – The handle to the remote buffer.

  • offset (Union[int, Tensor]) – Integer or rank-0 tensor indicating which entry in the remote buffer to load from.

  • t (Tensor) – The tensor the loaded data will written to.

Returns

The tensor loaded from the remote buffer

Return type

Tensor

popxl.ops.remote_store(remote_buffer, offset, t)

Store a tensor in Streaming Memory.

This operation stores the input tensor in the remote buffer residing in Streaming Memory.

This op is typically used to store different, identically-shaped tensors to the same remote buffer by specifying the offset.

Instances of the op with matching remote_buffer_id (specified in remote_buffer) will outline together, meaning that if different tensors are to be stored under the same remote buffer ID, a different offset value has to be supplied for each tensor.

remote_buffer handles the relationship between remote_buffer_id, shape and dtype because shape and dtype needs to be fixed for each remote_buffer_id.

The value of offset must be >= 0.

If t is of rank x, the remote buffer with remote_buffer_id will be of rank x+1, where the new dimension (the row) will be of size entries.

Note

There is no data dependency (in the graph) between remote store and remote load. Thus, the remote load operator may end up before the remote store operator in the serialized graph. One way to avoid this is by using with popxl.in_sequence(True).

Parameters
  • remote_buffer (RemoteBuffer) – The handle to the remote buffer.

  • offset (Union[int, Tensor]) – Integer or rank-0 tensor indicating which entry in the remote buffer to store to.

  • t (Tensor) – Tensor to copy and store in the remote buffer.

Return type

None

popxl.ops.rename(t, output_name=None)

Input is equal to the output. This can also be used to rename a Tensor.

Parameters
  • t (Tensor) – Tensor to provide as an output.

  • output_name (str) – Name of output tensor

Returns

equal to input.

Return type

Tensor

popxl.ops.repeat(graph, repeat_count, *inputs, inputs_dict=None)

Repeatedly call a graph.

This operation repeatedly executes a graph repeat_count times. The input tensors are provided as graph inputs for the first iteration.

The inputs and inputs_dict tensors are passed as graph inputs. You can specify an input either positionally using inputs, or via a tensor map using inputs_dict.

Graph inputs are determined when the graph is created using create_graph(callable, ...). The order of inputs will be the same as the order of the tensor inputs in the function signature and the order of called popxl.graph_inputs. See create_graph() for more information.

Between each execution of the subgraph, the N outputs of subgraph will be copied to the first N inputs. These are called loop carried inputs. The number of outputs must be less than or equal to the number of inputs. The remaining inputs will be unchanged throughout the loop iterations (unless modified in place).

Example:

# popxl.Module to repeat
class AddWeight(popxl.Module):
    def __init__(self):
        self.w: popxl.Tensor = None

    def build(self, x):
        self.w = popxl.graph_input(x.shape, x.dtype, "w")
        return self.w + x, w

    with g:  # a graph
        add_weight0 = AddWeight()
        add_weight_graph0 = ir.create_graph(add_weight0, x0)

        # repeat 8 times
        y0, w0 = ops.repeat(add_weight_graph0, 8, x0, inputs_dict={add_weight0.w: w0})

See also PyTorch Tensor.repeat, NumPy repeat.

Parameters
  • graph (Graph) – User defined graph to repeat repeat_count times.

  • repeat_count (int) – Number of times to repeat calling the graph.

  • *inputs (Tensor, List[Tensor], int, float) – Provide inputs via position.

  • inputs_dict (Optional[Mapping[Tensor, Tensor]]) – Provide inputs via a tensor map. Mapping of graph tensor -> parent tensor.

  • check_inputs (bool = True) – If true, then check when called that all inputs have been provided.

  • inputs (Union[Tensor, Iterable[Tensor]]) –

Return type

Tuple[Tensor, …]

Throws:

ValueError: If repeat_count < 0. ValueError: If the number of subgraph inputs < subgraph outputs.

Returns

Tuple of the output tensors of the call in the parent graph.

Return type

Tuple[Tensor, …]

Parameters
popxl.ops.repeat_with_info(graph, repeat_count, *inputs, inputs_dict=None, check_inputs=True)

Repeatedly call a graph and return information about the call site.

This operation repeatedly executes a graph repeat_count number of times. The input tensors are provided as graph inputs for the first iteration.

Returns CallSiteInfo that can be used to inspect callsite inputs/outputs.

The inputs and inputs_dict tensors are passed as graph inputs. You can specify an input either positionally using inputs or via a tensor map using inputs_dict.

Graph inputs are determined when the graph is created using ir.create_graph(callable, ...). The order of inputs will be the same as the order of the tensor inputs in the function signature and the order of called popxl.graph_inputs. See create_graph() for more information.

Between each execution of the subgraph, the N outputs of subgraph will be copied to the first N inputs. These are called loop carried inputs. The number of outputs must be less than or equal to the number of inputs.

Implementation detail: In order to maintain the input / output indices of the subgraph, we must call the user provided subgraph, and create a “middle” subgraph to repeat the user provided subgraph inside:

            LoopOp         Keep
                |          Going  Loop Carried
                |  Iterator  |      Inputs
                |     |      |      | | |    |- Implicit Inputs
                V     V      V      V V V    V
                .-Wrapper_subgraph--+-+-+----+-----.
Parent graph    |                   | | |    |     |
                |                   | | |    |     |
                |                   | | |    |     |
                |                   V V V    |     |
                | CallOp .-Loop_subgraph---. |     |
                |    |   | (user provided) |<-     |
                |    '-->|                 |       |
                |        |     (Ops)       |       |
                |        |                 |       |
                |        |                 |       |
                |        '----------+-+-+--'       |
                |                   | | |          |
                |                   V V V          |
                '---+---------------+-+-+----------'
                    |               | | |
                    |               | | |
                    V               V V V
                    Keep         Loop Carried
                Going              Outputs

Example:

# popxl.Module to repeat
class AddWeight(popxl.Module):
    def __init__(self):
        self.w: popxl.Tensor = None

    def build(self, x):
        self.w = popxl.graph_input(x.shape, x.dtype, "w")
        return self.w + x, w

    with g:  # a graph
        add_weight0 = AddWeight()
        add_weight_graph0 = ir.create_graph(add_weight0, x0)

        # repeat 8 times
        call_info = ops.repeat(
            add_weight_graph0, 8, x0, inputs_dict={add_weight0.w: w0}
        )
        y0, w0 = call_info.outputs()
Parameters
  • graph (Graph) – User defined graph to repeat repeat_count times.

  • repeat_count (int) – Number of times to repeat calling the graph.

  • *inputs (Tensor, List[Tensor], int, float) – Provide inputs via position.

  • inputs_dict (Optional[Mapping[Tensor, Tensor]]) – Provide inputs via a tensor map. Mapping of graph tensor -> parent tensor.

  • check_inputs (bool) – Check when called if all inputs have been provided. Defaults to True.

  • inputs (Union[Tensor, Iterable[Tensor]]) –

Raises
  • ValueError – If repeat_count < 0.

  • ValueError – If the number of explicitly passed inputs + the number of loop created inputs != the number of outputs.

Returns

Information on the created callsite for the repeat op.

Return type

CallSiteInfo

popxl.ops.reshape(t, shape)

Reshape a tensor.

See also PyTorch Tensor.reshape, NumPy reshape, ONNX Reshape.

Parameters
  • t (Tensor) – The tensor to be reshaped.

  • shape (Tuple[int, ...]) – Tuple containing the shape of the output.

Raises

ValueError – A ValueError will be raised if: - An invalid value is encountered in the shape. - If more than -1 is given in shape.

Returns

The reshaped tensor.

Return type

Tensor

popxl.ops.reshape_(t, shape)

Reshape a tensor (in-place).

This is the in-place version of reshape().

Parameters
  • t (Tensor) – The tensor to be reshaped.

  • shape (Tuple[int, ...]) – Tuple containing the shape of the output.

Raises

ValueError – A ValueError will be raised if: - An invalid value is encountered in the shape. - If more than -1 is given in shape.

Returns

An alias of the input tensor, reshaped.

Return type

Tensor

popxl.ops.roi_align(t, rois, batch_index, output_size, spatial_scale, sampling_ratio)

Apply pooling across each region of interest.

This consumes an input tensor t and regions of interest (ROIs) to apply pooling across each ROI. Only supports average pooling. Max pooling is not supported.

Parameters
  • t (Tensor) – Input data tensor from the previous operator; 4-D feature map of shape (N, C, H, W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data.

  • rois (Tensor) – ROIs to pool over. rois is 2-D input of shape (numRois, 4) given as [[x1, y1, x2, y2], …], where numRois is the number of ROIs. The ROI coordinates are in the coordinate system of the input image. Each coordinate set has a 1:1 correspondence with the batch_index input.

  • batch_index (Tensor) – 1-D tensor of shape [numRois,] with each element denoting the index of the corresponding image in the batch.

  • output_size (Tuple[int]) – Pooled output height and width.

  • spatial_scale (float) – Multiplicative spatial scale factor to translate ROI coordinates from their input spatial scale to the scale used when pooling; that is, the spatial scale of the input feature map t relative to the input image.

  • sampling_ratio (int) – Number of sampling points in the interpolation grid used to compute the output value of each pooled output bin.

Returns

ROI pooled output Y, a 4-D tensor of shape (numRois, channels, aligned_height, aligned_width) where aligned_height is the output height and aligned_width is the output height. The r-th batch element Y[r-1] is a pooled feature map corresponding to the r-th ROI t[r-1].

Return type

Tensor

popxl.ops.scaled_add(X, Y, a=1.0, b=1.0)

Perform a scaled addition of two tensors.

Compute the sum of X scaled by a and `Y by b, which means aX + bY.

Does not apply NumPy broadcasting. Uses mixed precision poplibs operations.

X and Y must be the same shape, but can be different types.

a and b must be scalars.

Parameters
  • X (Tensor) – The tensor to be added.

  • Y (Tensor) – The tensor to be added.

  • a (Union[float, Tensor]) – Scalar that will be multiplied by X and Y before performing the addition.

  • b (Union[float, Tensor]) – Scalar that will be multiplied by X and Y before performing the addition.

Returns

A tensor containing aX + bY.

Return type

Tensor

popxl.ops.scaled_add_(X, Y, a=1.0, b=1.0)

Perform a scaled addition of two tensors (in-place).

Compute the sum of X scaled by a and `Y by b. This is performed in place on X, which means that X = aX + bY.

Does not apply NumPy broadcasting. Uses mixed precision poplibs operations.

X and Y must be the same shape, but can be different types.

Parameters
  • X (Tensor) – The tensor to be added.

  • Y (Tensor) – The tensor to be added.

  • a (Union[float, Tensor]) – Scalar that will be multiplied by X and Y before performing the addition.

  • b (Union[float, Tensor]) – Scalar that will be multiplied by X and Y before performing the addition.

Returns

The X tensor containing aX + bY.

Return type

Tensor

popxl.ops.scatter(t, indices, values, axis=0, available_memory_proportion=None)

Update the values of multiple elements in an tensor.

The elements specified by indices are updated with the values in values.

scatter requires the three input tensor to be of the same rank r >= 1. The optional attribute axis identifies the axis of the tensor along which the update will be performed. By default, the outer-most axis, axis 0, is used. The output of the operation is produced by creating a copy of the input tensor, t, and then updating its elements to the values specified by values at the index positions specified by indices. The output shape is the same as the shape of the input tensor.

For each entry in values, the target index in t is obtained by combining the corresponding entry in indices with the index of the entry itself: the index-value for dimension = axis is obtained from the value of the corresponding entry in indices and the index-value for dimension != axis is obtained from the index of the entry itself.

Pseudo-code example:

x1 = x.copy()
scatter(x1, [1, 2, 3], [-1, -2, -3])
x2 = x.copy()
x[1] = -1
x[2] = -2
x[3] = -3
x1 == x2

See also PyTorch Tensor.scatter.

Parameters
  • t (Tensor) – The input tensor.

  • indices (Tensor) – The indices of the elements to update.

  • values (Tensor) – The values to update the tensor with.

  • axis (int) – Which axis to set on. Default is 0.

  • available_memory_proportion (Optional[float]) – The maximum proportion of available memory on each tile that this layer should consume temporarily during the course of the operation. Defaults to 1.0 if not set globally.

Returns

The tensor with updated values.

Return type

Tensor

popxl.ops.scatter_reduce(data, indices, reduction, initial_values=None, axis=0, axis_size=None, enable_index_broadcast=True, available_memory_proportion=None)
Parameters
Return type

Tensor

popxl.ops.shaped_dropout(t, seed_tensor, shape, ratio)

Add a shaped dropout operation to the input tensor.

Applies a shaped dropout to the input tensor t. This operator requires a shape parameter that is used to define the shape of the dropout mask so that strongly correlated features in the input tensor t can be preserved. The shape parameter must be broadcastable to the input tensor t. The dropout mask is created using samples from a Bernoulli distribution seeded with a seed tensor seed_tensor.

Parameters
  • t (Tensor) – The Tensor to apply the shaped dropout operation to.

  • seed_tensor (Tensor) – The Tensor used to seed the probability distribution which generates the dropout mask. Must have data type uint32 and shape [2,].

  • shape (Iterable[int]) – The shape of the dropout mask. This must be broadcastable to the input tensor.

  • ratio (float) – The probability of dropping an input feature. Default = 0.5.

Returns

A new tensor with the shaped dropout applied.

Return type

Tensor

popxl.ops.sign(t)

Return the sign of each element in the Tensor (-1, 0 or 1). NaN values have a sign of 0.

Parameters

t (Tensor) – Input Tensor.

Returns

element-wise sign of input.

Return type

Tensor

popxl.ops.sin(t)

Compute the sine of each element of the input tensor.

See also PyTorch Tensor.sin.

Parameters

t (Tensor) – Input tensor

Returns

Output tensor

Return type

Tensor

popxl.ops.slice(t, start=None, stop=None, step=None, axis=None)

Select elements from a tensor using a slice or multiple slices.

A slice specifies the start (inclusive) and stop (exclusive) index of elements to select. Multiple slices can be specified using a list of items for each parameter (start, stop, step). If step is -1, the slice is performed backwards.

If axis is not specified, each slice will correspond to dimensions 0 to N where N is the number of slices.

Examples:

t == slice(t) == slice(t, axis=1)
slice(t, start=1)  # Slice axis 0 from start index 1
slice(t, start=[1, 2]) == slice(t, start=[1, 2], axis=[0, 1])
slice(t, stop=-2)  # Slice axis 0 upto second last element (exclusive)
slice(
    t, stop=3, step=-1
)  # Slice backwards from last element (inclusive) to third last element (exclusive)

See also ONNX Slice.

Parameters
Returns

A tensor containing the selected slices.

Return type

Tensor

popxl.ops.slice_(t, start=None, stop=None, step=None, axis=None)

Select elements from a tensor, in place, using a slice or multiple slices.

This is the in-place version of slice(). The functionality is the same, but the tensor is sliced in place.

A slice specifies the start (inclusive) and stop (exclusive) index of elements to select. Multiple slices can be specified using a list of items for each parameter (start, stop, step). If step is -1, the slice is performed backwards.

If axis is not specified, each slice will correspond to dimensions 0 to N where N is the number of slices.

Parameters
Returns

An alias of the input tensor containing the selected slices.

Return type

Tensor

popxl.ops.softmax(t, axis)

Normalize the elements of a tensor along specified axes.

This rescales the slices of axis such that all elements are within the range [0, 1] and sum to 1. The output shape and dtype match the input.

See also ONNX Softmax.

Parameters
  • t (Tensor) – Tensor to be normalized.

  • axis (int) – The axis along which the normalization will be computed.

Returns

The normalized tensor.

Return type

Tensor

popxl.ops.split(t, splits, axis=0)

Split a tensor along an axis into a list of tensors.

See also PyTorch Tensor.split, NumPy split, ONNX Split.

Parameters
  • t (Tensor) – Tensor to be split.

  • splits (Union[int, List[int]]) – Either an int which specifies the number of splits or a list of ints specifying the length of each output tensor.

  • axis (int, optional) – Axis to split along. Defaults to 0.

Raises

ValueError – If the split doesn’t equally divide the tensor.

Returns

A list of tensors.

Return type

List[Tensor]

popxl.ops.split_random_seed(seed, n=2)

Produce n random seeds from an initial seed.

Chaining calls to split_random_seed can be used to ensure unique random behaviour across a program. For example:

seed, s1 = ops.split_random_seed(seed)
y = ops.dropout(x, s1)
seed, s2 = ops.split_random_seed(seed)
z = ops.dropout(y, s2)
Parameters
  • seed (Tensor) – Seed tensor used to be produce new seeds. Must have shape=(2,) and dtype=uint32.

  • n (int, optional) – Number of new seeds to produce. Defaults to 2.

Returns

New random seeds.

Return type

Tuple[Tensor, …]

popxl.ops.sqrt(t)

Compute the square root of the elements of a tensor.

If t is negative, then this will return NaN.

Parameters

t (Tensor) – Input tensor.

Returns

Output tensor containing the elementwise square roots of the input tensor.

Return type

Tensor

popxl.ops.squeeze(t, axes=None)

Remove axes of length one from the tensor.

Takes an input axes with a list of axes to squeeze. If axes is not provided, all the single dimensions will be removed from the shape. If an axis is selected with shape entry not equal to one, an error is raised. Implemented using reshape under the hood.

See also PyTorch Tensor.squeeze, NumPy squeeze, ONNX Squeeze.

Parameters
  • t (Tensor) – Tensor to be squeezed.

  • axes (List[int]) – List of integers indicating the dimensions to squeeze. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(t).

Raises

ValueError – A ValueError is raised if: - The axes contains duplicates. - The axis cannot be squeezed.

Returns

The squeezed tensor.

Return type

Tensor

popxl.ops.sub(lhs, rhs)

Subtract two tensors elementwise.

Follows NumPy broadcasting rules. Arguments must have the same dtype.

See also PyTorch Tensor.sub, ONNX Sub.

Parameters
  • lhs (Tensor) – Tensors to be subtracted.

  • rhs (Tensor) – Tensors to be subtracted.

Returns

A tensor containing (lhs - rhs).

Return type

Tensor

popxl.ops.subsample(t, strides)

Subsamples a tensor by selecting every n’th element from each dimension. The subsample count N is provided for each dimension.

Parameters
  • t (Tensor) – The input tensor to subsample.

  • strides (Union[int, Iterable[int]]) – A list of strides for each dimension of the input tensor. If len(strides) < t.rank, the list is padded with ones.

Returns

A subsampled output tensor.

Return type

Tensor

Raises

ValueError – Thrown if the length of the strides list is larger than the rank of the input tensor.

popxl.ops.sum(t, axis=None, keepdims=False)

Sum elements over an axis.

See also PyTorch Tensor.sum, NumPy sum, ONNX Sum.

Parameters
  • t (Tensor) – Tensor to be summed.

  • axis (int or list) – Axis or axes to sum. If none is provided all axes will be reduced. If axis is negative it counts from the last to the first axis.

  • keepdims (bool) – Keep the axis that is being reduced or not.

Returns

The reduced tensor.

Return type

Tensor

popxl.ops.sumsquare(t, axis=None, keepdims=False)

Compute the sum of the squares of tensor elements over an axis.

Parameters
  • t (Tensor) – Tensor to compute the sum of squares from.

  • axis (int or list) – Axis or axes over which to compute the sum of squares. If none is provided all axes will be reduced. If axis is negative it counts from the last to the first axis.

  • keepdims (bool) – Keep the axis that is being reduced or not.

Returns

The reduced tensor.

Return type

Tensor

popxl.ops.swish(t)

Compute the Swish activation of a tensor.

For more details, refer to Rectifier (neural networks).

Parameters

t (Tensor) – The input tensor to calculate the activation of.

Returns

A tensor containing the activations.

Return type

Tensor

popxl.ops.swish_(t)

Compute the Swish activation of a tensor in place.

For more details, refer to Rectifier (neural networks).

Parameters

t (Tensor) – The input tensor to calculate the activation of.

Returns

The input tensor with the Swish activation applied to it.

Return type

Tensor

popxl.ops.tanh(t)

Compute the hyperbolic tangent function elementwise on a tensor.

See also PyTorch Tensor.tanh, NumPy tanh, ONNX Tanh.

Parameters

t (Tensor) – The input tensor.

Returns

A tensor containing the hyperbolic tangent of the elements of the input tensor.

Return type

Tensor

popxl.ops.tied_gather(t, indices, axis=0, available_memory_proportion=None, zero_OOR=False)

Select multiple elements from an array.

Elements are specified given by indices, along a specified axis. Equivalent to numpy.take(). Note that this is different from torch.gather().

Numerically the same as the gather op but does not specify the tile layout of the indices tensor. When preceding a matmul op the tile layout of the indices is determined by the matmul, not the tied_gather. This has a has lower memory footprint but costs extra cycles due to the exchange.

Examples:

x = popxl.variable(np.arange(16).reshape(4, 4))
# [[ 0,  1,  2,  3],
#  [ 4,  5,  6,  7],
#  [ 8,  9, 10, 11],
#  [12, 13, 14, 15]]

gather(x, [3, 1, 2]) == Tensor([x[3], x[1], x[2]])
# [[12, 13, 14, 15],
#  [ 4,  5,  6,  7],
#  [ 8,  9, 10, 11]]

gather(x, [[0, 1], [1, 2]]) == gather(x, [0, 1, 1, 2]).reshape(2, 2, 4)
#  [[[ 0,  1,  2,  3],
#    [ 4,  5,  6,  7]],
#   [[ 4,  5,  6,  7],
#    [ 8,  9, 10, 11]]]
Parameters
  • t (Tensor) – The input tensor.

  • indices (Tensor) – The indices of the elements to extract.

  • axis (int) – The axis to gather on. The default is 0.

  • available_memory_proportion (Optional[float]) – The maximum proportion of available memory on each tile that this layer should consume temporarily during the course of the operation. Defaults to 1.0 if not set globally.

  • zero_OOR (bool) – If False, out of range (OOR) indices will produce garbage data. If True, OOR indices will produce zeros.

Returns

The gathered elements concatenated.

Return type

Tensor

popxl.ops.topk(t, k, axis, largest, sorted, available_memory_proportion=None)

Retrieve the top-K largest or smallest elements along a specified axis.

See also PyTorch torch.topk, ONNX TopK.

Parameters
  • t (Tensor) – Input tensor.

  • k (int) – The number of top elements to retrieve

  • axis (int) – Dimension on which to do the sort.

  • largest (bool) – Whether to return the top-K largest or smallest elements.

  • sorted (bool) – Whether to return the elements in sorted order.

  • available_memory_proportion (Optional[float]) – Optional[float] The maximum proportion of available memory on each tile that this layer should consume temporarily during the course of the operation. This value is used by the grad operator only. In other words value of that parameter is irrelevant to the inference use case. Defaults to 1.0 if not set globally.

Returns

A tuple of output values and indices.

Return type

Tuple[Tensor, Tensor]

popxl.ops.transpose(t, permutation=None)

Permute the axes of a tensor.

By default this operation reverses the axes of t.

See also PyTorch Tensor.transpose, NumPy transpose, ONNX Transpose.

Parameters
  • t (Tensor) – Tensor to be transposed.

  • permutation (Optional[Iterable[int]]) – Iterable containing the permutation of [0, N-1] where N is the rank of input t. If not specified, the axes will be reversed.

Returns

The transposed tensor.

Return type

Tensor

popxl.ops.transpose_(t, permutation=None)

Permute the axes of a tensor in place.

By default this operation reverses the axes of t.

This is the in-place version of transpose(). The behaviour is the same, but it modifies the tensor in place.

See also PyTorch Tensor.transpose_.

Parameters
  • t (Tensor) – Tensor to be transposed.

  • permutation (Optional[Tuple[int, ...]]) – Tuple containing the permutation of [0, N-1] where N is the rank of input t. If not provided, the axes will be reversed.

Returns

The transposed input tensor.

Return type

Tensor

popxl.ops.where(condition, lhs, rhs)

Elementwise selection based on satisfying a condition.

Choose elements from lhs or rhs depending on whether the corresponding element in condition is satisfied or not. The operator supports multi-directional broadcasting (NumPy style).

See also PyTorch Tensor.where, NumPy where, ONNX Where.

Parameters
  • condition (Tensor) – A boolean tensor where True indicates the lhs element and False the rhs element. The tensor will be cast to a bool if necessary.

  • lhs (Tensor) – The left hand side operand.

  • rhs (Tensor) – The right hand side operand.

Returns

The tensor containing elementwise lhs if condition else rhs.

Return type

Tensor

class popxl.ops.collectives.CommGroup

Class to specify sub-groups of replicas.

Examples of derived sub-groups: - IPU-link domain sub-rack:

where N is power of two and replicaGroupSize > 1.

  • Complete IPU-link domain / full rack:

  • Using GW-links only:

__init__(*args, **kwargs)

Overloaded function.

  1. __init__(self: popart_internal_ir.CommGroup) -> None

  2. __init__(self: popart_internal_ir.CommGroup, type: popart_internal_ir.CommGroupType, replicaGroupSize: int) -> None

  3. __init__(self: popart_internal_ir.CommGroup, grouping: popart_internal_ir.ReplicaGrouping) -> None

property replicaGroupSize

Replica group size.

toReplicaGrouping(self: popart_internal_ir.CommGroup, numReplicas: int) popart_internal_ir.ReplicaGrouping
property type

Replica group type.

class popxl.ops.collectives.CommGroupType

PopART equivalent of GCL CommGroupType. Each of these enumeration constants have a corresponding GCL CommGroupType value.

Members:

All : All replicas viewed as one group, replica group size is ignored. */

Consecutive : Groups are consecutive in replica.

If there are N replicas denoted {0, … N-1} and group size is k, then there are N/k groups of size k:

{0, 1, … k-1}, {k, … 2k-1} … {N-k-1, … N-1}

Orthogonal : Groups are sliced orthogonal to the replica ordering.

If there are N replicas denoted {0, … N-1} and group size is k, then there are m = N/k groups of size k:

{0, m, 2m, …}, {1, m+1, 2m+1, …} … {m-1, 2m-1, … N-1}

Ungrouped : Each replica is in it’s own group, replica group size is ignored. */

All = <CommGroupType.All: 0>
Consecutive = <CommGroupType.Consecutive: 1>
Orthogonal = <CommGroupType.Orthogonal: 2>
Ungrouped = <CommGroupType.Ungrouped: 3>
__init__(self: popart_internal_ir.CommGroupType, value: int) None
property name
property value
popxl.ops.collectives.all_reduce(ts, ipus=None, op='add')

Allreduce tensors across IPUs within a replica.

Currently only the add reduce op is supported by autodiff.

Parameters
  • ts (List[Tensor]) – Tensors to reduce

  • ipus (Optional[List[int]]) – IPUs the tensors are located on. If None, the op will try and infer.

  • op (str) – The reducing op. Options: add, mean, mul, min, max, and, or, square_add, local.

Returns

Output Tensors. The data of each tensor is identical on the IPUs corresponding to ipus

Return type

List[Tensor]

popxl.ops.collectives.all_reduce_identical_grad_inputs(ts, ipus=None, op='add')

Allreduce tensors across IPUs within a replica where the grad tensors of the corresponding grad op are identical.

This means that this op is an all-reduce and the corresponding grad op an identity.

Currently only the add reduce op is supported by autodiff.

The AllReduceToIdentityPattern pattern must be run for this op to function correctly.

Parameters
  • ts (List[Tensor]) – Tensors to reduce

  • ipus (Optional[List[int]]) – IPUs the tensors are located on. If None the op will try and infer.

  • op (str) – The reducing op. Options: add, mean, mul, min, max, and, or, square_add, local.

Returns

Output Tensors. Each Tensors data is identical on a IPU corresponding to ipus

Return type

List[Tensor]

popxl.ops.collectives.all_reduce_identical_inputs(ts, ipus=None, op='add')

Allreduce tensors across IPUs within a replica where the input tensors are identical.

This means the op is an identity but the corresponding grad op is an allreduce.

Currently only the add reduce op is supported by autodiff.

The AllReduceToIdentityPattern pattern must be run for this op to function correctly.

Parameters
  • ts (List[Tensor]) – Tensors to reduce

  • ipus (Optional[List[int]]) – IPUs the tensors are located on. If None the op will try and infer.

  • op (str) – The reducing op. Options: add, mean, mul, min, max, and, or, square_add, local.

Returns

Output Tensors. Each Tensors data is identical on a IPU corresponding to ipus

Return type

List[Tensor]

popxl.ops.collectives.replica_sharded_slice(t, group=None)

Take the replicated tensor sharded slice of a Tensor.

Parameters
  • t (Tensor) – Tensor to be reduced. Inputs will be flattened.

  • group (Optional[CommGroup]) – Replicas to shard across. Defaults to All replicas.

Returns

A slice of the tensor. Always a 1D tensor.

Return type

Tensor

popxl.ops.collectives.replicated_all_gather(t, axis=0, group=None, output_shape='auto')

Gather a tensor across replicas such that the output tensor contains the values of the tensor from each replica.

The shape of the output tensor is determined by the value of output_shape:

  • new_axis: the output shape is (group.size, *t.shape)

  • concat: the output shape has the same behavior as concat on axis

  • meta_shape: the output shape is t.meta_shape

  • auto: if the input has a meta-shape meta_shape is chosen, otherwise concat

This op is auto-differentiable and it’s corresponding grad op is an replicated_slice (except when output_shape==meta_shape).

Parameters
  • t (Tensor) – Tensor to be gathered.

  • axis (int) – Axis to gather and concatenate values when using ‘concat’ mode

  • group (Optional[ReplicaGrouping]) – Replicas to gather from. Defaults to All replicas.

  • output_shape (str) – see above for details. Choose ‘new_axis’, ‘concat’, ‘meta_shape’ or ‘auto’.

Returns

Gathered tensor.

Return type

Tensor

Raises

ValueError – if output_shape is not one of ‘new_axis’, ‘concat’, ‘meta_shape’ or ‘auto’.

popxl.ops.collectives.replicated_all_reduce(t, op='add', group=None)

Reduce a tensor across replicas.

Parameters
  • t (Tensor) – Tensor to be reduced

  • op (str, optional) – Operation to reduce with. Defaults to ‘add’. Options: ‘add’, ‘mean’, ‘mul’, ‘min’, ‘max’, ‘and’, ‘or’, ‘square_add’.

  • group (Optional[ReplicaGrouping]) – Replicas to reduce across. Defaults to All replicas.

Returns

Reduced tensor

Return type

Tensor

popxl.ops.collectives.replicated_all_reduce_(t, op='add', group=None)

Reduces tensor t across replicas inplace on t.

Parameters
  • t (Tensor) – Tensor to be reduced

  • sharding. (operations for replicated tensor) –

  • op (str, optional) – Operation to reduce with. Defaults to ‘add’. Options: ‘add’, ‘mean’, ‘mul’, ‘min’, ‘max’, ‘and’, ‘or’, ‘square_add’.

  • group (Optional[ReplicaGrouping]) – Replicas to reduce across. Defaults to All replicas.

Returns

Reduced tensor

Return type

Tensor

popxl.ops.collectives.replicated_reduce_scatter(t, op='add', group=None, configure_output_for_replicated_tensor_sharding=False)

Reduce a tensor across replicas with each replica receiving a unique slice of the tensor.

Parameters
  • t (Tensor) – Tensor to be reduced. Inputs will be flattened.

  • op (str, optional) – Operation to reduce with. Defaults to ‘add’. Options: ‘add’, ‘mean’, ‘mul’, ‘min’, ‘max’, ‘and’, ‘or’, ‘square_add’.

  • group (Optional[CommGroup]) – Replicas to reduce across. Defaults to All replicas.

  • configure_output_for_replicated_tensor_sharding (Optional[bool]) – Configures the output to be a replica sharded tensor. Defaults to false. Replicated tensor sharded tensors do not follow the data element order of the original tensor, and can only be used in operations that belong to the same replicated tensor sharding group, where all tensor inputs follow the same data order.

Returns

A slice of the reduced tensor. Always a 1D tensor.

Return type

Tensor

popxl.ops.collectives.replicated_slice(t, axis=0, group=None)

Each replica takes a equal slice of t split along axis axis. e.g. if t has shape (2,4), there are two replicas and axis==0: the first replica will output [0:1, ...] and the second replica [1:2, ...].

This op is similar to replica_sharded_slice but differs in that it maintains the output shape and does not configure the output for replicated tensor sharding.

This op is auto-differentiable and it’s corresponding grad op is an replicated_all_gather.

Parameters
  • t (that slice) – Tensor to split

  • axis (int) – Axis to slice along

  • group (Optional[ReplicaGrouping]) – Replica grouping that determines group of replicas

  • t

Returns

A slice of the tensor.

Return type

Tensor

Raises

ValueError – if the group size does not equally divide the axis size

popxl.ops.var_updates.accumulate_(t, X, f=None)

Update (in-place) tensor t given updater values X and a factor f according to t = t + (f * X).

Does not apply NumPy broadcasting. Uses mixed precision PopLibs operations. t and X must have the same shape, but can be different types. f must be scalar.

Parameters
  • t (Tensor) – Tensor Tensor to be updated.

  • X (Tensor) – Tensor Value to update the tensor with.

  • f (Optional[Union[float, Tensor]]) – Optional[Union[float, Tensor]] Optional scalar to apply to X before the addition.

Returns

An alias to the updated tensor.

Return type

Tensor

popxl.ops.var_updates.accumulate_mean_(t, X, step)

Update (in-place) tensor t given updater values X and a factor f according to t = (step/(step+1)) * t + (1/(step+1)) * X.

Intended to be used to keep track of the mean of a series of values.

For example:

with g:
    accum = popxl.variable(0, dtype=popxl.float32)
    a = popxl.variable(1, dtype=popxl.float32)
    b = popxl.variable(2, dtype=popxl.float32)
    accumulate_mean(accum, a, 0.0)
    accumulate_mean(accum, b, 1.0)

will result in accum having the value (a+b)/2 = 1.5.

Does not apply NumPy broadcasting. Uses mixed precision PopLibs operations. t and X must have the same shape, but can be different types. step must be scalar.

Parameters
  • t (Tensor) – Tensor to be updated.

  • X (Tensor) – Value to update the tensor with.

  • step (Union[float, Tensor]]) – The number of previously accumulated values.

Returns

An alias to the updated tensor.

Return type

Tensor

popxl.ops.var_updates.accumulate_moving_average_(t, X, f)

Update (in-place) tensor t given updater values X and a factor f according to t = (f * t) + ((1-f) * X).

Does not apply NumPy broadcasting. Uses mixed precision PopLibs operations. t and X must have the same shape, but can be different types. f must be scalar.

Parameters
  • t (Tensor) – Tensor Tensor to be updated.

  • X (Tensor) – Tensor Value to update the tensor with.

  • f (Union[float, Tensor]) – Union[float, Tensor] Scalar to apply before the addition.

Returns

An alias to the updated tensor.

Return type

Tensor

popxl.ops.var_updates.accumulate_moving_average_square_(t, X, f)

Update (in-place) tensor t given updater values X and a factor f according to t = (f * t) + ((1-f) * X^2).

Does not apply NumPy broadcasting. Uses mixed precision PopLibs operations. t and X must have the same shape, but can be different types. f must be scalar.

Parameters
  • t (Tensor) – Tensor Tensor to be updated.

  • X (Tensor) – Tensor Value to update the tensor with.

  • f (Union[float, Tensor]) – Union[float, Tensor] Scalar to apply to before the addition.

Returns

An alias to the updated tensor.

Return type

Tensor

popxl.ops.var_updates.accumulate_square_(t, X, f=1.0)

Update (in-place) tensor t given updater values X and a factor f according to t = t + (f * X^2).

Does not apply NumPy broadcasting. Uses mixed precision PopLibs operations. t and X must have the same shape, but can be different types. f must be scalar.

Parameters
  • t (Tensor) – Tensor Tensor to be updated.

  • X (Tensor) – Tensor Value to update the tensor with.

  • f (Union[float, Tensor]) – Optional[Union[float, Tensor]] Optional scalar to apply to X before the addition.

Returns

An alias to the updated tensor.

Return type

Tensor

popxl.ops.var_updates.accumulator_scale_(t, f)

Scale a tensor in-place.

This op will directly zero the input tensor if the factor is const and 0.

Does not apply NumPy broadcasting. Uses mixed precision PopLibs operations.

Parameters
  • t (Tensor) – Tensor Tensor to be updated.

  • f (Union[float, Tensor]) – Union[float, Tensor] The scalar to multiply t by. If a float this will be a const multiplication, otherwise will multiply by the values of the (non-const) tensor f elementwise.

Returns

An alias to the updated tensor.

Return type

Tensor

popxl.ops.var_updates.accumulator_zero_(t)

Zero the input tensor.

This is an AccumulatorScaleOp with a factor of 0, and this zeroes the input tensor.

Parameters

t (Tensor) – Tensor Tensor to be zeroed.

Returns

An alias to the input tensor.

Return type

Tensor

popxl.ops.var_updates.adam_updater(acc_first_order, acc_second_order, weight=None, time_step=None, weight_decay=None, beta1=None, beta2=None, epsilon=1e-07)

Calculate an updater term to update the weights for Adam.

Accumulated bias corrected first order momentum (FP16/FP32) mc:

mc = m / (1 - b1 ** t)

Without correction:

mc = m

Accumulated bias corrected second order momentum (FP16/FP32) vc:

vc = v / (1 - b2 ** t)

Without correction:

vc = v

Updater term (FP16/FP32, with weight decay mode: decay >0.0 and wd > 0.0) x:

x = mc / (sqrt(vc) + eps) + wd * w

Updater term (FP16/FP32, without weight decay mode: decay) x:

x = mc / (sqrt(vc) + eps)

Note

time_step will be incremented by 1.

Parameters
  • acc_first_order (Tensor) – Tensor (m) First order momentum (FP16/FP32).

  • acc_second_order (Tensor) – Tensor (v) Second order momentum (FP16/FP32).

  • weight (Optional[Tensor]) – Optional[Tensor] (w) Weight. Only required for weight_decay.

  • time_step (Optional[Tensor]) – Tensor (t) Time step. Providing this tensor enables bias correction.

  • weight_decay (Optional[Union[float, Tensor]]) – Optional[Union[float, Tensor]] = None Optional scalar to apply weight decay.

  • beta1 (Optional[Union[float, Tensor]]) – Optional[Union[float, Tensor]] = None Only required in bias correction for m

  • beta2 (Optional[Union[float, Tensor]]) – Optional[Union[float, Tensor]] = None Only required in bias correction for v

  • epsilon (Union[float, Tensor]) – Union[float, Tensor] = 1e-07 Scalar to calculate updater.

Raises
  • ValueError – If weight_decay is set and weight is None.

  • ValueError – If time_step set to None and beta1 and beta2 are not set (no bias correction can take place).

Returns

An updater to update the weight for Adam.

Return type

Tensor

popxl.ops.var_updates.adam_var_update(t, x, r1, r2, learning_rate=None, max_weight_norm=None)

Calculate the updated weight tensor for Adam or LAMB.

  • x = updater term (see adamupdater())

  • lr = learning rate

  • max_weight_norm = max weight norm (c.f. \(\phi\) or scaling function in Lamb paper)

  • r1 = (Lamb) L2 norm of the weight (w)

  • r2 = (Lamb) L2 norm of the updater term (x)

Lamb r1 (FP32): \(r1 = ||w||_2\) (without Lamb or \(\phi (r1) == 0: r1/r2 = 1\))

Special case: replicated weight sharding; every replica only stores a shard of w, therefore the sum-of-squares is computed replicated, and thereafter all-reduced before every replica takes the square root of r1sq.

Lamb r2 (FP32): \(r2 = ||x||_2\) (without Lamb or \(r2 == 0: r1/r2 = 1\))

Special case: replicated weight sharding; every replica only stores a shard of x, therefore the sum-of-squares is computed replicated, and thereafter all-reduced before every replica takes the square root of r2sq.

Scale factor: \(\phi (r1) = min(r1, max_weight_norm)\)

Variable update: \(w -= (\phi (r1) / r2) * lr * x\) where \(\phi (r1) / r2\) is the Lamb trust ratio.

Parameters
  • t (Tensor) – The weight to update.

  • x (Tensor) – The updater term.

  • r1 (Tensor) – The r1 squared input tensor.

  • r2 (Tensor) – The r2 squared input tensor.

  • learning_rate (Optional[Union[float, Tensor]]) – Optional learning rate tensor to use. Will be constant if this argument is a float or None. Defaults to None.

  • max_weight_norm (Optional[Union[float, Tensor]]) – Optional max weight tensor to use. Will be constant if this argument is is a float or None. Defaults to None.

Returns

The updated weight tensor.

Return type

Tensor

popxl.ops.var_updates.adamax_updater(acc_first_order, acc_second_order, weight=None, time_step=None, weight_decay=None, beta1=0.9, epsilon=1e-07)

Calculate an updater term to update the weights for Adamax.

Accumulated bias corrected first order momentum (FP16/FP32) mc:

mc = m / (1 - b1 ** t)

Updater term (FP16/FP32, with weight decay mode: decay > 0.0 and wd > 0.0) x:

x = mc / (vc + eps) + wd * w

Updater term (FP16/FP32, without weight decay mode: decay) x:

x = mc / (vc + eps)

Note

time_step will be incremented by 1.

Parameters
  • acc_first_order (Tensor) – First order momentum (FP16/FP32) (m).

  • acc_second_order (Tensor) – Second order momentum (FP16/FP32) (v).

  • weight (Optional[Tensor]) – Weight (w). Only required for weight_decay.

  • time_step (Tensor) – Time step (t).

  • weight_decay (Optional[Union[float, Tensor]]) – Optional scalar to apply weight decay. Defaults to None

  • beta1 (Union[float, Tensor]) – Scalar to do bias correction for m. Defaults to 0.9

  • epsilon (Union[float, Tensor]) – Scalar to calculate updater. Defaults to 1e-07

Raises
Returns

An updater to update the weight for Adamax.

Return type

Tensor

popxl.ops.var_updates.copy_var_update_(t, X)

Update a tensor in-place by copying the tensor containing the updater values.

Parameters
  • t (Tensor) – Tensor Tensor to be updated.

  • X (Tensor) – Tensor Value to update the tensor with.

Returns

An alias to the updated variable.

Return type

Tensor

popxl.ops.var_updates.lamb_updater(acc_first_order, acc_second_order, weight=None, time_step=None, weight_decay=None, beta1=None, beta2=None, epsilon=1e-07)

Calculate an updater term to update the weights for LAMB.

Accumulated bias corrected first order momentum (FP16/FP32) mc:

mc = m / (1 - b1 ** t) (without correction: mc = m)

Accumulated bias corrected second order momentum (FP16/FP32) vc:

vc = v / (1 - b2 ** t) (without correction: vc = v)

Updater term (FP16/FP32, with weight decay mode: decay > 0.0 and wd > 0.0) x:

x = mc / (sqrt(vc) + eps) + wd * w

Updater term (FP16/FP32, without weight decay mode: decay) x:

x = mc / (sqrt(vc) + eps)

Note

time_step will be incremented by 1.

Parameters
  • acc_first_order (Tensor) – First order momentum (FP16/FP32) (m).

  • acc_second_order (Tensor) – Second order momentum (FP16/FP32) (v).

  • weight (Optional[Tensor], optional) – Weight (w). Only required for weight_decay. Defaults to None.

  • time_step (Optional[Tensor], optional) – Time step (t). Providing this tensor enables bias correction. Defaults to None.

  • weight_decay (Optional[Union[float, Tensor]], optional) – Optional scalar to apply weight decay. Defaults to None.

  • beta1 (Optional[Union[float, Tensor]], optional) – Only required in bias correction for m. Defaults to None.

  • beta2 (Optional[Union[float, Tensor]], optional) – Only required in bias correction for v. Defaults to None.

  • epsilon (Union[float, Tensor], optional) – Scalar to calculate updater. Defaults to 1e-07.

Raises
  • ValueError – If weight_decay is set and weight is None.

  • ValueError – If time_step is set to None and beta1 and beta2 are not set (no bias correction can take place).

Returns

An updater to update the weight for LAMB.

Return type

Tensor

popxl.ops.var_updates.sparse_accumulate_(t, X, indices, axis=0, f=None, W=None)

Apply a sparse accumulate operation to a tensor.

Does not apply NumPy broadcasting. Uses mixed precision PopLibs operations. t and X must have the same shape, but can be different types.

Detail:

Assume you have:

w -> Gather -> x

and when the optimiser step is grown:

dW <- GatherGrad <- x
\
Accumulate -> accum'
/
accum

GatherGrad is essentially a scatter operation. Then we Accumulate the resultant dW on accum. This involves creating an extra dW tensor, so we can do the following instead:

            x
            |
            V
accum -> SparseAccumulate -> accum'

SparseAccumulate can accumulate the slices of x into accum as required, in one operation, without extra requiring extra memory.

When calling this op, the input tensor W is an optional input. This can be used when two different views of the weight are consumed in the forward pass, and one of those ops is a Gather, thus requiring a SparseAccumulate in the weight update step.

We connect the op to the other view of the weight instead of the view this SparseAccumulate is for. Then, the lowering will clone that tensor (and its layout) when creating accum.

Parameters
  • t (Tensor) – Tensor to be updated

  • X (Tensor) – Value to update the tensor with.

  • indices (Tensor) – The indices of the scatter operation.

  • axis (int, optional) – Which axis to set on. Default is 0.

  • f (Optional[Union[float, Tensor]], optional) – Optional scalar to apply to update before the addition. Defaults to None.

  • W (Optional[Tensor], optional) – Tile mapping reference tensor for t to be cloned from.

Returns

An alias to the updated tensor.

Return type

Tensor