17.11. Tensors

class popxl.Tensor

property T: popxl.tensor.Tensor: Return the tensor transposed with reversed axes.

property T_: popxl.tensor.Tensor: Return the tensor transposed with reversed axes in-place.

__init__(): Representation of a tensor.

copy_to_ipu(destination, source=None)

Copy a tensor to an IPU.

Parameters

destination (int) – ID of the IPU to copy the tensor to.
source (Optional[int]) – ID of the IPU to copy the tensor from. By default, the source will be taken from the producer of the tensor. If the tensor does not have a producer a source must be provided.

Return type

Tensor

detach()

Return the detached tensor.

Return type: Tensor

detach_()

Return this tensor detached inplace.

Return type: Tensor

property dtype: popxl.dtypes.dtype

flatten()

Return ops.flatten(self).

Return type: Tensor

flatten_()

Return ops.flatten_(self) inplace.

Return type: Tensor

property id: str: Fully-qualified identifier of the tensor (for example, ‘graph1/Gradient___x’).

property in_sync_with_ipu: bool

Check whether the host side buffer data is in sync with the data on the IPU device.

This only applies to variable tensors which can become out of sync if session.weights_from_host and session.weights_to_host are not called. Without a transfer from device to host and visa-versa the buffers and the data on the IPU can fall out of sync after either is updated.

property ipu: int

Return the IPU that the tensor is assigned to.

Raises: UndefinedValue – If the IPU is undefined.

property ir: Ir: Return the Ir that the tensor is a member of.

property location_info

property meta_shape: Tuple[int, ...]

Return the meta shape of the tensor.

The meta shape of the tensor can be used, for example, to store the original tensor shape before replicated tensor sharding was applied.

property name: str: Id of the tensor with the graph scope removed (for example, ‘Gradient___x’).

property nelms: int: Return the total number of elements in this tensor.

property rank: int: Return the total number of dimensions in this tensor.

reshape(shape)

Return ops.reshape(self, shape).

Parameters: shape (Iterable[int]) –
Return type: Tensor

reshape_(shape)

Return ops.reshape_(self, shape) inplace.

Parameters: shape (Iterable[int]) –
Return type: Tensor

property scope: str: Graph scope component of the tensor’s identifier (for example, ‘graph1’).

property shape: Tuple[int, ...]: Return a tuple of the shape of the tensor.

property spec: popxl.tensor.TensorSpec: Return a TensorSpec instance using the properties of this tensor.

strides(shape=None)

Get the strides of the tensor.

The strides of the tensor is the number of bytes to step in each dimension when traversing an array in memory. See numpy.ndarray.strides.

Returns: The strides of the tensor.
Return type: List[int]

property tile_set: Literal['compute', 'io']

Return the tile set (compute or io) that the tensor is assigned to.

Raises: UndefinedValue – If the tile set is undefined.

transpose(permutation=None)

Permute the axes of a tensor.

By default this operation reverses the axes of the tensor.

Parameters: permutation (Optional[Iterable[int]]) – Iterable containing the permutation of [0, N-1] where N is the rank of the tensor. If not provided, the axes will be reversed.
Returns: The transposed tensor.
Return type: Tensor

transpose_(permutation=None)

Permute the axes of a tensor in place.

By default this operation reverses the axes of the tensor.

This is the in-place version of transpose(). The behaviour is the same, but it modifies the tensor in place.

Parameters: permutation (Optional[Tuple[int, ...]]) – Tuple containing the a permutation of [0, N-1] where N is the rank of input t. If not provided, the axes will be reversed.
Returns: The transposed tensor.
Return type: Tensor

popxl.constant(data, dtype=None, name=None, downcast=True)

Return a constant tensor.

A constant tensor that is initialised with data during graph creation.

This tensor cannot change during the runtime of a model. The intended use of this class is when doing operations between popxl.Tensor instances and other types, such as numpy.ndarray objects, numbers, or list or tuples of numbers.

Example:

import popxl

ir = popxl.Ir()
with ir.main_graph:
    a = popxl.constant(0)
    # The `1` will be implicitly converted to a `Constant`.
    b = a + 1

Parameters

data (HostScalarTensor) – The data used to initialise the tensor. This can be an np.array, or a value numpy can use to construct an np.ndarray.
dtype (Optional[dtype]) – The data type of the tensor to be created. If not specified, NumPy will infer the data type and downcast to 32 bits if necessary.
name (Optional[str]) – The name of the tensor. Defaults to None.
downcast (bool) – Whether or not to downcast the data to the given dtype.

Return type

Constant

popxl.remote_replica_sharded_variable(data, remote_buffer, offset=0, dtype=None, name=None, downcast=True, replica_grouping=None, retrieval_mode='one_per_group')

Create a variable Tensor that is stored in remote memory.

The variable is scattered in equal shards across replicas (replicated tensor sharding (RTS) data parallelism) of the same model/graph. Eliminates redundant data storage when the full (un-sharded) tensor does not need to be present on each IPU. Stores the full tensor in remote memory (usually DDR memory).

Replicated tensors for which each replica needs a full copy, need to be recombined with a replicated AllGather operation.

Fully updated tensors that need to be sharded and/or reduced again require a replicated ReduceScatter operation.

Parameters

data (HostScalarTensor) – The data used to initialise the tensor. This can be an np.ndarray, or a value numpy can use to construct an np.ndarray.
remote_buffer (RemoteBuffer) – The handle to the remote buffer.
offset (int) – The offset to index the tensor shard in the remote tensor.
dtype (Optional[dtype]) – The data type of the tensor to be created, if not specified Numpy will infer the data type and be downcasted to 32 bits if necessary.
name (Optional[str]) – The name of the tensor. Defaults to None.
downcast (bool) – If no dtype is provided 64 bit float/ints will be downcasted to 32 bit variants. Default to True.
replica_grouping (Optional[ReplicaGrouping]) – The grouping of replicas to use when getting and setting variable values. Generally it makes sense to group replicas together that are guaranteed to agree on value based on the collective operations you add to the IR. Replicas within a group are always initialised with a shared value and by default, when retrieving values from replicas, only one value is returned per group. By default all replicas are in one group.
retrieval_mode (Optional[Literal["one_per_group", "all_replicas"]]) –
One of:
- ”one_per_group”: Return only the first replica’s variable per group, this is the default behaviour.
- ”all_replicas”: Return all replica’s variables in every group.
Defaults to “one_per_group”.

Raises

RuntimeError – If a non-default replica group is used.
ValueError – If replication has not been enabled.
ValueError – If the number of elements of var is not divisible by the number of replicas.
ValueError – if the variable shape or dtype does not match remote buffer’s

Returns

The remote sharded variable.

Return type

Variable

popxl.remote_variable(data, remote_buffer, offset=0, dtype=None, name=None, downcast=True, replica_grouping=None, retrieval_mode='one_per_group')

Create a variable Tensor that is stored in remote memory.

Parameters

data (HostScalarTensor) – The data used to initialise the tensor. This can be an np.ndarray, or a value numpy can use to construct an np.ndarray.
remote_buffer (RemoteBuffer) – The remote buffer to store the variable.
offset (int) – The offset into the entries of the remote buffer to store the variable. Defaults to 0
dtype (Optional[dtype]) – The data type of the tensor to be created, if not specified Numpy will infer the data type and be downcasted to 32 bits if necessary.
name (Optional[str]) – The name of the tensor. Defaults to None.
downcast (bool) – If no dtype is provided 64 bit float/ints will be downcasted to 32 bit variants. Default to True.
replica_grouping (Optional[ReplicaGrouping]) – The grouping of replicas to use when getting and setting variable values. Generally it makes sense to group replicas together that are guaranteed to agree on value based on the collective operations you add to the IR. Replicas within a group are always initialised with a shared value and by default, when retrieving values from replicas, only one value is returned per group. By default all replicas are in one group.
retrieval_mode (Optional[Literal["one_per_group", "all_replicas"]]) –
One of:
- ”one_per_group”: Return only the first replica’s variable per group, this is the default behaviour.
- ”all_replicas”: Return all replica’s variables in every group.
Defaults to “one_per_group”.

Raises

RuntimeError – If a non-default replica group is used.
ValueError – If the variable shape or dtype does not match remote buffer’s.

Returns

The remote variable.

Return type

Variable

popxl.replica_sharded_variable(data, dtype=None, name=None, downcast=True, replica_grouping=None, shard_over=None, retrieval_mode='one_per_group')

Scatter a tensor in equal shards across replicas (data parallelism) of the same model/graph.

Eliminates redundant data storage when the full (un-sharded) tensor does not need to be present on each IPU. Does not store the full tensor in remote memory.

Parameters

data (HostScalarTensor) – The data used to initialise the tensor. This can be an np.ndarray, or a value numpy can use to construct an np.ndarray.
dtype (Optional[dtype]) – The data type of the tensor to be created, if not specified Numpy will infer the data type and be downcasted to 32 bits if necessary.
name (Optional[str]) – The name of the tensor. Defaults to None.
downcast (bool) – If no dtype is provided 64 bit float/ints will be downcasted to 32 bit variants. Default to True.
replica_grouping (Optional[ReplicaGrouping]) – The grouping of replicas to use when getting and setting variable values. Generally it makes sense to group replicas together that are guaranteed to agree on value based on the collective operations you add to the IR. Replicas within a group are always initialised with a shared value and by default, when retrieving values from replicas, only one value is returned per group. By default all replicas are in one group.
shard_over (Optional[int], optional) – The number of replicas in each replica group to shard over. Defaults to all replicas in the group. Note, when there are multiple instances, this group can span instances. If the replica grouping size is 4, and shard_over is 4, the value of the variable for each group is sharded over all 4 replicas in that group. If the replica grouping size is 4, and shard_over is 2, the value of each group will be sharded once over group members 0 and 1, and once over group members 2 and 3. The replica grouping size must be divisible by shard_over.
retrieval_mode (Optional[Literal["one_per_group", "all_replicas"]]) –
One of:
- ”one_per_group”: Return only the first replica’s variable per group, this is the default behaviour.
- ”all_replicas”: Return all replica’s variables in every group.
Defaults to “one_per_group”.

Returns

A tuple of tensors:

The full variable. This should NOT be used directly. It can be used to interact with Session’s get/set data methods
The sharded variable.

Return type

Tuple[Variable, Tensor]

popxl.graph_input(shape, dtype, name=None, by_ref=False, meta_shape=None)

Create a new input tensor to the current graph.

You can use this function when defining a graph to create a new input tensor. When you call that graph, you will have to pass a tensor to the graph for this input.

Example:

import popxl


def add_w(x):
    w = popxl.graph_input(x.shape, x.dtype, "w")
    return w + x


ir = popxl.Ir()
with ir.main_graph:
    w = popxl.variable(1)
    x = popxl.variable(3)
    add_w_graph = ir.create_graph(add_w, x, w)
    (y,) = ops.call(add_w_graph, x, w)

Parameters

shape (Iterable[int]) – The shape of the tensor.
dtype (dtype) – The data type of the tensor.
name (Optional[str]) – The name of the tensor.
by_ref (bool) – If the tensor should be added by reference
meta_shape (Optional[Iterable[int]]) – The meta shape of the tensor.

Returns

The created input tensor.

Return type

Tensor

popxl.graph_output(t)

Mark a tensor as an output in the current graph.

You can use this function when defining a graph to mark an existing tensor in the graph as an output. When you call that graph, it will return that tensor in the parent graph.

Example:

import popxl


def add_w(x):
    w = popxl.graph_input(x.shape, x.dtype, "w")
    y = w + x
    popxl.graph_output(y)


ir = popxl.Ir()
with ir.main_graph:
    w = popxl.variable(1)
    x = popxl.variable(3)
    add_w_graph = ir.create_graph(add_w, x, w)
    (y,) = ops.call(add_w_graph, x, w)

Parameters: t (Tensor) – The graph tensor to mark as an output in the current graph.
Raises: ValueError – If the tensor is not in the current graph.
Return type: None

popxl.variable(data, dtype=None, name=None, downcast=True, replica_grouping=None, retrieval_mode=None)

Create a variable tensor that is initialised with data during graph creation.

This tensor can be used to represent a model weight or any other parameter that can change while running a model.

Must be created in the main graph scope. Example:

import popxl

with popxl.Ir().main_graph:
    a = popxl.variable(0)

Parameters

data (HostScalarTensor) – The data used to initialise the tensor. This can be an np.ndarray, or a value NumPy can use to construct an np.ndarray.
dtype (Optional[dtype]) – The data type of the tensor to be created. If not specified NumPy will infer the data type and downcast to 32 bits if necessary.
name (Optional[str]) – The name of the tensor. Defaults to None.
downcast (bool) – If True and no dtype is provided, 64-bit float/ints will be downcast to 32-bit variants. Defaults to True.
replica_grouping (Optional[ReplicaGrouping]) – The grouping of replicas to use when getting and setting variable values. Generally it makes sense to group replicas together that are guaranteed to agree on value based on the collective operations you add to the IR. Replicas within a group are always initialised with a shared value and by default, when retrieving values from replicas, only one value is returned per group. By default all replicas are in one group.
retrieval_mode (Optional[Literal["one_per_group", "all_replicas"]]) –
One of:
- ”one_per_group”: Return only the first replica’s variable per group.
- ”all_replicas”: Return all replica’s variables in every group.
Defaults to None.

Raises

RuntimeError – If a non-default replica group is used
ValueError – If the tensor is tried initialised within a graph

Returns

The desired variable.

Return type

Variable