17.11. Tensors
- class popxl.Tensor
- property T: popxl.tensor.Tensor
Return the tensor transposed with reversed axes.
- property T_: popxl.tensor.Tensor
Return the tensor transposed with reversed axes in-place.
- __init__()
Representation of a tensor.
- copy_to_ipu(destination, source=None)
Copy a tensor to an IPU.
- property dtype: popxl.dtypes.dtype
- property in_sync_with_ipu: bool
Check whether the host side buffer data is in sync with the data on the IPU device.
This only applies to variable tensors which can become out of sync if
session.weights_from_host
andsession.weights_to_host
are not called. Without a transfer from device to host and visa-versa the buffers and the data on the IPU can fall out of sync after either is updated.
- property ipu: int
Return the IPU that the tensor is assigned to.
- Raises
UndefinedValue – If the IPU is undefined.
- property location_info
- property meta_shape: Tuple[int, ...]
Return the meta shape of the tensor.
The meta shape of the tensor can be used, for example, to store the original tensor shape before replicated tensor sharding was applied.
- reshape(shape)
Return
ops.reshape(self, shape)
.
- reshape_(shape)
Return ops.reshape_(self, shape) inplace.
- property spec: popxl.tensor.TensorSpec
Return a
TensorSpec
instance using the properties of this tensor.
- strides(shape=None)
Get the strides of the tensor.
The strides of the tensor is the number of bytes to step in each dimension when traversing an array in memory. See
numpy.ndarray.strides
.- Returns
The strides of the tensor.
- Return type
List[int]
- property tile_set: Literal['compute', 'io']
Return the tile set (
compute
orio
) that the tensor is assigned to.- Raises
UndefinedValue – If the tile set is undefined.
- transpose(permutation=None)
Permute the axes of a tensor.
By default this operation reverses the axes of the tensor.
- transpose_(permutation=None)
Permute the axes of a tensor in place.
By default this operation reverses the axes of the tensor.
This is the in-place version of
transpose()
. The behaviour is the same, but it modifies the tensor in place.
- popxl.constant(data, dtype=None, name=None, downcast=True)
Return a constant tensor.
A constant tensor that is initialised with data during graph creation.
This tensor cannot change during the runtime of a model. The intended use of this class is when doing operations between
popxl.Tensor
instances and other types, such asnumpy.ndarray
objects, numbers, or list or tuples of numbers.Example:
import popxl ir = popxl.Ir() with ir.main_graph: a = popxl.constant(0) # The `1` will be implicitly converted to a `Constant`. b = a + 1
- Parameters
data (HostScalarTensor) – The data used to initialise the tensor. This can be an np.array, or a value numpy can use to construct an np.ndarray.
dtype (Optional[dtype]) – The data type of the tensor to be created. If not specified, NumPy will infer the data type and downcast to 32 bits if necessary.
name (Optional[str]) – The name of the tensor. Defaults to
None
.downcast (bool) – Whether or not to downcast the data to the given dtype.
- Return type
Constant
- popxl.remote_replica_sharded_variable(data, remote_buffer, offset=0, dtype=None, name=None, downcast=True, replica_grouping=None, retrieval_mode='one_per_group')
- Create a variable Tensor that is stored in remote memory.
The variable is scattered in equal shards across replicas (replicated tensor sharding (RTS) data parallelism) of the same model/graph. Eliminates redundant data storage when the full (un-sharded) tensor does not need to be present on each IPU. Stores the full tensor in remote memory (usually DDR memory).
Replicated tensors for which each replica needs a full copy, need to be recombined with a replicated AllGather operation.
Fully updated tensors that need to be sharded and/or reduced again require a replicated ReduceScatter operation.
- Parameters
data (HostScalarTensor) – The data used to initialise the tensor. This can be an np.ndarray, or a value numpy can use to construct an np.ndarray.
remote_buffer (RemoteBuffer) – The handle to the remote buffer.
offset (int) – The offset to index the tensor shard in the remote tensor.
dtype (Optional[dtype]) – The data type of the tensor to be created, if not specified Numpy will infer the data type and be downcasted to 32 bits if necessary.
name (Optional[str]) – The name of the tensor. Defaults to
None
.downcast (bool) – If no dtype is provided 64 bit float/ints will be downcasted to 32 bit variants. Default to True.
replica_grouping (Optional[ReplicaGrouping]) – The grouping of replicas to use when getting and setting variable values. Generally it makes sense to group replicas together that are guaranteed to agree on value based on the collective operations you add to the IR. Replicas within a group are always initialised with a shared value and by default, when retrieving values from replicas, only one value is returned per group. By default all replicas are in one group.
retrieval_mode (Optional[Literal["one_per_group", "all_replicas"]]) –
One of:
”one_per_group”: Return only the first replica’s variable per group, this is the default behaviour.
”all_replicas”: Return all replica’s variables in every group.
Defaults to “one_per_group”.
- Raises
RuntimeError – If a non-default replica group is used.
ValueError – If replication has not been enabled.
ValueError – If the number of elements of
var
is not divisible by the number of replicas.ValueError – if the variable shape or dtype does not match remote buffer’s
- Returns
The remote sharded variable.
- Return type
Variable
- popxl.remote_variable(data, remote_buffer, offset=0, dtype=None, name=None, downcast=True, replica_grouping=None, retrieval_mode='one_per_group')
Create a variable Tensor that is stored in remote memory.
- Parameters
data (HostScalarTensor) – The data used to initialise the tensor. This can be an np.ndarray, or a value numpy can use to construct an np.ndarray.
remote_buffer (RemoteBuffer) – The remote buffer to store the variable.
offset (int) – The offset into the entries of the remote buffer to store the variable. Defaults to 0
dtype (Optional[dtype]) – The data type of the tensor to be created, if not specified Numpy will infer the data type and be downcasted to 32 bits if necessary.
name (Optional[str]) – The name of the tensor. Defaults to
None
.downcast (bool) – If no dtype is provided 64 bit float/ints will be downcasted to 32 bit variants. Default to True.
replica_grouping (Optional[ReplicaGrouping]) – The grouping of replicas to use when getting and setting variable values. Generally it makes sense to group replicas together that are guaranteed to agree on value based on the collective operations you add to the IR. Replicas within a group are always initialised with a shared value and by default, when retrieving values from replicas, only one value is returned per group. By default all replicas are in one group.
retrieval_mode (Optional[Literal["one_per_group", "all_replicas"]]) –
One of:
”one_per_group”: Return only the first replica’s variable per group, this is the default behaviour.
”all_replicas”: Return all replica’s variables in every group.
Defaults to “one_per_group”.
- Raises
RuntimeError – If a non-default replica group is used.
ValueError – If the variable shape or dtype does not match remote buffer’s.
- Returns
The remote variable.
- Return type
Variable
- popxl.replica_sharded_variable(data, dtype=None, name=None, downcast=True, replica_grouping=None, shard_over=None, retrieval_mode='one_per_group')
Scatter a tensor in equal shards across replicas (data parallelism) of the same model/graph.
Eliminates redundant data storage when the full (un-sharded) tensor does not need to be present on each IPU. Does not store the full tensor in remote memory.
- Parameters
data (HostScalarTensor) – The data used to initialise the tensor. This can be an np.ndarray, or a value numpy can use to construct an np.ndarray.
dtype (Optional[dtype]) – The data type of the tensor to be created, if not specified Numpy will infer the data type and be downcasted to 32 bits if necessary.
name (Optional[str]) – The name of the tensor. Defaults to
None
.downcast (bool) – If no dtype is provided 64 bit float/ints will be downcasted to 32 bit variants. Default to True.
replica_grouping (Optional[ReplicaGrouping]) – The grouping of replicas to use when getting and setting variable values. Generally it makes sense to group replicas together that are guaranteed to agree on value based on the collective operations you add to the IR. Replicas within a group are always initialised with a shared value and by default, when retrieving values from replicas, only one value is returned per group. By default all replicas are in one group.
shard_over (Optional[int], optional) – The number of replicas in each replica group to shard over. Defaults to all replicas in the group. Note, when there are multiple instances, this group can span instances. If the replica grouping size is 4, and shard_over is 4, the value of the variable for each group is sharded over all 4 replicas in that group. If the replica grouping size is 4, and shard_over is 2, the value of each group will be sharded once over group members 0 and 1, and once over group members 2 and 3. The replica grouping size must be divisible by shard_over.
retrieval_mode (Optional[Literal["one_per_group", "all_replicas"]]) –
One of:
”one_per_group”: Return only the first replica’s variable per group, this is the default behaviour.
”all_replicas”: Return all replica’s variables in every group.
Defaults to “one_per_group”.
- Returns
A tuple of tensors:
The full variable. This should NOT be used directly. It can be used to interact with Session’s get/set data methods
The sharded variable.
- Return type
Tuple[Variable, Tensor]
- popxl.graph_input(shape, dtype, name=None, by_ref=False, meta_shape=None)
Create a new input tensor to the current graph.
You can use this function when defining a graph to create a new input tensor. When you call that graph, you will have to pass a tensor to the graph for this input.
Example:
import popxl def add_w(x): w = popxl.graph_input(x.shape, x.dtype, "w") return w + x ir = popxl.Ir() with ir.main_graph: w = popxl.variable(1) x = popxl.variable(3) add_w_graph = ir.create_graph(add_w, x, w) (y,) = ops.call(add_w_graph, x, w)
- Parameters
- Returns
The created input tensor.
- Return type
- popxl.graph_output(t)
Mark a tensor as an output in the current graph.
You can use this function when defining a graph to mark an existing tensor in the graph as an output. When you call that graph, it will return that tensor in the parent graph.
Example:
import popxl def add_w(x): w = popxl.graph_input(x.shape, x.dtype, "w") y = w + x popxl.graph_output(y) ir = popxl.Ir() with ir.main_graph: w = popxl.variable(1) x = popxl.variable(3) add_w_graph = ir.create_graph(add_w, x, w) (y,) = ops.call(add_w_graph, x, w)
- Parameters
t (Tensor) – The graph tensor to mark as an output in the current graph.
- Raises
ValueError – If the tensor is not in the current graph.
- Return type
None
- popxl.variable(data, dtype=None, name=None, downcast=True, replica_grouping=None, retrieval_mode=None)
Create a variable tensor that is initialised with data during graph creation.
This tensor can be used to represent a model weight or any other parameter that can change while running a model.
Must be created in the main graph scope. Example:
import popxl with popxl.Ir().main_graph: a = popxl.variable(0)
- Parameters
data (HostScalarTensor) – The data used to initialise the tensor. This can be an np.ndarray, or a value NumPy can use to construct an np.ndarray.
dtype (Optional[dtype]) – The data type of the tensor to be created. If not specified NumPy will infer the data type and downcast to 32 bits if necessary.
name (Optional[str]) – The name of the tensor. Defaults to
None
.downcast (bool) – If True and no dtype is provided, 64-bit float/ints will be downcast to 32-bit variants. Defaults to True.
replica_grouping (Optional[ReplicaGrouping]) – The grouping of replicas to use when getting and setting variable values. Generally it makes sense to group replicas together that are guaranteed to agree on value based on the collective operations you add to the IR. Replicas within a group are always initialised with a shared value and by default, when retrieving values from replicas, only one value is returned per group. By default all replicas are in one group.
retrieval_mode (Optional[Literal["one_per_group", "all_replicas"]]) –
One of:
”one_per_group”: Return only the first replica’s variable per group.
”all_replicas”: Return all replica’s variables in every group.
Defaults to None.
- Raises
RuntimeError – If a non-default replica group is used
ValueError – If the tensor is tried initialised within a graph
- Returns
The desired variable.
- Return type
Variable