19.11. Tensors
- class popxl.tensor.Tensor
- property T: popxl.tensor.Tensor
Return the tensor transposed with reversed axes.
- property T_: popxl.tensor.Tensor
Return the tensor transposed with reversed axes in-place.
- __init__()
Representation of a tensor.
- copy_to_ipu(destination, source=None)
Copy a tensor to an IPU.
- diag()
Return the diagonal of a 2d tensor.
- Raises
ValueError – If the tensor is not 2-dimensional
- Return type
- property dtype: popxl.dtypes.dtype
- property in_sync_with_ipu: bool
Check whether the host side buffer data is in sync with the data on the IPU device.
This only applies to variable tensors which can become out of sync if
session.weights_from_host
andsession.weights_to_host
are not called. Without a transfer from device to host and visa-versa the buffers and the data on the IPU can fall out of sync after either is updated.
- property ipu: int
Return the IPU that the tensor is assigned to.
- Raises
UndefinedValue – If the IPU is undefined.
- property location_info
- property meta_shape: Tuple[int, ...]
Return the meta shape of the tensor.
The meta shape of the tensor can be used, for example, to store the original tensor shape before replicated tensor sharding was applied.
- reshape(shape)
Return
ops.reshape(self, shape)
.
- reshape_(shape)
Return ops.reshape_(self, shape) inplace.
- property spec: popxl.tensor.TensorSpec
Return a
TensorSpec
instance using the properties of this tensor.
- strides(shape=None)
Get the strides of the tensor.
The strides of the tensor is the number of bytes to step in each dimension when traversing an array in memory. See
numpy.ndarray.strides
.- Returns
The strides of the tensor.
- Return type
List[int]
- property tile_set: Literal['compute', 'io']
Return the tile set (
compute
orio
) that the tensor is assigned to.- Raises
UndefinedValue – If the tile set is undefined.
- transpose(permutation=None)
Permute the axes of a tensor.
By default this operation reverses the axes of the tensor.
- transpose_(permutation=None)
Permute the axes of a tensor in place.
By default this operation reverses the axes of the tensor.
This is the in-place version of
transpose()
. The behaviour is the same, but it modifies the tensor in place.
- class popxl.tensor.Constant
A constant tensor. This tensor cannot change during the runtime of a model.
- class popxl.tensor.Variable
A variable tensor. This tensor can be used to represent a model weight or any other parameter that can change while running a model.
- __init__()
Representation of a tensor.
- copy_to_ipu(dst, src)
Return
ops.ipu_copy(self, dst, src)
.Must provide a src value.
- property replica_grouping: popxl.replica_grouping.ReplicaGrouping
Return the ReplicaGrouping settings for this tensor.
- Returns
The ReplicaGrouping object, if set.
- Return type
- property retrieval_mode: Literal['one_per_group', 'all_replicas']
Return the string representation of the retrieval mode.
One of:
“one_per_group”: Return only the first replica’s variable per group.
“all_replicas”: Return all replica’s variables in every group.
- Raises
ValueError – If an unsupported VariableRetrievalMode is present on the popart tensor.
- Returns
The string representing the retieval_mode.
- Return type
Literal[“one_per_group”, “all_replicas”]
- property shape_on_host
Return the full tensor shape on the host.
The full shape on host may have an outer group_num dimension on the host, depending on the replica_grouping argument. This function takes the reduced on-replica shape and adds the outer dimension safely (ie. checks if the outer dimension matches an expected outer dimension).
- property shape_on_replica
Return the reduced shape on an individual replica.
The full shape on host may have an outer group_num dimension on the host, depending on the replica_grouping argument. This function takes the full shape and removes the outer dimension safely (ie. checks if the outer dimension matches an expected outer dimension).
- popxl.constant(data, dtype=None, name=None, downcast=True, log2_scale=None, nan_on_overflow=None)
Return a constant tensor.
A constant tensor that is initialised with data during graph creation.
This tensor cannot change during the runtime of a model. The intended use of this class is when doing operations between
popxl.Tensor
instances and other types, such asnumpy.ndarray
objects, numbers, or list or tuples of numbers.Example:
import popxl ir = popxl.Ir() with ir.main_graph: a = popxl.constant(0) # The `1` will be implicitly converted to a `Constant`. b = a + 1
- Parameters
data (Union[int, float, bool, ndarray, Iterable[Union[int, float, bool]]]) – The data used to initialise the tensor. This can be an np.array, or a value numpy can use to construct an np.ndarray.
dtype (Optional[dtype]) – The data type of the tensor to be created. If not specified, NumPy will infer the data type and downcast to 32 bits if necessary. For float8 dtypes automatic inference of dtype is not currently possible, please explicitly specify the dtype.
name (Optional[str]) – The name of the tensor. Defaults to
None
.downcast (bool) – Whether or not to downcast the data to the given dtype. For float8 dtypes automatic inference of dtype is not currently possible, please explicitly specify the dtype.
log2_scale (Optional[int]) – The user’s data is multiplied by
pow2(log2Scale)
before casting. Only applicable when using float8 data types.nan_on_overflow (Optional[bool]) – If True produce NaN when the input values exceed the numeric range of the destination type selected. If False saturate the results. Only applicable when using float8 data types.
- Raises
TypeError – If a float8 tensor is passed without a corresponding dtype.
- Returns
The constant required.
- Return type
- popxl.variable(data, dtype=None, name=None, downcast=True, replica_grouping=None, retrieval_mode=None, log2_scale=None, nan_on_overflow=None)
Create a variable tensor that is initialised with data during graph creation.
This tensor can be used to represent a model weight or any other parameter that can change while running a model.
Must be created in the main graph scope. Example:
import popxl with popxl.Ir().main_graph: a = popxl.variable(0)
To optimise the host memory used by compilation/runtime, you can pass an
np.memmap
as thedata
parameter.Note, if you do this, PopXL will not internally copy
data
into a buffer it solely owns, but instead takes joint ownership of the object you passed in. This means it is up to you to not clobber the contents ofdata
. Letting it go out of scope is ok, because PopXL maintains a reference to it.Sometimes, PopXL has to internally make a copy of
data
into a buffer with a layout and dtype that it can handle natively. Doing this on annp.memmap
would defeat the point of the memory-mapping. Consequently, ifdata
is annp.memmap
, in order to avoid this, ALL of the following conditions must hold, or an error is thrown.The
data
array must be a C-arrayNo downcasting should be required to a dtype natively supported by PopXL
The
dtype
parameter must beNone
or exactly the same asdata.dtype
Furthermore, the implementation of non-const replica groupings requires making copies of various slices within
data
. Therefore, if you pass a non-const replica grouping with annp.memmap
, you will get a warning. Seepopxl.Ir.replica_grouping_from_assignment()
for how to create such groupings.If the
np.memmap
is read-only, when using theSession
it disables the weights to host program for the entireIr
.- Parameters
data (Union[int, float, bool, ndarray, Iterable[Union[int, float, bool]]]) – The data used to initialise the tensor. This can be an np.ndarray, or a value NumPy can use to construct an np.ndarray. This can also be an np.memmap.
dtype (Optional[dtype]) – The data type of the tensor to be created. If not specified NumPy will infer the data type and downcast to 32 bits if necessary. For float8 dtypes automatic inference of dtype is not currently possible, please explicitly specify the dtype.
name (Optional[str]) – The name of the tensor. Defaults to
None
.downcast (bool) – If True and no dtype is provided, 64-bit float/ints will be downcast to 32-bit variants. Defaults to True.
replica_grouping (Optional[ReplicaGrouping]) – The grouping of replicas to use when getting and setting variable values. Generally it makes sense to group replicas together that are guaranteed to agree on value based on the collective operations you add to the IR. Replicas within a group are always initialised with a shared value and by default, when retrieving values from replicas, only one value is returned per group. By default all replicas are in one group.
retrieval_mode (Optional[Literal['one_per_group', 'all_replicas']]) –
One of:
”one_per_group”: Return only the first replica’s variable per group.
”all_replicas”: Return all replica’s variables in every group.
Defaults to None.
log2_scale (Optional[int]) – If dtype is either popxl.float8_143 or popxl.float8_152 then multiply the incoming data by pow2(log2_scale) before casting.
nan_on_overflow (Optional[bool]) – If dtype is either popxl.float8_143 or popxl.float8_152 and this flag is set then replace values that cannot be represented by the requested dtype with np.nan values.
- Raises
RuntimeError – If a non-default replica group is used
ValueError – If the tensor is tried initialised within a graph
ValueError – If the
data
parameter is anp.memmap
and any of thefollowing is true –
- It is not a C-array, –
- It requires downcasting to a dtype natively supported by PopXL, –
- The dtype parameter is not None and conflicts with data.dtype. –
- Returns
The desired variable.
- Return type
- popxl.remote_variable(data, remote_buffer, offset=0, dtype=None, name=None, downcast=True, replica_grouping=None, retrieval_mode='one_per_group', log2_scale=None, nan_on_overflow=None)
Create a variable Tensor that is stored in remote memory.
- Parameters
data (Union[int, float, bool, ndarray, Iterable[Union[int, float, bool]]]) – The data used to initialise the tensor. This can be an np.ndarray, or a value numpy can use to construct an np.ndarray. This can also be an np.memmap, see
Variable()
.remote_buffer (RemoteBuffer) – The remote buffer to store the variable.
offset (int) – The offset into the entries of the remote buffer to store the variable. Defaults to 0
dtype (Optional[dtype]) – The data type of the tensor to be created, if not specified Numpy will infer the data type and be downcasted to 32 bits if necessary. For float8 dtypes automatic inference of dtype is not currently possible, please explicitly specify the dtype.
name (Optional[str]) – The name of the tensor. Defaults to
None
.downcast (bool) – If no dtype is provided 64 bit float/ints will be downcasted to 32 bit variants. Default to True.
replica_grouping (Optional[ReplicaGrouping]) – The grouping of replicas to use when getting and setting variable values. Generally it makes sense to group replicas together that are guaranteed to agree on value based on the collective operations you add to the IR. Replicas within a group are always initialised with a shared value and by default, when retrieving values from replicas, only one value is returned per group. By default all replicas are in one group.
retrieval_mode (Optional[Literal['one_per_group', 'all_replicas']]) –
One of:
”one_per_group”: Return only the first replica’s variable per group, this is the default behaviour.
”all_replicas”: Return all replica’s variables in every group.
Defaults to “one_per_group”.
log2_scale (int) – If dtype is either popxl.float8_143 or popxl.float8_152 then multiply the incoming data by pow2(log2_scale) before casting.
nan_on_overflow (bool) – If dtype is either popxl.float8_143 or popxl.float8_152 and this flag is set then replace values that cannot be represented by the requested dtype with np.nan values.
- Raises
RuntimeError – If a non-default replica group is used.
ValueError – If the variable shape or dtype does not match remote buffer’s.
ValueError – If the
data
parameter is anp.memmap
and any of thefollowing is true –
- It is not a C-array, –
- It requires downcasting to a dtype natively supported by PopXL, –
- The dtype parameter is not None and conflicts with data.dtype. –
- Returns
The remote variable.
- Return type
- popxl.replica_sharded_buffer(shape, dtype, replica_grouping=None, shard_over=None, entries=1)
Create a RemoteBuffer for use with replicated tensor sharded variables.
The tensor_shape and meta_shape properties of the returned RemoteBuffer will be a flattened one-dimensional shape. This is because the data of sharded tensors in PopXL reside in CBR-rearranged form. This means the original ordering of the data you provide is not preserved inside the RemoteBuffer, and so the original axes are meaningless.
- Parameters
shape (Tuple[int, ...]) – Shape of the variable tensor (including any replica grouping dimensions).
dtype (dtypes.dtype) – Dtype of the variable tensor.
replica_grouping (Optional[ReplicaGrouping], optional) – ReplicaGrouping of the variable tensor. Defaults to All replicas.
shard_over (Optional[int], optional) – The number of replicas in each replica group to shard over. Defaults to all replicas in the group. Note, when there are multiple instances, this group can span instances. If the replica grouping size is 4, and shard_over is 4, the value of the variable for each group is sharded over all 4 replicas in that group. If the replica grouping size is 4, and shard_over is 2, the value of each group will be sharded once over group members 0 and 1, and once over group members 2 and 3. The replica grouping size must be divisible by shard_over.
entries (int) – Number of entries in the RemoteBuffer.
- Raises
ValueError – If replica_grouping is not None and shard_grouping.stride != the replica_grouping.stride
ValueError – If replica_grouping is not None and shard_grouping.group_size <!= the replica_grouping.stride
ValueError – If replica_grouping is None and shard_grouping.stride != 1 with the default replica_grouping
ValueError – If replica_grouping is None and shard_grouping.group_size <!= the replication_factor
- Returns
RemoteBuffer
- popxl.remote_replica_sharded_variable(data, remote_buffer, offset=0, dtype=None, name=None, downcast=True, replica_grouping=None, retrieval_mode='one_per_group', log2_scale=None, nan_on_overflow=None)
- Create a variable Tensor that is stored in remote memory.
The variable is scattered in equal shards across replicas (replicated tensor sharding (RTS) data parallelism) of the same model/graph. Eliminates redundant data storage when the full (un-sharded) tensor does not need to be present on each IPU. Stores the full tensor in remote memory (usually DDR memory).
Replicated tensors for which each replica needs a full copy, need to be recombined with a replicated AllGather operation.
Fully updated tensors that need to be sharded and/or reduced again require a replicated ReduceScatter operation.
- Parameters
data (Union[int, float, bool, ndarray, Iterable[Union[int, float, bool]]]) – The data used to initialise the tensor. This can be an np.ndarray, or a value numpy can use to construct an np.ndarray. This can also be an np.memmap, see
Variable()
.remote_buffer (RemoteBuffer) – The handle to the remote buffer.
offset (int) – The offset to index the tensor shard in the remote tensor.
dtype (Optional[dtype]) – The data type of the tensor to be created, if not specified Numpy will infer the data type and be downcasted to 32 bits if necessary.
name (Optional[str]) – The name of the tensor. Defaults to
None
.downcast (bool) – If no dtype is provided 64 bit float/ints will be downcasted to 32 bit variants. Default to True.
replica_grouping (Optional[ReplicaGrouping]) – The grouping of replicas to use when getting and setting variable values. Generally it makes sense to group replicas together that are guaranteed to agree on value based on the collective operations you add to the IR. Replicas within a group are always initialised with a shared value and by default, when retrieving values from replicas, only one value is returned per group. By default all replicas are in one group.
retrieval_mode (Optional[Literal['one_per_group', 'all_replicas']]) –
One of:
”one_per_group”: Return only the first replica’s variable per group, this is the default behaviour.
”all_replicas”: Return all replica’s variables in every group.
Defaults to “one_per_group”.
log2_scale (int) – If dtype is either popxl.float8_143 or popxl.float8_152 then multiply the incoming data by pow2(log2_scale) before casting.
nan_on_overflow (bool) – If dtype is either popxl.float8_143 or popxl.float8_152 and this flag is set then replace values that cannot be represented by the requested dtype with np.nan values.
- Raises
RuntimeError – If a non-default replica group is used.
ValueError – If replication has not been enabled.
ValueError – If the number of elements of
var
is not divisible by the number of replicas.ValueError – if the variable shape or dtype does not match remote buffer’s
ValueError – If the
data
parameter is anp.memmap
and any of thefollowing is true –
- It is not a C-array, –
- It requires downcasting to a dtype natively supported by PopXL, –
- The dtype parameter is not None and conflicts with data.dtype. –
- Returns
The remote sharded variable.
- Return type
- popxl.replica_sharded_variable(data, dtype=None, name=None, downcast=True, replica_grouping=None, shard_over=None, retrieval_mode='one_per_group', log2_scale=None, nan_on_overflow=None)
Scatter a tensor in equal shards across replicas (data parallelism) of the same model/graph.
Eliminates redundant data storage when the full (un-sharded) tensor does not need to be present on each IPU. Does not store the full tensor in remote memory.
- Parameters
data (Union[int, float, bool, ndarray, Iterable[Union[int, float, bool]]]) – The data used to initialise the tensor. This can be an np.ndarray, or a value numpy can use to construct an np.ndarray. This can also be an np.memmap, see
Variable()
.dtype (Optional[dtype]) – The data type of the tensor to be created, if not specified Numpy will infer the data type and be downcasted to 32 bits if necessary. For float8 dtypes automatic inference of dtype is not currently possible, please explicitly specify the dtype.
name (Optional[str]) – The name of the tensor. Defaults to
None
.downcast (bool) – If no dtype is provided 64 bit float/ints will be downcasted to 32 bit variants. Default to True.
replica_grouping (Optional[ReplicaGrouping]) – The grouping of replicas to use when getting and setting variable values. Generally it makes sense to group replicas together that are guaranteed to agree on value based on the collective operations you add to the IR. Replicas within a group are always initialised with a shared value and by default, when retrieving values from replicas, only one value is returned per group. By default all replicas are in one group.
shard_over (Optional[int]) – The number of replicas in each replica group to shard over. Defaults to all replicas in the group. Note, when there are multiple instances, this group can span instances. If the replica grouping size is 4, and shard_over is 4, the value of the variable for each group is sharded over all 4 replicas in that group. If the replica grouping size is 4, and shard_over is 2, the value of each group will be sharded once over group members 0 and 1, and once over group members 2 and 3. The replica grouping size must be divisible by shard_over.
retrieval_mode (Optional[Literal["one_per_group", "all_replicas"]]) –
One of:
”one_per_group”: Return only the first replica’s variable per group, this is the default behaviour.
”all_replicas”: Return all replica’s variables in every group.
Defaults to “one_per_group”.
log2_scale (Optional[int]) – If dtype is either popxl.float8_143 or popxl.float8_152 then multiply the incoming data by pow2(log2_scale) before casting.
nan_on_overflow (Optional[bool]) – If dtype is either popxl.float8_143 or popxl.float8_152 and this flag is set then replace values that cannot be represented by the requested dtype with np.nan values.
- Raises
ValueError – If the
data
parameter is anp.memmap
and any of thefollowing is true –
- It is not a C-array, –
- It requires downcasting to a dtype natively supported by PopXL, –
- The dtype parameter is not None and conflicts with data.dtype. –
- Returns
A tuple of tensors:
The full variable. This should NOT be used directly. It can be used to interact with Session’s get/set data methods
The sharded variable.
- Return type
- popxl.graph_input(shape, dtype, name=None, by_ref=False, meta_shape=None)
Create a new input tensor to the current graph.
You can use this function when defining a graph to create a new input tensor. When you call that graph, you will have to pass a tensor to the graph for this input.
Example:
import popxl def add_w(x): w = popxl.graph_input(x.shape, x.dtype, "w") return w + x ir = popxl.Ir() with ir.main_graph: w = popxl.variable(1) x = popxl.variable(3) add_w_graph = ir.create_graph(add_w, x, w) (y,) = ops.call(add_w_graph, x, w)
- Parameters
- Returns
The created input tensor.
- Return type
- popxl.graph_output(t)
Mark a tensor as an output in the current graph.
You can use this function when defining a graph to mark an existing tensor in the graph as an output. When you call that graph, it will return that tensor in the parent graph.
Example:
import popxl def add_w(x): w = popxl.graph_input(x.shape, x.dtype, "w") y = w + x popxl.graph_output(y) ir = popxl.Ir() with ir.main_graph: w = popxl.variable(1) x = popxl.variable(3) add_w_graph = ir.create_graph(add_w, x, w) (y,) = ops.call(add_w_graph, x, w)
- Parameters
t (Tensor) – The graph tensor to mark as an output in the current graph.
- Raises
ValueError – If the tensor is not in the current graph.
- Return type
None
- popxl.tensor.HostScalarTensor
Container and scalar types that can be coerced into a Tensor
alias of
Union
[int
,float
,bool
,ndarray
,Iterable
[Union
[int
,float
,bool
]]]
- popxl.tensor.HostTensor
Container types that can be coerced into a Tensor
- popxl.tensor.ScalarType
Scalar types that can be coerced into a Tensor