19.11. Tensors

class popxl.tensor.Tensor
property T: popxl.tensor.Tensor

Return the tensor transposed with reversed axes.

property T_: popxl.tensor.Tensor

Return the tensor transposed with reversed axes in-place.

__init__()

Representation of a tensor.

copy_to_ipu(destination, source=None)

Copy a tensor to an IPU.

Parameters
  • destination (int) – ID of the IPU to copy the tensor to.

  • source (Optional[int]) – ID of the IPU to copy the tensor from. By default, the source will be taken from the producer of the tensor. If the tensor does not have a producer a source must be provided.

Return type

Tensor

detach()

Return the detached tensor.

Return type

Tensor

detach_()

Return this tensor detached inplace.

Return type

Tensor

diag()

Return the diagonal of a 2d tensor.

Raises

ValueError – If the tensor is not 2-dimensional

Return type

Tensor

property dtype: popxl.dtypes.dtype
flatten()

Return ops.flatten(self).

Return type

Tensor

flatten_()

Return ops.flatten_(self) inplace.

Return type

Tensor

property id: str

Fully-qualified identifier of the tensor (for example, ‘graph1/Gradient___x’).

property in_sync_with_ipu: bool

Check whether the host side buffer data is in sync with the data on the IPU device.

This only applies to variable tensors which can become out of sync if session.weights_from_host and session.weights_to_host are not called. Without a transfer from device to host and visa-versa the buffers and the data on the IPU can fall out of sync after either is updated.

property ipu: int

Return the IPU that the tensor is assigned to.

Raises

UndefinedValue – If the IPU is undefined.

property ir: Ir

Return the Ir that the tensor is a member of.

property location_info
property meta_shape: Tuple[int, ...]

Return the meta shape of the tensor.

The meta shape of the tensor can be used, for example, to store the original tensor shape before replicated tensor sharding was applied.

property name: str

Id of the tensor with the graph scope removed (for example, ‘Gradient___x’).

property nelms: int

Return the total number of elements in this tensor.

property rank: int

Return the total number of dimensions in this tensor.

reshape(shape)

Return ops.reshape(self, shape).

Parameters

shape (Iterable[int]) –

Return type

Tensor

reshape_(shape)

Return ops.reshape_(self, shape) inplace.

Parameters

shape (Iterable[int]) –

Return type

Tensor

property scope: str

Graph scope component of the tensor’s identifier (for example, ‘graph1’).

property shape: Tuple[int, ...]

Return a tuple of the shape of the tensor.

property spec: popxl.tensor.TensorSpec

Return a TensorSpec instance using the properties of this tensor.

strides(shape=None)

Get the strides of the tensor.

The strides of the tensor is the number of bytes to step in each dimension when traversing an array in memory. See numpy.ndarray.strides.

Returns

The strides of the tensor.

Return type

List[int]

property tile_set: Literal['compute', 'io']

Return the tile set (compute or io) that the tensor is assigned to.

Raises

UndefinedValue – If the tile set is undefined.

transpose(permutation=None)

Permute the axes of a tensor.

By default this operation reverses the axes of the tensor.

Parameters

permutation (Optional[Iterable[int]]) – Iterable containing the permutation of [0, N-1] where N is the rank of the tensor. If not provided, the axes will be reversed.

Returns

The transposed tensor.

Return type

Tensor

transpose_(permutation=None)

Permute the axes of a tensor in place.

By default this operation reverses the axes of the tensor.

This is the in-place version of transpose(). The behaviour is the same, but it modifies the tensor in place.

Parameters

permutation (Optional[Tuple[int, ...]]) – Tuple containing the a permutation of [0, N-1] where N is the rank of input t. If not provided, the axes will be reversed.

Returns

The transposed tensor.

Return type

Tensor

class popxl.tensor.Constant

A constant tensor. This tensor cannot change during the runtime of a model.

copy_to_ipu(dst, src)

Return ops.ipu_copy(self, dst, src).

Must provide a src value.

Parameters
Return type

Tensor

class popxl.tensor.Variable

A variable tensor. This tensor can be used to represent a model weight or any other parameter that can change while running a model.

__init__()

Representation of a tensor.

copy_to_ipu(dst, src)

Return ops.ipu_copy(self, dst, src).

Must provide a src value.

Parameters
Return type

Tensor

property replica_grouping: popxl.replica_grouping.ReplicaGrouping

Return the ReplicaGrouping settings for this tensor.

Returns

The ReplicaGrouping object, if set.

Return type

ReplicaGrouping

property retrieval_mode: Literal['one_per_group', 'all_replicas']

Return the string representation of the retrieval mode.

One of:

  • “one_per_group”: Return only the first replica’s variable per group.

  • “all_replicas”: Return all replica’s variables in every group.

Raises

ValueError – If an unsupported VariableRetrievalMode is present on the popart tensor.

Returns

The string representing the retieval_mode.

Return type

Literal[“one_per_group”, “all_replicas”]

property shape_on_host

Return the full tensor shape on the host.

The full shape on host may have an outer group_num dimension on the host, depending on the replica_grouping argument. This function takes the reduced on-replica shape and adds the outer dimension safely (ie. checks if the outer dimension matches an expected outer dimension).

property shape_on_replica

Return the reduced shape on an individual replica.

The full shape on host may have an outer group_num dimension on the host, depending on the replica_grouping argument. This function takes the full shape and removes the outer dimension safely (ie. checks if the outer dimension matches an expected outer dimension).

popxl.constant(data, dtype=None, name=None, downcast=True, log2_scale=None, nan_on_overflow=None)

Return a constant tensor.

A constant tensor that is initialised with data during graph creation.

This tensor cannot change during the runtime of a model. The intended use of this class is when doing operations between popxl.Tensor instances and other types, such as numpy.ndarray objects, numbers, or list or tuples of numbers.

Example:

import popxl

ir = popxl.Ir()
with ir.main_graph:
    a = popxl.constant(0)
    # The `1` will be implicitly converted to a `Constant`.
    b = a + 1
Parameters
  • data (Union[int, float, bool, ndarray, Iterable[Union[int, float, bool]]]) – The data used to initialise the tensor. This can be an np.array, or a value numpy can use to construct an np.ndarray.

  • dtype (Optional[dtype]) – The data type of the tensor to be created. If not specified, NumPy will infer the data type and downcast to 32 bits if necessary. For float8 dtypes automatic inference of dtype is not currently possible, please explicitly specify the dtype.

  • name (Optional[str]) – The name of the tensor. Defaults to None.

  • downcast (bool) – Whether or not to downcast the data to the given dtype. For float8 dtypes automatic inference of dtype is not currently possible, please explicitly specify the dtype.

  • log2_scale (Optional[int]) – The user’s data is multiplied by pow2(log2Scale) before casting. Only applicable when using float8 data types.

  • nan_on_overflow (Optional[bool]) – If True produce NaN when the input values exceed the numeric range of the destination type selected. If False saturate the results. Only applicable when using float8 data types.

Raises

TypeError – If a float8 tensor is passed without a corresponding dtype.

Returns

The constant required.

Return type

Tensor

popxl.variable(data, dtype=None, name=None, downcast=True, replica_grouping=None, retrieval_mode=None, log2_scale=None, nan_on_overflow=None)

Create a variable tensor that is initialised with data during graph creation.

This tensor can be used to represent a model weight or any other parameter that can change while running a model.

Must be created in the main graph scope. Example:

import popxl

with popxl.Ir().main_graph:
    a = popxl.variable(0)

To optimise the host memory used by compilation/runtime, you can pass an np.memmap as the data parameter.

Note, if you do this, PopXL will not internally copy data into a buffer it solely owns, but instead takes joint ownership of the object you passed in. This means it is up to you to not clobber the contents of data. Letting it go out of scope is ok, because PopXL maintains a reference to it.

Sometimes, PopXL has to internally make a copy of data into a buffer with a layout and dtype that it can handle natively. Doing this on an np.memmap would defeat the point of the memory-mapping. Consequently, if data is an np.memmap, in order to avoid this, ALL of the following conditions must hold, or an error is thrown.

  • The data array must be a C-array

  • No downcasting should be required to a dtype natively supported by PopXL

  • The dtype parameter must be None or exactly the same as data.dtype

Furthermore, the implementation of non-const replica groupings requires making copies of various slices within data. Therefore, if you pass a non-const replica grouping with an np.memmap, you will get a warning. See popxl.Ir.replica_grouping_from_assignment() for how to create such groupings.

If the np.memmap is read-only, when using the Session it disables the weights to host program for the entire Ir.

Parameters
  • data (Union[int, float, bool, ndarray, Iterable[Union[int, float, bool]]]) – The data used to initialise the tensor. This can be an np.ndarray, or a value NumPy can use to construct an np.ndarray. This can also be an np.memmap.

  • dtype (Optional[dtype]) – The data type of the tensor to be created. If not specified NumPy will infer the data type and downcast to 32 bits if necessary. For float8 dtypes automatic inference of dtype is not currently possible, please explicitly specify the dtype.

  • name (Optional[str]) – The name of the tensor. Defaults to None.

  • downcast (bool) – If True and no dtype is provided, 64-bit float/ints will be downcast to 32-bit variants. Defaults to True.

  • replica_grouping (Optional[ReplicaGrouping]) – The grouping of replicas to use when getting and setting variable values. Generally it makes sense to group replicas together that are guaranteed to agree on value based on the collective operations you add to the IR. Replicas within a group are always initialised with a shared value and by default, when retrieving values from replicas, only one value is returned per group. By default all replicas are in one group.

  • retrieval_mode (Optional[Literal['one_per_group', 'all_replicas']]) –

    One of:

    • ”one_per_group”: Return only the first replica’s variable per group.

    • ”all_replicas”: Return all replica’s variables in every group.

    Defaults to None.

  • log2_scale (Optional[int]) – If dtype is either popxl.float8_143 or popxl.float8_152 then multiply the incoming data by pow2(log2_scale) before casting.

  • nan_on_overflow (Optional[bool]) – If dtype is either popxl.float8_143 or popxl.float8_152 and this flag is set then replace values that cannot be represented by the requested dtype with np.nan values.

Raises
  • RuntimeError – If a non-default replica group is used

  • ValueError – If the tensor is tried initialised within a graph

  • ValueError – If the data parameter is a np.memmap and any of the

  • following is true

  • - It is not a C-array,

  • - It requires downcasting to a dtype natively supported by PopXL,

  • - The dtype parameter is not None and conflicts with data.dtype.

Returns

The desired variable.

Return type

Variable

popxl.remote_variable(data, remote_buffer, offset=0, dtype=None, name=None, downcast=True, replica_grouping=None, retrieval_mode='one_per_group', log2_scale=None, nan_on_overflow=None)

Create a variable Tensor that is stored in remote memory.

Parameters
  • data (Union[int, float, bool, ndarray, Iterable[Union[int, float, bool]]]) – The data used to initialise the tensor. This can be an np.ndarray, or a value numpy can use to construct an np.ndarray. This can also be an np.memmap, see Variable().

  • remote_buffer (RemoteBuffer) – The remote buffer to store the variable.

  • offset (int) – The offset into the entries of the remote buffer to store the variable. Defaults to 0

  • dtype (Optional[dtype]) – The data type of the tensor to be created, if not specified Numpy will infer the data type and be downcasted to 32 bits if necessary. For float8 dtypes automatic inference of dtype is not currently possible, please explicitly specify the dtype.

  • name (Optional[str]) – The name of the tensor. Defaults to None.

  • downcast (bool) – If no dtype is provided 64 bit float/ints will be downcasted to 32 bit variants. Default to True.

  • replica_grouping (Optional[ReplicaGrouping]) – The grouping of replicas to use when getting and setting variable values. Generally it makes sense to group replicas together that are guaranteed to agree on value based on the collective operations you add to the IR. Replicas within a group are always initialised with a shared value and by default, when retrieving values from replicas, only one value is returned per group. By default all replicas are in one group.

  • retrieval_mode (Optional[Literal['one_per_group', 'all_replicas']]) –

    One of:

    • ”one_per_group”: Return only the first replica’s variable per group, this is the default behaviour.

    • ”all_replicas”: Return all replica’s variables in every group.

    Defaults to “one_per_group”.

  • log2_scale (int) – If dtype is either popxl.float8_143 or popxl.float8_152 then multiply the incoming data by pow2(log2_scale) before casting.

  • nan_on_overflow (bool) – If dtype is either popxl.float8_143 or popxl.float8_152 and this flag is set then replace values that cannot be represented by the requested dtype with np.nan values.

Raises
  • RuntimeError – If a non-default replica group is used.

  • ValueError – If the variable shape or dtype does not match remote buffer’s.

  • ValueError – If the data parameter is a np.memmap and any of the

  • following is true

  • - It is not a C-array,

  • - It requires downcasting to a dtype natively supported by PopXL,

  • - The dtype parameter is not None and conflicts with data.dtype.

Returns

The remote variable.

Return type

Variable

popxl.replica_sharded_buffer(shape, dtype, replica_grouping=None, shard_over=None, entries=1)

Create a RemoteBuffer for use with replicated tensor sharded variables.

The tensor_shape and meta_shape properties of the returned RemoteBuffer will be a flattened one-dimensional shape. This is because the data of sharded tensors in PopXL reside in CBR-rearranged form. This means the original ordering of the data you provide is not preserved inside the RemoteBuffer, and so the original axes are meaningless.

Parameters
  • shape (Tuple[int, ...]) – Shape of the variable tensor (including any replica grouping dimensions).

  • dtype (dtypes.dtype) – Dtype of the variable tensor.

  • replica_grouping (Optional[ReplicaGrouping], optional) – ReplicaGrouping of the variable tensor. Defaults to All replicas.

  • shard_over (Optional[int], optional) – The number of replicas in each replica group to shard over. Defaults to all replicas in the group. Note, when there are multiple instances, this group can span instances. If the replica grouping size is 4, and shard_over is 4, the value of the variable for each group is sharded over all 4 replicas in that group. If the replica grouping size is 4, and shard_over is 2, the value of each group will be sharded once over group members 0 and 1, and once over group members 2 and 3. The replica grouping size must be divisible by shard_over.

  • entries (int) – Number of entries in the RemoteBuffer.

Raises
  • ValueError – If replica_grouping is not None and shard_grouping.stride != the replica_grouping.stride

  • ValueError – If replica_grouping is not None and shard_grouping.group_size <!= the replica_grouping.stride

  • ValueError – If replica_grouping is None and shard_grouping.stride != 1 with the default replica_grouping

  • ValueError – If replica_grouping is None and shard_grouping.group_size <!= the replication_factor

Returns

RemoteBuffer

popxl.remote_replica_sharded_variable(data, remote_buffer, offset=0, dtype=None, name=None, downcast=True, replica_grouping=None, retrieval_mode='one_per_group', log2_scale=None, nan_on_overflow=None)
Create a variable Tensor that is stored in remote memory.

The variable is scattered in equal shards across replicas (replicated tensor sharding (RTS) data parallelism) of the same model/graph. Eliminates redundant data storage when the full (un-sharded) tensor does not need to be present on each IPU. Stores the full tensor in remote memory (usually DDR memory).

Replicated tensors for which each replica needs a full copy, need to be recombined with a replicated AllGather operation.

Fully updated tensors that need to be sharded and/or reduced again require a replicated ReduceScatter operation.

Parameters
  • data (Union[int, float, bool, ndarray, Iterable[Union[int, float, bool]]]) – The data used to initialise the tensor. This can be an np.ndarray, or a value numpy can use to construct an np.ndarray. This can also be an np.memmap, see Variable().

  • remote_buffer (RemoteBuffer) – The handle to the remote buffer.

  • offset (int) – The offset to index the tensor shard in the remote tensor.

  • dtype (Optional[dtype]) – The data type of the tensor to be created, if not specified Numpy will infer the data type and be downcasted to 32 bits if necessary.

  • name (Optional[str]) – The name of the tensor. Defaults to None.

  • downcast (bool) – If no dtype is provided 64 bit float/ints will be downcasted to 32 bit variants. Default to True.

  • replica_grouping (Optional[ReplicaGrouping]) – The grouping of replicas to use when getting and setting variable values. Generally it makes sense to group replicas together that are guaranteed to agree on value based on the collective operations you add to the IR. Replicas within a group are always initialised with a shared value and by default, when retrieving values from replicas, only one value is returned per group. By default all replicas are in one group.

  • retrieval_mode (Optional[Literal['one_per_group', 'all_replicas']]) –

    One of:

    • ”one_per_group”: Return only the first replica’s variable per group, this is the default behaviour.

    • ”all_replicas”: Return all replica’s variables in every group.

    Defaults to “one_per_group”.

  • log2_scale (int) – If dtype is either popxl.float8_143 or popxl.float8_152 then multiply the incoming data by pow2(log2_scale) before casting.

  • nan_on_overflow (bool) – If dtype is either popxl.float8_143 or popxl.float8_152 and this flag is set then replace values that cannot be represented by the requested dtype with np.nan values.

Raises
  • RuntimeError – If a non-default replica group is used.

  • ValueError – If replication has not been enabled.

  • ValueError – If the number of elements of var is not divisible by the number of replicas.

  • ValueError – if the variable shape or dtype does not match remote buffer’s

  • ValueError – If the data parameter is a np.memmap and any of the

  • following is true

  • - It is not a C-array,

  • - It requires downcasting to a dtype natively supported by PopXL,

  • - The dtype parameter is not None and conflicts with data.dtype.

Returns

The remote sharded variable.

Return type

Variable

popxl.replica_sharded_variable(data, dtype=None, name=None, downcast=True, replica_grouping=None, shard_over=None, retrieval_mode='one_per_group', log2_scale=None, nan_on_overflow=None)

Scatter a tensor in equal shards across replicas (data parallelism) of the same model/graph.

Eliminates redundant data storage when the full (un-sharded) tensor does not need to be present on each IPU. Does not store the full tensor in remote memory.

Parameters
  • data (Union[int, float, bool, ndarray, Iterable[Union[int, float, bool]]]) – The data used to initialise the tensor. This can be an np.ndarray, or a value numpy can use to construct an np.ndarray. This can also be an np.memmap, see Variable().

  • dtype (Optional[dtype]) – The data type of the tensor to be created, if not specified Numpy will infer the data type and be downcasted to 32 bits if necessary. For float8 dtypes automatic inference of dtype is not currently possible, please explicitly specify the dtype.

  • name (Optional[str]) – The name of the tensor. Defaults to None.

  • downcast (bool) – If no dtype is provided 64 bit float/ints will be downcasted to 32 bit variants. Default to True.

  • replica_grouping (Optional[ReplicaGrouping]) – The grouping of replicas to use when getting and setting variable values. Generally it makes sense to group replicas together that are guaranteed to agree on value based on the collective operations you add to the IR. Replicas within a group are always initialised with a shared value and by default, when retrieving values from replicas, only one value is returned per group. By default all replicas are in one group.

  • shard_over (Optional[int]) – The number of replicas in each replica group to shard over. Defaults to all replicas in the group. Note, when there are multiple instances, this group can span instances. If the replica grouping size is 4, and shard_over is 4, the value of the variable for each group is sharded over all 4 replicas in that group. If the replica grouping size is 4, and shard_over is 2, the value of each group will be sharded once over group members 0 and 1, and once over group members 2 and 3. The replica grouping size must be divisible by shard_over.

  • retrieval_mode (Optional[Literal["one_per_group", "all_replicas"]]) –

    One of:

    • ”one_per_group”: Return only the first replica’s variable per group, this is the default behaviour.

    • ”all_replicas”: Return all replica’s variables in every group.

    Defaults to “one_per_group”.

  • log2_scale (Optional[int]) – If dtype is either popxl.float8_143 or popxl.float8_152 then multiply the incoming data by pow2(log2_scale) before casting.

  • nan_on_overflow (Optional[bool]) – If dtype is either popxl.float8_143 or popxl.float8_152 and this flag is set then replace values that cannot be represented by the requested dtype with np.nan values.

Raises
  • ValueError – If the data parameter is a np.memmap and any of the

  • following is true

  • - It is not a C-array,

  • - It requires downcasting to a dtype natively supported by PopXL,

  • - The dtype parameter is not None and conflicts with data.dtype.

Returns

A tuple of tensors:

  1. The full variable. This should NOT be used directly. It can be used to interact with Session’s get/set data methods

  2. The sharded variable.

Return type

Tuple[Variable, Tensor]

popxl.graph_input(shape, dtype, name=None, by_ref=False, meta_shape=None)

Create a new input tensor to the current graph.

You can use this function when defining a graph to create a new input tensor. When you call that graph, you will have to pass a tensor to the graph for this input.

Example:

import popxl


def add_w(x):
    w = popxl.graph_input(x.shape, x.dtype, "w")
    return w + x


ir = popxl.Ir()
with ir.main_graph:
    w = popxl.variable(1)
    x = popxl.variable(3)
    add_w_graph = ir.create_graph(add_w, x, w)
    (y,) = ops.call(add_w_graph, x, w)
Parameters
  • shape (Iterable[int]) – The shape of the tensor.

  • dtype (dtype) – The data type of the tensor.

  • name (Optional[str]) – The name of the tensor.

  • by_ref (bool) – If the tensor should be added by reference

  • meta_shape (Optional[Iterable[int]]) – The meta shape of the tensor.

Returns

The created input tensor.

Return type

Tensor

popxl.graph_output(t)

Mark a tensor as an output in the current graph.

You can use this function when defining a graph to mark an existing tensor in the graph as an output. When you call that graph, it will return that tensor in the parent graph.

Example:

import popxl


def add_w(x):
    w = popxl.graph_input(x.shape, x.dtype, "w")
    y = w + x
    popxl.graph_output(y)


ir = popxl.Ir()
with ir.main_graph:
    w = popxl.variable(1)
    x = popxl.variable(3)
    add_w_graph = ir.create_graph(add_w, x, w)
    (y,) = ops.call(add_w_graph, x, w)
Parameters

t (Tensor) – The graph tensor to mark as an output in the current graph.

Raises

ValueError – If the tensor is not in the current graph.

Return type

None

popxl.tensor.HostScalarTensor

Container and scalar types that can be coerced into a Tensor

alias of Union[int, float, bool, ndarray, Iterable[Union[int, float, bool]]]

popxl.tensor.HostTensor

Container types that can be coerced into a Tensor

alias of Union[ndarray, Iterable[Union[int, float, bool]]]

popxl.tensor.ScalarType

Scalar types that can be coerced into a Tensor

alias of Union[int, float, bool]

popxl.tensor.TensorLike

Tensors and types that can be coerced into a Tensor

alias of Union[Tensor, int, float, bool, ndarray, Iterable[Union[int, float, bool]]]