13. Variables in Streaming Memory

When the IPU memory is insufficient, you can use Remote memory buffers to store and load data in Streaming Memory. The remote buffer is often used for the variable tensors and for the intermediate tensors. In this section, you will see how to use the following in PopXL:

remote buffers
remote variable tensors
replicated tensor sharding variables

13.1. Remote buffers

In PopXL, you can create a remote buffer in the IR by using remote_buffer(tensor_shape, tensor_dtype, entries). The remote buffer contains a number of slots for tensors (entries) with the same shape (tensor_shape) and data type (tensor_dtype).

You can then store a tensor t at the index offset of a remote buffer remote_buffer by using the operation remote_store(remote_buffer, offset, t). To load a tensor at the index offset of the remote buffer remote_buffer, you can use remote_load(remote_buffer, offset, name). You can also name the returned tensor with name.

13.2. Remote variable tensors

Similarly to creating a variable tensor (Section 6.2, Variable tensors), you can also create a variable tensor located in Streaming Memory by using remote_variable():

remote_variable(data: Union[HostTensor, float, int],
                remote_buffer: RemoteBuffer,
                offset: int = 0,
                dtype: Optional[dtypes.dtype] = None,
                name: Optional[str] = None,
                downcast: bool = True)

The returned variable tensor, with value data, is put at the index offset of the remote buffer remote_buffer. The data type and shape of this variable tensor needs to be compatible with those of the remote buffer.

Listing 13.1 shows how to use remote buffers and remote variable tensors. First, a remote buffer, buffer, is created with only one entry. Then a remote variable tensor, remote_x, is created with value 1. This variable is stored at index 0 of the buffer. The value is then loaded from the remote buffer to the IPU variable loaded_x. The value of loaded_x is then updated by y with value 2. The new value of loaded_x is then stored in the same place, index 0 of buffer, as remote_x. You can check the value of remote_x by using session.get_tensor_data(remote_x) after you run a session. Both loaded_x and remote_x have the value 3 in this example.

Listing 13.1 Example to use remote buffer and remote variable

ir = popxl.Ir()
main = ir.main_graph

with main, popxl.in_sequence():
    x = np.array(1).astype(np.int32)

    # Create a remote buffer
    buffer = popxl.remote_buffer(x.shape, dtypes.int32, 1)

    # Create a remote variable and locate it to the buffer at index 0
    remote_x = popxl.remote_variable(x, buffer, 0)

    # Load the remote variable
    loaded_x = ops.remote_load(buffer, 0)

    # Calculation on IPU to update the loaded variable
    y = popxl.variable(2)
    ops.var_updates.accumulate_(loaded_x, y)

    # Store the updated value back to the remote buffer
    ops.remote_store(buffer, 0, loaded_x)

Download remote_variable.py

13.3. Variable tensors for replicated tensor sharding

You can also create a variable tensor for replicated tensor sharding (RTS) that is split in equal shards across replicas. See the PopART User Guide for more information. Together with the allGather operation replicated_all_gather(), RTS avoids storing the same tensor for each replica. The full tensor is stored in Streaming Memory. After the full tensor is updated on the IPU, it needs to be sharded and/or reduced again to each replica by using the reduceScatter operation replicated_reduce_scatter().

In PopXL, each shard of an RTS variable tensor is stored in its own remote buffer. To simplify the use of replication, each shard shares the same representation of its remote buffer. As shown in Fig. 13.1, each buffer has the same tensor type and tensor shape in each shard. The number of shards is the same as the number of replicas.

illustration of rts — Fig. 13.1 An RTS variable tensor in PopXL

Note that you need to have replication enabled to create an RTS variable tensor. You can enable replication by setting replication_factor to > 1 (Section 12.5, Data input shape).

There are two ways to create an RTS variable tensor:

Store the full variable tensor in Streaming Memory. You can access the variable tensor through remote_buffer.

remote_replica_sharded_variable(data: Union[HostTensor, float, int],
                                remote_buffer: RemoteBuffer,
                                offset: int = 0,
                                dtype: Optional[dtypes.dtype] = None,
                                name: Optional[str] = None,
                                downcast: bool = True) -> Variable

remote_replica_sharded_variable() returns an RTS variable tensor that has value data at the index offset of remote buffer remote_buffer. You need to use remote_load() and remote_store() operations to load and store the variable tensor data to and from the IPU.

Store the full variable tensor in Streaming Memory, along with another tensor to represent its shards. The tensor representing the shards can be used without remote_load() and remote_store() since it is automatically loaded from or stored to Streaming Memory.
replica_sharded_variable(data: Union[HostTensor, float, int], dtype: Optional[dtypes.dtype] = None, name: Optional[str] = None, downcast: bool = True) -> Tuple[Variable, Tensor]
In replica_sharded_variable(), the variable tensor is still created with a remote buffer, as for remote_replica_sharded_variable(). The number of entries in this buffer is the number of elements in the data divided by the number of replicas. Each shard is then automatically loaded or stored according to the execution context. However, the remote buffer is hidden to provide an easier interface. You can use remote_replica_sharded_variable() to have more flexibility.

The example in the code tab Remote RTS variable tensor shows how to update the value of a remote RTS variable tensor created with remote_replica_sharded_variable():

The remote RTS variable tensor remote_x is created with a remote buffer buffer.

Remote load tensor remote_x to tensor loaded_x.

Gather the shards of the tensor loaded_x to tensor full_x.

Update the tensor full_x in place by adding tensor y.

Shard tensor full_x across replicas to tensor updated_shard.

Remote store tensor updated_shard to index 0, the same place as tensor remote_x, in the remote buffer buffer.

The example in the code tab RTS variable tensor shows how to update the RTS variable tensor created by replica_sharded_variable(). In this example you can see the remote store and load operations are hidden.

A remote RTS variable tensor remote_x and its shards loaded_x are created without specifying a buffer.

Then, the shards loaded_x are updated by adding the sharded tensor y.

Listing 13.2 Example to use remote RTS variable tensor

ir = popxl.Ir()
ir.replication_factor = 2
ir.num_host_transfers = 1

with ir.main_graph, popxl.in_sequence():
    # Create an RTS variable remote_x with buffer
    x = np.array([1, 2]).astype(np.int32)
    buffer = popxl.remote_buffer((x.size // 2,), dtypes.int32, 1)
    remote_x = popxl.remote_replica_sharded_variable(x, buffer, 0)
    # Load remote_x to loaded_x
    loaded_x = ops.remote_load(buffer, 0)

    # Create a variable y
    y = popxl.variable([3, 4])

    # Add y to the all gathered full x
    full_x = ops.collectives.replicated_all_gather(loaded_x)
    ops.var_updates.accumulate_(full_x, y)

    # Scatter the updated full x to each buffer across replicas
    updated_shard = ops.collectives.replica_sharded_slice(full_x)
    ops.remote_store(buffer, 0, updated_shard)

# Execute the ir
with popxl.Session(ir, "ipu_model") as session:
    outputs = session.run({})
    # Get the updated x value
    final_weight = session.get_tensor_data(remote_x)

Download remote_rts_var.py

Listing 13.3 Example to use RTS variable tensor

ir = popxl.Ir()
ir.replication_factor = 2
ir.num_host_transfers = 1

with ir.main_graph, popxl.in_sequence():
    # Create an RTS variable remote_x and its shards loaded_x
    x = np.array([1, 2]).astype(np.int32)
    remote_x, loaded_x = popxl.replica_sharded_variable(x, dtypes.int32)

    # Create a variable and shard it across replicas
    y = popxl.variable([3, 4])
    sharded_y = ops.collectives.replica_sharded_slice(y)

    # Add each shard of y to each shard of x
    ops.var_updates.accumulate_(loaded_x, sharded_y)

# Execute the ir
with popxl.Session(ir, "ipu_model") as session:
    outputs = session.run({})

# Get the updated x value
final_weight = session.get_tensor_data(remote_x)

Download rts_var.py

Search help

13. Variables in Streaming Memory

13.1. Remote buffers

13.2. Remote variable tensors

13.3. Variable tensors for replicated tensor sharding