13. Replication
This chapter describes how to use replication in PopXL.
13.1. Graph replication
PopXL has the ability to run multiple copies of your model in parallel. This is called graph replication. Replication is a means of parallelising your inference or training workloads. We call each instance of the graph a replica. The replication factor is the number of replicas in total across all replica groups (see Section 13.2, Replica grouping).
This can be set through :py:attr:~popxl.Ir.replication_factor`.
13.2. Replica grouping
In PopXL you have the ability to define a grouping of replicas when you create variables. This grouping is used when you initialise or read the variable. Typically the variables are initialised and read on a per-group basis. The default behaviour is all replicas belong to one group.
The grouping in question is defined by the ReplicaGrouping
object, instantiated with
replica_grouping()
.
ReplicaGrouping
is initialized with a group_size
and a stride
.
The group_size
parameter sets the number of replicas to be grouped together, and
the stride
parameter sets the replica index difference between two members of a group.
Warning
Limitations:
When
stride == 1
a requirement isreplication_factor
modulogroup_size
equals 0.When
stride != 1
a requirement isstride
timesgroup_size
equalsreplication_factor
.
Tables Table 13.1, Table 13.2, and Table 13.3 shows
some different way group_size
and stride
would part up replicas into groups.
Group |
Replicas |
---|---|
0 |
0, 1, 2, 3 |
1 |
4, 5, 6, 7 |
2 |
8, 9, 10, 11 |
3 |
12, 13, 14, 15 |
Group |
Replicas |
---|---|
0 |
0, 4, 8, 12 |
1 |
1, 5, 9, 13 |
2 |
2, 6, 10, 14 |
3 |
3, 7, 11, 15 |
Group |
Replicas |
---|---|
0 |
0 |
1 |
1 |
2 |
2 |
… |
… |
14 |
14 |
15 |
15 |
13.3. Code examples
Listing 13.1 shows a simple example of the initialization of a few different groupings.
1# Copyright (c) 2022 Graphcore Ltd. All rights reserved.
2import popxl
3import numpy as np
4
5replication_factor = 8
6ir = popxl.Ir(replication=replication_factor)
7
8with ir.main_graph:
9
10 base_shape = [3, 3]
11
12 # Create a tensor with default settings, that is: load same value to all replicas.
13 tensor_1 = popxl.variable(np.ndarray(base_shape))
14
15 # Create a tensor with one variable on each of the replicas:
16 tensor_2 = popxl.variable(
17 np.ndarray([replication_factor] + base_shape),
18 replica_grouping=ir.replica_grouping(group_size=1),
19 )
20
21 # Create a tensor where two and two replicas are grouped together
22 group_size = 2
23 tensor_3 = popxl.variable(
24 np.ndarray([replication_factor // group_size] + base_shape),
25 replica_grouping=ir.replica_grouping(group_size=2),
26 )
27
28 # Create a tensor where tensors are grouped with an orthogonal replica.
29 tensor_3 = popxl.variable(
Listing 13.2 shows an example of using a replica grouping on a remote variable. The IR has two replicas, and each is its own group.
13ir = popxl.Ir(replication=2)
14
15num_groups = 2
16v_h = np.arange(0, num_groups * 32).reshape((num_groups, 32))
17
18rg = ir.replica_grouping(group_size=ir.replication_factor // num_groups)
19
20with ir.main_graph, popxl.in_sequence():
21 remote_buffer = popxl.remote_buffer((32,), dtypes.int32)
22 remote_v = popxl.remote_variable(v_h, remote_buffer, replica_grouping=rg)
23
24 v = ops.remote_load(remote_buffer, 0)
25
26 v += 1
27
28 ops.remote_store(remote_buffer, 0, v)
There are a couple of specfics to note here. Firstly, you need the in_sequence
context as there is no data-flow dependence between the inplace add op and
remote_store
op on the same tensor. Secondly, we manually pass the correct
per-replica shape to popxl.remote_buffer
. This shape does not have the group
dimension.
Note
If you consider v_h
to be the data for a single variable, this is akin to
sharding the variable over two replicas. Actually, unless you need to
AllGather your shards and cannot forgo the CBR optimisation, it is advisable
to just use replica groupings as shown to achieve sharding. This is because
the API is much less brittle with respect to what you can do without errors or
undefined behaviour.
13.4. Retrieval modes
By default, only one replica per group is returned. Usually this is sufficient as all replicas within
a group should be identical. However, if you wish to return all replicas within a group
(for example to test all grouped replicas are the same), set the retrieval_mode
parameter to
"all_replicas"
when constructing your variable:
31 replica_grouping=ir.replica_grouping(stride=4),
32 )
33
34 # Create a tensor which is grouped across sequential replicas (0 and 1, 2 and 3) and
35 # return all the group's variables when requested. The returned array will be of shape
36 # [replication_factor] + base_shape
37 group_size = 2
38 tensor_4 = popxl.variable(
39 np.ndarray([replication_factor // group_size] + base_shape),
40 replica_grouping=ir.replica_grouping(group_size=2),
41 retrieval_mode="all_replicas",
42 )
43
44 # Create a tensor which is grouped across orthogonal replicas (0 and 2, 1 and 3)
45 # and return all the group's variables when requested. The returned array will be of shape
46 # [replication_factor] + base_shape