8. Graphcore Communication Library (GCL)¶
The Graphcore Communication Library (GCL) enables high-performance scale-out for IPU systems. GCL utilises the IPU´s built-in hardware support for transferring data directly from the the memory of one IPU to another via the IPU-Fabric. The result is a low-overhead, high-throughput communication library, specifically targeted at systems such as the IPU-POD128.
The GCL library is used by other frameworks, such as TensorFlow, to implement functions such as data-parallel gradient reductions using all-reduce.
8.1. Example¶
A full example of an all-reduce operation using GCL is available. The graph creation code is shown in Listing 8.1.
// Main program
program::Sequence prog;
prog.add(program::Copy(inStream, data));
gcl::allReduceInPlaceCrossReplica(graph, data,
popops::CollectiveOperator::ADD, prog);
prog.add(program::Copy(data, outStream));
You can download the complete code and compile it with the command:
$ g++ gcl_allreduce_example.cpp -lpoplar -lgcl_ct -lpopops \
-o gcl_allreduce_example && ./gcl_allreduce_example
Download gcl_allreduce_example.cpp
For more information see the GCL API reference.
8.2. Topologies¶
8.2.1. Physical topologies¶
There are two ways of connecting IPU-Links and sync signals: in a mesh or as a torus. The mesh structure is similar to a ladder, where pairs of IPUs form each rung. In a torus, the ends of the “ladder” loop round to form a closed loop. See Fig. 8.1.
GCL supports both those topologies with different restrictions related to traffic flow and replica size that are described in the following section.

Fig. 8.1 Ladder and torus topologies used by GCL¶
8.2.2. Logical topologies¶
GCL supports a number of logical topologies that describe the traffic flow in the physical topology. Fig. 8.2 illustrates these the topologies.

Fig. 8.2 Logical topologies used by GCL¶
The following logical topologies are supported:
peripheral-ring is only relevant for replica size 1. The traffic follows a single ring on the periphery of the IPU-Link mesh. Assuming replica numbers assigned linearly from the bottom and an even-number communication-group size, the communication will follow this pattern:
0 - 1 - 3 - ... - <comm_size-3 - <comm_size-1> - <comm_size-2> - <comm_size-4> - ... - 4 - 2 - 0
barley-twist is only relevant for replica size 1 on an IPU-Link torus (that is, with loop-back cables). The traffic is split over two concurrent rings, forming a dual-serpent-like pattern through the IPU-Link torus. In this way, all eight IPU-Links will be used for communication, enabling utilisation of all available links for optimal bandwidth. The communication follows this pseudocode pattern:
int next_addr = (stream == barley-twist _X) ? 0 : 1; for (int duo_step; duo_step < comm_size/2; duo_step++) { next_addr = next_addr ^ 1; // Go-side-ways next_addr = (next_addr + 2) % comm_size; // Go-up }
ring-on-line is relevant for replica size 2. The traffic follows one of two virtual rings, one on each side of the IPU-Link mesh. Assuming replica numbers assigned linearly from bottom and an even-number communication-group size, the communication will follow this pattern:
0 - 1 - 3 - ... - <comm_size-3> - <comm_size-1> - <comm_size-2> - <comm_size-4> - ... - 4 - 2 - 0.
rung-ring-[2,4,8] is relevant for replica size 2, 4 and 8. The traffic follows one of two physical rings, one on each side of the IPU-Link mesh, by moving straight up and assuming wrap-around at the top for a torus. The communication will follow this pattern, looping-back to rank 0:
0 - 1 - 2 - ... <comm_size-2> - <comm_size-1> - 0
rung-ring-[4,8] is also a valid topology for up to 16 IPUs per ILD, on a mesh or torus physical topology with DNC routing.
8.2.3. Relationship between logical and physical topologies¶
Table 8.1 lists the different relationships between logical and physical topologies, depending on the size of the replica and the IPU-Link routing.
Replica size |
Logical topology |
Physical topology |
IPU-Link routing |
---|---|---|---|
1 |
peripheral-ring |
mesh, torus |
DNC, SWNC, RINGSWNC |
1 |
barley-twist |
torus |
BTNC |
2 |
ring-on-line |
mesh |
DNC, SWNC |
2 |
rung-ring-2 |
torus |
DNC, RINGSWNC |
4 |
rung-ring-4 |
mesh, torus |
DNC |
4 |
rung-ring-4 |
torus |
RINGSWNC |
8 |
rung-ring-8 |
mesh, torus |
DNC |
8 |
rung-ring-8 |
torus |
RINGSWNC |
Key |
||
IPU-Link routing options |
||
BTNC |
Barley-twist network configuration |
|
DNC |
Default network configuration |
|
SWNC |
Sliding-window network configuration |
|
RINGSWNC |
Ring with sliding-window network configuration |