5. GCL API reference
The Graphcore Communication Library (GCL) provides application-level functions that can be used in Poplar programs for the IPU.
5.1. gcl/TileAllocation.hpp
-
namespace gcl
Functions
-
std::vector<unsigned> perIPUTiles(const poplar::Graph &graph, unsigned offset, unsigned count, bool sorted = true)
Return a list of tile ids optimal for gcl collective operations.
To use GCL collectives the number of io tiles set in GCL_NUM_IO_TILES should be excluded from the compute tile graph using the offset parameter. GCL will then pick up the iotiles from offset 0.
- Parameters
graph – The graph on which to allocate tiles
offset – Skip a number of tiles and allocate from an offset
count – Number of tiles ids to return
sorted – If true will sort the returned list of ids. This should normally be true and is thus also the default.
Variables
-
constexpr auto MIN_IO_TILES = 32
The lowest number of io tiles currently supported.
-
std::vector<unsigned> perIPUTiles(const poplar::Graph &graph, unsigned offset, unsigned count, bool sorted = true)
5.2. gcl/Collectives.hpp
-
namespace gcl
Enums
-
enum CommGroupType
Enum to define communication group specification type Assumption: replica groups are uniform in size and layout on IPUs.
Values:
-
enumerator ALL
All replicas viewed as one group, replica group size is ignored.
-
enumerator CONSECUTIVE
Groups are consecutive in replica.
If there are N replicas denoted {0….N-1} and group size is k then the groups are: {0, 1, … k-1}, {k, … 2k-1} … {N-k-1, … N-1}
-
enumerator ORTHOGONAL
Groups are sliced orthogonal to the replica ordering.
If there are N replicas denoted {0….N-1} and group size is k then the groups are: {0, k, 2k, …}, {1, k+1, 2k+1, …} … {k-1, 2k-1, …, N-1}
-
enumerator ALL
Functions
-
poplar::Tensor allReduce(poplar::Graph &graph, const poplar::Tensor &data, popops::Operation op, poplar::program::Sequence &prog, const CommGroup &group, const std::string &debugPrefix = "", const poplar::OptionFlags &options = {})
Perform an all-reduce operation.
The operation is performed on the provided tensor over replicas as specified by the
group
argument. This operation reduces across the tensors that the replicated tensor is a handle for. The result is returned as a replicated tensor.- Parameters
graph – The replicated graph the input tensor belongs to.
data – The replicated tensor to reduce.
op – The reduction operator (for example, poplar::Operation::ADD).
prog – The program sequence to add operations to.
group – The subset of replicas for the collective operation.
debugPrefix – String used as a prefix for compute sets.
options – Collective options.
- Returns
A replicated tensor with the reduction of
data
.
-
poplar::Tensor allReduce(poplar::Graph &graph, const poplar::Tensor &data, popops::Operation op, poplar::program::Sequence &prog, const std::string &debugPrefix = "", const poplar::OptionFlags &options = {})
As allReduce() without the
group
arg (for all replicas).
-
void allReduceToDestination(poplar::Graph &graph, const poplar::Tensor &data, poplar::Tensor &destination, popops::Operation op, poplar::program::Sequence &prog, const CommGroup &group, const std::string &debugPrefix = "", const poplar::OptionFlags &options = {})
As allReduce() but writes the result to the
destination
tensor.
-
void allReduceToDestination(poplar::Graph &graph, const poplar::Tensor &data, poplar::Tensor &destination, popops::Operation op, poplar::program::Sequence &prog, const std::string &debugPrefix = "", const poplar::OptionFlags &options = {})
As allReduceToDestination() without
group
arg (for all replicas)
-
void allReduceInPlace(poplar::Graph &graph, poplar::Tensor &data, popops::Operation op, poplar::program::Sequence &prog, const CommGroup &group, const std::string &debugPrefix = "", const poplar::OptionFlags &options = {})
As allReduce() but writes result back to the input
data
tensor.
-
void allReduceInPlace(poplar::Graph &graph, poplar::Tensor &data, popops::Operation op, poplar::program::Sequence &prog, const std::string &debugPrefix = "", const poplar::OptionFlags &options = {})
As allReduceInPlace() without
group
arg (for all replicas).
-
poplar::Tensor reduceScatter(poplar::Graph &graph, const poplar::Tensor &data, popops::Operation op, poplar::program::Sequence &prog, const CommGroup &group, const std::string &debugPrefix = "", const poplar::OptionFlags &options = {})
Reduce the replicated rank-1 tensor
toReduce
with the result scattered across the replicas.For an input of shape [numElements] mapped to a single IPU per replica, the output will have shape [ceil(numElements / replicationFactor)]. If replicationFactor does not evenly divide numElements, the result is zero-padded. For instance:
Before:
Replica0: toReduce[x0, y0, z0]
Replica1: toReduce[x1, y1, z1]
After:
Replica0: result[op(x0, x1), op(y0, y1)]
Replica1: result[op(z0, z1), 0]
For an input of shape [numElementsIPU0 + numElementsIPU1 + …] mapped to multiple IPUs per replica, the output will have shape: [ceil(numElementsIPU0 / replicationFactor) + ceil(numElementsIPU1 / replicationFactor) + …] with the result grouped per IPU. If replicationFactor does not evenly divide the number of elements on an IPU, the result is zero-padded per IPU. For instance:
Before:
Replica0: toReduce[x0, y0, z0, w0]
Replica1: toReduce[x1, y1, z1, w1]
Replica2: toReduce[x2, y2, z2, w2]
Replica3: toReduce[x3, y3, z3, w3]
Mapping: toReduce[IPU0, IPU0, IPU0, IPU1]
After:
Replica0: result[op(x0, x1, x2, x3), op(w0, w1, w2, w3)]
Replica1: result[op(y0, y1, y2, y3), 0]
Replica2: result[op(z0, z1, z2, z3), 0]
Replica3: result[0, 0]
Mapping: result[IPU0, IPU1]
- Parameters
graph – The replicated graph the input tensor belongs to.
data – The replicated tensor to reduce scatter.
op – The reduction operator (for example,
Operation::ADD
)prog – The program sequence to add operations to.
group – The subset of replicas for the collective operation.
debugPrefix – String used as a prefix for compute sets.
options – Collective options.
- Returns
The output tensor, with the content described above.
-
poplar::Tensor reduceScatter(poplar::Graph &graph, const poplar::Tensor &data, popops::Operation op, poplar::program::Sequence &prog, const std::string &debugPrefix = "", const poplar::OptionFlags &options = {})
As reduceScatter() without
group
arg (for all replicas).
-
poplar::Tensor allGather(poplar::Graph &graph, const poplar::Tensor &data, poplar::program::Sequence &prog, const CommGroup &group, const std::string &debugPrefix = "", const poplar::OptionFlags &options = {})
Gather the replicated tensor
toGather
and return the result so each replica will have a copy of all other replicas’toGather
tensors.For instance:
Before:
Replica0: toGather[x,y]
Replica1: toGather[z,w]
Replica2: toGather[x1, y1]
After allGather:
Replica0: result[x,y,z,w,x1,y1]
Replica1: result[x,y,z,w,x1,y1]
Replica2: result[x,y,z,w,x1,y1]
For an input of shape [incomingShape] the output will be [replicationFactor][incomingShape].
- Parameters
graph – The replicated graph the input tensor belongs to.
data – The replicated tensor to reduce.
prog – The program sequence to add operations to.
group – The subset of replicas for the collective operation.
debugPrefix – String used as a prefix for compute sets.
options – Collective options.
- Returns
The output tensor, with the content described above.
-
poplar::Tensor allGather(poplar::Graph &graph, const poplar::Tensor &data, poplar::program::Sequence &prog, const std::string &debugPrefix = "", const poplar::OptionFlags &options = {})
As allGather() without
group
arg (for all replicas).
-
poplar::Tensor allToAll(poplar::Graph &graph, const poplar::Tensor &data, poplar::program::Sequence &prog, const CommGroup &group, const std::string &debugPrefix = "", const poplar::OptionFlags &options = {})
Perform an all-to-all exchange of the elements of the input tensor based on replica ID.
The shape of the input must have the number of replicas in the graph as its first or only dimension. That dimension will be used to split up the tensor being sent, with each replica sending all splits except for the split index which matches its replica ID. That is, replica 2 will not send input[2] and so on.
The replica receiving the slice will copy that incoming slice into the output at the index which matches the replica ID of the replica which sent it. For instance:
Input tensor:
Replica0: Tensor T[x0,x1,x2]
Replica1: Tensor T[y0,y1,y2]
Replica2: Tensor T[z0,z1,z2]
Output tensor:
Replica0: Tensor T[x0,y0,z0]
Replica1: Tensor T[x1,y1,z1]
Replica2: Tensor T[x2,y2,z2]
- Parameters
graph – The replicated graph the input tensor belongs to.
data – The replicated tensor to reduce.
prog – The program sequence to add operations to.
group – The subset of replicas for the collective operation.
debugPrefix – String used as a prefix for compute sets.
options – Collective options.
- Returns
The output tensor, with the content described above.
-
struct CommGroup
- #include <Collectives.hpp>
Struct to specify sub-groups of replicas Examples of derived sub-groups:
IPU-link domain sub-rack: type==CONSECUTIVE && replicaGroupSize==64/replica-size/N where N is power of two and replicaGroupSize > 1.
Complete IPU-link domain / full rack: type==CONSECUTIVE && replicaGroupSize==64/replica-size
Using GW-links only: type==ORTHOGONAL && replicaGroupSize==64/replica-size
Public Functions
-
CommGroup() = default
-
inline CommGroup(const CommGroupType &groupType, unsigned groupSize)
-
enum CommGroupType