5. GCL API reference

The Graphcore Communication Library (GCL) provides application-level functions that can be used in Poplar programs for the IPU.

5.1. gcl/TileAllocation.hpp

namespace gcl

Functions

std::vector<unsigned> perIPUTiles(const poplar::Graph &graph, unsigned offset, unsigned count, bool sorted = true)

Return a list of tile ids optimal for gcl collective operations.

To use GCL collectives the number of io tiles set in GCL_NUM_IO_TILES should be excluded from the compute tile graph using the offset parameter. GCL will then pick up the iotiles from offset 0.

Parameters
  • graph – The graph on which to allocate tiles

  • offset – Skip a number of tiles and allocate from an offset

  • count – Number of tiles ids to return

  • sorted – If true will sort the returned list of ids. This should normally be true and is thus also the default.

Variables

constexpr auto MIN_IO_TILES = 32

The lowest number of io tiles currently supported.

5.2. gcl/Collectives.hpp

namespace gcl

Enums

enum CommGroupType

Enum to define communication group specification type Assumption: replica groups are uniform in size and layout on IPUs.

Values:

enumerator ALL

All replicas viewed as one group, replica group size is ignored.

enumerator CONSECUTIVE

Groups are consecutive in replica.

If there are N replicas denoted {0….N-1} and group size is k then the groups are: {0, 1, … k-1}, {k, … 2k-1} … {N-k-1, … N-1}

enumerator ORTHOGONAL

Groups are sliced orthogonal to the replica ordering.

If there are N replicas denoted {0….N-1} and group size is k then the groups are: {0, k, 2k, …}, {1, k+1, 2k+1, …} … {k-1, 2k-1, …, N-1}

Functions

poplar::Tensor allReduce(poplar::Graph &graph, const poplar::Tensor &data, popops::Operation op, poplar::program::Sequence &prog, const CommGroup &group, const std::string &debugPrefix = "", const poplar::OptionFlags &options = {})

Perform an all-reduce operation.

The operation is performed on the provided tensor over replicas as specified by the group argument. This operation reduces across the tensors that the replicated tensor is a handle for. The result is returned as a replicated tensor.

Parameters
  • graph – The replicated graph the input tensor belongs to.

  • data – The replicated tensor to reduce.

  • op – The reduction operator (for example, poplar::Operation::ADD).

  • prog – The program sequence to add operations to.

  • group – The subset of replicas for the collective operation.

  • debugPrefix – String used as a prefix for compute sets.

  • options – Collective options.

Returns

A replicated tensor with the reduction of data.

poplar::Tensor allReduce(poplar::Graph &graph, const poplar::Tensor &data, popops::Operation op, poplar::program::Sequence &prog, const std::string &debugPrefix = "", const poplar::OptionFlags &options = {})

As allReduce() without the group arg (for all replicas).

void allReduceToDestination(poplar::Graph &graph, const poplar::Tensor &data, poplar::Tensor &destination, popops::Operation op, poplar::program::Sequence &prog, const CommGroup &group, const std::string &debugPrefix = "", const poplar::OptionFlags &options = {})

As allReduce() but writes the result to the destination tensor.

void allReduceToDestination(poplar::Graph &graph, const poplar::Tensor &data, poplar::Tensor &destination, popops::Operation op, poplar::program::Sequence &prog, const std::string &debugPrefix = "", const poplar::OptionFlags &options = {})

As allReduceToDestination() without group arg (for all replicas)

void allReduceInPlace(poplar::Graph &graph, poplar::Tensor &data, popops::Operation op, poplar::program::Sequence &prog, const CommGroup &group, const std::string &debugPrefix = "", const poplar::OptionFlags &options = {})

As allReduce() but writes result back to the input data tensor.

void allReduceInPlace(poplar::Graph &graph, poplar::Tensor &data, popops::Operation op, poplar::program::Sequence &prog, const std::string &debugPrefix = "", const poplar::OptionFlags &options = {})

As allReduceInPlace() without group arg (for all replicas).

poplar::Tensor reduceScatter(poplar::Graph &graph, const poplar::Tensor &data, popops::Operation op, poplar::program::Sequence &prog, const CommGroup &group, const std::string &debugPrefix = "", const poplar::OptionFlags &options = {})

Reduce the replicated rank-1 tensor toReduce with the result scattered across the replicas.

For an input of shape [numElements] mapped to a single IPU per replica, the output will have shape [ceil(numElements / replicationFactor)]. If replicationFactor does not evenly divide numElements, the result is zero-padded. For instance:

  • Before:

    • Replica0: toReduce[x0, y0, z0]

    • Replica1: toReduce[x1, y1, z1]

  • After:

    • Replica0: result[op(x0, x1), op(y0, y1)]

    • Replica1: result[op(z0, z1), 0]

For an input of shape [numElementsIPU0 + numElementsIPU1 + …] mapped to multiple IPUs per replica, the output will have shape: [ceil(numElementsIPU0 / replicationFactor) + ceil(numElementsIPU1 / replicationFactor) + …] with the result grouped per IPU. If replicationFactor does not evenly divide the number of elements on an IPU, the result is zero-padded per IPU. For instance:

  • Before:

    • Replica0: toReduce[x0, y0, z0, w0]

    • Replica1: toReduce[x1, y1, z1, w1]

    • Replica2: toReduce[x2, y2, z2, w2]

    • Replica3: toReduce[x3, y3, z3, w3]

    • Mapping: toReduce[IPU0, IPU0, IPU0, IPU1]

  • After:

    • Replica0: result[op(x0, x1, x2, x3), op(w0, w1, w2, w3)]

    • Replica1: result[op(y0, y1, y2, y3), 0]

    • Replica2: result[op(z0, z1, z2, z3), 0]

    • Replica3: result[0, 0]

    • Mapping: result[IPU0, IPU1]

Parameters
  • graph – The replicated graph the input tensor belongs to.

  • data – The replicated tensor to reduce scatter.

  • op – The reduction operator (for example, Operation::ADD)

  • prog – The program sequence to add operations to.

  • group – The subset of replicas for the collective operation.

  • debugPrefix – String used as a prefix for compute sets.

  • options – Collective options.

Returns

The output tensor, with the content described above.

poplar::Tensor reduceScatter(poplar::Graph &graph, const poplar::Tensor &data, popops::Operation op, poplar::program::Sequence &prog, const std::string &debugPrefix = "", const poplar::OptionFlags &options = {})

As reduceScatter() without group arg (for all replicas).

poplar::Tensor allGather(poplar::Graph &graph, const poplar::Tensor &data, poplar::program::Sequence &prog, const CommGroup &group, const std::string &debugPrefix = "", const poplar::OptionFlags &options = {})

Gather the replicated tensor toGather and return the result so each replica will have a copy of all other replicas’ toGather tensors.

For instance:

  • Before:

    • Replica0: toGather[x,y]

    • Replica1: toGather[z,w]

    • Replica2: toGather[x1, y1]

  • After allGather:

    • Replica0: result[x,y,z,w,x1,y1]

    • Replica1: result[x,y,z,w,x1,y1]

    • Replica2: result[x,y,z,w,x1,y1]

      For an input of shape [incomingShape] the output will be [replicationFactor][incomingShape].

Parameters
  • graph – The replicated graph the input tensor belongs to.

  • data – The replicated tensor to reduce.

  • prog – The program sequence to add operations to.

  • group – The subset of replicas for the collective operation.

  • debugPrefix – String used as a prefix for compute sets.

  • options – Collective options.

Returns

The output tensor, with the content described above.

poplar::Tensor allGather(poplar::Graph &graph, const poplar::Tensor &data, poplar::program::Sequence &prog, const std::string &debugPrefix = "", const poplar::OptionFlags &options = {})

As allGather() without group arg (for all replicas).

poplar::Tensor allToAll(poplar::Graph &graph, const poplar::Tensor &data, poplar::program::Sequence &prog, const CommGroup &group, const std::string &debugPrefix = "", const poplar::OptionFlags &options = {})

Perform an all-to-all exchange of the elements of the input tensor based on replica ID.

The shape of the input must have the number of replicas in the graph as its first or only dimension. That dimension will be used to split up the tensor being sent, with each replica sending all splits except for the split index which matches its replica ID. That is, replica 2 will not send input[2] and so on.

The replica receiving the slice will copy that incoming slice into the output at the index which matches the replica ID of the replica which sent it. For instance:

  • Input tensor:

    • Replica0: Tensor T[x0,x1,x2]

    • Replica1: Tensor T[y0,y1,y2]

    • Replica2: Tensor T[z0,z1,z2]

  • Output tensor:

    • Replica0: Tensor T[x0,y0,z0]

    • Replica1: Tensor T[x1,y1,z1]

    • Replica2: Tensor T[x2,y2,z2]

Parameters
  • graph – The replicated graph the input tensor belongs to.

  • data – The replicated tensor to reduce.

  • prog – The program sequence to add operations to.

  • group – The subset of replicas for the collective operation.

  • debugPrefix – String used as a prefix for compute sets.

  • options – Collective options.

Returns

The output tensor, with the content described above.

poplar::Tensor allToAll(poplar::Graph &graph, const poplar::Tensor &data, poplar::program::Sequence &prog, const std::string &debugPrefix = "", const poplar::OptionFlags &options = {})

As allToAll() without group arg (for all replicas).

struct CommGroup
#include <Collectives.hpp>

Struct to specify sub-groups of replicas Examples of derived sub-groups:

  • IPU-link domain sub-rack: type==CONSECUTIVE && replicaGroupSize==64/replica-size/N where N is power of two and replicaGroupSize > 1.

  • Complete IPU-link domain / full rack: type==CONSECUTIVE && replicaGroupSize==64/replica-size

  • Using GW-links only: type==ORTHOGONAL && replicaGroupSize==64/replica-size

Public Functions

CommGroup() = default
inline CommGroup(const CommGroupType &groupType, unsigned groupSize)

Public Members

CommGroupType type = CommGroupType::ALL
unsigned replicaGroupSize = 0