Poplar and PopLibs
gcl Namespace Reference

Graphcore Communications Library. More...

Classes

struct  Chunk
 Represents a section of a tensor mapped to an IPU. More...
 
struct  Chunks
 A vector of Chunk data. More...
 
class  CollectiveBalancedHostRearrangement
 This class contains functions and data necessary to rearrange tensors on the host side at runtime. More...
 
class  CollectiveBalancedReorder
 Helper class to reorder a tensor in a per-tile-balanced fashion such that each replica obtains (for inputs to AllGather or outputs of ReduceScatter) an equally sized 1D tensor with equally sized regions. More...
 
struct  CommGroup
 Struct to specify sub-groups of replicas. More...
 

Enumerations

enum class  CommGroupType { ALL , CONSECUTIVE , ORTHOGONAL }
 Enum to define communication group specification type. More...
 
enum class  CollectiveOperator { }
 Supported collective operators. More...
 

Functions

std::istream & operator>> (std::istream &is, CollectiveOperator &op)
 Parse token from input stream is to op. More...
 
std::ostream & operator<< (std::ostream &os, const CollectiveOperator &op)
 Write op to output stream os. More...
 
unsigned getMinIoTiles (const poplar::Graph &graph)
 Get the minimum number of IO tiles required for GCL operations. More...
 
std::vector< unsigned > perIPUTiles (const poplar::Graph &graph, unsigned offset, unsigned count, bool sorted=true, bool tilePairs=true)
 Return a list of tile IDs optimal for GCL collective operations. More...
 
CrossReplica functions

Collective operations working across replicas.

poplar::Tensor allReduceCrossReplica (poplar::Graph &graph, const poplar::Tensor &data, CollectiveOperator op, poplar::program::Sequence &prog, const CommGroup &group, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 Perform an all-reduce operation. More...
 
std::vector< poplar::TensorallReduceCrossReplica (poplar::Graph &graph, const std::vector< poplar::Tensor > &datas, CollectiveOperator op, poplar::program::Sequence &prog, const CommGroup &group, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 Perform an all-reduce operation on multiple tensors. More...
 
poplar::Tensor allReduceCrossReplica (poplar::Graph &graph, const poplar::Tensor &data, CollectiveOperator op, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 As allReduceCrossReplica() without the group arg (for all replicas). More...
 
std::vector< poplar::TensorallReduceCrossReplica (poplar::Graph &graph, const std::vector< poplar::Tensor > &datas, CollectiveOperator op, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 As allReduceCrossReplica() with multiple input tensors and without the group arg. More...
 
void allReduceToDestinationCrossReplica (poplar::Graph &graph, const poplar::Tensor &data, poplar::Tensor &destination, CollectiveOperator op, poplar::program::Sequence &prog, const CommGroup &group, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 As allReduceCrossReplica() but writes the result to the destination tensor. More...
 
void allReduceToDestinationCrossReplica (poplar::Graph &graph, const std::vector< poplar::Tensor > &datas, const std::vector< poplar::Tensor > &destinations, CollectiveOperator op, poplar::program::Sequence &prog, const CommGroup &group, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 As allReduceToDestinationCrossReplica() with multiple input and output tensors. More...
 
void allReduceToDestinationCrossReplica (poplar::Graph &graph, const poplar::Tensor &data, poplar::Tensor &destination, CollectiveOperator op, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 As allReduceToDestinationCrossReplica() without group arg. More...
 
void allReduceInPlaceCrossReplica (poplar::Graph &graph, poplar::Tensor &data, CollectiveOperator op, poplar::program::Sequence &prog, const CommGroup &group, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 As allReduceCrossReplica() but writes result back to the input data tensor. More...
 
void allReduceInPlaceCrossReplica (poplar::Graph &graph, std::vector< poplar::Tensor > &datas, CollectiveOperator op, poplar::program::Sequence &prog, const CommGroup &group, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 Perform an all-reduce operation on multiple tensors writing the results back to the input datas tensors. More...
 
void allReduceInPlaceCrossReplica (poplar::Graph &graph, poplar::Tensor &data, CollectiveOperator op, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 As allReduceInPlaceCrossReplica() without group arg. More...
 
void allReduceInPlaceCrossReplica (poplar::Graph &graph, std::vector< poplar::Tensor > &datas, CollectiveOperator op, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 As allReduceInPlaceCrossReplica() with multiple input tensors and without group arg. More...
 
poplar::Tensor reduceScatterCrossReplica (poplar::Graph &graph, const poplar::Tensor &data, CollectiveOperator op, poplar::program::Sequence &prog, const CommGroup &group, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 Reduce the replicated rank-1 tensor data with the result scattered across the replicas. More...
 
std::vector< poplar::TensorreduceScatterCrossReplica (poplar::Graph &graph, const std::vector< poplar::Tensor > &datas, CollectiveOperator op, poplar::program::Sequence &prog, const CommGroup &group, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 As reduceScatterCrossReplica() but with vector input argument and vector output as return value. More...
 
void reduceScatterToDestinationCrossReplica (poplar::Graph &graph, const std::vector< poplar::Tensor > &datas, const std::vector< poplar::Tensor > &destinations, CollectiveOperator op, poplar::program::Sequence &prog, const CommGroup &group, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 As reduceScatterCrossReplica() but with vector input/output arguments. More...
 
poplar::Tensor reduceScatterCrossReplica (poplar::Graph &graph, const poplar::Tensor &data, CollectiveOperator op, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 As reduceScatterCrossReplica() without group arg. More...
 
poplar::Tensor allGatherCrossReplica (poplar::Graph &graph, const poplar::Tensor &data, poplar::program::Sequence &prog, const CommGroup &group, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 Gather the replicated tensor data. More...
 
std::vector< poplar::TensorallGatherCrossReplica (poplar::Graph &graph, const std::vector< poplar::Tensor > &datas, poplar::program::Sequence &prog, const CommGroup &group, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 As allGatherCrossReplica() but with vector input argument and vector output as return value. More...
 
void allGatherToDestinationCrossReplica (poplar::Graph &graph, const std::vector< poplar::Tensor > &datas, const std::vector< poplar::Tensor > &destinations, poplar::program::Sequence &prog, const CommGroup &group, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 As allGatherCrossReplica() but with vector input/output arguments. More...
 
poplar::Tensor allGatherCrossReplica (poplar::Graph &graph, const poplar::Tensor &data, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 As allGatherCrossReplica() without group arg. More...
 
poplar::Tensor allToAllCrossReplica (poplar::Graph &graph, const poplar::Tensor &data, poplar::program::Sequence &prog, const CommGroup &group, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 Perform an all-to-all exchange of the elements of the input tensor based on replica ID. More...
 
poplar::Tensor allToAllCrossReplica (poplar::Graph &graph, const poplar::Tensor &data, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 As allToAllCrossReplica() without group arg. More...
 
poplar::Tensor broadcastCrossReplica (poplar::Graph &graph, const poplar::Tensor &data, poplar::program::Sequence &prog, const CommGroup &group={}, unsigned rootReplica=0, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 Perform a broadcast from one replica to all other replicas. More...
 
WithinReplica functions

Collective operations working within replicas.

poplar::Tensor concatChunks (const Chunks &chunks)
 Concatenates chunks. More...
 
Chunks reduceScatterWithinReplica (poplar::Graph &graph, const poplar::Tensor &toReduce, CollectiveOperator op, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 Reduce a rank 2 tensor. More...
 
poplar::Tensor allGatherWithinReplica (poplar::Graph &graph, const Chunks &toGather, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 Broadcast data distributed over all IPUs. More...
 
poplar::Tensor allReduceWithinReplica (poplar::Graph &graph, const poplar::Tensor &toReduce, CollectiveOperator op, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 Perform an all-reduce operation on the specified tensor. More...
 

Detailed Description

Graphcore Communications Library.

Enumeration Type Documentation

◆ CollectiveOperator

enum class gcl::CollectiveOperator
strong

Supported collective operators.

Enumerator
LOGICAL_AND 

Only supports boolean operands.

LOGICAL_OR 

Only supports boolean operands.

SQUARE_ADD 

Squares each element before applying ADD reduction.

LOCAL 

Do nothing and keep the local value.

◆ CommGroupType

enum class gcl::CommGroupType
strong

Enum to define communication group specification type.

Assumption: replica groups are uniform in size and layout on IPUs.

Enumerator
ALL 

All replicas viewed as one group.

CONSECUTIVE 

Groups are consecutive in replica.

If there are N replicas denoted {0, ... N-1} and group size is k, then there are N/k groups of size k: {0, 1, ... k-1}, {k, ... 2k-1} ... {N-k-1, ... N-1}

ORTHOGONAL 

Groups are sliced orthogonal to the replica ordering.

If there are N replicas denoted {0, ... N-1} and group size is k, then there are m = N/k groups of size k: {0, m, 2m, ...}, {1, m+1, 2m+1, ...} ... {m-1, 2m-1, ... N-1}

Function Documentation

◆ allGatherCrossReplica() [1/3]

poplar::Tensor gcl::allGatherCrossReplica ( poplar::Graph graph,
const poplar::Tensor data,
poplar::program::Sequence prog,
const CommGroup group,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

Gather the replicated tensor data.

Return the result so each replica will have a copy of all other replicas' data tensors. For instance:

Before:

Replica0: data[s,t]
Replica1: data[u,v]
Replica2: data[w,x]
Replica3: data[y,z]

After:

Replica0: result[[s,t], [u,v], [w,x], [y,z]]
Replica1: result[[s,t], [u,v], [w,x], [y,z]]
Replica2: result[[s,t], [u,v], [w,x], [y,z]]
Replica3: result[[s,t], [u,v], [w,x], [y,z]]

For an input of shape [incomingShape] the output will be [replicationFactor][incomingShape].

Parameters
graphThe replicated graph the input tensor belongs to.
dataThe replicated tensor to gather.
progThe program sequence to add operations to.
groupThe subset of replicas for the collective operation.
debugContextOptional debug context.
optionsSee OptionFlags.
Returns
The output tensor, with the content described above.

◆ allGatherCrossReplica() [2/3]

poplar::Tensor gcl::allGatherCrossReplica ( poplar::Graph graph,
const poplar::Tensor data,
poplar::program::Sequence prog,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

As allGatherCrossReplica() without group arg.

Parameters
graphThe replicated graph the input tensor belongs to.
dataThe replicated tensor to gather.
progThe program sequence to add operations to.
debugContextOptional debug context.
optionsSee OptionFlags.
Returns
The output tensor, with the content described above.

◆ allGatherCrossReplica() [3/3]

std::vector< poplar::Tensor > gcl::allGatherCrossReplica ( poplar::Graph graph,
const std::vector< poplar::Tensor > &  datas,
poplar::program::Sequence prog,
const CommGroup group,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

As allGatherCrossReplica() but with vector input argument and vector output as return value.

Parameters
graphThe replicated graph the input tensor belongs to.
datasThe replicated tensors to gather.
progThe program sequence to add operations to.
groupThe subset of replicas for the collective operation.
debugContextOptional debug context.
optionsSee OptionFlags.
Returns
The output tensors, with the content described above.

◆ allGatherToDestinationCrossReplica()

void gcl::allGatherToDestinationCrossReplica ( poplar::Graph graph,
const std::vector< poplar::Tensor > &  datas,
const std::vector< poplar::Tensor > &  destinations,
poplar::program::Sequence prog,
const CommGroup group,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

As allGatherCrossReplica() but with vector input/output arguments.

Note
It's important that the destination tensors are mapped to ipus in the same way as the data tensors.
Parameters
graphThe replicated graph the input tensor belongs to.
datasThe replicated tensors to gather.
destinationsOutput tensors which must have correct type/shape.
progThe program sequence to add operations to.
groupThe subset of replicas for the collective operation.
debugContextOptional debug context.
optionsSee OptionFlags.

◆ allGatherWithinReplica()

poplar::Tensor gcl::allGatherWithinReplica ( poplar::Graph graph,
const Chunks toGather,
poplar::program::Sequence prog,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

Broadcast data distributed over all IPUs.

This function assumes chunk i is mapped to IPU i.

Before:

Chunks = [
           [ ], // IPU0 (index=2, offset=0)
           [z], // IPU1 (index=1, offset=0)
           [x], // IPU2 (index=3, offset=0)
           [y]  // IPU3 (index=0, offset=0)
         ]

After:

result = [
           [x,y,z], // IPU0
           [x,y,z], // IPU1
           [x,y,z], // IPU2
           [x,y,z]  // IPU3
         ]
Note
Multi ipu ranks (>1 ipu per rank) are not yet supported.
Parameters
graphThe graph.
toGatherThe chunks to gather.
progThe program sequence to add operations to.
debugContextOptional debug information.
optionsSee OptionFlags.
Returns
A 2D tensor that contains a copy of the data for each IPU. Index i in the outermost dimension of the result is mapped to IPU i.

◆ allReduceCrossReplica() [1/4]

poplar::Tensor gcl::allReduceCrossReplica ( poplar::Graph graph,
const poplar::Tensor data,
CollectiveOperator  op,
poplar::program::Sequence prog,
const CommGroup group,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

Perform an all-reduce operation.

The operation is performed on the provided tensor over replicas as specified by the group argument. This operation reduces across the tensors that the replicated tensor is a handle for. The result is returned as a replicated tensor with the same shape as the input shape where all replicas output tensors have the same data. For instance:

Before:

Replica0: data[x0,y0]
Replica1: data[x1,y1]
Replica2: data[x2,y2]
Replica3: data[x3,y3]

After:

Replica0: result[op(x0,x1,x2,x3), op(y0,y1,y2,y3)]
Replica1: result[op(x0,x1,x2,x3), op(y0,y1,y2,y3)]
Replica2: result[op(x0,x1,x2,x3), op(y0,y1,y2,y3)]
Replica3: result[op(x0,x1,x2,x3), op(y0,y1,y2,y3)]
Parameters
graphThe replicated graph the input tensor belongs to.
dataThe replicated tensor to reduce.
opThe reduction operator (for example, gcl::CollectiveOperator::ADD).
progThe program sequence to add operations to.
groupThe subset of replicas for the collective operation.
debugContextOptional debug context.
optionsSee OptionFlags.
Returns
A replicated tensor with the reduction of data.

◆ allReduceCrossReplica() [2/4]

poplar::Tensor gcl::allReduceCrossReplica ( poplar::Graph graph,
const poplar::Tensor data,
CollectiveOperator  op,
poplar::program::Sequence prog,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

As allReduceCrossReplica() without the group arg (for all replicas).

Parameters
graphThe replicated graph the input tensors belongs to.
dataThe replicated tensor to reduce.
opThe reduction operator (for example, gcl::CollectiveOperator::ADD).
progThe program sequence to add operations to.
debugContextOptional debug context.
optionsSee OptionFlags.
Returns
A replicated tensor with the reduction of data.

◆ allReduceCrossReplica() [3/4]

std::vector< poplar::Tensor > gcl::allReduceCrossReplica ( poplar::Graph graph,
const std::vector< poplar::Tensor > &  datas,
CollectiveOperator  op,
poplar::program::Sequence prog,
const CommGroup group,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

Perform an all-reduce operation on multiple tensors.

As allReduceCrossReplica(), but batch up multiple tensors to be executed as a single collective operation. This gives a performance improvement over sequentially reducing one tensor per operation. For short tensors the potential latency reduction is 1/(number-of-tensors) over sequentially reducing one tensor per operation.

Parameters
graphThe replicated graph the input tensors belongs to.
datasThe vector of replicated tensors to reduce.
opThe reduction operator (for example, gcl::CollectiveOperator::ADD).
progThe program sequence to add operations to.
groupThe subset of replicas for the collective operation.
debugContextOptional debug context.
optionsSee OptionFlags.
Returns
A vector of replicated tensors. Each of these tensors containing the reduction of the corresponding tensor in datas accross all replicas.

◆ allReduceCrossReplica() [4/4]

std::vector< poplar::Tensor > gcl::allReduceCrossReplica ( poplar::Graph graph,
const std::vector< poplar::Tensor > &  datas,
CollectiveOperator  op,
poplar::program::Sequence prog,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

As allReduceCrossReplica() with multiple input tensors and without the group arg.

Parameters
graphThe replicated graph the input tensors belongs to.
datasA vector of replicated tensors to reduce.
opThe reduction operator (for example, gcl::CollectiveOperator::ADD).
progThe program sequence to add operations to.
debugContextOptional debug context.
optionsSee OptionFlags.
Returns
A vector of replicated tensors. Each of these tensors containing the reduction of the corresponding tensor in datas across all replicas.

◆ allReduceInPlaceCrossReplica() [1/4]

void gcl::allReduceInPlaceCrossReplica ( poplar::Graph graph,
poplar::Tensor data,
CollectiveOperator  op,
poplar::program::Sequence prog,
const CommGroup group,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

As allReduceCrossReplica() but writes result back to the input data tensor.

Parameters
graphThe replicated graph the input tensor belongs to.
dataThe replicated tensor to reduce.
opThe reduction operator (for example, gcl::CollectiveOperator::ADD).
progThe program sequence to add operations to.
groupThe subset of replicas for the collective operation.
debugContextOptional debug context.
optionsSee OptionFlags.

◆ allReduceInPlaceCrossReplica() [2/4]

void gcl::allReduceInPlaceCrossReplica ( poplar::Graph graph,
poplar::Tensor data,
CollectiveOperator  op,
poplar::program::Sequence prog,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

As allReduceInPlaceCrossReplica() without group arg.

Parameters
graphThe replicated graph the input tensor belongs to.
dataThe replicated tensor to reduce.
opThe reduction operator (for example, gcl::CollectiveOperator::ADD).
progThe program sequence to add operations to.
debugContextOptional debug context.
optionsSee OptionFlags.

◆ allReduceInPlaceCrossReplica() [3/4]

void gcl::allReduceInPlaceCrossReplica ( poplar::Graph graph,
std::vector< poplar::Tensor > &  datas,
CollectiveOperator  op,
poplar::program::Sequence prog,
const CommGroup group,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

Perform an all-reduce operation on multiple tensors writing the results back to the input datas tensors.

As allReduceInPlaceCrossReplica(), but batch up multiple tensors to be executed as a single collective operations. This gives a performance improvement over sequentially reducing one tensor per operation. For short tensors the potential latency reduction is 1/(number-of-tensors) over sequentially reducing one tensor per operation.

Parameters
graphThe replicated graph the input tensor belongs to.
datasVector of replicated tensors to reduce.
opThe reduction operator (for example, gcl::CollectiveOperator::ADD).
progThe program sequence to add operations to.
groupThe subset of replicas for the collective operation.
debugContextOptional debug context.
optionsSee OptionFlags.

◆ allReduceInPlaceCrossReplica() [4/4]

void gcl::allReduceInPlaceCrossReplica ( poplar::Graph graph,
std::vector< poplar::Tensor > &  datas,
CollectiveOperator  op,
poplar::program::Sequence prog,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

As allReduceInPlaceCrossReplica() with multiple input tensors and without group arg.

Parameters
graphThe replicated graph the input tensor belongs to.
datasVector of replicated tensors to reduce.
opThe reduction operator (for example, gcl::CollectiveOperator::ADD).
progThe program sequence to add operations to.
debugContextOptional debug context.
optionsSee OptionFlags

◆ allReduceToDestinationCrossReplica() [1/3]

void gcl::allReduceToDestinationCrossReplica ( poplar::Graph graph,
const poplar::Tensor data,
poplar::Tensor destination,
CollectiveOperator  op,
poplar::program::Sequence prog,
const CommGroup group,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

As allReduceCrossReplica() but writes the result to the destination tensor.

Parameters
graphThe replicated graph the input tensor belongs to.
dataThe replicated tensor to reduce.
destinationTensor to write the result to.
opThe reduction operator (for example, gcl::CollectiveOperator::ADD).
progThe program sequence to add operations to.
groupThe subset of replicas for the collective operation.
debugContextOptional debug context.
optionsSee OptionFlags.

◆ allReduceToDestinationCrossReplica() [2/3]

void gcl::allReduceToDestinationCrossReplica ( poplar::Graph graph,
const poplar::Tensor data,
poplar::Tensor destination,
CollectiveOperator  op,
poplar::program::Sequence prog,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

As allReduceToDestinationCrossReplica() without group arg.

Parameters
graphThe replicated graph the input tensor belongs to.
dataThe replicated tensor to reduce.
destinationTensor to write the result to.
opThe reduction operator (for example, gcl::CollectiveOperator::ADD).
progThe program sequence to add operations to.
debugContextOptional debug context.
optionsSee OptionFlags.

◆ allReduceToDestinationCrossReplica() [3/3]

void gcl::allReduceToDestinationCrossReplica ( poplar::Graph graph,
const std::vector< poplar::Tensor > &  datas,
const std::vector< poplar::Tensor > &  destinations,
CollectiveOperator  op,
poplar::program::Sequence prog,
const CommGroup group,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

As allReduceToDestinationCrossReplica() with multiple input and output tensors.

Parameters
graphThe replicated graph the input tensors belongs to.
datasVector of replicated tensors to reduce.
destinationsVector of replicated tensors to write the result to.
opThe reduction operator (for example, gcl::CollectiveOperator::ADD).
progThe program sequence to add operations to.
groupThe subset of replicas for the collective operation.
debugContextOptional debug context.
optionsSee OptionFlags.

◆ allReduceWithinReplica()

poplar::Tensor gcl::allReduceWithinReplica ( poplar::Graph graph,
const poplar::Tensor toReduce,
CollectiveOperator  op,
poplar::program::Sequence prog,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

Perform an all-reduce operation on the specified tensor.

This operation reduces across the outermost dimension of the input and produces a tensor with the same shape where the innermost dimension is the result of the reduction and the outermost dimension is a number of copies of the result.

This function assumes index i in the outermost dimension of the input is mapped to IPU i. Index i in the outermost dimension of the result is mapped to IPU i.

Before:

toReduce = [
             [x0,y0], // IPU0
             [x1,y1], // IPU1
             [x2,y2], // IPU2
             [x3,y3], // IPU3
           ]

After:

result = [
           [op(x0,x1,x2,x3), op(y0,y1,y2,y3)], // IPU0
           [op(x0,x1,x2,x3), op(y0,y1,y2,y3)], // IPU1
           [op(x0,x1,x2,x3), op(y0,y1,y2,y3)], // IPU2
           [op(x0,x1,x2,x3), op(y0,y1,y2,y3)]  // IPU3
         ]
Parameters
graphThe graph.
toReduceThe tensor to reduce. Each partial should be mapped identically to the others across the ipus within the rank.
opThe reduction operator (for example, gcl::CollectiveOperator::ADD).
progThe program sequence to add operations to.
debugContextOptional debug information.
optionsSee OptionFlags.
Returns
A tensor with the same shape as toReduce, where the innermost dimension is the result of the reduction and the outermost dimension has a number of copies of the result.

◆ allToAllCrossReplica() [1/2]

poplar::Tensor gcl::allToAllCrossReplica ( poplar::Graph graph,
const poplar::Tensor data,
poplar::program::Sequence prog,
const CommGroup group,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

Perform an all-to-all exchange of the elements of the input tensor based on replica ID.

The shape of the input must have the number of replicas in the graph as its first or only dimension. That dimension will be used to split up the tensor being sent, with each replica sending all splits except for the split index which matches its replica ID. That is, replica 2 will not send input[2] and so on.

The replica receiving the slice will copy that incoming slice into the output at the index which matches the replica ID of the replica which sent it. For instance:

Before:

Replica0: data[x0,x1,x2]
Replica1: data[y0,y1,y2]
Replica2: data[z0,z1,z2]

After:

Replica0: result[x0,y0,z0]
Replica1: result[x1,y1,z1]
Replica2: result[x2,y2,z2]
Parameters
graphThe replicated graph the input tensor belongs to.
dataThe replicated tensor to aggregate.
progThe program sequence to add operations to.
groupThe subset of replicas for the collective operation.
debugContextOptional debug context.
optionsSee OptionFlags.
Returns
The output tensor, with the content described above.

◆ allToAllCrossReplica() [2/2]

poplar::Tensor gcl::allToAllCrossReplica ( poplar::Graph graph,
const poplar::Tensor data,
poplar::program::Sequence prog,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

As allToAllCrossReplica() without group arg.

Parameters
graphThe replicated graph the input tensor belongs to.
dataThe replicated tensor to aggregate.
progThe program sequence to add operations to.
debugContextOptional debug context.
optionsSee gcl::allReduceCrossReplica().
Returns
The output tensor, with the content described above.

◆ broadcastCrossReplica()

poplar::Tensor gcl::broadcastCrossReplica ( poplar::Graph graph,
const poplar::Tensor data,
poplar::program::Sequence prog,
const CommGroup group = {},
unsigned  rootReplica = 0,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

Perform a broadcast from one replica to all other replicas.

Before:

Replica0: data[x0,x1,x2] // <-- rootReplica
Replica1: data[y0,y1,y2]
Replica2: data[z0,z1,z2]

After:

Replica0: result[x0,x1,x2]
Replica1: result[x0,x1,x2]
Replica2: result[x0,x1,x2]
Parameters
graphThe replicated graph the input tensor belongs to.
dataThe replicated tensor to broadcast.
progThe program sequence to add operations to.
groupThe subset of replicas for the collective operation.
rootReplicaThe replica id to use as source for the broadcast.
debugContextOptional debug context.
optionsSee gcl::allReduceCrossReplica().
Returns
The output tensor, with the content described above.

◆ concatChunks()

poplar::Tensor gcl::concatChunks ( const Chunks chunks)

Concatenates chunks.

Given a vector of Chunk data, its elements are sorted according to the offset or index and a tensor is returned that consists of sorted concatenated Chunk elements. This operation is performed on the output of the reduceScatterWithinReplica and on the input of the allGatherWithinReplica operations.

Parameters
chunksA structure containing a vector of Chunk data.
Returns
A concatenated vector consisting of sorted Chunk elements.

◆ getMinIoTiles()

unsigned gcl::getMinIoTiles ( const poplar::Graph graph)

Get the minimum number of IO tiles required for GCL operations.

This is the minimum number of tiles you must ask for in perIPUTiles() for it to be a valid graph for GCL operations.

Parameters
graphThe graph to check. It assumes that all tiles in the graph will be used for IO operations.
Returns
The smallest number of IO tiles that must be allocated for GCL operations.

◆ operator<<()

std::ostream & gcl::operator<< ( std::ostream &  os,
const CollectiveOperator op 
)

Write op to output stream os.

The value written is the stringified enumeration, for example "ADD" or "MUL".

Parameters
osostream output destination.
opgcl::CollectiveOperator to represent as a string.
Returns
The original output stream.

◆ operator>>()

std::istream & gcl::operator>> ( std::istream &  is,
CollectiveOperator op 
)

Parse token from input stream is to op.

Valid input values are the stringified enumerations, for example "ADD" or "MUL".

Parameters
isThe stream to read from.
opgcl::CollectiveOperator parsed from input stream
Returns
The original input stream.

◆ perIPUTiles()

std::vector< unsigned > gcl::perIPUTiles ( const poplar::Graph graph,
unsigned  offset,
unsigned  count,
bool  sorted = true,
bool  tilePairs = true 
)

Return a list of tile IDs optimal for GCL collective operations.

A set of tiles are chosen with an optimal distribution across exchange blocks and exchange-block contexts. (Exchange blocks provide the hardware interface to off-chip IO.)

Parameters
graphThe graph on which to allocate tiles.
offsetSkip a number of tiles and allocate from an offset. This is useful if you want to call the function multiple times, for example to get a set of tiles for IO and another set of tiles for compute. In that case, you could it twice: first with an offset of zero to get the IO tiles, and then with offset equal to the number of IO tiles in order to get the compute tiles.
countNumber of tiles IDs to return.
sortedIf true the function will sort the returned list of IDs. This is normally required and is the default.
tilePairsOverride the default behaviour and return tile pairs. Tiles used for host IO must be allocated in pairs.
Returns
A vector of tile IDs.

◆ reduceScatterCrossReplica() [1/3]

poplar::Tensor gcl::reduceScatterCrossReplica ( poplar::Graph graph,
const poplar::Tensor data,
CollectiveOperator  op,
poplar::program::Sequence prog,
const CommGroup group,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

Reduce the replicated rank-1 tensor data with the result scattered across the replicas.

For an input of shape [numElements] mapped to a single IPU per replica, the output will have shape [ceil(numElements / replicationFactor)]. If replicationFactor does not evenly divide numElements, the result is zero-padded. For instance:

Before:

Replica0: toReduce[x0, y0, z0]
Replica1: toReduce[x1, y1, z1]

After:

Replica0: result[op(x0, x1), op(y0, y1)]
Replica1: result[op(z0, z1), 0]

Multi IPU mapped input

For the syncful implementation an input of shape: [numElementsIPU0 + numElementsIPU1 + ...] mapped to multiple IPUs per replica, the output will have shape: [ceil(numElementsIPU0 / replicationFactor) + ceil(numElementsIPU1 / replicationFactor) + ...] with the result grouped per IPU. If replicationFactor does not evenly divide the number of elements on an IPU, the result is zero-padded per IPU. For instance:

Before:

Replica0: toReduce[  x0,   y0,   z0,   w0]
Replica1: toReduce[  x1,   y1,   z1,   w1]
Replica2: toReduce[  x2,   y2,   z2,   w2]
Replica3: toReduce[  x3,   y3,   z3,   w3]
Mapping:  toReduce[IPU0, IPU0, IPU0, IPU1]

After:

Replica0: result[op(x0, x1, x2, x3), op(w0, w1, w2, w3)]
Replica1: result[op(y0, y1, y2, y3),                  0]
Replica2: result[op(z0, z1, z2, z3),                  0]
Replica3: result[                 0,                  0]
Mapping:  result[              IPU0,               IPU1]
Note
Only flat input tensors are currently supported.
Parameters
graphThe replicated graph the input tensor belongs to.
dataThe replicated tensor to reduce scatter.
opThe reduction operator (for example, gcl::CollectiveOperator::ADD).
progThe program sequence to add operations to.
groupThe subset of replicas for the collective operation.
debugContextOptional debug context.
optionsSee OptionFlags.
Returns
The output tensor, with the content described above.

◆ reduceScatterCrossReplica() [2/3]

poplar::Tensor gcl::reduceScatterCrossReplica ( poplar::Graph graph,
const poplar::Tensor data,
CollectiveOperator  op,
poplar::program::Sequence prog,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

As reduceScatterCrossReplica() without group arg.

Parameters
graphThe replicated graph the input tensor belongs to.
dataThe replicated tensor to reduce scatter.
opThe reduction operator (for example, gcl::CollectiveOperator::ADD).
progThe program sequence to add operations to.
debugContextOptional debug context.
optionsSee OptionFlags
Returns
The output tensor, with the content described above.

◆ reduceScatterCrossReplica() [3/3]

std::vector< poplar::Tensor > gcl::reduceScatterCrossReplica ( poplar::Graph graph,
const std::vector< poplar::Tensor > &  datas,
CollectiveOperator  op,
poplar::program::Sequence prog,
const CommGroup group,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

As reduceScatterCrossReplica() but with vector input argument and vector output as return value.

Parameters
graphThe replicated graph the input tensor belongs to.
datasThe replicated tensors to reduce scatter.
opThe reduction operator (for example, gcl::CollectiveOperator::ADD).
progThe program sequence to add operations to.
groupThe subset of replicas for the collective operation.
debugContextOptional debug context.
optionsSee OptionFlags.
Returns
The output tensors, with the content described above.

◆ reduceScatterToDestinationCrossReplica()

void gcl::reduceScatterToDestinationCrossReplica ( poplar::Graph graph,
const std::vector< poplar::Tensor > &  datas,
const std::vector< poplar::Tensor > &  destinations,
CollectiveOperator  op,
poplar::program::Sequence prog,
const CommGroup group,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

As reduceScatterCrossReplica() but with vector input/output arguments.

Parameters
graphThe replicated graph the input tensor belongs to.
datasThe replicated tensors to reduce scatter.
destinationsOutput tensors which must have correct type/shape.
opThe reduction operator (for example, gcl::CollectiveOperator::ADD).
progThe program sequence to add operations to.
groupThe subset of replicas for the collective operation.
debugContextOptional debug context.
optionsSee OptionFlags.

◆ reduceScatterWithinReplica()

Chunks gcl::reduceScatterWithinReplica ( poplar::Graph graph,
const poplar::Tensor toReduce,
CollectiveOperator  op,
poplar::program::Sequence prog,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

Reduce a rank 2 tensor.

Given a tensor of rank 2, reduce across the outermost dimension using the specified reduction operator. This function assumes index i in the outermost dimension is mapped to IPU i. The result is distributed over IPUs such that each IPU has a slice of the final result.

Before:

data = [
         [x0,y0,z0], // IPU0
         [x1,y1,z1], // IPU1
         [x2,y2,z2], // IPU2
         [x3,y3,z3]  // IPU3
       ]

After:

Chunks = [
           [],                // IPU0 (index=0, offset=0)
           [op(z0,z1,z2,z3)], // IPU1 (index=3, offset=0)
           [op(x0,x1,x2,x3)], // IPU2 (index=1, offset=0)
           [op(y0,y1,y2,y3)]  // IPU3 (index=2, offset=0)
         ]
Note
Multi ipu ranks (>1 ipu per rank) are not yet supported.
Parameters
graphThe graph.
toReduceThe tensor to reduce. Each partial should be mapped identically to the others across the IPUs within the rank.
opThe reduction operator (for example, gcl::CollectiveOperator::ADD).
progThe program sequence to add operations to.
debugContextOptional debug information.
optionsSee OptionFlags
Returns
A vector of chunks, where chunk i resides on IPU i. The chunks may have different numbers of elements (for example, when the number of IPUs does not exactly divide the number of elements).