Poplar and PopLibs
|
Graphcore Communications Library. More...
Classes | |
struct | Chunk |
Represents a section of a tensor mapped to an IPU. More... | |
struct | Chunks |
A vector of Chunk data. More... | |
class | CollectiveBalancedHostRearrangement |
This class contains functions and data necessary to rearrange tensors on the host side at runtime. More... | |
class | CollectiveBalancedReorder |
Helper class to reorder a tensor in a per-tile-balanced fashion such that each replica obtains (for inputs to AllGather or outputs of ReduceScatter) an equally sized 1D tensor with equally sized regions. More... | |
struct | CommGroup |
Struct to specify sub-groups of replicas. More... | |
Enumerations | |
enum class | CommGroupType { ALL , CONSECUTIVE , ORTHOGONAL } |
Enum to define communication group specification type. More... | |
enum class | CollectiveOperator { } |
Supported collective operators. More... | |
Functions | |
std::istream & | operator>> (std::istream &is, CollectiveOperator &op) |
Parse token from input stream is to op . More... | |
std::ostream & | operator<< (std::ostream &os, const CollectiveOperator &op) |
Write op to output stream os . More... | |
unsigned | getMinIoTiles (const poplar::Graph &graph) |
Get the minimum number of IO tiles required for GCL operations. More... | |
std::vector< unsigned > | perIPUTiles (const poplar::Graph &graph, unsigned offset, unsigned count, bool sorted=true, bool tilePairs=true) |
Return a list of tile IDs optimal for GCL collective operations. More... | |
CrossReplica functions | |
Collective operations working across replicas. | |
poplar::Tensor | allReduceCrossReplica (poplar::Graph &graph, const poplar::Tensor &data, CollectiveOperator op, poplar::program::Sequence &prog, const CommGroup &group, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={}) |
Perform an all-reduce operation. More... | |
std::vector< poplar::Tensor > | allReduceCrossReplica (poplar::Graph &graph, const std::vector< poplar::Tensor > &datas, CollectiveOperator op, poplar::program::Sequence &prog, const CommGroup &group, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={}) |
Perform an all-reduce operation on multiple tensors. More... | |
poplar::Tensor | allReduceCrossReplica (poplar::Graph &graph, const poplar::Tensor &data, CollectiveOperator op, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={}) |
As allReduceCrossReplica() without the group arg (for all replicas). More... | |
std::vector< poplar::Tensor > | allReduceCrossReplica (poplar::Graph &graph, const std::vector< poplar::Tensor > &datas, CollectiveOperator op, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={}) |
As allReduceCrossReplica() with multiple input tensors and without the group arg. More... | |
void | allReduceToDestinationCrossReplica (poplar::Graph &graph, const poplar::Tensor &data, poplar::Tensor &destination, CollectiveOperator op, poplar::program::Sequence &prog, const CommGroup &group, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={}) |
As allReduceCrossReplica() but writes the result to the destination tensor. More... | |
void | allReduceToDestinationCrossReplica (poplar::Graph &graph, const std::vector< poplar::Tensor > &datas, const std::vector< poplar::Tensor > &destinations, CollectiveOperator op, poplar::program::Sequence &prog, const CommGroup &group, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={}) |
As allReduceToDestinationCrossReplica() with multiple input and output tensors. More... | |
void | allReduceToDestinationCrossReplica (poplar::Graph &graph, const poplar::Tensor &data, poplar::Tensor &destination, CollectiveOperator op, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={}) |
As allReduceToDestinationCrossReplica() without group arg. More... | |
void | allReduceInPlaceCrossReplica (poplar::Graph &graph, poplar::Tensor &data, CollectiveOperator op, poplar::program::Sequence &prog, const CommGroup &group, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={}) |
As allReduceCrossReplica() but writes result back to the input data tensor. More... | |
void | allReduceInPlaceCrossReplica (poplar::Graph &graph, std::vector< poplar::Tensor > &datas, CollectiveOperator op, poplar::program::Sequence &prog, const CommGroup &group, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={}) |
Perform an all-reduce operation on multiple tensors writing the results back to the input datas tensors. More... | |
void | allReduceInPlaceCrossReplica (poplar::Graph &graph, poplar::Tensor &data, CollectiveOperator op, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={}) |
As allReduceInPlaceCrossReplica() without group arg. More... | |
void | allReduceInPlaceCrossReplica (poplar::Graph &graph, std::vector< poplar::Tensor > &datas, CollectiveOperator op, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={}) |
As allReduceInPlaceCrossReplica() with multiple input tensors and without group arg. More... | |
poplar::Tensor | reduceScatterCrossReplica (poplar::Graph &graph, const poplar::Tensor &data, CollectiveOperator op, poplar::program::Sequence &prog, const CommGroup &group, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={}) |
Reduce the replicated rank-1 tensor data with the result scattered across the replicas. More... | |
std::vector< poplar::Tensor > | reduceScatterCrossReplica (poplar::Graph &graph, const std::vector< poplar::Tensor > &datas, CollectiveOperator op, poplar::program::Sequence &prog, const CommGroup &group, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={}) |
As reduceScatterCrossReplica() but with vector input argument and vector output as return value. More... | |
void | reduceScatterToDestinationCrossReplica (poplar::Graph &graph, const std::vector< poplar::Tensor > &datas, const std::vector< poplar::Tensor > &destinations, CollectiveOperator op, poplar::program::Sequence &prog, const CommGroup &group, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={}) |
As reduceScatterCrossReplica() but with vector input/output arguments. More... | |
poplar::Tensor | reduceScatterCrossReplica (poplar::Graph &graph, const poplar::Tensor &data, CollectiveOperator op, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={}) |
As reduceScatterCrossReplica() without group arg. More... | |
poplar::Tensor | allGatherCrossReplica (poplar::Graph &graph, const poplar::Tensor &data, poplar::program::Sequence &prog, const CommGroup &group, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={}) |
Gather the replicated tensor data . More... | |
std::vector< poplar::Tensor > | allGatherCrossReplica (poplar::Graph &graph, const std::vector< poplar::Tensor > &datas, poplar::program::Sequence &prog, const CommGroup &group, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={}) |
As allGatherCrossReplica() but with vector input argument and vector output as return value. More... | |
void | allGatherToDestinationCrossReplica (poplar::Graph &graph, const std::vector< poplar::Tensor > &datas, const std::vector< poplar::Tensor > &destinations, poplar::program::Sequence &prog, const CommGroup &group, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={}) |
As allGatherCrossReplica() but with vector input/output arguments. More... | |
poplar::Tensor | allGatherCrossReplica (poplar::Graph &graph, const poplar::Tensor &data, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={}) |
As allGatherCrossReplica() without group arg. More... | |
poplar::Tensor | allToAllCrossReplica (poplar::Graph &graph, const poplar::Tensor &data, poplar::program::Sequence &prog, const CommGroup &group, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={}) |
Perform an all-to-all exchange of the elements of the input tensor based on replica ID. More... | |
poplar::Tensor | allToAllCrossReplica (poplar::Graph &graph, const poplar::Tensor &data, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={}) |
As allToAllCrossReplica() without group arg. More... | |
poplar::Tensor | broadcastCrossReplica (poplar::Graph &graph, const poplar::Tensor &data, poplar::program::Sequence &prog, const CommGroup &group={}, unsigned rootReplica=0, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={}) |
Perform a broadcast from one replica to all other replicas. More... | |
WithinReplica functions | |
Collective operations working within replicas. | |
poplar::Tensor | concatChunks (const Chunks &chunks) |
Concatenates chunks. More... | |
Chunks | reduceScatterWithinReplica (poplar::Graph &graph, const poplar::Tensor &toReduce, CollectiveOperator op, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={}) |
Reduce a rank 2 tensor. More... | |
poplar::Tensor | allGatherWithinReplica (poplar::Graph &graph, const Chunks &toGather, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={}) |
Broadcast data distributed over all IPUs. More... | |
poplar::Tensor | allReduceWithinReplica (poplar::Graph &graph, const poplar::Tensor &toReduce, CollectiveOperator op, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={}) |
Perform an all-reduce operation on the specified tensor. More... | |
Graphcore Communications Library.
|
strong |
|
strong |
Enum to define communication group specification type.
Assumption: replica groups are uniform in size and layout on IPUs.
poplar::Tensor gcl::allGatherCrossReplica | ( | poplar::Graph & | graph, |
const poplar::Tensor & | data, | ||
poplar::program::Sequence & | prog, | ||
const CommGroup & | group, | ||
const poplar::DebugContext & | debugContext = {} , |
||
const poplar::OptionFlags & | options = {} |
||
) |
Gather the replicated tensor data
.
Return the result so each replica will have a copy of all other replicas' data
tensors. For instance:
Before:
Replica0: data[s,t] Replica1: data[u,v] Replica2: data[w,x] Replica3: data[y,z]
After:
Replica0: result[[s,t], [u,v], [w,x], [y,z]] Replica1: result[[s,t], [u,v], [w,x], [y,z]] Replica2: result[[s,t], [u,v], [w,x], [y,z]] Replica3: result[[s,t], [u,v], [w,x], [y,z]]
For an input of shape [incomingShape]
the output will be [replicationFactor][incomingShape]
.
graph | The replicated graph the input tensor belongs to. |
data | The replicated tensor to gather. |
prog | The program sequence to add operations to. |
group | The subset of replicas for the collective operation. |
debugContext | Optional debug context. |
options | See OptionFlags. |
poplar::Tensor gcl::allGatherCrossReplica | ( | poplar::Graph & | graph, |
const poplar::Tensor & | data, | ||
poplar::program::Sequence & | prog, | ||
const poplar::DebugContext & | debugContext = {} , |
||
const poplar::OptionFlags & | options = {} |
||
) |
As allGatherCrossReplica() without group
arg.
graph | The replicated graph the input tensor belongs to. |
data | The replicated tensor to gather. |
prog | The program sequence to add operations to. |
debugContext | Optional debug context. |
options | See OptionFlags. |
std::vector< poplar::Tensor > gcl::allGatherCrossReplica | ( | poplar::Graph & | graph, |
const std::vector< poplar::Tensor > & | datas, | ||
poplar::program::Sequence & | prog, | ||
const CommGroup & | group, | ||
const poplar::DebugContext & | debugContext = {} , |
||
const poplar::OptionFlags & | options = {} |
||
) |
As allGatherCrossReplica() but with vector input argument and vector output as return value.
graph | The replicated graph the input tensor belongs to. |
datas | The replicated tensors to gather. |
prog | The program sequence to add operations to. |
group | The subset of replicas for the collective operation. |
debugContext | Optional debug context. |
options | See OptionFlags. |
void gcl::allGatherToDestinationCrossReplica | ( | poplar::Graph & | graph, |
const std::vector< poplar::Tensor > & | datas, | ||
const std::vector< poplar::Tensor > & | destinations, | ||
poplar::program::Sequence & | prog, | ||
const CommGroup & | group, | ||
const poplar::DebugContext & | debugContext = {} , |
||
const poplar::OptionFlags & | options = {} |
||
) |
As allGatherCrossReplica() but with vector input/output arguments.
graph | The replicated graph the input tensor belongs to. |
datas | The replicated tensors to gather. |
destinations | Output tensors which must have correct type/shape. |
prog | The program sequence to add operations to. |
group | The subset of replicas for the collective operation. |
debugContext | Optional debug context. |
options | See OptionFlags. |
poplar::Tensor gcl::allGatherWithinReplica | ( | poplar::Graph & | graph, |
const Chunks & | toGather, | ||
poplar::program::Sequence & | prog, | ||
const poplar::DebugContext & | debugContext = {} , |
||
const poplar::OptionFlags & | options = {} |
||
) |
Broadcast data distributed over all IPUs.
This function assumes chunk i
is mapped to IPU i
.
Before:
Chunks = [ [ ], // IPU0 (index=2, offset=0) [z], // IPU1 (index=1, offset=0) [x], // IPU2 (index=3, offset=0) [y] // IPU3 (index=0, offset=0) ]
After:
result = [ [x,y,z], // IPU0 [x,y,z], // IPU1 [x,y,z], // IPU2 [x,y,z] // IPU3 ]
graph | The graph. |
toGather | The chunks to gather. |
prog | The program sequence to add operations to. |
debugContext | Optional debug information. |
options | See OptionFlags. |
i
in the outermost dimension of the result is mapped to IPU i
. poplar::Tensor gcl::allReduceCrossReplica | ( | poplar::Graph & | graph, |
const poplar::Tensor & | data, | ||
CollectiveOperator | op, | ||
poplar::program::Sequence & | prog, | ||
const CommGroup & | group, | ||
const poplar::DebugContext & | debugContext = {} , |
||
const poplar::OptionFlags & | options = {} |
||
) |
Perform an all-reduce operation.
The operation is performed on the provided tensor over replicas as specified by the group
argument. This operation reduces across the tensors that the replicated tensor is a handle for. The result is returned as a replicated tensor with the same shape as the input shape where all replicas output tensors have the same data. For instance:
Before:
Replica0: data[x0,y0] Replica1: data[x1,y1] Replica2: data[x2,y2] Replica3: data[x3,y3]
After:
Replica0: result[op(x0,x1,x2,x3), op(y0,y1,y2,y3)] Replica1: result[op(x0,x1,x2,x3), op(y0,y1,y2,y3)] Replica2: result[op(x0,x1,x2,x3), op(y0,y1,y2,y3)] Replica3: result[op(x0,x1,x2,x3), op(y0,y1,y2,y3)]
graph | The replicated graph the input tensor belongs to. |
data | The replicated tensor to reduce. |
op | The reduction operator (for example, gcl::CollectiveOperator::ADD). |
prog | The program sequence to add operations to. |
group | The subset of replicas for the collective operation. |
debugContext | Optional debug context. |
options | See OptionFlags. |
data
. poplar::Tensor gcl::allReduceCrossReplica | ( | poplar::Graph & | graph, |
const poplar::Tensor & | data, | ||
CollectiveOperator | op, | ||
poplar::program::Sequence & | prog, | ||
const poplar::DebugContext & | debugContext = {} , |
||
const poplar::OptionFlags & | options = {} |
||
) |
As allReduceCrossReplica() without the group
arg (for all replicas).
graph | The replicated graph the input tensors belongs to. |
data | The replicated tensor to reduce. |
op | The reduction operator (for example, gcl::CollectiveOperator::ADD). |
prog | The program sequence to add operations to. |
debugContext | Optional debug context. |
options | See OptionFlags. |
data
. std::vector< poplar::Tensor > gcl::allReduceCrossReplica | ( | poplar::Graph & | graph, |
const std::vector< poplar::Tensor > & | datas, | ||
CollectiveOperator | op, | ||
poplar::program::Sequence & | prog, | ||
const CommGroup & | group, | ||
const poplar::DebugContext & | debugContext = {} , |
||
const poplar::OptionFlags & | options = {} |
||
) |
Perform an all-reduce operation on multiple tensors.
As allReduceCrossReplica(), but batch up multiple tensors to be executed as a single collective operation. This gives a performance improvement over sequentially reducing one tensor per operation. For short tensors the potential latency reduction is 1/(number-of-tensors)
over sequentially reducing one tensor per operation.
graph | The replicated graph the input tensors belongs to. |
datas | The vector of replicated tensors to reduce. |
op | The reduction operator (for example, gcl::CollectiveOperator::ADD). |
prog | The program sequence to add operations to. |
group | The subset of replicas for the collective operation. |
debugContext | Optional debug context. |
options | See OptionFlags. |
datas
accross all replicas. std::vector< poplar::Tensor > gcl::allReduceCrossReplica | ( | poplar::Graph & | graph, |
const std::vector< poplar::Tensor > & | datas, | ||
CollectiveOperator | op, | ||
poplar::program::Sequence & | prog, | ||
const poplar::DebugContext & | debugContext = {} , |
||
const poplar::OptionFlags & | options = {} |
||
) |
As allReduceCrossReplica() with multiple input tensors and without the group
arg.
graph | The replicated graph the input tensors belongs to. |
datas | A vector of replicated tensors to reduce. |
op | The reduction operator (for example, gcl::CollectiveOperator::ADD). |
prog | The program sequence to add operations to. |
debugContext | Optional debug context. |
options | See OptionFlags. |
datas
across all replicas. void gcl::allReduceInPlaceCrossReplica | ( | poplar::Graph & | graph, |
poplar::Tensor & | data, | ||
CollectiveOperator | op, | ||
poplar::program::Sequence & | prog, | ||
const CommGroup & | group, | ||
const poplar::DebugContext & | debugContext = {} , |
||
const poplar::OptionFlags & | options = {} |
||
) |
As allReduceCrossReplica() but writes result back to the input data
tensor.
graph | The replicated graph the input tensor belongs to. |
data | The replicated tensor to reduce. |
op | The reduction operator (for example, gcl::CollectiveOperator::ADD). |
prog | The program sequence to add operations to. |
group | The subset of replicas for the collective operation. |
debugContext | Optional debug context. |
options | See OptionFlags. |
void gcl::allReduceInPlaceCrossReplica | ( | poplar::Graph & | graph, |
poplar::Tensor & | data, | ||
CollectiveOperator | op, | ||
poplar::program::Sequence & | prog, | ||
const poplar::DebugContext & | debugContext = {} , |
||
const poplar::OptionFlags & | options = {} |
||
) |
As allReduceInPlaceCrossReplica() without group
arg.
graph | The replicated graph the input tensor belongs to. |
data | The replicated tensor to reduce. |
op | The reduction operator (for example, gcl::CollectiveOperator::ADD). |
prog | The program sequence to add operations to. |
debugContext | Optional debug context. |
options | See OptionFlags. |
void gcl::allReduceInPlaceCrossReplica | ( | poplar::Graph & | graph, |
std::vector< poplar::Tensor > & | datas, | ||
CollectiveOperator | op, | ||
poplar::program::Sequence & | prog, | ||
const CommGroup & | group, | ||
const poplar::DebugContext & | debugContext = {} , |
||
const poplar::OptionFlags & | options = {} |
||
) |
Perform an all-reduce operation on multiple tensors writing the results back to the input datas
tensors.
As allReduceInPlaceCrossReplica(), but batch up multiple tensors to be executed as a single collective operations. This gives a performance improvement over sequentially reducing one tensor per operation. For short tensors the potential latency reduction is 1/(number-of-tensors) over sequentially reducing one tensor per operation.
graph | The replicated graph the input tensor belongs to. |
datas | Vector of replicated tensors to reduce. |
op | The reduction operator (for example, gcl::CollectiveOperator::ADD). |
prog | The program sequence to add operations to. |
group | The subset of replicas for the collective operation. |
debugContext | Optional debug context. |
options | See OptionFlags. |
void gcl::allReduceInPlaceCrossReplica | ( | poplar::Graph & | graph, |
std::vector< poplar::Tensor > & | datas, | ||
CollectiveOperator | op, | ||
poplar::program::Sequence & | prog, | ||
const poplar::DebugContext & | debugContext = {} , |
||
const poplar::OptionFlags & | options = {} |
||
) |
As allReduceInPlaceCrossReplica() with multiple input tensors and without group
arg.
graph | The replicated graph the input tensor belongs to. |
datas | Vector of replicated tensors to reduce. |
op | The reduction operator (for example, gcl::CollectiveOperator::ADD). |
prog | The program sequence to add operations to. |
debugContext | Optional debug context. |
options | See OptionFlags |
void gcl::allReduceToDestinationCrossReplica | ( | poplar::Graph & | graph, |
const poplar::Tensor & | data, | ||
poplar::Tensor & | destination, | ||
CollectiveOperator | op, | ||
poplar::program::Sequence & | prog, | ||
const CommGroup & | group, | ||
const poplar::DebugContext & | debugContext = {} , |
||
const poplar::OptionFlags & | options = {} |
||
) |
As allReduceCrossReplica() but writes the result to the destination
tensor.
graph | The replicated graph the input tensor belongs to. |
data | The replicated tensor to reduce. |
destination | Tensor to write the result to. |
op | The reduction operator (for example, gcl::CollectiveOperator::ADD). |
prog | The program sequence to add operations to. |
group | The subset of replicas for the collective operation. |
debugContext | Optional debug context. |
options | See OptionFlags. |
void gcl::allReduceToDestinationCrossReplica | ( | poplar::Graph & | graph, |
const poplar::Tensor & | data, | ||
poplar::Tensor & | destination, | ||
CollectiveOperator | op, | ||
poplar::program::Sequence & | prog, | ||
const poplar::DebugContext & | debugContext = {} , |
||
const poplar::OptionFlags & | options = {} |
||
) |
As allReduceToDestinationCrossReplica() without group
arg.
graph | The replicated graph the input tensor belongs to. |
data | The replicated tensor to reduce. |
destination | Tensor to write the result to. |
op | The reduction operator (for example, gcl::CollectiveOperator::ADD). |
prog | The program sequence to add operations to. |
debugContext | Optional debug context. |
options | See OptionFlags. |
void gcl::allReduceToDestinationCrossReplica | ( | poplar::Graph & | graph, |
const std::vector< poplar::Tensor > & | datas, | ||
const std::vector< poplar::Tensor > & | destinations, | ||
CollectiveOperator | op, | ||
poplar::program::Sequence & | prog, | ||
const CommGroup & | group, | ||
const poplar::DebugContext & | debugContext = {} , |
||
const poplar::OptionFlags & | options = {} |
||
) |
As allReduceToDestinationCrossReplica() with multiple input and output tensors.
graph | The replicated graph the input tensors belongs to. |
datas | Vector of replicated tensors to reduce. |
destinations | Vector of replicated tensors to write the result to. |
op | The reduction operator (for example, gcl::CollectiveOperator::ADD). |
prog | The program sequence to add operations to. |
group | The subset of replicas for the collective operation. |
debugContext | Optional debug context. |
options | See OptionFlags. |
poplar::Tensor gcl::allReduceWithinReplica | ( | poplar::Graph & | graph, |
const poplar::Tensor & | toReduce, | ||
CollectiveOperator | op, | ||
poplar::program::Sequence & | prog, | ||
const poplar::DebugContext & | debugContext = {} , |
||
const poplar::OptionFlags & | options = {} |
||
) |
Perform an all-reduce operation on the specified tensor.
This operation reduces across the outermost dimension of the input and produces a tensor with the same shape where the innermost dimension is the result of the reduction and the outermost dimension is a number of copies of the result.
This function assumes index i
in the outermost dimension of the input is mapped to IPU i
. Index i
in the outermost dimension of the result is mapped to IPU i
.
Before:
toReduce = [ [x0,y0], // IPU0 [x1,y1], // IPU1 [x2,y2], // IPU2 [x3,y3], // IPU3 ]
After:
result = [ [op(x0,x1,x2,x3), op(y0,y1,y2,y3)], // IPU0 [op(x0,x1,x2,x3), op(y0,y1,y2,y3)], // IPU1 [op(x0,x1,x2,x3), op(y0,y1,y2,y3)], // IPU2 [op(x0,x1,x2,x3), op(y0,y1,y2,y3)] // IPU3 ]
graph | The graph. |
toReduce | The tensor to reduce. Each partial should be mapped identically to the others across the ipus within the rank. |
op | The reduction operator (for example, gcl::CollectiveOperator::ADD). |
prog | The program sequence to add operations to. |
debugContext | Optional debug information. |
options | See OptionFlags. |
toReduce
, where the innermost dimension is the result of the reduction and the outermost dimension has a number of copies of the result. poplar::Tensor gcl::allToAllCrossReplica | ( | poplar::Graph & | graph, |
const poplar::Tensor & | data, | ||
poplar::program::Sequence & | prog, | ||
const CommGroup & | group, | ||
const poplar::DebugContext & | debugContext = {} , |
||
const poplar::OptionFlags & | options = {} |
||
) |
Perform an all-to-all exchange of the elements of the input tensor based on replica ID.
The shape of the input must have the number of replicas in the graph as its first or only dimension. That dimension will be used to split up the tensor being sent, with each replica sending all splits except for the split index which matches its replica ID. That is, replica 2 will not send input[2] and so on.
The replica receiving the slice will copy that incoming slice into the output at the index which matches the replica ID of the replica which sent it. For instance:
Before:
Replica0: data[x0,x1,x2] Replica1: data[y0,y1,y2] Replica2: data[z0,z1,z2]
After:
Replica0: result[x0,y0,z0] Replica1: result[x1,y1,z1] Replica2: result[x2,y2,z2]
graph | The replicated graph the input tensor belongs to. |
data | The replicated tensor to aggregate. |
prog | The program sequence to add operations to. |
group | The subset of replicas for the collective operation. |
debugContext | Optional debug context. |
options | See OptionFlags. |
poplar::Tensor gcl::allToAllCrossReplica | ( | poplar::Graph & | graph, |
const poplar::Tensor & | data, | ||
poplar::program::Sequence & | prog, | ||
const poplar::DebugContext & | debugContext = {} , |
||
const poplar::OptionFlags & | options = {} |
||
) |
As allToAllCrossReplica() without group
arg.
graph | The replicated graph the input tensor belongs to. |
data | The replicated tensor to aggregate. |
prog | The program sequence to add operations to. |
debugContext | Optional debug context. |
options | See gcl::allReduceCrossReplica(). |
poplar::Tensor gcl::broadcastCrossReplica | ( | poplar::Graph & | graph, |
const poplar::Tensor & | data, | ||
poplar::program::Sequence & | prog, | ||
const CommGroup & | group = {} , |
||
unsigned | rootReplica = 0 , |
||
const poplar::DebugContext & | debugContext = {} , |
||
const poplar::OptionFlags & | options = {} |
||
) |
Perform a broadcast from one replica to all other replicas.
Before:
Replica0: data[x0,x1,x2] // <-- rootReplica Replica1: data[y0,y1,y2] Replica2: data[z0,z1,z2]
After:
Replica0: result[x0,x1,x2] Replica1: result[x0,x1,x2] Replica2: result[x0,x1,x2]
graph | The replicated graph the input tensor belongs to. |
data | The replicated tensor to broadcast. |
prog | The program sequence to add operations to. |
group | The subset of replicas for the collective operation. |
rootReplica | The replica id to use as source for the broadcast. |
debugContext | Optional debug context. |
options | See gcl::allReduceCrossReplica(). |
poplar::Tensor gcl::concatChunks | ( | const Chunks & | chunks | ) |
Concatenates chunks.
Given a vector of Chunk data, its elements are sorted according to the offset or index and a tensor is returned that consists of sorted concatenated Chunk elements. This operation is performed on the output of the reduceScatterWithinReplica and on the input of the allGatherWithinReplica operations.
chunks | A structure containing a vector of Chunk data. |
unsigned gcl::getMinIoTiles | ( | const poplar::Graph & | graph | ) |
Get the minimum number of IO tiles required for GCL operations.
This is the minimum number of tiles you must ask for in perIPUTiles() for it to be a valid graph for GCL operations.
graph | The graph to check. It assumes that all tiles in the graph will be used for IO operations. |
std::ostream & gcl::operator<< | ( | std::ostream & | os, |
const CollectiveOperator & | op | ||
) |
Write op
to output stream os
.
The value written is the stringified enumeration, for example "ADD" or "MUL".
os | ostream output destination. |
op | gcl::CollectiveOperator to represent as a string. |
std::istream & gcl::operator>> | ( | std::istream & | is, |
CollectiveOperator & | op | ||
) |
Parse token from input stream is
to op
.
Valid input values are the stringified enumerations, for example "ADD" or "MUL".
is | The stream to read from. |
op | gcl::CollectiveOperator parsed from input stream |
std::vector< unsigned > gcl::perIPUTiles | ( | const poplar::Graph & | graph, |
unsigned | offset, | ||
unsigned | count, | ||
bool | sorted = true , |
||
bool | tilePairs = true |
||
) |
Return a list of tile IDs optimal for GCL collective operations.
A set of tiles are chosen with an optimal distribution across exchange blocks and exchange-block contexts. (Exchange blocks provide the hardware interface to off-chip IO.)
graph | The graph on which to allocate tiles. |
offset | Skip a number of tiles and allocate from an offset. This is useful if you want to call the function multiple times, for example to get a set of tiles for IO and another set of tiles for compute. In that case, you could it twice: first with an offset of zero to get the IO tiles, and then with offset equal to the number of IO tiles in order to get the compute tiles. |
count | Number of tiles IDs to return. |
sorted | If true the function will sort the returned list of IDs. This is normally required and is the default. |
tilePairs | Override the default behaviour and return tile pairs. Tiles used for host IO must be allocated in pairs. |
poplar::Tensor gcl::reduceScatterCrossReplica | ( | poplar::Graph & | graph, |
const poplar::Tensor & | data, | ||
CollectiveOperator | op, | ||
poplar::program::Sequence & | prog, | ||
const CommGroup & | group, | ||
const poplar::DebugContext & | debugContext = {} , |
||
const poplar::OptionFlags & | options = {} |
||
) |
Reduce the replicated rank-1 tensor data
with the result scattered across the replicas.
For an input of shape [numElements]
mapped to a single IPU per replica, the output will have shape [ceil(numElements / replicationFactor)]
. If replicationFactor does not evenly divide numElements, the result is zero-padded. For instance:
Before:
Replica0: toReduce[x0, y0, z0] Replica1: toReduce[x1, y1, z1]
After:
Replica0: result[op(x0, x1), op(y0, y1)] Replica1: result[op(z0, z1), 0]
Multi IPU mapped input
For the syncful implementation an input of shape: [numElementsIPU0 + numElementsIPU1 + ...]
mapped to multiple IPUs per replica, the output will have shape: [ceil(numElementsIPU0 / replicationFactor) + ceil(numElementsIPU1 / replicationFactor) + ...]
with the result grouped per IPU. If replicationFactor does not evenly divide the number of elements on an IPU, the result is zero-padded per IPU. For instance:
Before:
Replica0: toReduce[ x0, y0, z0, w0] Replica1: toReduce[ x1, y1, z1, w1] Replica2: toReduce[ x2, y2, z2, w2] Replica3: toReduce[ x3, y3, z3, w3] Mapping: toReduce[IPU0, IPU0, IPU0, IPU1]
After:
Replica0: result[op(x0, x1, x2, x3), op(w0, w1, w2, w3)] Replica1: result[op(y0, y1, y2, y3), 0] Replica2: result[op(z0, z1, z2, z3), 0] Replica3: result[ 0, 0] Mapping: result[ IPU0, IPU1]
graph | The replicated graph the input tensor belongs to. |
data | The replicated tensor to reduce scatter. |
op | The reduction operator (for example, gcl::CollectiveOperator::ADD). |
prog | The program sequence to add operations to. |
group | The subset of replicas for the collective operation. |
debugContext | Optional debug context. |
options | See OptionFlags. |
poplar::Tensor gcl::reduceScatterCrossReplica | ( | poplar::Graph & | graph, |
const poplar::Tensor & | data, | ||
CollectiveOperator | op, | ||
poplar::program::Sequence & | prog, | ||
const poplar::DebugContext & | debugContext = {} , |
||
const poplar::OptionFlags & | options = {} |
||
) |
As reduceScatterCrossReplica() without group
arg.
graph | The replicated graph the input tensor belongs to. |
data | The replicated tensor to reduce scatter. |
op | The reduction operator (for example, gcl::CollectiveOperator::ADD). |
prog | The program sequence to add operations to. |
debugContext | Optional debug context. |
options | See OptionFlags |
std::vector< poplar::Tensor > gcl::reduceScatterCrossReplica | ( | poplar::Graph & | graph, |
const std::vector< poplar::Tensor > & | datas, | ||
CollectiveOperator | op, | ||
poplar::program::Sequence & | prog, | ||
const CommGroup & | group, | ||
const poplar::DebugContext & | debugContext = {} , |
||
const poplar::OptionFlags & | options = {} |
||
) |
As reduceScatterCrossReplica() but with vector input argument and vector output as return value.
graph | The replicated graph the input tensor belongs to. |
datas | The replicated tensors to reduce scatter. |
op | The reduction operator (for example, gcl::CollectiveOperator::ADD). |
prog | The program sequence to add operations to. |
group | The subset of replicas for the collective operation. |
debugContext | Optional debug context. |
options | See OptionFlags. |
void gcl::reduceScatterToDestinationCrossReplica | ( | poplar::Graph & | graph, |
const std::vector< poplar::Tensor > & | datas, | ||
const std::vector< poplar::Tensor > & | destinations, | ||
CollectiveOperator | op, | ||
poplar::program::Sequence & | prog, | ||
const CommGroup & | group, | ||
const poplar::DebugContext & | debugContext = {} , |
||
const poplar::OptionFlags & | options = {} |
||
) |
As reduceScatterCrossReplica() but with vector input/output arguments.
graph | The replicated graph the input tensor belongs to. |
datas | The replicated tensors to reduce scatter. |
destinations | Output tensors which must have correct type/shape. |
op | The reduction operator (for example, gcl::CollectiveOperator::ADD). |
prog | The program sequence to add operations to. |
group | The subset of replicas for the collective operation. |
debugContext | Optional debug context. |
options | See OptionFlags. |
Chunks gcl::reduceScatterWithinReplica | ( | poplar::Graph & | graph, |
const poplar::Tensor & | toReduce, | ||
CollectiveOperator | op, | ||
poplar::program::Sequence & | prog, | ||
const poplar::DebugContext & | debugContext = {} , |
||
const poplar::OptionFlags & | options = {} |
||
) |
Reduce a rank 2 tensor.
Given a tensor of rank 2, reduce across the outermost dimension using the specified reduction operator. This function assumes index i
in the outermost dimension is mapped to IPU i
. The result is distributed over IPUs such that each IPU has a slice of the final result.
Before:
data = [ [x0,y0,z0], // IPU0 [x1,y1,z1], // IPU1 [x2,y2,z2], // IPU2 [x3,y3,z3] // IPU3 ]
After:
Chunks = [ [], // IPU0 (index=0, offset=0) [op(z0,z1,z2,z3)], // IPU1 (index=3, offset=0) [op(x0,x1,x2,x3)], // IPU2 (index=1, offset=0) [op(y0,y1,y2,y3)] // IPU3 (index=2, offset=0) ]
graph | The graph. |
toReduce | The tensor to reduce. Each partial should be mapped identically to the others across the IPUs within the rank. |
op | The reduction operator (for example, gcl::CollectiveOperator::ADD). |
prog | The program sequence to add operations to. |
debugContext | Optional debug information. |
options | See OptionFlags |
i
resides on IPU i
. The chunks may have different numbers of elements (for example, when the number of IPUs does not exactly divide the number of elements).