CollectiveBalancedReorder
#include <gcl/CollectiveBalancedReorder.hpp>
-
namespace gcl
Graphcore Communications Library.
CrossReplica functions
Collective operations working across replicas.
WithinReplica functions
Collective operations working within replicas.
-
class CollectiveBalancedHostRearrangement
- #include <CollectiveBalancedReorder.hpp>
This class contains functions and data necessary to rearrange tensors on the host side at runtime.
The separation is made so that we can serialize the state and restore it without having to create a
poplar::Graph
.Public Functions
-
void rearrangeForCollective(const void *in, void *out, int64_t elemByteSize) const
Balanced reorder the tensor in a collective-friendly manner (host-side).
- Parameters
in – Pointer to the input buffer.
out – Pointer to the output buffer.
elemByteSize – The byte size of the elements.
-
void undoRearrangeForCollective(const void *in, void *out, int64_t elemByteSize) const
Reorder tensor back into the expected IR tensor shape and order (host-side).
- Parameters
in – Pointer to the input buffer.
out – Pointer to the output buffer.
elemByteSize – The byte size of the elements.
-
size_t getNumRearrangedTensorElems() const
Number of elements in the collective balanced (reordered) tensor.
- Returns
The number of elements.
-
void rearrange(const void *in, void *out, int64_t elemByteSize, bool refToGathered) const
Host tensor rearrangement routine.
- Parameters
in – Pointer to the input buffer.
out – Pointer to the output buffer.
elemByteSize – The byte size of the elements.
refToGathered – Whether to rearrage from reference to gathered or the other way.
Public Members
-
unsigned replicationFactor = 0
The graph’s replication factor.
Private Functions
-
template<typename ElementType>
void rearrangeImpl(const ElementType *in, ElementType *out, bool refToGathered) const Host tensor rearrangement routine.
- Parameters
in – Pointer to the input buffer.
out – Pointer to the output buffer.
refToGathered – Whether to rearrage from reference to gathered or the other way.
-
void rearrangeForCollective(const void *in, void *out, int64_t elemByteSize) const
-
class CollectiveBalancedReorder
- #include <CollectiveBalancedReorder.hpp>
Helper class to reorder a tensor in a per-tile-balanced fashion such that each replica obtains (for inputs to AllGather or outputs of ReduceScatter) an equally sized 1D tensor with equally sized regions.
This helper class reduces the memory used by the syncful collective. The reordering process:
Flattens the input tensor
Analyses the tile mapping
Determines reordering strategy and required internal padding
Can rearrange and undo the rearrangement on any tensor that has the same tile mapping
Can rearrange and undo the rearrangement on host tensors that are to be copied into CBR-rearranged RemoteBuffers
Public Functions
-
CollectiveBalancedReorder(poplar::Graph &graph_, poplar::Tensor tensor_, unsigned replicationFactor_, const poplar::DebugNameAndId &dnai_, bool allowElementMap = false, unsigned grainSize = 1)
Constructor.
- Parameters
graph_ – The poplar graph.
tensor_ – The reference tensor to rearrange.
replicationFactor_ – The replication factor of the graph.
dnai_ – Debug name and id.
allowElementMap – Allow alternative representation of the host rearrangements. Sometimes it is beneficial to collapse all intervals into simple 1-to-1 element map. This flag should be set true in all new code and deprecated when all frameworks implement serialisation of newly added
elementMap
field.grainSize – The grain size to use when padding the tensor.
-
poplar::Tensor createReplicaSlice(const poplar::Type &type)
Create a tensor mapped efficiently over the same tiles as the reference tensor.
The returned tensor has the size of the result of the reduce scatter and of the input of the all gather.
- Parameters
type – The type to use when creating the tensor.
- Returns
The efficient tensor created from the reference.
-
poplar::Tensor createCollectivesTensor(const poplar::Type &type, const std::string &debugPrefix)
Create a tensor mapped efficiently over the same tiles as the reference tensor.
The returned tensor has the size of the input of the reduce scatter and of the result of the all gather.
- Parameters
type – The type to use when creating the tensor.
debugPrefix – The debug prefix.
- Returns
The efficient tensor created from the reference.
-
poplar::Tensor undoRearrangeForCollective(const poplar::Tensor &tensor) const
Reorder tensor back into the expected IR tensor shape and order.
- Parameters
tensor – The tensor to rearrange.
- Returns
The tensor with the rearrangement undone.
-
inline std::vector<std::size_t> getReferenceShape() const
Get the shape of the reference tensor.
- Returns
The shape of the reference tensor.
-
inline const CollectiveBalancedHostRearrangement &getHostRearrangement() const
Get a helper class that implements allows to apply the rearrangement on the host.
- Returns
The helper class for host rearrangement.
Private Functions
-
void rearrange(const void *in, void *out, int64_t elemByteSize, bool refToGathered) const
Host tensor rearrangement routine.
Private Members
-
unsigned replicationFactor
-
poplar::TensorRearranger simplifier
-
CollectiveBalancedHostRearrangement hostRearrangement
-
const poplar::DebugNameAndId dnai
-
class CollectiveBalancedHostRearrangement