CollectiveBalancedReorder
#include <gcl/CollectiveBalancedReorder.hpp>
Defines
-
GCL_NO_DISCARD
Produce compile time warning for unused return values.
-
namespace gcl
Graphcore Communications Library.
CrossReplica functions
Collective operations working across replicas.
WithinReplica functions
Collective operations working within replicas.
-
class CollectiveBalancedHostRearrangement
- #include <CollectiveBalancedReorder.hpp>
This class contains functions and data necessary to rearrange tensors on the host side at runtime.
The separation is made so that we can serialize the state and restore it without having to create a
poplar::Graph
.Public Functions
-
CollectiveBalancedHostRearrangement() = default
-
~CollectiveBalancedHostRearrangement() = default
-
CollectiveBalancedHostRearrangement(const CollectiveBalancedHostRearrangement&) = default
Defaulted to avoid warnings in deprecation period.
-
CollectiveBalancedHostRearrangement(CollectiveBalancedHostRearrangement&&) noexcept = default
Defaulted to avoid warnings in deprecation period.
-
CollectiveBalancedHostRearrangement &operator=(const CollectiveBalancedHostRearrangement&) = default
Defaulted to avoid warnings in deprecation period.
- Returns
The CollectiveBalancedHostRearrangement that is assigned to.
-
CollectiveBalancedHostRearrangement &operator=(CollectiveBalancedHostRearrangement&&) noexcept = default
Defaulted to avoid warnings in deprecation period.
- Returns
The CollectiveBalancedHostRearrangement that is assigned to.
-
void rearrangeForCollective(const void *in, void *out, int64_t elemByteSize) const
Balanced reorder the tensor in a collective-friendly manner (host-side).
- Parameters
in – Pointer to the input buffer.
out – Pointer to the output buffer.
elemByteSize – The byte size of the elements.
-
void rearrangeForCollective(const void *in, std::size_t inSize, void *out, std::size_t outSize, std::size_t elemByteSize) const
Balanced reorder the tensor in a collective-friendly manner (host-side).
- Parameters
in – Pointer to the input buffer.
inSize – The size of the in buffer in bytes.
out – Pointer to the output buffer.
outSize – The size of the out buffer in bytes.
elemByteSize – The byte size of the elements.
-
template<typename T>
inline void rearrangeForCollective(const std::vector<T> &in, std::vector<T> &out, std::size_t elemByteSize = sizeof(T)) const Balanced reorder the tensor in a collective-friendly manner (host-side).
- Parameters
in – Input buffer.
out – Output buffer.
elemByteSize – The byte size of the elements.
-
void undoRearrangeForCollective(const void *in, void *out, int64_t elemByteSize) const
Reorder tensor back into the expected IR tensor shape and order (host-side).
- Parameters
in – Pointer to the input buffer.
out – Pointer to the output buffer.
elemByteSize – The byte size of the elements.
-
template<typename T>
inline void undoRearrangeForCollective(const std::vector<T> &in, std::vector<T> &out, std::size_t elemByteSize = sizeof(T)) const Reorder tensor back into the expected IR tensor shape and order (host-side).
- Parameters
in – Input buffer.
out – Output buffer.
elemByteSize – The byte size of the elements.
-
void undoRearrangeForCollective(const void *in, std::size_t inSize, void *out, std::size_t outSize, std::size_t elemByteSize) const
Reorder tensor back into the expected IR tensor shape and order (host-side).
- Parameters
in – Pointer to the input buffer.
inSize – The size of the in buffer in bytes.
out – Pointer to the output buffer.
outSize – The size of the out buffer in bytes.
elemByteSize – The byte size of the elements.
-
size_t getNumRearrangedTensorElems() const
Number of elements in the collective balanced (reordered) tensor.
- Returns
The number of elements.
-
void rearrange(const void *in, void *out, int64_t elemByteSize, bool refToGathered) const
Host tensor rearrangement routine.
- Parameters
in – Pointer to the input buffer.
out – Pointer to the output buffer.
elemByteSize – The byte size of the elements.
refToGathered – Whether to rearrange from reference to gathered or the other way.
-
unsigned getReplicationFactor() const
The graph’s replication factor.
- Returns
replication factor
-
void setReplicationFactor(unsigned replicationFactor)
The graph’s replication factor.
- Parameters
replicationFactor –
-
std::size_t getTotalElementsPerReplica() const
The total number for one replica’s fragment.
- Returns
number of elements per replica
-
void setTotalElementsPerReplica(std::size_t totalElementsPerReplica)
The total number for one replica’s fragment.
- Parameters
totalElementsPerReplica –
-
const std::vector<poplar::Interval> &getGatheredToRefSlices() const
The mapping from the gathered tensor back to the reference tensor.
- Returns
mapping
Public Members
-
unsigned replicationFactor = 0
The graph’s replication factor.
Private Functions
-
template<typename ElementType>
void rearrangeImpl(const ElementType *in, std::size_t inLen, ElementType *out, std::size_t outLen, bool refToGathered) const Host tensor rearrangement routine.
- Parameters
in – Pointer to the input buffer.
inLen – Length of input buffer in number of elements.
out – Pointer to the output buffer.
outLen – Length of output buffer in number of elements.
refToGathered – Whether to rearrange from reference to gathered or the other way.
Friends
- friend class CollectiveBalancedReorder
-
CollectiveBalancedHostRearrangement() = default
-
class CollectiveBalancedReorder
- #include <CollectiveBalancedReorder.hpp>
Helper class to reorder a tensor in a per-tile-balanced fashion such that each replica obtains (for inputs to AllGather or outputs of ReduceScatter) an equally sized 1D tensor with equally sized regions.
This helper class reduces the memory used by the syncful collective. The reordering process:
Flattens the input tensor
Analyses the tile mapping
Determines reordering strategy and required internal padding
Can rearrange and undo the rearrangement on any tensor that has the same tile mapping
Can rearrange and undo the rearrangement on host tensors that are to be copied into CBR-rearranged RemoteBuffers
Public Functions
-
CollectiveBalancedReorder(poplar::Graph &graph_, poplar::Tensor tensor_, unsigned replicationFactor_, const poplar::DebugNameAndId &dnai_, bool allowElementMap = false, unsigned grainSize = 1)
Constructor.
- Parameters
graph_ – The poplar graph.
tensor_ – The reference tensor to rearrange.
replicationFactor_ – The replication factor of the graph.
dnai_ – Debug name and id.
allowElementMap – Allow alternative representation of the host rearrangements. Sometimes it is beneficial to collapse all intervals into simple 1-to-1 element map. This flag should be set true in all new code and deprecated when all frameworks implement serialisation of newly added
elementMap
field.grainSize – The grain size to use when padding the tensor.
-
poplar::Tensor createReplicaSlice(const poplar::Type &type)
Create a tensor mapped efficiently over the same tiles as the reference tensor.
The returned tensor has the size of the result of the reduce scatter and of the input of the all gather.
- Parameters
type – The type to use when creating the tensor.
- Returns
The efficient tensor created from the reference.
-
poplar::Tensor createCollectivesTensor(const poplar::Type &type, const std::string &debugPrefix)
Create a tensor mapped efficiently over the same tiles as the reference tensor.
The returned tensor has the size of the input of the reduce scatter and of the result of the all gather.
- Parameters
type – The type to use when creating the tensor.
debugPrefix – The debug prefix.
- Returns
The efficient tensor created from the reference.
-
poplar::Tensor undoRearrangeForCollective(const poplar::Tensor &tensor) const
Reorder tensor back into the expected IR tensor shape and order.
- Parameters
tensor – The tensor to rearrange.
- Returns
The tensor with the rearrangement undone.
-
inline std::vector<std::size_t> getReferenceShape() const
Get the shape of the reference tensor.
- Returns
The shape of the reference tensor.
-
inline const CollectiveBalancedHostRearrangement &getHostRearrangement() const
Get a helper class that implements allows to apply the rearrangement on the host.
- Returns
The helper class for host rearrangement.
Private Functions
Private Members
-
unsigned mReplicationFactor
-
poplar::TensorRearranger mSimplifier
-
CollectiveBalancedHostRearrangement mHostRearrangement
-
const poplar::DebugNameAndId mDnai
-
class CollectiveBalancedHostRearrangement