6.1. CollectiveBalancedReorder
#include <gcl/CollectiveBalancedReorder.hpp>
-
class CollectiveBalancedHostRearrangement
Functions and data for rearranging tensors on the host side at runtime.
The host-side is implemented separately so that we can serialize the state and restore it without having to create a
poplar::Graph
.Public Functions
-
void rearrangeForCollective(const void *inBuf, std::size_t inLen, void *outBuf, std::size_t outLen, std::size_t elemByteSize) const
Reorder the tensor in a balanced and collective-friendly manner on the host.
- Parameters
inBuf – Pointer to the input buffer.
inLen – The length of the input buffer in bytes.
outBuf – Pointer to the output buffer.
outLen – The length of the output buffer in bytes.
elemByteSize – The size of each element, in bytes.
-
template<typename T>
inline void rearrangeForCollective(const std::vector<T> &in, std::vector<T> &out, std::size_t elemByteSize = sizeof(T)) const Reorder the tensor in a balanced and collective-friendly manner on the host.
- Parameters
in – Input buffer.
out – Output buffer.
elemByteSize – The size of each element, in bytes.
-
template<typename T>
inline void undoRearrangeForCollective(const std::vector<T> &in, std::vector<T> &out, std::size_t elemByteSize = sizeof(T)) const Reorder the tensor back into the original contiguous order on the host.
- Parameters
in – Input buffer.
out – Output buffer.
elemByteSize – The size of each element, in bytes.
-
void undoRearrangeForCollective(const void *inBuf, std::size_t inLen, void *outBuf, std::size_t outLen, std::size_t elemByteSize) const
Reorder the tensor back into the original contiguous order on the host.
- Parameters
inBuf – Pointer to the input buffer.
inLen – The length of the input buffer in bytes.
outBuf – Pointer to the output buffer.
outLen – The length of the output buffer in bytes.
elemByteSize – The size of each element, in bytes.
-
size_t getNumRearrangedTensorElems() const
Number of elements in the reordered tensor.
This may not be the same as the original tensor because of the addition of padding.
- Returns
The number of elements.
-
unsigned getReplicationFactor() const
Get the graph’s replication factor.
- Returns
The replication factor of the graph.
-
void setReplicationFactor(unsigned replicationFactor)
Set the graph’s replication factor.
- Parameters
replicationFactor – The replication factor.
-
std::size_t getTotalElementsPerReplica() const
Get the number of elements in each replica.
- Returns
The number of elements per replica.
-
void setTotalElementsPerReplica(std::size_t totalElementsPerReplica)
Set the number of elements in each replica.
- Parameters
totalElementsPerReplica – The number of elements per replica.
-
const std::vector<poplar::Interval> &getGatheredToRefSlices() const
Get the mapping from the gathered tensor back to the reference tensor.
- Returns
The mapping to the reference tensor.
-
void setGatheredToRefSlices(std::vector<poplar::Interval> slices)
Set the mapping from the gathered tensor back to the reference tensor.
- Parameters
slices – The mapping of tensor indexes.
-
const std::vector<uint32_t> &getElementMap() const
A simple map of indexes for mapping individual elements one by one.
This is used instead of gatheredToRefSlices() for short intervals.
- Returns
The map of element indexes.
-
void setElementMap(std::vector<uint32_t> elementMap)
Set the map of indexes for mapping individual elements.
- Parameters
elementMap – The element map.
-
void rearrangeForCollective(const void *inBuf, std::size_t inLen, void *outBuf, std::size_t outLen, std::size_t elemByteSize) const
-
class CollectiveBalancedReorder
Helper class to reorder a tensor in a per-tile-balanced fashion.
The reordering will be done such that each replica obtains (for inputs to AllGather or outputs of ReduceScatter) an equally sized 1D tensor with equally sized regions. This helper class reduces the memory used by the collective.
This pads a tensor,
t
, so that each tile’s data is divisible byN
(whereN
is* \grainSize
). It then reorders the elements so that, the tile mappings oft.reshape(N, -1)[i]
are the same for alli
. This means that performing an operation, such asgcl::CollectiveOperator::ADD
, between the slices will not require any rearrangement.The reordering process does the following:
Flatten the input tensor
Analyse the tile mapping
Determine the reordering strategy and required internal padding
It can also:
rearrange and undo the rearrangement on any tensor that has the same tile mapping
rearrange and undo the rearrangement on host tensors that are to be copied into CollectiveBalancedReorder-rearranged RemoteBuffers
Public Functions
-
CollectiveBalancedReorder(poplar::Graph &graph, poplar::Tensor tensor, unsigned replicationFactor, const poplar::DebugNameAndId &dnai, bool allowElementMap = false, unsigned grainSize = 1)
Constructor.
- Parameters
graph – The Poplar graph.
tensor – The reference tensor to rearrange.
replicationFactor – The replication factor of the graph.
dnai – Debug name and ID.
allowElementMap – Allow alternative representation of the host rearrangements. Sometimes it is beneficial to collapse all intervals into simple 1-to-1 element map. This flag should be set true in all new code and deprecated when all frameworks implement serialisation of newly added
elementMap
field.grainSize – The grain size to use when padding the tensor.
-
poplar::Tensor createReplicaSlice(const poplar::Type &type, const std::string &debugPrefix = "")
Create a tensor mapped efficiently over the same tiles as the reference tensor.
The returned tensor has the same size as the output of an ReduceScatter or input to an AllGather.
- Parameters
type – The type to use when creating the tensor.
debugPrefix – The debug prefix.
- Returns
The efficient tensor created from the reference.
-
poplar::Tensor createCollectivesTensor(const poplar::Type &type, const std::string &debugPrefix = "")
Create a tensor mapped efficiently over the same tiles as the reference tensor.
The returned tensor has the same size as the input of an ReduceScatter or output to an AllGather.
- Parameters
type – The type to use when creating the tensor.
debugPrefix – The debug prefix.
- Returns
The efficient tensor created from the reference.
-
poplar::Tensor undoRearrangeForCollective(const poplar::Tensor &tensor) const
Reorder tensor back into the original contiguous order.
- Parameters
tensor – The tensor to rearrange.
- Returns
The tensor with the rearrangement undone.
-
inline std::vector<std::size_t> getReferenceShape() const
Get the shape of the reference tensor.
- Returns
The shape of the reference tensor.
-
inline const CollectiveBalancedHostRearrangement &getHostRearrangement() const
Get a helper class that implements the rearrangement on the host.
- Returns
The helper class for host rearrangement.
-
void zeroPaddingInCollectiveTensor(const poplar::Tensor &collectiveTensor, poplar::program::Sequence &prog) const
Zero the padding elements of a collective friendly tensor.
- Parameters
collectiveTensor – The collective tensor to zero the padding of.
prog – The program sequence to add the zeroing program to.