6.1. CollectiveBalancedReorder
#include <gcl/CollectiveBalancedReorder.hpp>
- 
class CollectiveBalancedHostRearrangement
 Functions and data for rearranging tensors on the host side at runtime.
The host-side is implemented separately so that we can serialize the state and restore it without having to create a
poplar::Graph.Public Functions
- 
void rearrangeForCollective(const void *inBuf, std::size_t inLen, void *outBuf, std::size_t outLen, std::size_t elemByteSize) const
 Reorder the tensor in a balanced and collective-friendly manner on the host.
- Parameters
 inBuf – Pointer to the input buffer.
inLen – The length of the input buffer in bytes.
outBuf – Pointer to the output buffer.
outLen – The length of the output buffer in bytes.
elemByteSize – The size of each element, in bytes.
- 
template<typename T>
inline void rearrangeForCollective(const std::vector<T> &in, std::vector<T> &out, std::size_t elemByteSize = sizeof(T)) const Reorder the tensor in a balanced and collective-friendly manner on the host.
- Parameters
 in – Input buffer.
out – Output buffer.
elemByteSize – The size of each element, in bytes.
- 
template<typename T>
inline void undoRearrangeForCollective(const std::vector<T> &in, std::vector<T> &out, std::size_t elemByteSize = sizeof(T)) const Reorder the tensor back into the original contiguous order on the host.
- Parameters
 in – Input buffer.
out – Output buffer.
elemByteSize – The size of each element, in bytes.
- 
void undoRearrangeForCollective(const void *inBuf, std::size_t inLen, void *outBuf, std::size_t outLen, std::size_t elemByteSize) const
 Reorder the tensor back into the original contiguous order on the host.
- Parameters
 inBuf – Pointer to the input buffer.
inLen – The length of the input buffer in bytes.
outBuf – Pointer to the output buffer.
outLen – The length of the output buffer in bytes.
elemByteSize – The size of each element, in bytes.
- 
size_t getNumRearrangedTensorElems() const
 Number of elements in the reordered tensor.
This may not be the same as the original tensor because of the addition of padding.
- Returns
 The number of elements.
- 
unsigned getReplicationFactor() const
 Get the graph’s replication factor.
- Returns
 The replication factor of the graph.
- 
void setReplicationFactor(unsigned replicationFactor)
 Set the graph’s replication factor.
- Parameters
 replicationFactor – The replication factor.
- 
std::size_t getTotalElementsPerReplica() const
 Get the number of elements in each replica.
- Returns
 The number of elements per replica.
- 
void setTotalElementsPerReplica(std::size_t totalElementsPerReplica)
 Set the number of elements in each replica.
- Parameters
 totalElementsPerReplica – The number of elements per replica.
- 
const std::vector<poplar::Interval> &getGatheredToRefSlices() const
 Get the mapping from the gathered tensor back to the reference tensor.
- Returns
 The mapping to the reference tensor.
- 
void setGatheredToRefSlices(std::vector<poplar::Interval> slices)
 Set the mapping from the gathered tensor back to the reference tensor.
- Parameters
 slices – The mapping of tensor indexes.
- 
const std::vector<uint32_t> &getElementMap() const
 A simple map of indexes for mapping individual elements one by one.
This is used instead of gatheredToRefSlices() for short intervals.
- Returns
 The map of element indexes.
- 
void setElementMap(std::vector<uint32_t> elementMap)
 Set the map of indexes for mapping individual elements.
- Parameters
 elementMap – The element map.
- 
void rearrangeForCollective(const void *inBuf, std::size_t inLen, void *outBuf, std::size_t outLen, std::size_t elemByteSize) const
 
- 
class CollectiveBalancedReorder
 Helper class to reorder a tensor in a per-tile-balanced fashion.
The reordering will be done such that each replica obtains (for inputs to AllGather or outputs of ReduceScatter) an equally sized 1D tensor with equally sized regions. This helper class reduces the memory used by the collective.
This pads a tensor,
t, so that each tile’s data is divisible byN(whereNis* \grainSize). It then reorders the elements so that, the tile mappings oft.reshape(N, -1)[i]are the same for alli. This means that performing an operation, such asgcl::CollectiveOperator::ADD, between the slices will not require any rearrangement.The reordering process does the following:
Flatten the input tensor
Analyse the tile mapping
Determine the reordering strategy and required internal padding
It can also:
rearrange and undo the rearrangement on any tensor that has the same tile mapping
rearrange and undo the rearrangement on host tensors that are to be copied into CollectiveBalancedReorder-rearranged RemoteBuffers
Public Functions
- 
CollectiveBalancedReorder(poplar::Graph &graph, poplar::Tensor tensor, unsigned replicationFactor, const poplar::DebugNameAndId &dnai, bool allowElementMap = false, unsigned grainSize = 1)
 Constructor.
- Parameters
 graph – The Poplar graph.
tensor – The reference tensor to rearrange.
replicationFactor – The replication factor of the graph.
dnai – Debug name and ID.
allowElementMap – Allow alternative representation of the host rearrangements. Sometimes it is beneficial to collapse all intervals into simple 1-to-1 element map. This flag should be set true in all new code and deprecated when all frameworks implement serialisation of newly added
elementMapfield.grainSize – The grain size to use when padding the tensor.
- 
poplar::Tensor createReplicaSlice(const poplar::Type &type, const std::string &debugPrefix = "")
 Create a tensor mapped efficiently over the same tiles as the reference tensor.
The returned tensor has the same size as the output of an ReduceScatter or input to an AllGather.
- Parameters
 type – The type to use when creating the tensor.
debugPrefix – The debug prefix.
- Returns
 The efficient tensor created from the reference.
- 
poplar::Tensor createCollectivesTensor(const poplar::Type &type, const std::string &debugPrefix = "")
 Create a tensor mapped efficiently over the same tiles as the reference tensor.
The returned tensor has the same size as the input of an ReduceScatter or output to an AllGather.
- Parameters
 type – The type to use when creating the tensor.
debugPrefix – The debug prefix.
- Returns
 The efficient tensor created from the reference.
- 
poplar::Tensor undoRearrangeForCollective(const poplar::Tensor &tensor) const
 Reorder tensor back into the original contiguous order.
- Parameters
 tensor – The tensor to rearrange.
- Returns
 The tensor with the rearrangement undone.
- 
inline std::vector<std::size_t> getReferenceShape() const
 Get the shape of the reference tensor.
- Returns
 The shape of the reference tensor.
- 
inline const CollectiveBalancedHostRearrangement &getHostRearrangement() const
 Get a helper class that implements the rearrangement on the host.
- Returns
 The helper class for host rearrangement.
- 
void zeroPaddingInCollectiveTensor(const poplar::Tensor &collectiveTensor, poplar::program::Sequence &prog) const
 Zero the padding elements of a collective friendly tensor.
- Parameters
 collectiveTensor – The collective tensor to zero the padding of.
prog – The program sequence to add the zeroing program to.