6.1. CollectiveBalancedReorder

#include <gcl/CollectiveBalancedReorder.hpp>

class CollectiveBalancedHostRearrangement

Functions and data for rearranging tensors on the host side at runtime.

The host-side is implemented separately so that we can serialize the state and restore it without having to create a poplar::Graph.

Public Functions

void rearrangeForCollective(const void *inBuf, std::size_t inLen, void *outBuf, std::size_t outLen, std::size_t elemByteSize) const

Reorder the tensor in a balanced and collective-friendly manner on the host.

Parameters

inBuf – Pointer to the input buffer.
inLen – The length of the input buffer in bytes.
outBuf – Pointer to the output buffer.
outLen – The length of the output buffer in bytes.
elemByteSize – The size of each element, in bytes.

template<typename T> inline void rearrangeForCollective(const std::vector<T> &in, std::vector<T> &out, std::size_t elemByteSize = sizeof(T)) const

Reorder the tensor in a balanced and collective-friendly manner on the host.

Parameters

in – Input buffer.
out – Output buffer.
elemByteSize – The size of each element, in bytes.

template<typename T> inline void undoRearrangeForCollective(const std::vector<T> &in, std::vector<T> &out, std::size_t elemByteSize = sizeof(T)) const

Reorder the tensor back into the original contiguous order on the host.

Parameters

in – Input buffer.
out – Output buffer.
elemByteSize – The size of each element, in bytes.

void undoRearrangeForCollective(const void *inBuf, std::size_t inLen, void *outBuf, std::size_t outLen, std::size_t elemByteSize) const

Reorder the tensor back into the original contiguous order on the host.

Parameters

inBuf – Pointer to the input buffer.
inLen – The length of the input buffer in bytes.
outBuf – Pointer to the output buffer.
outLen – The length of the output buffer in bytes.
elemByteSize – The size of each element, in bytes.

size_t getNumRearrangedTensorElems() const

Number of elements in the reordered tensor.

This may not be the same as the original tensor because of the addition of padding.

Returns: The number of elements.

unsigned getReplicationFactor() const

Get the graph’s replication factor.

Returns: The replication factor of the graph.

void setReplicationFactor(unsigned replicationFactor)

Set the graph’s replication factor.

Parameters: replicationFactor – The replication factor.

std::size_t getTotalElementsPerReplica() const

Get the number of elements in each replica.

Returns: The number of elements per replica.

void setTotalElementsPerReplica(std::size_t totalElementsPerReplica)

Set the number of elements in each replica.

Parameters: totalElementsPerReplica – The number of elements per replica.

const std::vector<poplar::Interval> &getGatheredToRefSlices() const

Get the mapping from the gathered tensor back to the reference tensor.

Returns: The mapping to the reference tensor.

void setGatheredToRefSlices(std::vector<poplar::Interval> slices)

Set the mapping from the gathered tensor back to the reference tensor.

Parameters: slices – The mapping of tensor indexes.

const std::vector<uint32_t> &getElementMap() const

A simple map of indexes for mapping individual elements one by one.

This is used instead of gatheredToRefSlices() for short intervals.

Returns: The map of element indexes.

void setElementMap(std::vector<uint32_t> elementMap)

Set the map of indexes for mapping individual elements.

Parameters: elementMap – The element map.

class CollectiveBalancedReorder

Helper class to reorder a tensor in a per-tile-balanced fashion.

The reordering will be done such that each replica obtains (for inputs to AllGather or outputs of ReduceScatter) an equally sized 1D tensor with equally sized regions. This helper class reduces the memory used by the collective.

This pads a tensor, t, so that each tile’s data is divisible by N (where N is * \grainSize). It then reorders the elements so that, the tile mappings of t.reshape(N, -1)[i] are the same for all i. This means that performing an operation, such as gcl::CollectiveOperator::ADD, between the slices will not require any rearrangement.

The reordering process does the following:

Flatten the input tensor
Analyse the tile mapping
Determine the reordering strategy and required internal padding

It can also:

rearrange and undo the rearrangement on any tensor that has the same tile mapping
rearrange and undo the rearrangement on host tensors that are to be copied into CollectiveBalancedReorder-rearranged RemoteBuffers

Public Functions

CollectiveBalancedReorder(poplar::Graph &graph, poplar::Tensor tensor, unsigned replicationFactor, const poplar::DebugNameAndId &dnai, bool allowElementMap = false, unsigned grainSize = 1)

Constructor.

Parameters

graph – The Poplar graph.
tensor – The reference tensor to rearrange.
replicationFactor – The replication factor of the graph.
dnai – Debug name and ID.
allowElementMap – Allow alternative representation of the host rearrangements. Sometimes it is beneficial to collapse all intervals into simple 1-to-1 element map. This flag should be set true in all new code and deprecated when all frameworks implement serialisation of newly added elementMap field.
grainSize – The grain size to use when padding the tensor.

poplar::Tensor createReplicaSlice(const poplar::Type &type, const std::string &debugPrefix = "")

Create a tensor mapped efficiently over the same tiles as the reference tensor.

The returned tensor has the same size as the output of an ReduceScatter or input to an AllGather.

Parameters

type – The type to use when creating the tensor.
debugPrefix – The debug prefix.

Returns

The efficient tensor created from the reference.

poplar::Tensor createCollectivesTensor(const poplar::Type &type, const std::string &debugPrefix = "")

Create a tensor mapped efficiently over the same tiles as the reference tensor.

The returned tensor has the same size as the input of an ReduceScatter or output to an AllGather.

Parameters

type – The type to use when creating the tensor.
debugPrefix – The debug prefix.

Returns

The efficient tensor created from the reference.

poplar::Tensor undoRearrangeForCollective(const poplar::Tensor &tensor) const

Reorder tensor back into the original contiguous order.

Parameters: tensor – The tensor to rearrange.
Returns: The tensor with the rearrangement undone.

inline std::vector<std::size_t> getReferenceShape() const

Get the shape of the reference tensor.

Returns: The shape of the reference tensor.

inline const CollectiveBalancedHostRearrangement &getHostRearrangement() const

Get a helper class that implements the rearrangement on the host.

Returns: The helper class for host rearrangement.

void zeroPaddingInCollectiveTensor(const poplar::Tensor &collectiveTensor, poplar::program::Sequence &prog) const

Zero the padding elements of a collective friendly tensor.

Parameters

collectiveTensor – The collective tensor to zero the padding of.
prog – The program sequence to add the zeroing program to.

Search help

6.1. CollectiveBalancedReorder