6.1. CollectiveBalancedReorder

#include <gcl/CollectiveBalancedReorder.hpp>
class CollectiveBalancedHostRearrangement

Functions and data for rearranging tensors on the host side at runtime.

The host-side is implemented separately so that we can serialize the state and restore it without having to create a poplar::Graph.

Public Functions

void rearrangeForCollective(const void *inBuf, std::size_t inLen, void *outBuf, std::size_t outLen, std::size_t elemByteSize) const

Reorder the tensor in a balanced and collective-friendly manner on the host.

Parameters
  • inBuf – Pointer to the input buffer.

  • inLen – The length of the input buffer in bytes.

  • outBuf – Pointer to the output buffer.

  • outLen – The length of the output buffer in bytes.

  • elemByteSize – The size of each element, in bytes.

template<typename T>
inline void rearrangeForCollective(const std::vector<T> &in, std::vector<T> &out, std::size_t elemByteSize = sizeof(T)) const

Reorder the tensor in a balanced and collective-friendly manner on the host.

Parameters
  • in – Input buffer.

  • out – Output buffer.

  • elemByteSize – The size of each element, in bytes.

template<typename T>
inline void undoRearrangeForCollective(const std::vector<T> &in, std::vector<T> &out, std::size_t elemByteSize = sizeof(T)) const

Reorder the tensor back into the original contiguous order on the host.

Parameters
  • in – Input buffer.

  • out – Output buffer.

  • elemByteSize – The size of each element, in bytes.

void undoRearrangeForCollective(const void *inBuf, std::size_t inLen, void *outBuf, std::size_t outLen, std::size_t elemByteSize) const

Reorder the tensor back into the original contiguous order on the host.

Parameters
  • inBuf – Pointer to the input buffer.

  • inLen – The length of the input buffer in bytes.

  • outBuf – Pointer to the output buffer.

  • outLen – The length of the output buffer in bytes.

  • elemByteSize – The size of each element, in bytes.

size_t getNumRearrangedTensorElems() const

Number of elements in the reordered tensor.

This may not be the same as the original tensor because of the addition of padding.

Returns

The number of elements.

unsigned getReplicationFactor() const

Get the graph’s replication factor.

Returns

The replication factor of the graph.

void setReplicationFactor(unsigned replicationFactor)

Set the graph’s replication factor.

Parameters

replicationFactor – The replication factor.

std::size_t getTotalElementsPerReplica() const

Get the number of elements in each replica.

Returns

The number of elements per replica.

void setTotalElementsPerReplica(std::size_t totalElementsPerReplica)

Set the number of elements in each replica.

Parameters

totalElementsPerReplica – The number of elements per replica.

const std::vector<poplar::Interval> &getGatheredToRefSlices() const

Get the mapping from the gathered tensor back to the reference tensor.

Returns

The mapping to the reference tensor.

void setGatheredToRefSlices(std::vector<poplar::Interval> slices)

Set the mapping from the gathered tensor back to the reference tensor.

Parameters

slices – The mapping of tensor indexes.

const std::vector<uint32_t> &getElementMap() const

A simple map of indexes for mapping individual elements one by one.

This is used instead of gatheredToRefSlices() for short intervals.

Returns

The map of element indexes.

void setElementMap(std::vector<uint32_t> elementMap)

Set the map of indexes for mapping individual elements.

Parameters

elementMap – The element map.

class CollectiveBalancedReorder

Helper class to reorder a tensor in a per-tile-balanced fashion.

The reordering will be done such that each replica obtains (for inputs to AllGather or outputs of ReduceScatter) an equally sized 1D tensor with equally sized regions. This helper class reduces the memory used by the collective.

This pads a tensor, t, so that each tile’s data is divisible by N (where N is * \grainSize). It then reorders the elements so that, the tile mappings of t.reshape(N, -1)[i] are the same for all i. This means that performing an operation, such as gcl::CollectiveOperator::ADD, between the slices will not require any rearrangement.

The reordering process does the following:

  • Flatten the input tensor

  • Analyse the tile mapping

  • Determine the reordering strategy and required internal padding

It can also:

  • rearrange and undo the rearrangement on any tensor that has the same tile mapping

  • rearrange and undo the rearrangement on host tensors that are to be copied into CollectiveBalancedReorder-rearranged RemoteBuffers

Public Functions

CollectiveBalancedReorder(poplar::Graph &graph, poplar::Tensor tensor, unsigned replicationFactor, const poplar::DebugNameAndId &dnai, bool allowElementMap = false, unsigned grainSize = 1)

Constructor.

Parameters
  • graph – The Poplar graph.

  • tensor – The reference tensor to rearrange.

  • replicationFactor – The replication factor of the graph.

  • dnai – Debug name and ID.

  • allowElementMap – Allow alternative representation of the host rearrangements. Sometimes it is beneficial to collapse all intervals into simple 1-to-1 element map. This flag should be set true in all new code and deprecated when all frameworks implement serialisation of newly added elementMap field.

  • grainSize – The grain size to use when padding the tensor.

poplar::Tensor createReplicaSlice(const poplar::Type &type, const std::string &debugPrefix = "")

Create a tensor mapped efficiently over the same tiles as the reference tensor.

The returned tensor has the same size as the output of an ReduceScatter or input to an AllGather.

Parameters
  • type – The type to use when creating the tensor.

  • debugPrefix – The debug prefix.

Returns

The efficient tensor created from the reference.

poplar::Tensor createCollectivesTensor(const poplar::Type &type, const std::string &debugPrefix = "")

Create a tensor mapped efficiently over the same tiles as the reference tensor.

The returned tensor has the same size as the input of an ReduceScatter or output to an AllGather.

Parameters
  • type – The type to use when creating the tensor.

  • debugPrefix – The debug prefix.

Returns

The efficient tensor created from the reference.

poplar::Tensor undoRearrangeForCollective(const poplar::Tensor &tensor) const

Reorder tensor back into the original contiguous order.

Parameters

tensor – The tensor to rearrange.

Returns

The tensor with the rearrangement undone.

inline std::vector<std::size_t> getReferenceShape() const

Get the shape of the reference tensor.

Returns

The shape of the reference tensor.

inline const CollectiveBalancedHostRearrangement &getHostRearrangement() const

Get a helper class that implements the rearrangement on the host.

Returns

The helper class for host rearrangement.

void zeroPaddingInCollectiveTensor(const poplar::Tensor &collectiveTensor, poplar::program::Sequence &prog) const

Zero the padding elements of a collective friendly tensor.

Parameters
  • collectiveTensor – The collective tensor to zero the padding of.

  • prog – The program sequence to add the zeroing program to.