CollectiveBalancedReorder

#include <gcl/CollectiveBalancedReorder.hpp>

Defines

GCL_NO_DISCARD

Produce compile time warning for unused return values.

namespace gcl

Graphcore Communications Library.

CrossReplica functions

Collective operations working across replicas.

WithinReplica functions

Collective operations working within replicas.

class CollectiveBalancedHostRearrangement
#include <CollectiveBalancedReorder.hpp>

This class contains functions and data necessary to rearrange tensors on the host side at runtime.

The separation is made so that we can serialize the state and restore it without having to create a poplar::Graph.

Public Functions

CollectiveBalancedHostRearrangement() = default
~CollectiveBalancedHostRearrangement() = default
CollectiveBalancedHostRearrangement(const CollectiveBalancedHostRearrangement&) = default

Defaulted to avoid warnings in deprecation period.

CollectiveBalancedHostRearrangement(CollectiveBalancedHostRearrangement&&) noexcept = default

Defaulted to avoid warnings in deprecation period.

CollectiveBalancedHostRearrangement &operator=(const CollectiveBalancedHostRearrangement&) = default

Defaulted to avoid warnings in deprecation period.

Returns

The CollectiveBalancedHostRearrangement that is assigned to.

CollectiveBalancedHostRearrangement &operator=(CollectiveBalancedHostRearrangement&&) noexcept = default

Defaulted to avoid warnings in deprecation period.

Returns

The CollectiveBalancedHostRearrangement that is assigned to.

void rearrangeForCollective(const void *in, void *out, int64_t elemByteSize) const

Balanced reorder the tensor in a collective-friendly manner (host-side).

Parameters
  • in – Pointer to the input buffer.

  • out – Pointer to the output buffer.

  • elemByteSize – The byte size of the elements.

void rearrangeForCollective(const void *in, std::size_t inSize, void *out, std::size_t outSize, std::size_t elemByteSize) const

Balanced reorder the tensor in a collective-friendly manner (host-side).

Parameters
  • in – Pointer to the input buffer.

  • inSize – The size of the in buffer in bytes.

  • out – Pointer to the output buffer.

  • outSize – The size of the out buffer in bytes.

  • elemByteSize – The byte size of the elements.

template<typename T>
inline void rearrangeForCollective(const std::vector<T> &in, std::vector<T> &out, std::size_t elemByteSize = sizeof(T)) const

Balanced reorder the tensor in a collective-friendly manner (host-side).

Parameters
  • in – Input buffer.

  • out – Output buffer.

  • elemByteSize – The byte size of the elements.

void undoRearrangeForCollective(const void *in, void *out, int64_t elemByteSize) const

Reorder tensor back into the expected IR tensor shape and order (host-side).

Parameters
  • in – Pointer to the input buffer.

  • out – Pointer to the output buffer.

  • elemByteSize – The byte size of the elements.

template<typename T>
inline void undoRearrangeForCollective(const std::vector<T> &in, std::vector<T> &out, std::size_t elemByteSize = sizeof(T)) const

Reorder tensor back into the expected IR tensor shape and order (host-side).

Parameters
  • in – Input buffer.

  • out – Output buffer.

  • elemByteSize – The byte size of the elements.

void undoRearrangeForCollective(const void *in, std::size_t inSize, void *out, std::size_t outSize, std::size_t elemByteSize) const

Reorder tensor back into the expected IR tensor shape and order (host-side).

Parameters
  • in – Pointer to the input buffer.

  • inSize – The size of the in buffer in bytes.

  • out – Pointer to the output buffer.

  • outSize – The size of the out buffer in bytes.

  • elemByteSize – The byte size of the elements.

size_t getNumRearrangedTensorElems() const

Number of elements in the collective balanced (reordered) tensor.

Returns

The number of elements.

void rearrange(const void *in, void *out, int64_t elemByteSize, bool refToGathered) const

Host tensor rearrangement routine.

Parameters
  • in – Pointer to the input buffer.

  • out – Pointer to the output buffer.

  • elemByteSize – The byte size of the elements.

  • refToGathered – Whether to rearrange from reference to gathered or the other way.

unsigned getReplicationFactor() const

The graph’s replication factor.

Returns

replication factor

void setReplicationFactor(unsigned replicationFactor)

The graph’s replication factor.

Parameters

replicationFactor

std::size_t getTotalElementsPerReplica() const

The total number for one replica’s fragment.

Returns

number of elements per replica

void setTotalElementsPerReplica(std::size_t totalElementsPerReplica)

The total number for one replica’s fragment.

Parameters

totalElementsPerReplica

const std::vector<poplar::Interval> &getGatheredToRefSlices() const

The mapping from the gathered tensor back to the reference tensor.

Returns

mapping

void setGatheredToRefSlices(std::vector<poplar::Interval> slices)

Set the mapping from the gathered tensor back to the reference tensor.

Parameters

slices

const std::vector<uint32_t> &getElementMap() const

Simple indices map for mapping individual elements one by one.

It is used instead gatheredToRefSlices for short intervals.

Returns

element map

Public Members

unsigned replicationFactor = 0

The graph’s replication factor.

std::size_t totalElementsPerReplica = 0

The total number for one replica’s fragment.

std::vector<poplar::Interval> gatheredToRefSlices

The mapping from the gathered tensor back to the reference tensor.

std::vector<uint32_t> elementMap

Simple indices map for mapping individual elements one by one.

It is used instead gatheredToRefSlices for short intervals.

Private Functions

template<typename ElementType>
void rearrangeImpl(const ElementType *in, std::size_t inLen, ElementType *out, std::size_t outLen, bool refToGathered) const

Host tensor rearrangement routine.

Parameters
  • in – Pointer to the input buffer.

  • inLen – Length of input buffer in number of elements.

  • out – Pointer to the output buffer.

  • outLen – Length of output buffer in number of elements.

  • refToGathered – Whether to rearrange from reference to gathered or the other way.

void rearrange(const void *in, std::size_t inSize, void *out, std::size_t outSize, std::size_t elemByteSize, bool refToGathered) const
void update(bool allowElementMap, size_t numGatheredToRefSlicesVecEntries, const std::vector<std::vector<poplar::Interval>> &gatheredToRefSlicesVec)

Friends

friend class CollectiveBalancedReorder
class CollectiveBalancedReorder
#include <CollectiveBalancedReorder.hpp>

Helper class to reorder a tensor in a per-tile-balanced fashion such that each replica obtains (for inputs to AllGather or outputs of ReduceScatter) an equally sized 1D tensor with equally sized regions.

This helper class reduces the memory used by the syncful collective. The reordering process:

  • Flattens the input tensor

  • Analyses the tile mapping

  • Determines reordering strategy and required internal padding

  • Can rearrange and undo the rearrangement on any tensor that has the same tile mapping

  • Can rearrange and undo the rearrangement on host tensors that are to be copied into CBR-rearranged RemoteBuffers

Public Functions

CollectiveBalancedReorder(poplar::Graph &graph_, poplar::Tensor tensor_, unsigned replicationFactor_, const poplar::DebugNameAndId &dnai_, bool allowElementMap = false, unsigned grainSize = 1)

Constructor.

Parameters
  • graph_ – The poplar graph.

  • tensor_ – The reference tensor to rearrange.

  • replicationFactor_ – The replication factor of the graph.

  • dnai_ – Debug name and id.

  • allowElementMap – Allow alternative representation of the host rearrangements. Sometimes it is beneficial to collapse all intervals into simple 1-to-1 element map. This flag should be set true in all new code and deprecated when all frameworks implement serialisation of newly added elementMap field.

  • grainSize – The grain size to use when padding the tensor.

poplar::Tensor createReplicaSlice(const poplar::Type &type)

Create a tensor mapped efficiently over the same tiles as the reference tensor.

The returned tensor has the size of the result of the reduce scatter and of the input of the all gather.

Parameters

type – The type to use when creating the tensor.

Returns

The efficient tensor created from the reference.

poplar::Tensor createCollectivesTensor(const poplar::Type &type, const std::string &debugPrefix)

Create a tensor mapped efficiently over the same tiles as the reference tensor.

The returned tensor has the size of the input of the reduce scatter and of the result of the all gather.

Parameters
  • type – The type to use when creating the tensor.

  • debugPrefix – The debug prefix.

Returns

The efficient tensor created from the reference.

poplar::Tensor undoRearrangeForCollective(const poplar::Tensor &tensor) const

Reorder tensor back into the expected IR tensor shape and order.

Parameters

tensor – The tensor to rearrange.

Returns

The tensor with the rearrangement undone.

inline std::vector<std::size_t> getReferenceShape() const

Get the shape of the reference tensor.

Returns

The shape of the reference tensor.

inline const CollectiveBalancedHostRearrangement &getHostRearrangement() const

Get a helper class that implements allows to apply the rearrangement on the host.

Returns

The helper class for host rearrangement.

void zeroPaddingInCollectiveTensor(poplar::Tensor &collectiveTensor, poplar::program::Sequence &prog) const

Zero the padding of a collective friendly tensor.

Parameters
  • collectiveTensor – The collective tensor to zero the padding of.

  • prog – The sequence to add the zeroing program to.

Private Functions

void rearrange(const void *in, void *out, std::size_t elemByteSize, bool refToGathered) const

Host tensor rearrangement routine.

Private Members

poplar::Graph &mGraph

Graph or subgraph on which the tensor and reordered tensor are allocated.

unsigned mReplicationFactor
std::vector<std::size_t> mNumReplicaElementsPerTile
std::vector<std::size_t> mElementsPerTile
std::vector<poplar::Interval> mGatheredToSimplifiedRefSlices
poplar::Tensor mReferenceTensor
poplar::TensorRearranger mSimplifier
CollectiveBalancedHostRearrangement mHostRearrangement
const poplar::DebugNameAndId mDnai