CollectiveBalancedReorder

#include <gcl/CollectiveBalancedReorder.hpp>
namespace gcl

Graphcore Communications Library.

CrossReplica functions

Collective operations working across replicas.

WithinReplica functions

Collective operations working within replicas.

class CollectiveBalancedHostRearrangement
#include <CollectiveBalancedReorder.hpp>

This class contains functions and data necessary to rearrange tensors on the host side at runtime.

The separation is made so that we can serialize the state and restore it without having to create a poplar::Graph.

Public Functions

void rearrangeForCollective(const void *in, void *out, int64_t elemByteSize) const

Balanced reorder the tensor in a collective-friendly manner (host-side).

Parameters
  • in – Pointer to the input buffer.

  • out – Pointer to the output buffer.

  • elemByteSize – The byte size of the elements.

void undoRearrangeForCollective(const void *in, void *out, int64_t elemByteSize) const

Reorder tensor back into the expected IR tensor shape and order (host-side).

Parameters
  • in – Pointer to the input buffer.

  • out – Pointer to the output buffer.

  • elemByteSize – The byte size of the elements.

size_t getNumRearrangedTensorElems() const

Number of elements in the collective balanced (reordered) tensor.

Returns

The number of elements.

void rearrange(const void *in, void *out, int64_t elemByteSize, bool refToGathered) const

Host tensor rearrangement routine.

Parameters
  • in – Pointer to the input buffer.

  • out – Pointer to the output buffer.

  • elemByteSize – The byte size of the elements.

  • refToGathered – Whether to rearrage from reference to gathered or the other way.

Public Members

unsigned replicationFactor = 0

The graph’s replication factor.

std::size_t totalElementsPerReplica = 0

The total number for one replica’s fragment.

std::vector<poplar::Interval> gatheredToRefSlices

The mapping from the gathered tensor back to the reference tensor.

std::vector<uint32_t> elementMap

Simple indices map for mapping individual elements one by one.

It is used instead gatheredToRefSlices for short intervals.

Private Functions

template<typename ElementType>
void rearrangeImpl(const ElementType *in, ElementType *out, bool refToGathered) const

Host tensor rearrangement routine.

Parameters
  • in – Pointer to the input buffer.

  • out – Pointer to the output buffer.

  • refToGathered – Whether to rearrage from reference to gathered or the other way.

class CollectiveBalancedReorder
#include <CollectiveBalancedReorder.hpp>

Helper class to reorder a tensor in a per-tile-balanced fashion such that each replica obtains (for inputs to AllGather or outputs of ReduceScatter) an equally sized 1D tensor with equally sized regions.

This helper class reduces the memory used by the syncful collective. The reordering process:

  • Flattens the input tensor

  • Analyses the tile mapping

  • Determines reordering strategy and required internal padding

  • Can rearrange and undo the rearrangement on any tensor that has the same tile mapping

  • Can rearrange and undo the rearrangement on host tensors that are to be copied into CBR-rearranged RemoteBuffers

Public Functions

CollectiveBalancedReorder(poplar::Graph &graph_, poplar::Tensor tensor_, unsigned replicationFactor_, const poplar::DebugNameAndId &dnai_, bool allowElementMap = false, unsigned grainSize = 1)

Constructor.

Parameters
  • graph_ – The poplar graph.

  • tensor_ – The reference tensor to rearrange.

  • replicationFactor_ – The replication factor of the graph.

  • dnai_ – Debug name and id.

  • allowElementMap – Allow alternative representation of the host rearrangements. Sometimes it is beneficial to collapse all intervals into simple 1-to-1 element map. This flag should be set true in all new code and deprecated when all frameworks implement serialisation of newly added elementMap field.

  • grainSize – The grain size to use when padding the tensor.

poplar::Tensor createReplicaSlice(const poplar::Type &type)

Create a tensor mapped efficiently over the same tiles as the reference tensor.

The returned tensor has the size of the result of the reduce scatter and of the input of the all gather.

Parameters

type – The type to use when creating the tensor.

Returns

The efficient tensor created from the reference.

poplar::Tensor createCollectivesTensor(const poplar::Type &type, const std::string &debugPrefix)

Create a tensor mapped efficiently over the same tiles as the reference tensor.

The returned tensor has the size of the input of the reduce scatter and of the result of the all gather.

Parameters
  • type – The type to use when creating the tensor.

  • debugPrefix – The debug prefix.

Returns

The efficient tensor created from the reference.

poplar::Tensor undoRearrangeForCollective(const poplar::Tensor &tensor) const

Reorder tensor back into the expected IR tensor shape and order.

Parameters

tensor – The tensor to rearrange.

Returns

The tensor with the rearrangement undone.

inline std::vector<std::size_t> getReferenceShape() const

Get the shape of the reference tensor.

Returns

The shape of the reference tensor.

inline const CollectiveBalancedHostRearrangement &getHostRearrangement() const

Get a helper class that implements allows to apply the rearrangement on the host.

Returns

The helper class for host rearrangement.

void zeroPaddingInCollectiveTensor(poplar::Tensor &collectiveTensor, poplar::program::Sequence &prog) const

Zero the padding of a collective friendly tensor.

Parameters
  • collectiveTensor – The collective tensor to zero the padding of.

  • prog – The sequence to add the zeroing program to.

Returns

The shape of the reference tensor.

Private Functions

void rearrange(const void *in, void *out, int64_t elemByteSize, bool refToGathered) const

Host tensor rearrangement routine.

Private Members

poplar::Graph &graph

Graph or subgraph on which the tensor and reordered tensor are allocated.

unsigned replicationFactor
std::vector<std::size_t> numReplicaElementsPerTile
std::vector<std::size_t> elementsPerTile
std::vector<poplar::Interval> gatheredToSimplifiedRefSlices
poplar::Tensor referenceTensor
poplar::TensorRearranger simplifier
CollectiveBalancedHostRearrangement hostRearrangement
const poplar::DebugNameAndId dnai