TileMapping

#include <poputil/TileMapping.hpp>

Functions for handling the mapping of tensors to tiles.

namespace poputil

General utility functions for building graphs.

Functions

std::vector<std::vector<poplar::Interval>> calcLinearTileMapping(const poplar::Graph &graph, std::vector<std::size_t> shape, unsigned minElementsPerTile, unsigned grainSize, unsigned offset = 0, bool ascendingOrder = true)

Calculate a tile mapping that spreads the tensor evenly over the tiles in a graph.

By default the indices of the resulting mapping go from from low to high tile numbers, however offset and direction can be specified.

Parameters
  • graph – The graph to calculate the mapping for.

  • shape – The shape of the tensor to be mapped: a vector containing the size of each dimension of the tensor.

  • minElementsPerTile – The minimum number of tensor elements to be allocated to a tile.

  • grainSize – The number of elements mapped to each tile will be an integer multiple of the grain size.

  • offset – The offset to the first tile used for mapping

  • ascendingOrder – If true, the first tile used = offset and tiles are allocated in increasing order If false, the first tile used = (number of device tiles -1 - offset) and tiles are allocated in decreasing order

Returns

A vector specifying the mapping

std::vector<std::vector<poplar::Interval>> calcLinearTileMapping(const poplar::Graph &graph, const poplar::Tensor &t, unsigned offset = 0, bool ascendingOrder = true)

Calculate a tile mapping that spreads the tensor evenly over the tiles in a graph.

By default the indices of the resulting mapping go from from low to high tile numbers, however offset and direction can be specified.

In this case the elements are distributed so that groups of elements of the device’s natural vector width will not be split. It effectively sets the grain size to the natural vector width for the data type. This means the number of elements on each tile will be a multiple of the natural vector width and the index of the first element is aligned to the natural vector width.

The natural vector width is the largest vector width supported in hardware for arithmetic operations on that data type.

It will also try to keep at least 128 bytes of data on each tile to avoid high exchange costs.

Parameters
  • graph – The graph to add the operation to.

  • t – The tensor to be mapped

  • offset – The offset to the first tile used for mapping

  • ascendingOrder – If true, the first tile used = offset and tiles are allocated in increasing order If false, the first tile used = (number of device tiles - 1 - offset) and tiles are allocated in decreasing order

Returns

A vector specifying the mapping

void mapTensorLinearly(poplar::Graph &graph, const poplar::Tensor &t, unsigned minElementsPerTile, unsigned grainSize)

Map the specified tensor, spreading the tensor evenly over the tiles in a graph.

The indices of the flattened tensor are mapped from low to high tile numbers, however offset and direction can be specified.

Parameters
  • graph – The graph to calculate the mapping for.

  • t – The tensor to be mapped.

  • minElementsPerTile – The minimum number of tensor elements to be allocated to a tile.

  • grainSize – The number of elements mapped to each tile will be an integer multiple of the grain size.

  • offset – The offset to the first tile used for mapping

  • ascendingOrder – If true, the first tile used = offset and tiles are allocated in increasing order If false, the first tile used = (number of device tiles -1 - offset) and tiles are allocated in decreasing order

void mapTensorLinearlyWithOffset(poplar::Graph &graph, const poplar::Tensor &t, unsigned minElementsPerTile, unsigned grainSize, unsigned offset, bool ascendingOrder = true)
void mapTensorLinearly(poplar::Graph &graph, const poplar::Tensor &t)

Map the specified tensor, spreading the tensor evenly over the tiles in a graph.

The indices of the flattened tensor are mapped from low to high tile numbers, however offset and direction can be specified.

In this case the elements are distributed so that groups of elements of the device’s natural vector width will not be split. It effectively sets the grain size to the natural vector width for the data type. This means the number of elements on each tile will be a multiple of the natural vector width and the index of the first element is aligned to the natural vector width.

The natural vector width is the largest vector width supported in hardware for arithmetic operations on that data type.

It will also try to keep at least 128 bytes of data on each tile to avoid high exchange costs.

Parameters
  • graph – The graph to add the operation to.

  • t – The tensor to be mapped.

  • offset – The offset to the first tile used for mapping

  • ascendingOrder – If true, the first tile used = offset and tiles are allocated in increasing order If false, the first tile used = (number of device tiles -1 - offset) and tiles are allocated in decreasing order

void mapTensorLinearlyWithOffset(poplar::Graph &graph, const poplar::Tensor &t, unsigned startTile = 0, bool ascendingOrder = true)
std::size_t chooseMappingOffset(std::size_t numTiles, const std::vector<std::size_t> &shape)

Choose a offset for use with tensor mapping functions using a hash of the shape provided with the seed provided.

Parameters
  • numTiles – The number of tiles of the intended target device.

  • shape – The shape to produce a hash of.

  • seed – Optional seed to use in producing the hash.

Returns

The selected offset in the range 0 to numTiles - 1

std::size_t chooseMappingOffset(std::size_t numTiles, const std::vector<std::size_t> &shape, std::size_t seed)
unsigned getTileImbalance(const poplar::Graph::TileToTensorMapping &mapping, unsigned minElementsPerTile = 0, unsigned grainSize = 1)

Determine how unbalanced a tensor is when mapped over tiles in a graph.

This reports how well a tensor mapping compares with the mapping based on a given number of elements per tile.

Parameters
  • mapping – The current tile mapping of the tensor.

  • minElementsPerTile – The suggested minimum number of elements per tile.

  • grainSize – The number of elements mapped to each tile would be an integer multiple of the suggested grain size.

Returns

The maximum number of elements greater than expected on any tile.

unsigned getTileImbalance(const poplar::Graph &graph, const poplar::Tensor &t, unsigned minElementsPerTile = 0, unsigned grainSize = 1)

Determine how unbalanced a tensor is mapped over tiles.

This compares the way a tensor is mapped to a set of tiles to the mapping based on a given number of elements per tile.

Parameters
  • graph – The graph containing the mapped tensor.

  • mapping – The tensor currently mapped to tiles in the graph.

  • minElementsPerTile – The suggested minimum number of elements per tile.

  • grainSize – The number of elements mapped to each tile would be an integer multiple of the suggested grain size.

Returns

The maximum number of elements greater than expected on any tile.

poplar::Tensor cloneToIpu(poplar::Graph &graph, const poplar::Tensor &t, unsigned dstIPU, const poplar::DebugContext &debugContext = {}, poplar::TensorCloneMethod method = poplar::TensorCloneMethod::PRESERVE_ORDER_UNLESS_ALIASES)

Create a clone of the specified tensor on the specified IPU.

The cloned tensor is mapped to the IPU in such a way that the mapping of tensor elements to tiles is preserved.

Parameters
  • graph – The graph representing the entire multi-IPU device.

  • t – The tensor to clone.

  • dstIPU – The index of the IPU to clone the tensor onto.

  • name – A debug name to give to any new tensors allocated in the graph during the clone. If this is empty then the debug names will be derived from existing tensor debug names.

  • method – The method to use for the cloning.

Returns

The cloned tensor.

poplar::Tensor cloneToGraph(poplar::Graph &srcGraph, poplar::Graph &dstGraph, const poplar::Tensor &t, const poplar::DebugContext &debugContext = {}, poplar::TensorCloneMethod method = poplar::TensorCloneMethod::PRESERVE_ORDER_UNLESS_ALIASES)

Create a clone of the specified tensor on the specified graph.

The cloned tensor is mapped to the destination graph in such a way that the mapping of tensor elements to tiles is preserved.

Note

It is assumed that the destination graph has enough tiles to clone the input tensor. This includes any gaps in the tile mapping. This means the maximum mapped tile of t in the source graph must be less than dstGraph.getTarget().getNumTiles().

Parameters
  • srcGraph – The graph representing the source tiles.

  • dstGraph – The graph representing the destination tiles.

  • t – The tensor to clone.

  • debugContext – Optional debug information

  • method – The method to use for the cloning.

Returns

The cloned tensor.

poplar::Tensor copyToIpu(poplar::Graph &masterGraph, const poplar::Tensor &t, poplar::program::Sequence &prog, unsigned dstIPU, const poplar::DebugContext &debugContext = {}, poplar::TensorCloneMethod method = poplar::TensorCloneMethod::PRESERVE_ORDER_UNLESS_ALIASES)

Move a tensor from one IPU to another.

The tensor is moved by duplicating it, mapping the clone onto another IPU, and copying the original tensor values to the new one.

Parameters
  • masterGraph – The graph representing the entire multi-IPU device.

  • t – The tensor to move from one IPU to another.

  • prog – A program sequence to add the Copy to.

  • dstIPU – The index of the IPU onto which the tensor will be moved.

  • debugContext – A debug name to give to the tensor created on dstIPU. If this is empty then the debug names will be derived from existing tensor debug names.

  • method – The method to use for cloning of the tensor on the destination IPU.

Returns

The new tensor on the specified IPU.

poplar::Tensor createIpuCopy(poplar::Graph &graph, const poplar::Tensor &t, unsigned dstIpu, poplar::Tensor &copySrc, poplar::Tensor &copyDst, const poplar::DebugContext &debugContext = {}, poplar::TensorCloneMethod method = poplar::TensorCloneMethod::PRESERVE_ORDER_AND_ALIASES)

Prepare to move a tensor from one IPU to another.

The tensor is duplicated and the clone is mapped onto another IPU. References to source and destination tensors are provided for use by an inter-IPU copy.

The necessary copy operation is not added to the program.

Parameters
  • masterGraph – The graph representing the entire multi-IPU device.

  • t – The tensor to move from one IPU to another.

  • dstIPU – The index of the IPU onto which the tensor will be moved.

  • copySrc – A tensor that can be used as the source to do the copy.

  • copyDst – A tensor that can be used as the destination of the copy.

  • debugContext – A debug name to give to the tensor created on dstIPU. If this is empty then the debug names will be derived from existing tensor debug names.

  • method – The method to use for cloning of the tensor on the destination IPU.

Returns

The new tensor on the specified IPU.

bool dimIsSplitOverTiles(const poplar::Graph &graph, const poplar::Tensor &t, unsigned dimension)

Check if a dimension of a tensor is split over more than one tile.

Examines the mapping of the specified tensor to see if the specified dimension is split over more than one tile.

Parameters
  • graph – The graph to examine.

  • t – The tensor to check.

  • dimension – The dimension to check.

Returns

True if elements of the given dimension are spread over more than one tile.

bool dimIsSplitOverIPUs(const poplar::Graph &graph, const poplar::Tensor &t, unsigned dimension)

Check if a dimension of a tensor is split over more than one IPU.

Examines the mapping of the specified tensor to see if the specified dimension is split over more than one IPU.

Parameters
  • graph – The graph to examine.

  • t – The tensor to check.

  • dimension – The dimension to check.

Returns

True if elements of the given dimension are spread over more than one IPU.

poplar::Tensor createBroadcastOperand(poplar::Graph &graph, const poplar::Tensor &fullTensor, const poplar::Type &type, unsigned dim, bool ditherMapping = false, const poplar::DebugContext &debugContext = {})

Create a simpler tensor that is mapped in the same way as another, full, tensor.

The full tensor is typically a left hand side operand of an operation while the created tensor is the right hand side. The created tensor has one dimension, which is the same size as the specified dimension of the full tensor.

Because the created tensor has the same mapping as the full tensor, it reduces the amount of data exchange or copies that are required for an operation using the two tensors.

Parameters
  • graph – The graph which the output tensor is added to.

  • fullTensor – The tensor mapping for the output tensor is copied from this tensor.

  • type – The type of the output tensor.

  • dim – The dimension of the input tensor which is the size of the created tensor.

  • ditherMapping – Enable dithering to be applied to the mapping of the output tensor.

  • debugContext – Optional debug information.

Returns

The created output tensor.

unsigned transformTileIndex(unsigned tile, unsigned numTiles, unsigned offset, bool ascending)

Transform a tile index such that the result begins at zero and increments.

Parameters
  • tile – The tile number to transform.

  • numTiles – The number of tiles on the target device.

  • offset – The offset to the first tile used for the mapping before the transform takes place.

  • ascendingOrder – Mapping order before the transform takes place: If true, the first tile used = offset and tiles are allocated in increasing order If false, the first tile used = (number of device tiles -1 - offset) and tiles are allocated in decreasing order

Returns

Transformed tile number

unsigned invTransformTileIndex(unsigned tile, unsigned numTiles, unsigned offset, bool ascending)

Transform a tile index such that the result begins at an offset and increments or decrements.

Parameters
  • tile – The tile number to transform.

  • numTiles – The number of tiles on the target device.

  • offset – The offset to the first tile used for the mapping after the transform takes place.

  • ascendingOrder – Mapping order after the transform takes place: If true, the first tile used = offset and tiles are allocated in increasing order If false, the first tile used = (number of device tiles -1 - offset) and tiles are allocated in decreasing order

Returns

Transformed tile number

class TensorUseTracker
#include <TileMapping.hpp>

Class that tracks the usage of data on different tiles.

If data is broadcast to many tiles, it is sometimes efficient to map the data so that it is spread evenly amongst the tiles that use it.

This class can collect information about the use of data and then calculate a suitable tile mapping.

Public Types

enum MappingMethod

Values:

enumerator OptimizeHaloRegions

Map “halo regions” to single tiles.

These are regions that are used by multiple tiles but have neighbouring regions used by subsets of those tiles.

enumerator ConstrainMappingToUsedTiles

Mapping of elements is constrained to be only on tiles that use them.

Otherwise, to meet grain size constraints, elements may be mapped to tiles which do not use them.

enumerator None

No mapping method used.

Public Functions

TensorUseTracker(unsigned numTiles, unsigned startTile = 0, bool ascendingMappingOrder = true)
TensorUseTracker(const TensorUseTracker &other)
TensorUseTracker(TensorUseTracker &&other)
TensorUseTracker &operator=(const TensorUseTracker &other)
TensorUseTracker &operator=(TensorUseTracker &&other)
~TensorUseTracker()
void add(const poplar::Graph &graph, unsigned tile, const poplar::Tensor &t)

Add a data use case.

Parameters
  • graph – The Poplar graph being tracked.

  • tile – The tile that the use occurs on.

  • t – The tensor representing the data being used.

void add(TensorUseTracker other)

Add data use cases from another tracker.

Parameters

other – The TensorUseTracker to merge data use information from.

void resolve(const poplar::Graph &graph, unsigned grainSize, unsigned minElementsPerTile, bool extendPartialUsage = false, TensorUseTracker::MappingMethod mappingMethod = TensorUseTracker::MappingMethod::None)

Resolve data uses for mapping.

Data used on multiple tiles will have their uses spread across those tiles.

Parameters
  • graph – The Poplar graph being tracked.

  • grainSize – The number of elements mapped to each tile will be an integer multiple of the grain size.

  • minElementsPerTile – The minimum number of elements that must be mapped to a tile.

  • extendPartialUsage – When set, partial uses of tensors will be extended to cover the entire tensor, based on the usage of neighbouring regions.

  • mappingMethod – Method used for mapping elements.

void mapTensorsByUse(poplar::Graph &graph, unsigned grainSize, unsigned minElementsPerTile, bool extendPartialUsage = false, TensorUseTracker::MappingMethod mappingMethod = TensorUseTracker::MappingMethod::None)

Map data according to use.

This function will set the tile mapping of variable regions based on tracked data uses. Variable regions with uses on multiple tiles will have their elements spread across those tiles.

Parameters
  • graph – The Poplar graph being tracked.

  • grainSize – The number of elements mapped to each tile will be an integer multiple of the grain size.

  • minElementsPerTile – The minimum number of elements that must be mapped to a tile.

  • extendPartialUsage – When set, partial uses of tensors will be extended to cover the entire tensor, based on the usage of neighbouring regions before mapping.

  • mappingMethod – Method used for mapping elements.

bool empty() const

Have any use cases been registered.

Returns

True if no data use cases, false otherwise

Private Members

std::unique_ptr<TensorUseTrackerState> st