TileMapping

#include <poputil/TileMapping.hpp>

Functions for handling the mapping of tensors to tiles.

namespace poputil

General utility functions for building graphs.

Functions

std::vector<std::vector<poplar::Interval>> calcLinearTileMapping(const poplar::Graph &graph, std::vector<std::size_t> shape, unsigned minElementsPerTile, unsigned grainSize, unsigned offset = 0, bool ascendingOrder = true)

Calculate a tile mapping that spreads the tensor evenly over the tiles in a graph.

By default the indices of the resulting mapping go from from low to high tile numbers, however offset and direction can be specified.

Parameters
  • graph – The graph to calculate the mapping for.

  • shape – The shape of the tensor to be mapped: a vector containing the size of each dimension of the tensor.

  • minElementsPerTile – The minimum number of tensor elements to be allocated to a tile.

  • grainSize – The number of elements mapped to each tile will be an integer multiple of the grain size.

  • offset – The offset to the first tile used for mapping

  • ascendingOrder – If true, the first tile used = offset and tiles are allocated in increasing order If false, the first tile used = (number of device tiles -1 - offset) and tiles are allocated in decreasing order

Returns

A vector specifying the mapping

std::vector<std::vector<poplar::Interval>> calcLinearTileMapping(const poplar::Graph &graph, const poplar::Tensor &t, unsigned offset = 0, bool ascendingOrder = true)

Calculate a tile mapping that spreads the tensor evenly over the tiles in a graph.

By default the indices of the resulting mapping go from from low to high tile numbers, however offset and direction can be specified.

In this case the elements are distributed so that groups of elements of the device’s natural vector width will not be split. It effectively sets the grain size to the natural vector width for the data type. This means the number of elements on each tile will be a multiple of the natural vector width and the index of the first element is aligned to the natural vector width.

The natural vector width is the largest vector width supported in hardware for arithmetic operations on that data type.

It will also try to keep at least 128 bytes of data on each tile to avoid high exchange costs.

Parameters
  • graph – The graph to add the operation to.

  • t – The tensor to be mapped

  • offset – The offset to the first tile used for mapping

  • ascendingOrder – If true, the first tile used = offset and tiles are allocated in increasing order If false, the first tile used = (number of device tiles - 1 - offset) and tiles are allocated in decreasing order

Returns

A vector specifying the mapping

std::pair<poplar::Graph::TileToTensorMapping, unsigned> calcLinearTileMappingAndNewOffset(const poplar::Graph &graph, const poplar::Tensor &t, unsigned offset = 0)

Calculate a tile mapping that spreads the tensor evenly over the tiles in a graph.

This function is similar to poputil::calcLinearTileMapping but with an additional “new offset” output equal to the last plus one tile used for the mapping. For example, consider a target with 8 tiles and a resulting mapping over 4 tiles. The value of the returned offset will be:

  • 6 if offset = 2.

  • 2 if offset = 6.

Parameters
  • graph – The graph to add the operation to.

  • t – The tensor to be mapped

  • offset – The offset to the first tile used for mapping

Returns

A pair consisting of a vector specifying the mapping and the new advanced offset.

void mapTensorLinearly(poplar::Graph &graph, const poplar::Tensor &t, unsigned minElementsPerTile, unsigned grainSize)

Map the specified tensor, spreading the tensor evenly over the tiles in a graph.

The indices of the flattened tensor are mapped from low to high tile numbers.

Parameters
  • graph – The graph to calculate the mapping for.

  • t – The tensor to be mapped.

  • minElementsPerTile – The minimum number of tensor elements to be allocated to a tile.

  • grainSize – The number of elements mapped to each tile will be an integer multiple of the grain size.

void mapTensorLinearlyWithOffset(poplar::Graph &graph, const poplar::Tensor &t, unsigned minElementsPerTile, unsigned grainSize, unsigned offset, bool ascendingOrder = true)

Map the specified tensor, spreading the tensor evenly over the tiles in a graph.

The indices of the flattened tensor are mapped from low to high tile numbers, however offset and direction can be specified.

Parameters
  • graph – The graph to calculate the mapping for.

  • t – The tensor to be mapped.

  • minElementsPerTile – The minimum number of tensor elements to be allocated to a tile.

  • grainSize – The number of elements mapped to each tile will be an integer multiple of the grain size.

  • offset – The offset to the first tile used for mapping

  • ascendingOrder – If true, the first tile used = offset and tiles are allocated in increasing order. If false, the first tile used = (number of device tiles -1 - offset) and tiles are allocated in decreasing order.

void mapTensorLinearly(poplar::Graph &graph, const poplar::Tensor &t)

Map the specified tensor, spreading the tensor evenly over the tiles in a graph.

The indices of the flattened tensor are mapped from low to high tile numbers.

In this case the elements are distributed so that groups of elements of the device’s natural vector width will not be split. It effectively sets the grain size to the natural vector width for the data type. This means the number of elements on each tile will be a multiple of the natural vector width and the index of the first element is aligned to the natural vector width.

The natural vector width is the largest vector width supported in hardware for arithmetic operations on that data type.

It will also try to keep at least 128 bytes of data on each tile to avoid high exchange costs.

Parameters
  • graph – The graph to add the operation to.

  • t – The tensor to be mapped.

void mapTensorLinearlyWithOffset(poplar::Graph &graph, const poplar::Tensor &t, unsigned offset = 0, bool ascendingOrder = true)

Map the specified tensor, spreading the tensor evenly over the tiles in a graph.

The indices of the flattened tensor are mapped from low to high tile numbers, however offset and direction can be specified. In this case the elements are distributed so that groups of elements of the device’s natural vector width will not be split. It effectively sets the grain size to the natural vector width for the data type. This means the number of elements on each tile will be a multiple of the natural vector width and the index of the first element is aligned to the natural vector width.

The natural vector width is the largest vector width supported in hardware for arithmetic operations on that data type.

It will also try to keep at least 128 bytes of data on each tile to avoid high exchange costs.

Parameters
  • graph – The graph to calculate the mapping for.

  • t – The tensor to be mapped.

  • offset – The offset to the first tile used for mapping.

  • ascendingOrder – If true, the first tile used = offset and tiles are allocated in increasing order. If false, the first tile used = (number of device tiles -1 - offset) and tiles are allocated in decreasing order.

std::size_t chooseMappingOffset(std::size_t numTiles, const std::vector<std::size_t> &shape)

Choose an offset for use with tensor mapping functions using a hash of the shape provided.

Parameters
  • numTiles – The number of tiles of the intended target device.

  • shape – The shape to produce a hash of.

Returns

The selected offset in the range 0 to numTiles - 1

std::size_t chooseMappingOffset(std::size_t numTiles, const std::vector<std::size_t> &shape, std::size_t seed)

Choose an offset for use with tensor mapping functions using a hash of the shape, and a seed.

Parameters
  • numTiles – The number of tiles of the intended target device.

  • shape – The shape to produce a hash of.

  • seed – Optional seed to use in producing the hash.

Returns

The selected offset in the range 0 to numTiles - 1

unsigned getTileImbalance(const poplar::Graph::TileToTensorMapping &mapping, unsigned minElementsPerTile = 0, unsigned grainSize = 1)

Determine how unbalanced a tensor is when mapped over tiles in a graph.

This reports how well a tensor mapping compares with the mapping based on a given number of elements per tile.

Parameters
  • mapping – The current tile mapping of the tensor.

  • minElementsPerTile – The suggested minimum number of elements per tile.

  • grainSize – The number of elements mapped to each tile would be an integer multiple of the suggested grain size.

Returns

The maximum number of elements greater than expected on any tile.

unsigned getTileImbalance(const poplar::Graph &graph, const poplar::Tensor &t, unsigned minElementsPerTile = 0, unsigned grainSize = 1)

Determine how unbalanced a tensor is mapped over tiles.

This compares the way a tensor is mapped to a set of tiles to the mapping based on a given number of elements per tile.

Parameters
  • graph – The graph containing the mapped tensor.

  • mapping – The tensor currently mapped to tiles in the graph.

  • minElementsPerTile – The suggested minimum number of elements per tile.

  • grainSize – The number of elements mapped to each tile would be an integer multiple of the suggested grain size.

Returns

The maximum number of elements greater than expected on any tile.

poplar::Tensor cloneToIpu(poplar::Graph &graph, const poplar::Tensor &t, unsigned dstIPU, const poplar::DebugContext &debugContext = {}, poplar::TensorCloneMethod method = poplar::TensorCloneMethod::PRESERVE_ORDER_UNLESS_ALIASES)

Create a clone of the specified tensor on the specified IPU.

The cloned tensor is mapped to the IPU in such a way that the mapping of tensor elements to tiles is preserved.

Parameters
  • graph – The graph representing the entire multi-IPU device.

  • t – The tensor to clone.

  • dstIPU – The index of the IPU to clone the tensor onto.

  • name – A debug name to give to any new tensors allocated in the graph during the clone. If this is empty then the debug names will be derived from existing tensor debug names.

  • method – The method to use for the cloning.

Throws

poputil::poplibs_error – If dstIPU is greater than or equal to the number of IPUs targeted by the graph.

Returns

The cloned tensor.

poplar::Tensor cloneToGraph(poplar::Graph &srcGraph, poplar::Graph &dstGraph, const poplar::Tensor &t, const poplar::DebugContext &debugContext = {}, poplar::TensorCloneMethod method = poplar::TensorCloneMethod::PRESERVE_ORDER_UNLESS_ALIASES)

Create a clone of the specified tensor on the specified graph.

The cloned tensor is mapped to the destination graph in such a way that the mapping of tensor elements to tiles is preserved.

Note

It is assumed that the destination graph has enough tiles to clone the input tensor. This includes any gaps in the tile mapping. This means the maximum mapped tile of t in the source graph must be less than dstGraph.getTarget().getNumTiles().

Parameters
  • srcGraph – The graph representing the source tiles.

  • dstGraph – The graph representing the destination tiles.

  • t – The tensor to clone.

  • debugContext – Optional debug information

  • method – The method to use for the cloning.

Returns

The cloned tensor.

std::pair<poplar::Tensor, unsigned> cloneAndExpandAliasing(poplar::Graph &graph, const poplar::Tensor &t, unsigned offset = 0, const poplar::DebugContext &debugContext = {})

Create a clone of the specified tensor on the specified graph.

The cloned tensor is mapped to the graph in such a way that the mapping of tensor elements to tiles is preserved. If the source tensor consists of aliasing intervals, these will be made non-aliasing in the cloned tensor and mapped linearly accross the tiles with the specified tile offset. The remapping is done as a precautionary measure to reduce the chance of getting out of memory issues on a tile which has many aliasing elements.

In addition to the cloned tensor, this function returns “new offset” output equal to the last plus one tile used for the mapping of the expanded aliasing elements. See poputil::calcLinearTileMappingAndNewOffset for more details.

Parameters
  • graph – The graph to add the operation to.

  • t – The tensor to clone.

  • offset – The offset to the first tile used for mapping the elements of the resulting tensor corresponding to aliasing elements of the source tensor.

  • debugContext – Optional debug information

Returns

A pair consisting of the cloned tensor and the new advanced offset.

poplar::Tensor copyToIpu(poplar::Graph &masterGraph, const poplar::Tensor &t, poplar::program::Sequence &prog, unsigned dstIPU, const poplar::DebugContext &debugContext = {}, poplar::TensorCloneMethod method = poplar::TensorCloneMethod::PRESERVE_ORDER_UNLESS_ALIASES)

Move a tensor from one IPU to another.

The tensor is moved by duplicating it, mapping the clone onto another IPU, and copying the original tensor values to the new one.

Parameters
  • masterGraph – The graph representing the entire multi-IPU device.

  • t – The tensor to move from one IPU to another.

  • prog – A program sequence to add the Copy to.

  • dstIPU – The index of the IPU onto which the tensor will be moved.

  • debugContext – A debug name to give to the tensor created on dstIPU. If this is empty then the debug names will be derived from existing tensor debug names.

  • method – The method to use for cloning of the tensor on the destination IPU.

Throws

poputil::poplibs_error – If dstIPU is greater than or equal to the number of IPUs targeted by the graph.

Returns

The new tensor on the specified IPU.

poplar::Tensor createIpuCopy(poplar::Graph &graph, const poplar::Tensor &t, unsigned dstIpu, poplar::Tensor &copySrc, poplar::Tensor &copyDst, const poplar::DebugContext &debugContext = {}, poplar::TensorCloneMethod method = poplar::TensorCloneMethod::PRESERVE_ORDER_AND_ALIASES)

Prepare to move a tensor from one IPU to another.

The tensor is duplicated and the clone is mapped onto another IPU. References to source and destination tensors are provided for use by an inter-IPU copy.

The necessary copy operation is not added to the program.

Parameters
  • masterGraph – The graph representing the entire multi-IPU device.

  • t – The tensor to move from one IPU to another.

  • dstIPU – The index of the IPU onto which the tensor will be moved.

  • copySrc – A tensor that can be used as the source to do the copy.

  • copyDst – A tensor that can be used as the destination of the copy.

  • debugContext – A debug name to give to the tensor created on dstIPU. If this is empty then the debug names will be derived from existing tensor debug names.

  • method – The method to use for cloning of the tensor on the destination IPU.

Throws

poputil::poplibs_error – If dstIPU is greater than or equal to the number of IPUs targeted by the graph.

Returns

The new tensor on the specified IPU.

bool dimIsSplitOverTiles(const poplar::Graph &graph, const poplar::Tensor &t, unsigned dimension)

Check if a dimension of a tensor is split over more than one tile.

Examines the mapping of the specified tensor to see if the specified dimension is split over more than one tile.

Parameters
  • graph – The graph to examine.

  • t – The tensor to check.

  • dimension – The dimension to check.

Returns

True if elements of the given dimension are spread over more than one tile.

bool dimIsSplitOverIPUs(const poplar::Graph &graph, const poplar::Tensor &t, unsigned dimension)

Check if a dimension of a tensor is split over more than one IPU.

Examines the mapping of the specified tensor to see if the specified dimension is split over more than one IPU.

Parameters
  • graph – The graph to examine.

  • t – The tensor to check.

  • dimension – The dimension to check.

Returns

True if elements of the given dimension are spread over more than one IPU.

poplar::Tensor createBroadcastOperand(poplar::Graph &graph, const poplar::Tensor &fullTensor, const poplar::Type &type, unsigned dim, bool ditherMapping = false, const poplar::DebugContext &debugContext = {})

Create a simpler tensor that is mapped in the same way as another, full, tensor.

The full tensor is typically a left hand side operand of an operation while the created tensor is the right hand side. The created tensor has one dimension, which is the same size as the specified dimension of the full tensor.

Because the created tensor has the same mapping as the full tensor, it reduces the amount of data exchange or copies that are required for an operation using the two tensors.

Parameters
  • graph – The graph which the output tensor is added to.

  • fullTensor – The tensor mapping for the output tensor is copied from this tensor.

  • type – The type of the output tensor.

  • dim – The dimension of the input tensor which is the size of the created tensor.

  • ditherMapping – Enable dithering to be applied to the mapping of the output tensor.

  • debugContext – Optional debug information.

Returns

The created output tensor.

unsigned transformTileIndex(unsigned tile, unsigned numTiles, unsigned offset, bool ascending)

Transform a tile index such that the result begins at zero and increments.

Parameters
  • tile – The tile number to transform.

  • numTiles – The number of tiles on the target device.

  • offset – The offset to the first tile used for the mapping before the transform takes place.

  • ascendingOrder – Mapping order before the transform takes place: If true, the first tile used = offset and tiles are allocated in increasing order. If false, the first tile used = (number of device tiles -1 - offset) and tiles are allocated in decreasing order.

Returns

Transformed tile number.

unsigned invTransformTileIndex(unsigned tile, unsigned numTiles, unsigned offset, bool ascending)

Transform a tile index such that the result begins at an offset and increments or decrements.

Parameters
  • tile – The tile number to transform.

  • numTiles – The number of tiles on the target device.

  • offset – The offset to the first tile used for the mapping after the transform takes place.

  • ascendingOrder – Mapping order after the transform takes place: If true, the first tile used = offset and tiles are allocated in increasing order. If false, the first tile used = (number of device tiles -1 - offset) and tiles are allocated in decreasing order.

Returns

Transformed tile number

class TensorUseTracker
#include <TileMapping.hpp>

Class that tracks the usage of data on different tiles.

If data is broadcast to many tiles, it is sometimes efficient to map the data so that it is spread evenly amongst the tiles that use it.

This class can collect information about the use of data and then calculate a suitable tile mapping.

Public Types

enum MappingMethod

Values:

enumerator OptimizeHaloRegions

Map “halo regions” to single tiles.

These are regions that are used by multiple tiles but have neighbouring regions used by subsets of those tiles.

enumerator ConstrainMappingToUsedTiles

Mapping of elements is constrained to be only on tiles that use them.

Otherwise, to meet grain size constraints, elements may be mapped to tiles which do not use them.

enumerator None

No mapping method used.

Public Functions

TensorUseTracker(unsigned numTiles, unsigned startTile = 0, bool ascendingMappingOrder = true)

Constructor for the TensorUseTracker class.

Parameters
  • numTiles – The number of tiles to track data use of.

  • startTile – The tile to start tracking data use on.

  • ascendingMappingOrder – If true, the first tile used = startTile and tiles are allocated in increasing order. If false, the first tile used = (number of device tiles -1

    • startTile) and tiles are allocated in decreasing order.

TensorUseTracker(const TensorUseTracker &other)

Constructor for the TensorUseTracker class.

TensorUseTracker(TensorUseTracker &&other)

Default constructor for the TensorUseTracker class.

TensorUseTracker &operator=(const TensorUseTracker &other)

Assignment operator for the TensorUseTracker class.

TensorUseTracker &operator=(TensorUseTracker &&other)

Default assignment operator for the TensorUseTracker class.

~TensorUseTracker()

Destructor for the TensorUserTracker class.

void add(const poplar::Graph &graph, unsigned tile, const poplar::Tensor &t)

Add a case of data usage.

Parameters
  • graph – The Poplar graph being tracked.

  • tile – The tile that the use occurs on.

  • t – The tensor representing the data being used.

void add(TensorUseTracker other)

Add cases of data usage from another tracker.

Parameters

other – The TensorUseTracker to merge data usage information from.

void resolve(const poplar::Graph &graph, unsigned grainSize, unsigned minElementsPerTile, bool extendPartialUsage = false, TensorUseTracker::MappingMethod mappingMethod = TensorUseTracker::MappingMethod::None)

Resolve data usage for mapping.

Data used on multiple tiles will have their usage spread across those tiles.

Parameters
  • graph – The Poplar graph being tracked.

  • grainSize – The number of elements mapped to each tile will be an integer multiple of the grain size.

  • minElementsPerTile – The minimum number of elements that must be mapped to a tile.

  • extendPartialUsage – When set, partial usage of tensors will be extended to cover the entire tensor, based on the usage of neighbouring regions.

  • mappingMethod – Method used for mapping elements.

void mapTensorsByUse(poplar::Graph &graph, unsigned grainSize, unsigned minElementsPerTile, bool extendPartialUsage = false, TensorUseTracker::MappingMethod mappingMethod = TensorUseTracker::MappingMethod::None)

Map data according to use.

This function will set the tile mapping of variable regions based on tracked data use. Variable regions with usage on multiple tiles will have their elements spread across those tiles.

Parameters
  • graph – The Poplar graph being tracked.

  • grainSize – The number of elements mapped to each tile will be an integer multiple of the grain size.

  • minElementsPerTile – The minimum number of elements that must be mapped to a tile.

  • extendPartialUsage – When set, partial use of tensors will be extended to cover the entire tensor, based on the usage of neighbouring regions before mapping.

  • mappingMethod – Method used for mapping elements.

bool empty() const

Check if any cases of data usage have been registered.

Returns

If true, no cases have been registered. If false, cases have been registered.

Private Members

std::unique_ptr<TensorUseTrackerState> st