TileMapping
#include <poputil/TileMapping.hpp>
Functions for handling the mapping of tensors to tiles.
-
namespace poputil
General utility functions for building graphs.
Functions
-
std::vector<std::vector<poplar::Interval>> calcLinearTileMapping(const poplar::Graph &graph, std::vector<std::size_t> shape, unsigned minElementsPerTile, unsigned grainSize, unsigned offset = 0, bool ascendingOrder = true)
Calculate a tile mapping that spreads the tensor evenly over the tiles in a graph.
By default the indices of the resulting mapping go from from low to high tile numbers, however offset and direction can be specified.
- Parameters
graph – The graph to calculate the mapping for.
shape – The shape of the tensor to be mapped: a vector containing the size of each dimension of the tensor.
minElementsPerTile – The minimum number of tensor elements to be allocated to a tile.
grainSize – The number of elements mapped to each tile will be an integer multiple of the grain size.
offset – The offset to the first tile used for mapping
ascendingOrder – If true, the first tile used = offset and tiles are allocated in increasing order If false, the first tile used = (number of device tiles -1 - offset) and tiles are allocated in decreasing order
- Returns
A vector specifying the mapping
-
std::vector<std::vector<poplar::Interval>> calcLinearTileMapping(const poplar::Graph &graph, const poplar::Tensor &t, unsigned offset = 0, bool ascendingOrder = true)
Calculate a tile mapping that spreads the tensor evenly over the tiles in a graph.
By default the indices of the resulting mapping go from from low to high tile numbers, however offset and direction can be specified.
In this case the elements are distributed so that groups of elements of the device’s natural vector width will not be split. It effectively sets the grain size to the natural vector width for the data type. This means the number of elements on each tile will be a multiple of the natural vector width and the index of the first element is aligned to the natural vector width.
The natural vector width is the largest vector width supported in hardware for arithmetic operations on that data type.
It will also try to keep at least 128 bytes of data on each tile to avoid high exchange costs.
- Parameters
graph – The graph to add the operation to.
t – The tensor to be mapped
offset – The offset to the first tile used for mapping
ascendingOrder – If true, the first tile used = offset and tiles are allocated in increasing order If false, the first tile used = (number of device tiles - 1 - offset) and tiles are allocated in decreasing order
- Returns
A vector specifying the mapping
-
void mapTensorLinearly(poplar::Graph &graph, const poplar::Tensor &t, unsigned minElementsPerTile, unsigned grainSize)
Map the specified tensor, spreading the tensor evenly over the tiles in a graph.
The indices of the flattened tensor are mapped from low to high tile numbers, however offset and direction can be specified.
- Parameters
graph – The graph to calculate the mapping for.
t – The tensor to be mapped.
minElementsPerTile – The minimum number of tensor elements to be allocated to a tile.
grainSize – The number of elements mapped to each tile will be an integer multiple of the grain size.
offset – The offset to the first tile used for mapping
ascendingOrder – If true, the first tile used = offset and tiles are allocated in increasing order If false, the first tile used = (number of device tiles -1 - offset) and tiles are allocated in decreasing order
-
void mapTensorLinearlyWithOffset(poplar::Graph &graph, const poplar::Tensor &t, unsigned minElementsPerTile, unsigned grainSize, unsigned offset, bool ascendingOrder = true)
-
void mapTensorLinearly(poplar::Graph &graph, const poplar::Tensor &t)
Map the specified tensor, spreading the tensor evenly over the tiles in a graph.
The indices of the flattened tensor are mapped from low to high tile numbers, however offset and direction can be specified.
In this case the elements are distributed so that groups of elements of the device’s natural vector width will not be split. It effectively sets the grain size to the natural vector width for the data type. This means the number of elements on each tile will be a multiple of the natural vector width and the index of the first element is aligned to the natural vector width.
The natural vector width is the largest vector width supported in hardware for arithmetic operations on that data type.
It will also try to keep at least 128 bytes of data on each tile to avoid high exchange costs.
- Parameters
graph – The graph to add the operation to.
t – The tensor to be mapped.
offset – The offset to the first tile used for mapping
ascendingOrder – If true, the first tile used = offset and tiles are allocated in increasing order If false, the first tile used = (number of device tiles -1 - offset) and tiles are allocated in decreasing order
-
void mapTensorLinearlyWithOffset(poplar::Graph &graph, const poplar::Tensor &t, unsigned startTile = 0, bool ascendingOrder = true)
-
std::size_t chooseMappingOffset(std::size_t numTiles, const std::vector<std::size_t> &shape)
Choose a offset for use with tensor mapping functions using a hash of the shape provided with the seed provided.
- Parameters
numTiles – The number of tiles of the intended target device.
shape – The shape to produce a hash of.
seed – Optional seed to use in producing the hash.
- Returns
The selected offset in the range 0 to numTiles - 1
-
std::size_t chooseMappingOffset(std::size_t numTiles, const std::vector<std::size_t> &shape, std::size_t seed)
-
unsigned getTileImbalance(const poplar::Graph::TileToTensorMapping &mapping, unsigned minElementsPerTile = 0, unsigned grainSize = 1)
Determine how unbalanced a tensor is when mapped over tiles in a graph.
This reports how well a tensor mapping compares with the mapping based on a given number of elements per tile.
- Parameters
mapping – The current tile mapping of the tensor.
minElementsPerTile – The suggested minimum number of elements per tile.
grainSize – The number of elements mapped to each tile would be an integer multiple of the suggested grain size.
- Returns
The maximum number of elements greater than expected on any tile.
-
unsigned getTileImbalance(const poplar::Graph &graph, const poplar::Tensor &t, unsigned minElementsPerTile = 0, unsigned grainSize = 1)
Determine how unbalanced a tensor is mapped over tiles.
This compares the way a tensor is mapped to a set of tiles to the mapping based on a given number of elements per tile.
- Parameters
graph – The graph containing the mapped tensor.
mapping – The tensor currently mapped to tiles in the graph.
minElementsPerTile – The suggested minimum number of elements per tile.
grainSize – The number of elements mapped to each tile would be an integer multiple of the suggested grain size.
- Returns
The maximum number of elements greater than expected on any tile.
-
poplar::Tensor cloneToIpu(poplar::Graph &graph, const poplar::Tensor &t, unsigned dstIPU, const poplar::DebugContext &debugContext = {}, poplar::TensorCloneMethod method = poplar::TensorCloneMethod::PRESERVE_ORDER_UNLESS_ALIASES)
Create a clone of the specified tensor on the specified IPU.
The cloned tensor is mapped to the IPU in such a way that the mapping of tensor elements to tiles is preserved.
- Parameters
graph – The graph representing the entire multi-IPU device.
t – The tensor to clone.
dstIPU – The index of the IPU to clone the tensor onto.
name – A debug name to give to any new tensors allocated in the graph during the clone. If this is empty then the debug names will be derived from existing tensor debug names.
method – The method to use for the cloning.
- Returns
The cloned tensor.
-
poplar::Tensor cloneToGraph(poplar::Graph &srcGraph, poplar::Graph &dstGraph, const poplar::Tensor &t, const poplar::DebugContext &debugContext = {}, poplar::TensorCloneMethod method = poplar::TensorCloneMethod::PRESERVE_ORDER_UNLESS_ALIASES)
Create a clone of the specified tensor on the specified graph.
The cloned tensor is mapped to the destination graph in such a way that the mapping of tensor elements to tiles is preserved.
Note
It is assumed that the destination graph has enough tiles to clone the input tensor. This includes any gaps in the tile mapping. This means the maximum mapped tile of
t
in the source graph must be less thandstGraph.getTarget().getNumTiles()
.- Parameters
srcGraph – The graph representing the source tiles.
dstGraph – The graph representing the destination tiles.
t – The tensor to clone.
debugContext – Optional debug information
method – The method to use for the cloning.
- Returns
The cloned tensor.
-
poplar::Tensor copyToIpu(poplar::Graph &masterGraph, const poplar::Tensor &t, poplar::program::Sequence &prog, unsigned dstIPU, const poplar::DebugContext &debugContext = {}, poplar::TensorCloneMethod method = poplar::TensorCloneMethod::PRESERVE_ORDER_UNLESS_ALIASES)
Move a tensor from one IPU to another.
The tensor is moved by duplicating it, mapping the clone onto another IPU, and copying the original tensor values to the new one.
- Parameters
masterGraph – The graph representing the entire multi-IPU device.
t – The tensor to move from one IPU to another.
prog – A program sequence to add the Copy to.
dstIPU – The index of the IPU onto which the tensor will be moved.
debugContext – A debug name to give to the tensor created on dstIPU. If this is empty then the debug names will be derived from existing tensor debug names.
method – The method to use for cloning of the tensor on the destination IPU.
- Returns
The new tensor on the specified IPU.
-
poplar::Tensor createIpuCopy(poplar::Graph &graph, const poplar::Tensor &t, unsigned dstIpu, poplar::Tensor ©Src, poplar::Tensor ©Dst, const poplar::DebugContext &debugContext = {}, poplar::TensorCloneMethod method = poplar::TensorCloneMethod::PRESERVE_ORDER_AND_ALIASES)
Prepare to move a tensor from one IPU to another.
The tensor is duplicated and the clone is mapped onto another IPU. References to source and destination tensors are provided for use by an inter-IPU copy.
The necessary copy operation is not added to the program.
- Parameters
masterGraph – The graph representing the entire multi-IPU device.
t – The tensor to move from one IPU to another.
dstIPU – The index of the IPU onto which the tensor will be moved.
copySrc – A tensor that can be used as the source to do the copy.
copyDst – A tensor that can be used as the destination of the copy.
debugContext – A debug name to give to the tensor created on dstIPU. If this is empty then the debug names will be derived from existing tensor debug names.
method – The method to use for cloning of the tensor on the destination IPU.
- Returns
The new tensor on the specified IPU.
-
bool dimIsSplitOverTiles(const poplar::Graph &graph, const poplar::Tensor &t, unsigned dimension)
Check if a dimension of a tensor is split over more than one tile.
Examines the mapping of the specified tensor to see if the specified dimension is split over more than one tile.
- Parameters
graph – The graph to examine.
t – The tensor to check.
dimension – The dimension to check.
- Returns
True if elements of the given dimension are spread over more than one tile.
-
bool dimIsSplitOverIPUs(const poplar::Graph &graph, const poplar::Tensor &t, unsigned dimension)
Check if a dimension of a tensor is split over more than one IPU.
Examines the mapping of the specified tensor to see if the specified dimension is split over more than one IPU.
- Parameters
graph – The graph to examine.
t – The tensor to check.
dimension – The dimension to check.
- Returns
True if elements of the given dimension are spread over more than one IPU.
-
poplar::Tensor createBroadcastOperand(poplar::Graph &graph, const poplar::Tensor &fullTensor, const poplar::Type &type, unsigned dim, bool ditherMapping = false, const poplar::DebugContext &debugContext = {})
Create a simpler tensor that is mapped in the same way as another, full, tensor.
The full tensor is typically a left hand side operand of an operation while the created tensor is the right hand side. The created tensor has one dimension, which is the same size as the specified dimension of the full tensor.
Because the created tensor has the same mapping as the full tensor, it reduces the amount of data exchange or copies that are required for an operation using the two tensors.
- Parameters
graph – The graph which the output tensor is added to.
fullTensor – The tensor mapping for the output tensor is copied from this tensor.
type – The type of the output tensor.
dim – The dimension of the input tensor which is the size of the created tensor.
ditherMapping – Enable dithering to be applied to the mapping of the output tensor.
debugContext – Optional debug information.
- Returns
The created output tensor.
-
unsigned transformTileIndex(unsigned tile, unsigned numTiles, unsigned offset, bool ascending)
Transform a tile index such that the result begins at zero and increments.
- Parameters
tile – The tile number to transform.
numTiles – The number of tiles on the target device.
offset – The offset to the first tile used for the mapping before the transform takes place.
ascendingOrder – Mapping order before the transform takes place: If true, the first tile used = offset and tiles are allocated in increasing order If false, the first tile used = (number of device tiles -1 - offset) and tiles are allocated in decreasing order
- Returns
Transformed tile number
-
unsigned invTransformTileIndex(unsigned tile, unsigned numTiles, unsigned offset, bool ascending)
Transform a tile index such that the result begins at an offset and increments or decrements.
- Parameters
tile – The tile number to transform.
numTiles – The number of tiles on the target device.
offset – The offset to the first tile used for the mapping after the transform takes place.
ascendingOrder – Mapping order after the transform takes place: If true, the first tile used = offset and tiles are allocated in increasing order If false, the first tile used = (number of device tiles -1 - offset) and tiles are allocated in decreasing order
- Returns
Transformed tile number
-
class TensorUseTracker
- #include <TileMapping.hpp>
Class that tracks the usage of data on different tiles.
If data is broadcast to many tiles, it is sometimes efficient to map the data so that it is spread evenly amongst the tiles that use it.
This class can collect information about the use of data and then calculate a suitable tile mapping.
Public Types
-
enum MappingMethod
Values:
-
enumerator OptimizeHaloRegions
Map “halo regions” to single tiles.
These are regions that are used by multiple tiles but have neighbouring regions used by subsets of those tiles.
-
enumerator ConstrainMappingToUsedTiles
Mapping of elements is constrained to be only on tiles that use them.
Otherwise, to meet grain size constraints, elements may be mapped to tiles which do not use them.
-
enumerator None
No mapping method used.
-
enumerator OptimizeHaloRegions
Public Functions
-
TensorUseTracker(unsigned numTiles, unsigned startTile = 0, bool ascendingMappingOrder = true)
-
TensorUseTracker(const TensorUseTracker &other)
-
TensorUseTracker(TensorUseTracker &&other)
-
TensorUseTracker &operator=(const TensorUseTracker &other)
-
TensorUseTracker &operator=(TensorUseTracker &&other)
-
~TensorUseTracker()
-
void add(const poplar::Graph &graph, unsigned tile, const poplar::Tensor &t)
Add a data use case.
- Parameters
graph – The Poplar graph being tracked.
tile – The tile that the use occurs on.
t – The tensor representing the data being used.
-
void add(TensorUseTracker other)
Add data use cases from another tracker.
- Parameters
other – The
TensorUseTracker
to merge data use information from.
-
void resolve(const poplar::Graph &graph, unsigned grainSize, unsigned minElementsPerTile, bool extendPartialUsage = false, TensorUseTracker::MappingMethod mappingMethod = TensorUseTracker::MappingMethod::None)
Resolve data uses for mapping.
Data used on multiple tiles will have their uses spread across those tiles.
- Parameters
graph – The Poplar graph being tracked.
grainSize – The number of elements mapped to each tile will be an integer multiple of the grain size.
minElementsPerTile – The minimum number of elements that must be mapped to a tile.
extendPartialUsage – When set, partial uses of tensors will be extended to cover the entire tensor, based on the usage of neighbouring regions.
mappingMethod – Method used for mapping elements.
-
void mapTensorsByUse(poplar::Graph &graph, unsigned grainSize, unsigned minElementsPerTile, bool extendPartialUsage = false, TensorUseTracker::MappingMethod mappingMethod = TensorUseTracker::MappingMethod::None)
Map data according to use.
This function will set the tile mapping of variable regions based on tracked data uses. Variable regions with uses on multiple tiles will have their elements spread across those tiles.
- Parameters
graph – The Poplar graph being tracked.
grainSize – The number of elements mapped to each tile will be an integer multiple of the grain size.
minElementsPerTile – The minimum number of elements that must be mapped to a tile.
extendPartialUsage – When set, partial uses of tensors will be extended to cover the entire tensor, based on the usage of neighbouring regions before mapping.
mappingMethod – Method used for mapping elements.
-
bool empty() const
Have any use cases been registered.
- Returns
True if no data use cases, false otherwise
-
enum MappingMethod
-
std::vector<std::vector<poplar::Interval>> calcLinearTileMapping(const poplar::Graph &graph, std::vector<std::size_t> shape, unsigned minElementsPerTile, unsigned grainSize, unsigned offset = 0, bool ascendingOrder = true)