Gather

#include <popops/Gather.hpp>

Support for gather operations.

namespace popops

Common functions, such as elementwise and reductions.

map

Map an expression across tensors.

Elementwise Options

enableGenerateCodelet (true, false) [=true]

When true and the following conditions are met, poplar will generate a codelet to execute the map operation. Otherwise, it will sequence poplibs codelets to create the expression.
- All of the inputs are of the same size
- Inputs do not alias
- Multiple operations are being performed

mapInPlace

Update the input tensors with the result of map().

mapWithOutput

Write the result of map() to the given output tensor.

checkTypes

Check that the host compile-time type constType is compatible with the run-time IPU type elementType.

param elementType: The run-time IPU type.
param constant: Unused.

tparam constType: The host compile-time type.

throws std::runtime_error: If the types are not compatible.

varianceToInvStdDev

Convert variance to inverse standard deviation.

Each element in the output tensor is the result of 1 / sqrt(variance_value + epsilon), where variance_value is the corresponding element in variance.

Warning

If variance_value + epsilon is zero then the result will be invalid and this operation could generate a divide-by-zero floating-point exception (if enabled).

param graph: The graph to update.
param variance: A tensor of variance values.
param epsilon: A (typically small) scalar to add to the variance values, to avoid numerical issues (for example, divide by zero).
param prog: The sequence of programs to append this conversion operation to.
param debugContext: Optional debug information.

return: A tensor where each element is the inverse standard deviation.

invStdDevToVariance

Convert inverse standard deviation to variance.

Each element in the output tensor is the result of 1 / (invStdDev_value + epsilon)^2, where invStdDev_value is the corresponding element in invStdDev.

Warning

If invStdDev_value + epsilon is zero then the result will be invalid and this operation could generate a divide-by-zero floating-point exception (if enabled).

param graph: The graph to update.
param invStdDev: A tensor of inverse standard deviation values.
param epsilon: A (typically small) scalar to add to the variance values, to avoid numerical issues (for example, divide by zero).
param prog: The sequence of programs to append this conversion operation to.
param debugContext: Optional debug information.
param options: A list of flags to pass to the expression evaluator.

return: A tensor where each element is the variance.

add

Add each element in A to the corresponding element in B.

param graph: The graph to update.
param A: A tensor of elements.
param B: A tensor of elements.
param prog: The sequence to extend with the execution of the expression evaluation.
param debugContext: Optional debug information
param options: Element-wise options. See map().

return: A tensor where each element is the result of a + b, where a and b are the corresponding elements of A and B tensors respectively.

addInPlace

Update the tensor A with the result of add().

See add() for parameter descriptions.

addWithOutput

Write the result of add() to the given output tensor, out.

See add() for the remaining parameter descriptions.

param out: The tensor to write the results to.

sub

Subtract the elements of B from A and return the result in a new tensor.

param graph: The graph to update.
param A: The tensor of elements which will be subtracted from.
param B: The tensor of elements to subtract from A.
param prog: The sequence to extend with the execution of the expression evaluation.
param debugContext: Optional debug information.
param options: Element-wise options. See map().

return: A tensor where each element is equal to a - b, where a and b are the corresponding elements of A and B tensors respectively.

subInPlace

Update the tensor A with the result of sub().

See sub() for parameter descriptions.

subWithOutput

Write the result of sub() to the given output tensor, out.

See sub() for the remaining parameter descriptions.

param out: The tensor to write the results to.

mul

Multiply each element in A by the corresponding element in B.

param graph: The graph to update.
param A: A tensor of elements.
param B: A tensor of elements.
param prog: The sequence to extend with the execution of the expression evaluation.
param debugContext: Optional debug information.
param options: Element-wise options. See map().

return: A tensor where each element is the result of a * b, where a and b are the corresponding elements of A and B tensors respectively.

mulInPlace

Update the tensor A with the result of mul().

See mul() for parameter descriptions.

mulWithOutput

Write the result of mul() to the given output tensor, out.

See mul() for the remaining parameter descriptions.

param out: The tensor to write the results to.

div

Divide each element in A by the corresponding element in B.

param graph: The graph to update.
param A: The tensor of dividends.
param B: The tensor of divisors.
param prog: The sequence to extend with the execution of the expression evaluation.
param debugContext: Optional debug information
param options: Element-wise options. See map().

return: A tensor where each element is the result of a / b, where a and b are the corresponding elements of A and B tensors respectively.

divInPlace

Update the tensor A with the result of div().

See div() for parameter descriptions.

divWithOutput

Write the result of div() to the given output tensor, out.

See div() for the remaining parameter descriptions.

param out: The tensor to write the results to.

pow

Compute each element in A to the power of the corresponding element in B.

param graph: The graph to update.
param A: The tensor of bases.
param B: The tensor of exponents.
param prog: The sequence to extend with the execution of the expression evaluation.
param debugContext: Optional debug information.
param options: Element-wise options. See map().

return: A tensor where each element is equal to pow(a, b), where a and b are the corresponding elements of A and B tensors respectively.

powInPlace

Update the tensor A with the result of pow().

See pow() for parameter descriptions.

powWithOutput

Write the result of pow() to the given output tensor, out.

See pow() for the remaining parameter descriptions.

param out: The tensor to write the results to.

rem

Compute the remainder of each element in A divided by the corresponding element in B.

param graph: The graph to update.
param A: The tensor of dividends.
param B: The tensor of divisors.
param prog: The sequence to extend with the execution of the expression evaluation.
param debugContext: Optional debug information.
param options: Element-wise options. See map().

return: A tensor where each element is equal to a % b, where a and b are the corresponding elements of A and B tensors respectively.

remInPlace

Update the tensor A with the result of rem().

See rem() for parameter descriptions.

remWithOutput

Write the result of rem() to the given output tensor, out.

See rem() for the remaining parameter descriptions.

param out: The tensor to write the results to.

bitwiseAnd

Compute the bitwise AND of each element in A with the corresponding element in B.

param graph: The graph to update.
param A: A tensor of elements.
param B: A tensor of elements.
param prog: The sequence to extend with the execution of the expression evaluation.
param debugContext: Optional debug information
param options: Element-wise options. See map().

return: A tensor where each element is the result of a & b, where a and bare the corresponding elements of A and B tensors respectively.

bitwiseAndInPlace

Update the tensor A with the result of bitwiseAnd().

See bitwiseAnd() for parameter descriptions.

bitwiseAndWithOutput

Write the result of bitwiseAnd() to the given output tensor, out.

See bitwiseAnd() for the remaining parameter descriptions.

param out: The tensor to write the results to.

bitwiseOr

Compute the bitwise OR of each element in A with the corresponding element in B.

param graph: The graph to update.
param A: A tensor of elements.
param B: A tensor of elements.
param prog: The sequence to extend with the execution of the expression evaluation.
param debugContext: Optional debug information
param options: Element-wise options. See map().

return: A tensor where each element is the result of a | b, where a and b are the corresponding elements of A and B tensors respectively.

bitwiseOrInPlace

Update the tensor A with the result of bitwiseOr().

See bitwiseOr() for parameter descriptions.

bitwiseOrWithOutput

Write the result of bitwiseOr() to the given output tensor, out.

See bitwiseOr() for the remaining parameter descriptions.

param out: The tensor to write the results to.

bitwiseXor

Compute the bitwise XOR of each element in A with the corresponding element in B.

param graph: The graph to update.
param A: A tensor of elements.
param B: A tensor of elements.
param prog: The sequence to extend with the execution of the expression evaluation.
param debugContext: Optional debug information
param options: Element-wise options. See map().

return: A tensor where each element is the result of a ^ b, where a and b are the corresponding elements of A and B tensors respectively.

bitwiseXorInPlace

Update the tensor A with the result of bitwiseXor().

See bitwiseXnor() for parameter descriptions.

bitwiseXorWithOutput

Write the result of bitwiseXor() to the given output tensor, out.

See bitwiseXor() for the remaining parameter descriptions.

param out: The tensor to write the results to.

bitwiseXnor

Compute the bitwise XNOR of each element in A with the corresponding element in B.

param graph: The graph to update.
param A: A tensor of elements.
param B: A tensor of elements.
param prog: The sequence to extend with the execution of the expression evaluation.
param debugContext: Optional debug information
param options: Element-wise options. See map().

return: A tensor where each element is the result of !(a ^ b), where a and b are the corresponding elements of A and B tensors respectively.

bitwiseXnorInPlace

Update the tensor A with the result of bitwiseXnor().

See bitwiseXnor() for parameter descriptions.

bitwiseXnorWithOutput

Write the result of bitwiseXnor() to the given output tensor, out.

See bitwiseXnor() for the remaining parameter descriptions.

param out: The tensor to write the results to.

shiftLeft

Shift the elements of A left by the corresponding elements of B.

param graph: The graph to update.
param A: The tensor of elements which to left-shift.
param B: The tensor of elements that describe the amount to left-shift A by.
param prog: The sequence to extend with the execution of the expression evaluation.
param debugContext: Optional debug information.
param options: Element-wise options. See map().

return: A tensor where each element is equal to a << b, where a and b are the corresponding elements of A and B tensors respectively.

shiftLeftInPlace

Update the tensor A with the result of shiftLeft().

See shiftLeft() for parameter descriptions.

shiftLeftWithOutput

Write the result of shiftLeft() to the given output tensor, out.

See shiftLeft() for the remaining parameter descriptions.

param out: The tensor to write the results to.

shiftRight

Shift the elements of A right by the corresponding elements of B.

param graph: The graph to update.
param A: The tensor of elements which to right-shift.
param B: The tensor of elements that describe the amount to right-shift by. A.
param prog: The sequence to extend with the execution of the expression evaluation.
param debugContext: Optional debug information.
param options: Element-wise options. See map().

return: A tensor where each element is equal to a >> b (without sign extension), where a and b are the corresponding elements of A and B tensors respectively.

shiftRightInPlace

Update the tensor A with the result of shiftRight().

See shiftRight() for parameter descriptions.

shiftRightWithOutput

Write the result of shiftRight() to the given output tensor, out.

See shiftRight() for the remaining parameter descriptions.

param out: The tensor to write the results to.

shiftRightSignExtend

Shift the elements of A right with sign extension by the corresponding elements of B.

param graph: The graph to update.
param A: The tensor of elements which to right-shift.
param B: The tensor of elements that describe the amount to right-shift A by.
param prog: The sequence to extend with the execution of the expression evaluation.
param debugContext: Optional debug information.
param options: Element-wise options. See map().

return: A tensor where each element is equal to a >> b with sign extension, where a and b are the corresponding elements of A and B tensors respectively.

shiftRightSignExtendInPlace

Update the tensor A with the result of shiftRightSignExtend().

See shiftRightSignExtend() for parameter descriptions.

shiftRightSignExtendWithOutput

Write the result of shiftRightSignExtend() to the given output tensor, out.

See shiftRightSignExtend() for the remaining parameter descriptions.

param out: The tensor to write the results to.

logicalAnd

Compute the logical AND (&&) of each element in A with the corresponding element in B.

param graph: The graph to update.
param A: A tensor of elements.
param B: A tensor of elements.
param prog: The sequence to extend with the execution of the expression evaluation.
param debugContext: Optional debug information.
param options: Element-wise options. See map().

return: A tensor where each element is the result of a && b, where a and b are the corresponding elements of A and B tensors respectively.

logicalAndInPlace

Update the tensor A with the result of logicalAnd().

See logicalAnd() for parameter descriptions.

logicalAndWithOutput

Write the result of logicalAnd() to the given output tensor, out.

See logicalAnd() for the remaining parameter descriptions.

param out: The tensor to write the booleans to.

logicalOr

Compute the logical OR (||) of each element in A with the corresponding element in B.

param graph: The graph to update.
param A: A tensor of elements.
param B: A tensor of elements.
param prog: The sequence to extend with the execution of the expression evaluation.
param debugContext: Optional debug information.
param options: Element-wise options. See map().

return: A tensor where each element is the result of a || b, where a and b are the corresponding elements of A and B tensors respectively.

logicalOrInPlace

Update the tensor A with the result of logicalOr().

See logicalOr() for parameter descriptions.

logicalOrWithOutput

Write the result of logicalOr() to the given output tensor, out.

See logicalOr() for the remaining parameter descriptions.

param out: The tensor to write the booleans to.

eq

Check if each element in A is equal to the corresponding element in B.

param graph: The graph to update.
param A: A tensor of elements.
param B: A tensor of elements.
param prog: The sequence to extend with the execution of the expression evaluation.
param debugContext: Optional debug information
param options: Element-wise options. See map().

return: A tensor where each element is the result of a == b, where a and b are the corresponding elements of A and B tensors respectively.

eqInPlace

Update the tensor A with the result of eq().

See eq() for parameter descriptions.

eqWithOutput

Write the result of eq() to the given output tensor, out.

See eq() for the remaining parameter descriptions.

param out: The tensor to write the booleans to.

neq

Check if each element in A is not equal to the corresponding element in B.

param graph: The graph to update.
param A: A tensor of elements.
param B: A tensor of elements.
param prog: The sequence to extend with the execution of the expression evaluation.
param debugContext: Optional debug information.
param options: Element-wise options. See map().

return: A tensor where each element is the result of a != b, where a and b are the corresponding elements of A and B tensors respectively.

neqInPlace

Update the tensor A with the result of neq().

See neq() for parameter descriptions.

neqWithOutput

Write the result of neq() to the given output tensor, out.

See neq() for the remaining parameter descriptions.

param out: The tensor to write the booleans to.

gteq

Check if each element in A is greater than or equal to the corresponding element in B.

param graph: The graph to update.
param A: A tensor of elements.
param B: A tensor of elements.
param prog: The sequence to extend with the execution of the expression evaluation.
param debugContext: Optional debug information
param options: Element-wise options. See map().

return: A tensor where each element is the result of a >= b, where a and b are the corresponding elements of A and B tensors respectively.

gteqInPlace

Update the tensor A with the result of gteq().

See gteq() for parameter descriptions.

gteqWithOutput

Write the result of gteq() to the given output tensor, out.

See gteq() for the remaining parameter descriptions.

param out: The tensor to write the booleans to.

gt

Check if each element in A is greater than the corresponding element in B.

param graph: The graph to update.
param A: A tensor of elements.
param B: A tensor of elements.
param prog: The sequence to extend with the execution of the expression evaluation.
param debugContext: Optional debug information
param options: Element-wise options. See map().

return: A tensor where each element is the result of a > b, where a and b are the corresponding elements of A and B tensors respectively.

gtInPlace

Update the tensor A with the result of gt().

See gt() for parameter descriptions.

gtWithOutput

Write the result of gt() to the given output tensor, out.

See gt() for the remaining parameter descriptions.

param out: The tensor to write the booleans to.

lteq

Check if each element in A is less than or equal to the corresponding element in B.

param graph: The graph to update.
param A: A tensor of elements.
param B: A tensor of elements.
param prog: The sequence to extend with the execution of the expression evaluation.
param debugContext: Optional debug information.
param options: Element-wise options. See map().

return: A tensor where each element is the result of a <= b, where a and b are the corresponding elements of A and B tensors respectively.

lteqInPlace

Update the A with the result of lteq().

See lteq() for parameter descriptions.

lteqWithOutput

Write the result of lteq() to the given output tensor, out.

See lteq() for the remaining parameter descriptions.

param out: The tensor to write the booleans to.

lt

Check if each element in A is less than the corresponding element in B.

param graph: The graph to update.
param A: A tensor of elements.
param B: A tensor of elements.
param prog: The sequence to extend with the execution of the expression evaluation.
param debugContext: Optional debug information.
param options: Element-wise options. See map().

return: A tensor where each element is the result of a < b, where a and b are the corresponding elements of A and B tensors respectively.

ltInPlace

Update the A with the result of lt().

See lt() for parameter descriptions.

ltWithOutput

Write the result of lt() to the given output tensor, out.

See lt() for the remaining parameter descriptions.

param out: The tensor to write the boolean results to.

max

Compute the maximum of each element in A with the corresponding element in B.

param graph: The graph to update.
param A: A tensor of elements.
param B: A tensor of elements.
param prog: The sequence to extend with the execution of the expression evaluation.
param debugContext: Optional debug information.
param options: Element-wise options. See map().

return: A tensor where each element is the result of max(a, b), where a and b are the corresponding elements of A and B tensors respectively.

maxInPlace

Update the tensor A with the result of max().

See max() for parameter descriptions.

maxWithOutput

Write the result of max() to the given output tensor, out.

See max() for the remaining parameter descriptions.

param out: The tensor to write the maximums to.

min

Compute the minimum of each element in A with the corresponding element in B.

param graph: The graph to update.
param A: A tensor of elements.
param B: A tensor of elements.
param prog: The sequence to extend with the execution of the expression evaluation.
param debugContext: Optional debug information.
param options: Element-wise options. See map().

return: A tensor where each element is the result of min(a, b), where a and b are the corresponding elements of A and B tensors respectively.

minInPlace

Update the tensor A with the result of min().

See min() for parameter descriptions.

minWithOutput

Write the result of min() to the given output tensor, out.

See min() for the remaining parameter descriptions.

param out: The tensor to write the minimums to.

atan2

Compute the element-wise arctangent of A / B.

param graph: The graph to update.
param A: A tensor of elements.
param B: A tensor of elements.
param prog: The sequence to extend with the execution of the expression evaluation.
param debugContext: Optional debug information
param options: Element-wise options. See map().

return: A tensor where each element is the result of arctan(a / b); a and b are the corresponding elements of A and B tensors respectively.

atan2InPlace

Update the tensor A with the result of atan2().

See atan2() for parameter descriptions.

atan2WithOutput

Write the result of atan2() to the given output tensor, out.

See atan2() for the remaining parameter descriptions.

param out: The tensor to write the result to.

invStdDevToVarianceInPlace

Update the invStdDev tensor with the result of invStdDevToVariance().

See invStdDevToVariance() for parameter descriptions.

invStdDevToVarianceWithOutput

Write the result of invStdDevToVariance() to the given output tensor, out.

See invStdDevToVariance() for the remaining parameter descriptions.

param out: The tensor to write the variance to.

varianceToInvStdDevInPlace

Update the variance tensor with the result of varianceToInvStdDev().

See varianceToInvStdDev() for parameter descriptions.

varianceToInvStdDevWithOutput

Write the result of varianceToInvStdDev() to the given output tensor, out.

See varianceToInvStdDev() for the remaining parameter descriptions.

param out: The tensor to write inverse standard deviation to.

Functions

poplar::Tensor createGatherInput(poplar::Graph &graph, const poplar::Type &type, const std::vector<std::size_t> &operandShape, unsigned axis, GatherParams params = {}, const poplar::DebugContext &debugContext = {})

Create the input of the gather with only a single gather axis.

This is designed to spread the gather, and each dynamic slice within the gather, across the tiles evenly.

Parameters

graph – The Poplar graph.
type – The data type of the required tensor.
operandShape – The desired shape of the input.
axis – The axis that will be gathered on.
params – The same parameters as used by the gather().
debugContext – Optional debug information.

Returns

A tensor with the desired shape.

poplar::Tensor gather(poplar::Graph &graph, const poplar::Tensor &input, const poplar::Tensor &indices, unsigned axis, poplar::program::Sequence &prog, GatherParams params, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &optionFlags = {})

The gather operation stitches together several slices (each slice at a potentially different runtime offset) of an input tensor.

To achieve the best performance, the input tensor should be created with createGatherInput().

** gather options **

remapOutOfBoundIndices (true, false) [=false] Out of bounds indices are mapped to index 0.
paddingIndexUsed (true, false) [=false] Padding index equal to the size of the slice dimension of tensor input may be used in the indices. The actual padding values returned for a padding index are zeros.

Note

The indices are treated as offsets along the chosen axis. At this offset a slice of depth 1 in the axis dimension is taken.

Parameters

graph – The Poplar graph.
input – The tensor we are gathering from of rank x.
indices – Tensor containing the indices of the slices we gather of rank y.
axis – The axis to gather on. The axis must be less than x.
prog – The program sequence to add this operation to.
params – Parameters for the form of the gather.
debugContext – Optional debug information.
optionFlags – Option flags

Returns

The gathered slices from the input with rank y + (x - 1).

poplar::Tensor createGatherInput(poplar::Graph &graph, const poplar::Type &type, const std::vector<std::size_t> &inputShape, const std::vector<std::size_t> &sliceSizes, std::vector<unsigned> startIndexMap, const poplar::DebugContext &debugContext = {})

Create the input of the gather given a start index map.

This is designed to spread the gather, and each dynamic slice within the gather, across the tiles evenly.

Parameters

graph – The Poplar graph.
type – The data type of the required tensor.
inputShape – The desired shape of the input.
sliceSizes – sliceSizes[i] is the bounds for the slice on dimension i.
startIndexMap – A map that describes how to map indices in indices in gather() to legal indices into the input.
debugContext – Optional debug information.

Returns

A tensor with the desired shape.

poplar::Tensor gather(poplar::Graph &graph, const poplar::Tensor &input, const poplar::Tensor &indices, std::size_t indexVectorDim, const std::vector<std::size_t> &offsetDims, const std::vector<std::size_t> &sliceSizes, const std::vector<std::size_t> &collapsedSliceDims, const std::vector<unsigned> &startIndexMap, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &optionFlags = {})

The gather operation stitches together several slices (each slice at a potentially different runtime offset) of an input tensor.

To achieve the best performance, the input tensor should be created with createGatherInput().

See overload of gather for information on optionFlags.

Example use where we want to take 2 elements from a given tensor:

// The runtime defined input tensor
input = {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}}; // shape = {3, 3}

// The runtime defined indices tensor containing the coords we want to
// extract
indices = {{1, 1}, {2, 1}}; // shape = {2, 2}

// We want to extract elems at [1, 1] and [2, 1] from the input
// To achieve this we need to define the other parameters correctly

// We want to treat the rows of indices as coords into the input tensor
indexVectorDim = 1;

// None of the output dims will correspond to any of the input dims
offsetDims = {};

// We will be taking 1x1 slices to pick single elements
sliceSizes = {1, 1};

// We will collapse both dims of the input slices
collapsedSliceDims = {0, 1};

// An identity mapping between the indices coords and the input dims
startIndexMap = {0, 1};

// Perform the desired gather
result = gather(input,
                indices,
                indexVectorDim,
                offsetDims,
                sliceSizes
                collapsedSliceDims,
                startIndexMap) = {5, 8}; // shape = {2}

Note

When indexVectorDim == indices.rank(), the indices are interpreted as scalar values.

Note

This is a near direct port of https://www.tensorflow.org/xla/operation_semantics#gather from tensorflow/compiler/xla/service/gather_expander.cc

Parameters

graph – The Poplar graph.
input – The tensor we are gathering from.
indices – Tensor containing the starting indices of the slices we gather.
indexVectorDim – The dimension in indices that “contains” the starting indices.
offsetDims – The set of dimensions in the output shape that offset into a tensor sliced from input.
sliceSizes – sliceSizes[i] is the bounds for the slice on dimension i.
collapsedSliceDims – The set of dimensions in each slice that are collapsed away. These dimensions must have size 1.
startIndexMap – A map that describes how to map indices in indices to legal indices into input.
prog – The program sequence to add this operation to.
debugContext – Optional debug information.
optionFlags – Option flags

Returns

The gathered slices from the input.

struct GatherParams

#include <Gather.hpp>

Defines the parameters to a gather operation.

Public Functions

GatherParams() = default

inline GatherParams(std::size_t maxElementsPerTile_)

Public Members

std::size_t maxElementsPerTile = 65535

Suggested maximum number of elements to place on a tile.

This can be used to balance the gather across the IPUs.