4. PopLibs API reference

The PopLibs libraries provide application-level functions that can be used in Poplar programs for the IPU.

Library

Depends on

Description

poplin

popops, poputil

Linear algebra functions (matrix multiplications, convolutions)

popnn

poplin, poputil

Functions used in neural networks (for example, non-linearities, pooling and loss functions)

popops

poputil

Operations on tensors in control programs (elementwise functions and reductions)

poprand

poputil

Functions for populating tensors with random numbers

popsparse

Functions for operating on sparse tensors

poputil

General utility functions for building graphs

Utility functions (poputil)

General utility functions for building graphs.

poputil/Broadcast.hpp

Functions to provide numpy-like tensor matching and broadcasting.

namespace poputil

General utility functions for building graphs.

Functions

void expandToMatchRanks(poplar::Tensor &a, poplar::Tensor &b)

Match dimensions of two tensors using numpy-style expansion rules.

Insert singleton dimensions into either of the two tensors so that their ranks match, following numpy-style expansion rules. The tensor with the lower rank has singleton dimensions inserted as the outermost dimensions.

Parameters
  • a – First tensor to match.

  • b – Second tensor to match.

void broadcastToMatch(poplar::Tensor &a, const std::vector<std::size_t> &shape)

Match dimensions of a tensor to a shape using numpy-style broadcast rules:

1) If the rank of the tensor is less than the required shape then expand to the left by adding dimensions of size 1 to match the rank required.

2) For each dimension, the size of the dimension in the tensor must be the same as the required shape or must be 1. In the case where it is of size 1, the tensor is broadcast in that dimension to match the shape. If neither of these conditions hold then an exception is thrown.

Parameters
  • a – The tensor to broadcast to match the shape. This will be updated in place with broadcast dimensions.

  • shape – The shape to match.

Throws

poputil::poplibs_error – If a cannot be broadcast to match shape.

void broadcastToMatch(poplar::Tensor &a, poplar::Tensor &b)

Match dimensions of two tensors using numpy-style broadcast rules:

1) If the rank of one tensor is less than the other then extend the dimensions to the left with dimensions of size 1 to match the rank required.

2) For each dimension, the size of each dimension in both tensors must be the same or one of them must have size 1. In the case where one is of size 1, the tensor is broadcast in that dimension to match the other. If neither of these conditions hold then an exception is thrown.

Parameters
  • a – First tensor to match. This will be updated in place with broadcast dimensions.

  • b – Second tensor to match. This will be updated in place with broadcast dimensions.

Throws

poputil::poplibs_error – If a cannot be broadcast to match a dimension.

void broadcastToMatch(poplar::Tensor &a, poplar::Tensor &b, poplar::Tensor &c)

Match dimensions of three tensors using numpy-style broadcast rules:

1) If the rank of one tensor is less than the other then extend the dimensions to the left with dimensions of size 1 to match the rank required.

2) For each dimension, the size of each dimension in both tensors must be the same or one of them must have size 1. In the case where one is of size 1, the tensor is broadcast in that dimension to match the other. If neither of these conditions hold then an exception is thrown.

Parameters
  • a – First tensor to match. This will be updated in place with broadcast dimensions.

  • b – Second tensor to match. This will be updated in place with broadcast dimensions.

  • c – Third tensor to match. This will be updated in place with broadcast dimensions.

Throws

poputil::poplibs_error – If a cannot be broadcast to match a dimension.

bool canBroadcastToMatch(const poplar::Tensor &a, const poplar::Tensor &b)

Test if the given tensors can be broadcast to match one another using the rules for broadcastToMatch().

Parameters
  • a – First tensor to match.

  • b – Second tensor to match.

Returns

True if the two tensors may be broadcast to match one another and false if they cannot be matched with the broadcastToMatch() rules.

poputil/GraphFunction.hpp

namespace poputil

General utility functions for building graphs.

namespace graphfn

Support for using poplar::Program objects like function calls.

Typedefs

using Signature = std::vector<ArgSig>

Enums

enum ArgType

Type of argument to function program.

Values:

enumerator InputArg
enumerator OutputArg
enumerator InOutArg
enumerator CreatedArg

Functions

inline ArgSig input(poplar::Tensor similar, std::string debugName = "")
inline ArgSig inout(poplar::Tensor similar, std::string debugName = "")
inline ArgSig output(poplar::Tensor similar, std::string debugName = "")
inline ArgSig created(std::string debugName = "")
struct ArgSig

Public Functions

inline ArgSig(ArgType type, poplar::Tensor tensor, std::string debugName)

Public Members

ArgType type
poplar::Tensor similarTensor
std::string debugName
class ProgramFunction

Public Functions

ProgramFunction(poplar::Graph &graph, Signature sig, std::function<poplar::program::Program(std::vector<poplar::Tensor>&)> f, bool inlined = false, const poplar::DebugContext &debugContext = {})
ProgramFunction(poplar::Graph &graph, Signature sig, std::function<poplar::program::Program(std::vector<poplar::Tensor>&, const poplar::DebugNameAndId&)> f, bool inlined = false, const poplar::DebugContext &debugContext = {})
poplar::program::Program operator()(std::vector<poplar::Tensor> &args, const poplar::DebugContext &debugContext = {})

Private Members

VoidFunction voidFunc
class TensorFunction

Public Functions

TensorFunction(poplar::Graph &graph, Signature sig, std::function<poplar::Tensor(std::vector<poplar::Tensor>&, poplar::program::Sequence&)> f, bool inlined = false, const poplar::DebugContext &debugContext = {})
TensorFunction(poplar::Graph &graph, Signature sig, std::function<poplar::Tensor(std::vector<poplar::Tensor>&, poplar::program::Sequence&, const poplar::DebugNameAndId&)> f, bool inlined = false, const poplar::DebugContext &debugContext = {})
poplar::Tensor operator()(std::vector<poplar::Tensor> &args, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Private Members

VoidFunction voidFunc
class VoidFunction

Public Functions

VoidFunction(VoidFunction &&fn)
VoidFunction(poplar::Graph &graph, Signature sig, std::function<void(std::vector<poplar::Tensor>&, poplar::program::Sequence&)> f, bool inlined = false, const poplar::DebugContext &debugContext = {})
VoidFunction(poplar::Graph &graph, Signature sig, std::function<void(std::vector<poplar::Tensor>&, poplar::program::Sequence&, const poplar::DebugNameAndId&)> f, bool inlined = false, const poplar::DebugContext &debugContext = {})
void operator()(std::vector<poplar::Tensor> &args, poplar::program::Sequence &seq, const poplar::DebugContext &dc = {})

Private Members

poplar::Graph &graph
Signature sig
bool inlined
poplar::program::Sequence prog
poplar::Function func
std::vector<poplar::Tensor> params

poputil/OptionParsing.hpp

OptionSpec/OptionHandler used to build up a specification of what options and their values should be, and to translate the value strings to real values.

namespace poplibs

PopLibs classes and functions.

class OptionHandler
#include <OptionParsing.hpp>

Represents the various options types.

Public Functions

template<typename T>
inline OptionHandler(T &&valueHandler)
inline void parseValue(poplar::StringRef value) const

Public Static Functions

template<typename T>
static inline std::string describeEnumValues(const std::map<std::string, T> &valueMap)
template<typename T, typename ValueMapT = std::map<std::string, T>>
static inline OptionHandler createWithEnum(T &output, ValueMapT &&valueMap)
template<typename T>
static inline OptionHandler createWithInteger(T &output)
template<typename T>
static inline OptionHandler createWithBool(T &output)
template<typename T>
static inline OptionHandler createWithDouble(T &output)
static inline OptionHandler createWithString(std::string &output)
template<typename T>
static inline OptionHandler createWithList(std::vector<T> &output)

Private Members

std::function<void(poplar::StringRef)> valueHandler
class OptionSpec
#include <OptionParsing.hpp>

Represents a set of options and their values.

Public Functions

inline OptionSpec(initializer_list_t &&handlers)
inline void parse(poplar::StringRef option, poplar::StringRef value, bool ignoreUnknown = false) const

Private Types

using value_type = std::pair<const std::string, OptionHandler>
using map_type = std::map<const std::string, OptionHandler>
using initializer_list_t = std::initializer_list<value_type>

Private Members

map_type handlers
namespace parse

Functions

template<typename T>
T asInteger(const poplar::StringRef &str)
template<typename T>
T asFloatingPoint(const poplar::StringRef &str)

poputil/TensorMetaData.hpp

Class to allow extra data to be associated with a tensor.

namespace poputil

General utility functions for building graphs.

class TensorMetaData
#include <TensorMetaData.hpp>

Class used to represent some unspecified form of meta-data for a tensor.

Public Functions

TensorMetaData()
TensorMetaData(const TensorMetaData &other)
TensorMetaData(TensorMetaData &&other)
TensorMetaData &operator=(const TensorMetaData &other)
TensorMetaData &operator=(TensorMetaData &&other)
TensorMetaData(std::unique_ptr<TensorMetaDataBase> data)
~TensorMetaData()
inline const TensorMetaDataBase *getData() const

Private Members

std::unique_ptr<TensorMetaDataBase> data

poputil/TileMapping.hpp

Functions for handling the mapping of tensors to tiles.

namespace poputil

General utility functions for building graphs.

Functions

std::vector<std::vector<poplar::Interval>> calcLinearTileMapping(const poplar::Graph &graph, std::vector<std::size_t> shape, unsigned minElementsPerTile, unsigned grainSize)

Calculate a tile mapping that spreads the tensor evenly over the tiles in a graph.

The indices of the flattened tensor are mapped from low to high tile numbers.

Parameters
  • graph – The graph to calculate the mapping for.

  • shape – The shape of the tensor to be mapped: a vector containing the size of each dimension of the tensor.

  • minElementsPerTile – The minimum number of tensor elements to be allocated to a tile.

  • grainSize – The number of elements mapped to each tile will be an integer multiple of the grain size.

Returns

A vector containing

std::vector<std::vector<poplar::Interval>> calcLinearTileMapping(const poplar::Graph &graph, const poplar::Tensor &t)

Calculate a tile mapping that spreads the tensor evenly over the tiles in a graph.

The indices of the flattened tensor are mapped from low to high tile numbers.

In this case the elements are distributed so that groups of elements of the device’s natural vector width will not be split. It effectively sets the grain size to the natural vector width for the data type. This means the number of elements on each tile will be a multiple of the natural vector width and the index of the first element is aligned to the natural vector width.

The natural vector width is the largest vector width supported in hardware for arithmetic operations on that data type.

It will also try to keep at least 128 bytes of data on each tile to avoid high exchange costs.

Parameters
  • graph – The graph to add the operation to.

  • shape – The tensor to be mapped.

void mapTensorLinearly(poplar::Graph &graph, const poplar::Tensor &t, unsigned minElementsPerTile, unsigned grainSize)
void mapTensorLinearly(poplar::Graph &graph, const poplar::Tensor &t)
unsigned getTileImbalance(const poplar::Graph::TileToTensorMapping &mapping, unsigned minElementsPerTile = 0, unsigned grainSize = 1)

Determine how unbalanced a tensor is when mapped over tiles in a graph.

This reports how well a tensor mapping compares with the mapping based on a given number of elements per tile.

Parameters
  • mapping – The current tile mapping of the tensor.

  • minElementsPerTile – The suggested minimum number of elements per tile.

  • grainSize – The number of elements mapped to each tile would be an integer multiple of the suggested grain size.

Returns

The maximum number of elements greater than expected on any tile.

unsigned getTileImbalance(const poplar::Graph &graph, const poplar::Tensor &t, unsigned minElementsPerTile = 0, unsigned grainSize = 1)

Determine how unbalanced a tensor is mapped over tiles.

This compares the way a tensor is mapped to a set of tiles to the mapping based on a given number of elements per tile.

Parameters
  • graph – The graph containing the mapped tensor.

  • mapping – The tensor currently mapped to tiles in the graph.

  • minElementsPerTile – The suggested minimum number of elements per tile.

  • grainSize – The number of elements mapped to each tile would be an integer multiple of the suggested grain size.

Returns

The maximum number of elements greater than expected on any tile.

poplar::Tensor cloneToIpu(poplar::Graph &graph, const poplar::Tensor &t, unsigned dstIPU, const poplar::DebugContext &debugContext = {}, poplar::TensorCloneMethod method = poplar::TensorCloneMethod::PRESERVE_ORDER_UNLESS_ALIASES)

Create a clone of the specified tensor on the specified IPU.

The cloned tensor is mapped to the IPU in such a way that the mapping of tensor elements to tiles is preserved.

Parameters
  • graph – The graph representing the entire multi-IPU device.

  • t – The tensor to clone.

  • dstIPU – The index of the IPU to clone the tensor onto.

  • name – A debug name to give to any new tensors allocated in the graph during the clone. If this is empty then the debug names will be derived from existing tensor debug names.

  • method – The method to use for the cloning.

Returns

The cloned tensor.

poplar::Tensor cloneToGraph(poplar::Graph &srcGraph, poplar::Graph &dstGraph, const poplar::Tensor &t, const poplar::DebugContext &debugContext = {}, poplar::TensorCloneMethod method = poplar::TensorCloneMethod::PRESERVE_ORDER_UNLESS_ALIASES)

Create a clone of the specified tensor on the specified graph.

The cloned tensor is mapped to the destination graph in such a way that the mapping of tensor elements to tiles is preserved.

Note

It is assumed that the destination graph has enough tiles to clone the input tensor. This includes any gaps in the tile mapping. This means the maximum mapped tile of t in the source graph must be less than dstGraph.getTarget().getNumTiles().

Parameters
  • srcGraph – The graph representing the source tiles.

  • dstGraph – The graph representing the destination tiles.

  • t – The tensor to clone.

  • debugContext – Optional debug information

  • method – The method to use for the cloning.

Returns

The cloned tensor.

poplar::Tensor copyToIpu(poplar::Graph &masterGraph, const poplar::Tensor &t, poplar::program::Sequence &prog, unsigned dstIPU, const poplar::DebugContext &debugContext = {}, poplar::TensorCloneMethod method = poplar::TensorCloneMethod::PRESERVE_ORDER_UNLESS_ALIASES)

Move a tensor from one IPU to another.

The tensor is moved by duplicating it, mapping the clone onto another IPU, and copying the original tensor values to the new one.

Parameters
  • masterGraph – The graph representing the entire multi-IPU device.

  • t – The tensor to move from one IPU to another.

  • prog – A program sequence to add the Copy to.

  • dstIPU – The index of the IPU onto which the tensor will be moved.

  • debugContext – A debug name to give to the tensor created on dstIPU. If this is empty then the debug names will be derived from existing tensor debug names.

  • method – The method to use for cloning of the tensor on the destination IPU.

Returns

The new tensor on the specified IPU.

poplar::Tensor createIpuCopy(poplar::Graph &graph, const poplar::Tensor &t, unsigned dstIpu, poplar::Tensor &copySrc, poplar::Tensor &copyDst, const poplar::DebugContext &debugContext = {}, poplar::TensorCloneMethod method = poplar::TensorCloneMethod::PRESERVE_ORDER_AND_ALIASES)

Prepare to move a tensor from one IPU to another.

The tensor is duplicated and the clone is mapped onto another IPU. References to source and destination tensors are provided for use by an inter-IPU copy.

The necessary copy operation is not added to the program.

Parameters
  • masterGraph – The graph representing the entire multi-IPU device.

  • t – The tensor to move from one IPU to another.

  • dstIPU – The index of the IPU onto which the tensor will be moved.

  • copySrc – A tensor that can be used as the source to do the copy.

  • copyDst – A tensor that can be used as the destination of the copy.

  • debugContext – A debug name to give to the tensor created on dstIPU. If this is empty then the debug names will be derived from existing tensor debug names.

  • method – The method to use for cloning of the tensor on the destination IPU.

Returns

The new tensor on the specified IPU.

bool dimIsSplitOverTiles(const poplar::Graph &graph, const poplar::Tensor &t, unsigned dimension)

Check if a dimension of a tensor is split over more than one tile.

Examines the mapping of the specified tensor to see if the specified dimension is split over more than one tile.

Parameters
  • graph – The graph to examine.

  • t – The tensor to check.

  • dimension – The dimension to check.

Returns

True if elements of the given dimension are spread over more than one tile.

bool dimIsSplitOverIPUs(const poplar::Graph &graph, const poplar::Tensor &t, unsigned dimension)

Check if a dimension of a tensor is split over more than one IPU.

Examines the mapping of the specified tensor to see if the specified dimension is split over more than one IPU.

Parameters
  • graph – The graph to examine.

  • t – The tensor to check.

  • dimension – The dimension to check.

Returns

True if elements of the given dimension are spread over more than one IPU.

poplar::Tensor createBroadcastOperand(poplar::Graph &graph, const poplar::Tensor &fullTensor, const poplar::Type &type, unsigned dim, bool ditherMapping = false, const poplar::DebugContext &debugContext = {})

Create a simpler tensor that is mapped in the same way as another, full, tensor.

The full tensor is typically a left hand side operand of an operation while the created tensor is the right hand side. The created tensor has one dimension, which is the same size as the specified dimension of the full tensor.

Because the created tensor has the same mapping as the full tensor, it reduces the amount of data exchange or copies that are required for an operation using the two tensors.

Parameters
  • graph – The graph which the output tensor is added to.

  • fullTensor – The tensor mapping for the output tensor is copied from this tensor.

  • type – The type of the output tensor.

  • dim – The dimension of the input tensor which is the size of the created tensor.

  • ditherMapping – Enable dithering to be applied to the mapping of the output tensor.

  • debugContext – Optional debug information.

Returns

The created output tensor.

class TensorUseTracker
#include <TileMapping.hpp>

Class that tracks the usage of data on different tiles.

If data is broadcast to many tiles, it is sometimes efficient to map the data so that it is spread evenly amongst the tiles that use it.

This class can collect information about the use of data and then calculate a suitable tile mapping.

Public Types

enum MappingMethod

Values:

enumerator OptimizeHaloRegions

Map “halo regions” to single tiles.

These are regions that are used by multiple tiles but have neighbouring regions used by subsets of those tiles.

enumerator ConstrainMappingToUsedTiles

Mapping of elements is constrained to be only on tiles that use them.

Otherwise, to meet grain size constraints, elements may be mapped to tiles which do not use them.

enumerator None

No mapping method used.

Public Functions

TensorUseTracker(unsigned numTiles)
TensorUseTracker(const TensorUseTracker &other)
TensorUseTracker(TensorUseTracker &&other)
TensorUseTracker &operator=(const TensorUseTracker &other)
TensorUseTracker &operator=(TensorUseTracker &&other)
~TensorUseTracker()
void add(const poplar::Graph &graph, unsigned tile, const poplar::Tensor &t)

Add a data use case.

Parameters
  • graph – The Poplar graph being tracked.

  • tile – The tile that the use occurs on.

  • t – The tensor representing the data being used.

void add(TensorUseTracker other)

Add data use cases from another tracker.

Parameters

other – The TensorUseTracker to merge data use information from.

void resolve(const poplar::Graph &graph, unsigned grainSize, unsigned minElementsPerTile, bool extendPartialUsage = false, TensorUseTracker::MappingMethod mappingMethod = TensorUseTracker::MappingMethod::None)

Resolve data uses for mapping.

Data used on multiple tiles will have their uses spread across those tiles.

Parameters
  • graph – The Poplar graph being tracked.

  • grainSize – The number of elements mapped to each tile will be an integer multiple of the grain size.

  • minElementsPerTile – The minimum number of elements that must be mapped to a tile.

  • extendPartialUsage – When set, partial uses of tensors will be extended to cover the entire tensor, based on the usage of neighbouring regions.

  • mappingMethod – Method used for mapping elements.

void mapTensorsByUse(poplar::Graph &graph, unsigned grainSize, unsigned minElementsPerTile, bool extendPartialUsage = false, TensorUseTracker::MappingMethod mappingMethod = TensorUseTracker::MappingMethod::None)

Map data according to use.

This function will set the tile mapping of variable regions based on tracked data uses. Variable regions with uses on multiple tiles will have their elements spread across those tiles.

Parameters
  • graph – The Poplar graph being tracked.

  • grainSize – The number of elements mapped to each tile will be an integer multiple of the grain size.

  • minElementsPerTile – The minimum number of elements that must be mapped to a tile.

  • extendPartialUsage – When set, partial uses of tensors will be extended to cover the entire tensor, based on the usage of neighbouring regions before mapping.

  • mappingMethod – Method used for mapping elements.

bool empty() const

Have any use cases been registered.

Returns

True if no data use cases, false otherwise

Private Members

std::unique_ptr<TensorUseTrackerState> st

poputil/Util.hpp

General operations on tensors.

namespace poputil

General utility functions for building graphs.

Functions

void mergeAdjacentRegions(std::vector<poplar::Interval> &regions)
void mergeAdjacentRegions(std::vector<std::vector<poplar::Interval>> &mapping)
std::vector<poplar::Interval> flattenIntervals(const std::vector<std::vector<poplar::Interval>> &intervals)

Flatten a vector of vectors of intervals to a vector, maintaining ordering.

std::vector<std::vector<poplar::Interval>> splitRegions(const std::vector<poplar::Interval> &regions, unsigned grainSize, unsigned maxPartitions, unsigned minElementsPerPartition = 0, unsigned maxElementsPerPartition = UINT_MAX, unsigned maxElementsPerRegion = UINT_MAX)

Given a set of contiguous regions, partition these regions while trying to balance the number of elements in each partition and respecting the specified grain size.

At most maxPartitions partitions are created. Regions may be split to achieve a better balance.

std::vector<std::vector<poplar::Interval>> splitRegionsBetweenWorkers(const poplar::Target &target, const std::vector<poplar::Interval> &regions, unsigned grainSize, unsigned minElementsPerPartition = 0, unsigned maxElementsPerPartition = UINT_MAX, unsigned maxElementsPerRegion = UINT_MAX)

Given a set of contiguous regions per tile, partition these regions between workers on that tile while respecting the specified grain size.

Regions may be split to balance the work across workers.

std::vector<std::vector<std::vector<poplar::Interval>>> splitRegions(const std::vector<std::vector<poplar::Interval>> &regions, unsigned grainSize, unsigned maxPartitions, unsigned minElementsPerPartition = 0, unsigned maxElementsPerPartition = UINT_MAX, unsigned maxElementsPerRegion = UINT_MAX)

Given a set of sequences of regions, partition these sequences while trying to balance the number of elements in each partition and respecting the specified grain size.

At most maxPartitions partitions are created. Sequences, and regions within them, may be split to achieve a better balance.

std::vector<std::vector<std::vector<poplar::Interval>>> splitRegionsBetweenWorkers(const poplar::Target &target, const std::vector<std::vector<poplar::Interval>> &regions, unsigned grainSize, unsigned minElementsPerPartition = 0, unsigned maxElementsPerPartition = UINT_MAX, unsigned maxElementsPerRegion = UINT_MAX)

Given a set of sequences of regions per tile, partition these sequences between workers on that tile while respecting the specified grain size.

Regions may be split to balance the work across workers.

template<class T>
std::vector<T> unflattenIndex(const std::vector<T> &shape, std::size_t index)

Given an index into a flattened tensor, returns the indices into the dimensions of the original tensor.

template<class T>
std::size_t flattenIndex(const std::vector<T> &shape, const std::vector<T> &indices)

Given a list of indices into a tensor, return the corresponding index in a flattened version of the tensor.

std::size_t intervalSequenceNumElements(const std::vector<std::vector<poplar::Interval>> &seq)

Return the total number of elements in the interval sequence.

poplar::Tensor duplicate(poplar::Graph &graph, const poplar::Tensor &in, poplar::program::Sequence &p, const poplar::DebugContext &debugContext = {}, poplar::TensorCloneMethod method = poplar::TensorCloneMethod::PRESERVE_ORDER_UNLESS_ALIASES)

Copy a tensor’s data to a new tensor.

The duplicated tensor has the same tile mapping as the original tensor.

poplar::Tensor cloneN(poplar::Graph &graph, const poplar::Tensor &t, unsigned N, const poplar::DebugContext &debugContext = {}, poplar::TensorCloneMethod method = poplar::TensorCloneMethod::PRESERVE_ORDER_UNLESS_ALIASES)

Clone a tensor N times.

Given a tensor of shape [D1, D2, … Dn], this function will create a new tensor of shape [N, D1, D2, …, Dn] where each of the N sub-tensors is a clone of the original tensor (that is, it has the same layout).

Parameters
  • graph – The Poplar graph.

  • t – The tensor to clone.

  • N – The replication factor to clone with.

  • name – The name for the new variables created.

  • method – The tensor cloning method (see Graph::clone()).

std::vector<int> balancedPartition(int rangeUpperBound, int splitCount)

Split a range.

Utility function to split a range [0, rangeUpperBound] into splitCount slices as evenly as possible. If splitCount does not divide rangeUpperBound evenly then output slices are assigned more units in round-robin.

double castToDeviceHalfValue(const poplar::Target &target, double input)

Cast a double precision value to a value exactly representable in device HALF type.

Parameters
  • target – The target device that the cast will be performed on.

  • input – Input value.

Returns

Value cast to HALF type on device.

bool checkAccuracyWhenCast(const poplar::Target &target, double input, poplar::Type inputType, poplar::Type outputType, double tolerance)

Check accuracy of a cast operation.

Utility function to check if input can be cast from inputType to outputType without an error in its accuracy, or causing an overflow.

Parameters
  • target – The target device that the cast will be performed on.

  • input – Input value.

  • inputType – Input type before the cast operation.

  • outputType – Output type after the cast operation.

  • tolerance – Allowed tolerance in error from cast operation.

Throws

poputil::poplibs_error – If either inputType or outputType are not either half or float.

Returns

Boolean tensor indicating the error will be less than tolerance.

poplar::Tensor factorDims(const poplar::Tensor &t, const std::vector<std::size_t> &factors, unsigned startDim = 0)

Factors the outermost dimensions of tensor t by the values given in factors.

For each value f in factors, the corresponding outer dimension is split into two parts of sizes size(dim)/f and f. The second of these becomes a dimension inside all the factored dimensions. For example, given a tensor with shape [4,6,4] and factors [1,2], we first divide the shape into [4/1,1,6/2,2,4] and then shuffle it to [4/1,6/2,1,2,4].

Parameters
  • t – The tensor to be factored.

  • factors – The values to factor each dimension by.

  • startDim – The outermost dimension to start at.

Returns

The refactored tensor.

poplar::Tensor unfactorDims(const poplar::Tensor &t, unsigned numDims, unsigned startDim = 0)

The opposite of factorDims().

This does not need information for each dimension because that is present in the tensor. It just needs the number of dimensions.

Parameters
  • t – The tensor to be refactored.

  • numDims – The number of dimensions to be refactored.

  • startDim – The outermost dimension to start at.

Returns

The refactored tensor.

poputil/VarStructure.hpp

Manage partitioning and grouping in tensors.

namespace poputil

General utility functions for building graphs.

Typedefs

using GroupingInfo = std::pair<unsigned, unsigned>

Functions

unsigned detectInnermostGrouping(const poplar::Graph &graph, const poplar::Tensor &t)

Detect if the tensor t has a grouping in its innermost dimension.

Parameters
  • graph – The graph to add the function to.

  • t – The tensor to check for grouping.

Throws

poputil::poplibs_error – If the rank of t is zero.

Returns

The size of the group. Zero if there is no grouping.

std::vector<GroupingInfo> detectDimGroupings(const poplar::Graph &graph, const poplar::Tensor &t)

Find all grouped dimensions from the innermost grouped dimension moving outwards, returning groupings for each.

The same dimension may appear more than once. This uses detectInnermostGrouping() iteratively.

Parameters
  • graph – The graph to add the function to.

  • t – The tensor to check for grouping.

Throws

poputil::poplibs_error – If the rank of t is zero.

Returns

A list of the grouped dimensions starting with the innermost.

poplar::Tensor createPartitionableTensor(poplar::Graph &graph, const poplar::Type &type, const std::vector<std::size_t> &shape, const std::vector<std::size_t> &nPartitions, const poplar::DebugContext &debugContext = {})

Create a tensor with the given shape, so that when it is partitioned into slices according to the given number of partitions in each dimension, each slice is a single contiguous region.

This partitions the tensor so that the maximum number of elements in each partition of a dimension is minimised as well as the number of partitions. That is, if a dimension has n elements, and the number of partitions in that dimension is d then:

a * ceil(n/d) + 1 * (n%d) = n

There will be a partitions with ceil(n/d) elements followed by b partitions with floor(n/d) elements and possibly some number of partitions with 0 elements.

The returned tensor has no tile mapping set.

Parameters
  • graph – The graph to add the variable to.

  • type – The type of the elements in the returned tensor.

  • shape – The shape of the returned tensor.

  • nPartitions – The number of partitions the shape will be partitioned into in each dimension.

  • debugContext – Optional debug information.

Throws

poputil::poplibs_error – If the size of shape and nPartitions are not equal.

Returns

A tensor with the given shape where each partition is contiguous.

void iterateTensorPartitions(const poplar::Tensor &t, const std::vector<std::size_t> &nPartitions, const std::function<void(const std::vector<std::size_t> &i, const poplar::Tensor &s)> &f)

Iterate a function over the partitions of a tensor.

Partitioning follows the same definition as described for createPartitionableTensor().

Parameters
  • t – The tensor to iterate over.

  • nPartitions – The number of partitions the tensor is partitioned into in each dimension.

  • i

  • f – A function taking the indices of the partition in the range [0, splits[d]) in each dimension of the tensor as well as the slice of the tensor corresponding to that partition.

Throws

poputil::poplibs_error – If the rank of t and the size of nPartitions are not equal.

poputil/VertexTemplates.hpp

template<>
struct poputil::VertexTemplateToString<poplar::StringRef>

Public Static Functions

static inline std::string to_string(const poplar::StringRef &ref)
namespace poputil

General utility functions for building graphs.

Functions

inline std::string templateVertexParams(bool first)
template<typename ...Args>
inline std::string templateVertexParams(bool first, const std::string &val, Args&&... args)
template<typename ...Args>
inline std::string templateVertexParams(bool first, const char *val, Args&&... args)
template<typename ...Args>
inline std::string templateVertexParams(bool first, const poplar::Type &type, Args&&... args)
template<typename ...Args>
inline std::string templateVertexParams(bool first, bool b, Args&&... args)
template<typename T, typename ...Args>
inline std::string templateVertexParams(bool first, const T &val, Args&&... args)
template<typename ...Args>
inline std::string templateVertex(const std::string &name, Args&&... args)

Generate a string representation of a Vertex type for use by poplar::Graph::addVertex().

Parameters
  • name – The name of the vertex.

  • args – The types of the arguments to the vertex.

Returns

A string representation of the vertex type.

template<typename T>
struct VertexTemplateToString

Public Static Functions

static inline std::string to_string(const T &x)
template<> StringRef >

Public Static Functions

static inline std::string to_string(const poplar::StringRef &ref)

Tensor operations (popops)

Functions for building operations on tensors in control programs (such as element-wise functions and reductions).

popops/AllTrue.hpp

Perform logical AND of tensor elements.

namespace popops

Common functions, such as elementwise and reductions.

Functions

poplar::Tensor allTrue(poplar::Graph &graph, poplar::Tensor A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Given a boolean tensor, compute the logical AND of all its elements.

A new variable is created to store the result.

Parameters
  • graph – The Poplar graph.

  • A – The boolean tensor.

  • prog – The program sequence to add this operation to.

  • debugContext – Optional debug information.

Throws

poputil::poplibs_error – If the elements of A are not booleans.

Returns

A variable that holds the result of the operation.

popops/Cast.hpp

Casts between tensor types.

namespace popops

Common functions, such as elementwise and reductions.

Functions

poplar::Tensor cast(poplar::Graph &graph, const poplar::Tensor &src, const poplar::Type &dstType, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Cast elements of the specified src tensor to dstType, returning the result as a new tensor.

Note: If dstType == src.elementType(), then the operation is a copy.

Parameters
  • graph – The graph that the operation will be added to.

  • src – Source tensor to cast.

  • dstType – Type of the destination tensor.

  • prog – Program to add the cast operation to.

  • debugContext – Optional debug information.

Returns

The resultant cast tensor.

poplar::program::Program cast(poplar::Graph &graph, poplar::Tensor src, poplar::Tensor dst, const poplar::DebugContext &debugContext = {})

Create a program to copy tensor casting between types (for example, half->float).

Precondition: src.shape() == dst.shape()

Note: If dst.elementType() == src.elementType(), then the operation is just a copy.

Parameters
  • graph – The graph that the operation will be added to.

  • src – Source tensor.

  • dst – Destination tensor.

  • debugContext – Optional debug information.

Returns

The program to perform this operation.

void cast(poplar::Graph &graph, poplar::Tensor src, poplar::Tensor dst, poplar::ComputeSet cs)

Create vertices to copy element wise from the src tensor to the dst tensor casting between types (for example, half->float).

The vertices are added to the specified compute set.

Precondition: src.shape() == dst.shape()

Parameters
  • graph – The graph that the operation will be added to.

  • src – Source tensor.

  • dst – Destination tensor.

  • cs – Compute set to add the vertices to.

poplar::Tensor cast(poplar::Graph &graph, poplar::Tensor src, const poplar::Type &dstType, poplar::ComputeSet cs, const poplar::DebugContext &debugContext = {})

Create vertices to cast elements of the specified src tensor to dstType, returning the result as a new tensor.

The vertices are added to the specified compute set.

Parameters
  • graph – The graph that the operation will be added to.

  • src – Source tensor.

  • dstType – Destination type.

  • cs – Compute set to add the vertices to.

  • debugContext – Optional debug information.

Returns

Resultant destination tensor.

poplar::Tensor checkAccuracyWhenCast(poplar::Graph &graph, const poplar::Tensor &input, poplar::Type outputType, double tolerance, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Helper function which checks the relative error in the tensor input when casting it to type outputType.

The result is a single element bool tensor which is set to true if the error is less than tolerance.

Preconditions:

  • input.elementType() == FLOAT

  • outputType == HALF

  • input.numElements() == 1

Parameters
  • graph – The graph that the operation will be added to.

  • input – Input tensor.

  • outputType – Output type after the cast operation.

  • tolerance – Allowed tolerance in error from cast operation.

  • prog – Program to add the check onto.

  • debugContext – Optional debug information.

Throws

poputil::poplibs_error – If either input or outputType are not either half or float.

Returns

Boolean tensor indicating that the error is less than tolerance.

popops/CircBuf.hpp

Circular buffer support.

namespace popops

Common functions, such as elementwise and reductions.

class CircBuf

Public Functions

CircBuf(poplar::Graph &graph, const poplar::Type &dataType, unsigned size, const std::vector<std::size_t> &shape, const poplar::DebugContext &debugContext = {})

CircBuf represents a circular buffer of tensors which can be indexed using prev().

Each call to add() will add the given tensor to the circular buffer with the potential to overwrite a previous element if the buffer is full.

Parameters
  • graph – Graph to add the circular buffer to.

  • dataType – Datatype of the tensor elements in buffer.

  • size – Size of the circular buffer.

  • shape – Shape of the tensor elements in buffer.

  • debugContext – Optional debug information.

poplar::Tensor prev(unsigned i, poplar::program::Sequence &seq, const poplar::DebugContext &debugContext = {})

Return elements i entries old.

i must be less size.

Parameters
  • i – Index into the circular buffer.

  • seq – Program to add the operation to.

  • debugContext – Optional debug information.

Returns

Tensor returned from the circular buffer.

void add(poplar::Tensor t, poplar::program::Sequence &seq, const poplar::DebugContext &debugContext = {})

Append an element to the end of the circular buffer.

Parameters
  • t – Tensor to append to the circular buffer

  • seq – Program to add the operation to.

  • debugContext – Optional debug information.

poplar::Tensor getIndex() const

Tensor representing the index into the circular buffer.

unsigned size() const

Size of the circular buffer.

poplar::Graph::TileToTensorMapping getTileMapping()
Returns

Tensor mapping of the tensor returned by indexing into a circular buffer.

Private Members

poplar::Graph &graph
unsigned size_
poplar::Tensor index
std::vector<std::size_t> shape
unsigned padElements
poplar::Tensor hist

popops/CollectiveTypes.hpp

Support types for replicated and non-replicated collectives.

namespace popops

Common functions, such as elementwise and reductions.

Enums

enum CollectiveOperator

Supported collective operators.

Values:

enumerator ADD
enumerator MUL
enumerator MIN
enumerator MAX
enumerator LOGICAL_AND

Only supports boolean operands.

enumerator LOGICAL_OR

Only supports boolean operands.

enumerator SQUARE_ADD

Squares each element before applying ADD reduction.

enumerator LOCAL

Do nothing and keep the local value.

enumerator ADD
enumerator MEAN
enumerator MUL
enumerator MIN
enumerator MAX
enumerator LOGICAL_AND

Only supports boolean operands.

enumerator LOGICAL_OR

Only supports boolean operands.

enumerator SQUARE_ADD

Squares each element before applying ADD reduction.

enumerator LOCAL

Do nothing and keep the local value.

struct Chunk
#include <Collectives.hpp>
struct Chunks
#include <Collectives.hpp>

popops/DynamicSlice.hpp

Support for dynamic slices.

namespace poplar

Poplar classes and functions.

namespace popops

Common functions, such as elementwise and reductions.

Functions

bool operator<(const SlicePlan &a, const SlicePlan &b) noexcept
bool operator==(const SlicePlan &a, const SlicePlan &b) noexcept
bool operator!=(const SlicePlan &a, const SlicePlan &b) noexcept
poplar::Tensor createSliceableTensor(poplar::Graph &graph, const poplar::Type &type, const std::vector<size_t> &shape, const std::vector<size_t> &dims, const std::vector<size_t> &sizes, std::size_t minGrainSize = 0, const poplar::DebugContext &debugContext = {})

Create and map a tensor to be sliced/updated efficiently.

The returned tensor will be spread over as many tiles as possible while respecting the minimum number of elements per tile (minGrainSize) and still being in a form that can be sliced/updated efficiently.

Parameters
  • graph – The Poplar graph.

  • type – The type of the elements.

  • shape – The shape of the tensor to be slice/updated.

  • dims – The dimensions of the tensor that will be slice/updated.

  • sizes – The size of the slice in each of the dimensions.

  • minGrainSize – The minimum elements per slice mapped to each tile

  • debugContext – Optional debug information.

Returns

A tensor shape shape that is suitably mapped

poplar::Tensor createSliceableTensor(poplar::Graph &graph, const poplar::Type &type, const std::vector<size_t> &shape, const std::vector<size_t> &dims, const std::vector<size_t> &sizes, const SlicePlan &plan, const poplar::OptionFlags &options, const poplar::DebugContext &debugContext = {})

Create and map a tensor to be sliced/updated efficiently.

The returned tensor will be laid out according to the plan.

Parameters
  • graph – The Poplar graph.

  • type – The type of the elements.

  • shape – The shape of the tensor to be slice/updated.

  • dims – The dimensions of the tensor that will be slice/updated.

  • sizes – The size of the slice in each of the dimensions.

  • plan – Plan describing how the slicing/updating operation will be implemented.

  • options – Flags controlling how the operation will be implemented.

  • debugContext – Optional debug information.

Returns

A tensor shape shape that is suitably mapped.

poplar::Tensor createSliceTensor(poplar::Graph &graph, const poplar::Tensor &t, const std::vector<size_t> &dims, const std::vector<size_t> &sizes, std::size_t numIndices, const poplar::DebugContext &debugContext = {})

Create and map a tensor to be sliced into or updated from efficiently.

Introspection on the tensor t is used to lay out the created tensor such that it can be used to efficiently update t.

Parameters
  • graph – The Poplar graph.

  • t – The tensor to be updated.

  • dims – The dimensions of the tensor that will be sliced/updated.

  • sizes – The number of elements of each dimension in dims that will be sliced/updated.

  • numIndices – The number of slices this tensor should contain.

  • plan – Plan describing how the slicing/updating operation will be implemented.

  • options – Flags controlling how the operation will be implemented.

  • debugContext – Optional debug information.

Returns

A tensor with shape [numIndices, shape…] mapped appropriately to be sliced into/updated from.

poplar::Tensor createSliceTensor(poplar::Graph &graph, const poplar::Type &type, const std::vector<std::size_t> &shape, const std::vector<std::size_t> &dims, const std::vector<std::size_t> &sizes, std::size_t numIndices, const SlicePlan &plan, const poplar::OptionFlags &options, const poplar::DebugContext &debugContext = {})

Create and map a tensor to be sliced into or updated from efficiently.

The returned tensor is laid out according to the plan for the slice/update operation.

Parameters
  • graph – The Poplar graph.

  • type – The type of the elements.

  • shape – The shape of the tensor to be slice/updated.

  • dims – The dimensions of the tensor that will be sliced/updated.

  • sizes – The number of elements of each dimension in dims that will be sliced/updated.

  • numIndices – The number of slices this tensor should contain.

  • plan – Plan describing how the slicing/updating operation will be implemented.

  • options – Flags controlling how the operation will be implemented.

  • debugContext – Optional debug information.

Returns

A tensor with shape [numIndices, shape…] mapped appropriately to be sliced into/updated from.

poplar::Tensor createIndicesTensor(poplar::Graph &graph, const std::vector<std::size_t> &dims, std::size_t numIndices, const SlicePlan &plan, const poplar::OptionFlags &options, const poplar::DebugContext &debugContext = {})

Create and map a tensor to contain indices for slicing or updating a tensor efficiently.

Parameters
  • graph – The Poplar graph.

  • dims – The dimensions of a tensor to be sliced/updated that will be sliced/updated using these indices.

  • numIndices – The number of indices this tensor should contain

  • plan – Plan describing how the slicing/updating operation will be implemented.

  • options – Flags controlling how the operation will be implemented.

  • debugContext – Optional debug information.

Returns

A tensor of shape [numIndices, dims.size()] mapped appropriately to be used as the indices for a slice/update operation. Element type is always UNSIGNED_INT.

poplar::Tensor createSliceableTensorFromSlice(poplar::Graph &graph, const poplar::Tensor &s, const std::vector<std::size_t> &dims, const std::vector<std::size_t> &numSlices, const poplar::DebugContext &debugContext = {})

Create and map a tensor to be sliced/updated.

The tensor is mapped in a way that can be efficiently sliced and updated to/from the given slice tensor. It will be distributed across as many tiles as the given slice and with the same contiguous regions on each tile. The tensor’s shape and mapping are derived from the reference slice tensor.

Parameters
  • graph – The Poplar graph.

  • s – The reference slice.

  • dims – The dimensions of the returned tensor that will be sliced.

  • numSlices – The number of independent slices in each sliced dimension.

  • debugContext – Optional debug information.

Returns

A tensor to be sliced/updated.

poplar::Tensor dynamicSlice(poplar::Graph &graph, const poplar::Tensor &t, const poplar::Tensor &offset, const std::vector<std::size_t> &dims, const std::vector<std::size_t> &sizes, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Slice a tensor based on offsets specified by a tensor.

dims gives the dimensions to slice, sizes defines the size of the slice in those dimensions and offset gives the base offsets on each execution.

offset[0], dims and sizes must have the same size. offset may have a second dimension with an element per tile, which can eliminate exchange.

Parameters
  • graph – The Poplar graph.

  • t – The source tensor.

  • offset – A tensor of offsets at which the output is extracted.

  • dims – The dimensions of t to slice.

  • sizes – The size of the slice in each of the dimensions in dims.

  • prog – The program to be extended

  • debugContext – Optional debug information.

Returns

The specified subtensor

poplar::Graph::TileToTensorMapping getSliceMapping(poplar::Graph &graph, const poplar::Tensor &t, const std::vector<std::size_t> &dims, const std::vector<std::size_t> &sizes)

Get the tile mapping for a slice of a tensor.

dims gives the dimensions to slice, sizes defines the size of the slice in those dimensions.

Parameters
  • graph – The Poplar graph.

  • t – The source tensor.

  • dims – The dimensions of t to slice.

  • sizes – The size of the slice in each of the dimensions in dims.

void dynamicUpdate(poplar::Graph &graph, const poplar::Tensor &t, const poplar::Tensor &s, const poplar::Tensor &offset, const std::vector<std::size_t> &dims, const std::vector<std::size_t> &sizes, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Update a subtensor at offsets read from a tensor.

dims gives the dimensions that are partially updated, by sizes elements, at offsets offset. Unspecified dimensions are copied in full with zero offset.

offset[0], dims and sizes must have the same size. offset may have a second dimension with an element per tile, which can eliminate exchange.

Parameters
  • graph – The Poplar graph.

  • t – The tensor to update.

  • s – The updates.

  • offset – The offset within t to be updated.

  • dims – The dimensions to be dynamically updated.

  • sizes – The size of the update in each of the dimensions in dims.

  • prog – The program to be extended.

  • debugContext – Optional debug information.

poplar::Tensor multiSlice(poplar::Graph &graph, const poplar::Tensor &t, const poplar::Tensor &offsets, const std::vector<std::size_t> &dims, const std::vector<std::size_t> &sizes, poplar::program::Sequence &prog, const SlicePlan &plan, const poplar::OptionFlags &options, const poplar::DebugContext &debugContext = {})

Take multiple slices from a base tensor.

The returned tensor will have a rank one greater than t. Its outer dimension will be offsets.dim(0). Note that dims refers to the dimensions of t. t can be created using createSliceableTensor() to ensure efficient mapping.

Parameters
  • graph – The Poplar graph.

  • t – The tensor being sliced.

  • offsets – The offsets within t to be sliced.

  • dims – The dimensions of t to be sliced.

  • sizes – The size of the update in each of the dimensions in dims.

  • prog – The program to be extended.

  • plan – Plan describing how the operation will be implemented.

  • options – Flags controlling how the operation will be implemented.

  • debugContext – Optional debug information.

void multiUpdate(poplar::Graph &graph, const poplar::Tensor &t, const poplar::Tensor &s, const poplar::Tensor &offsets, const std::vector<std::size_t> &dims, const std::vector<std::size_t> &sizes, poplar::program::Sequence &prog, const SlicePlan &plan, const poplar::OptionFlags &options, const poplar::DebugContext &debugContext = {})

Update multiple slices in a tensor.

Parameters
  • graph – The Poplar graph.

  • t – The tensor being updated.

  • s – The slices to insert.

  • offsets – The offsets within t to be updated.

  • dims – The dimensions of t to be updated.

  • sizes – The size of the update in each of the dimensions in dims.

  • prog – The program to be extended.

  • plan – Plan describing how the operation will be implemented.

  • options – Flags controlling how the operation will be implemented.

  • debugContext – Optional debug information.

void multiUpdateAdd(poplar::Graph &graph, const poplar::Tensor &t, const poplar::Tensor &s, const poplar::Tensor &offsets, const poplar::Tensor &scale, const std::vector<std::size_t> &dims, const std::vector<std::size_t> &sizes, poplar::program::Sequence &prog, const SlicePlan &plan, const poplar::OptionFlags &options, const poplar::DebugContext &debugContext = {})

Accumulate multiple slices in a tensor for i offsets: t[offsets[i]] += scale * s[i] t, s and scale must have the same element type.

Parameters
  • graph – The Poplar graph.

  • t – The tensor being updated (must be rank 2).

  • s – The slices to accumulate.

  • offsets – The offsets within t to be accumulated.

  • scale – The scaling to apply to the update.

  • dims – The dimensions of t to be accumulated (must be rank 1).

  • sizes – The size of the accumulate in each of the dimensions in dims.

  • prog – The program to be extended.

  • plan – Plan describing how the operation will be implemented.

  • options – Flags controlling how the operation will be implemented.

  • debugContext – Optional debug information.

void multiUpdateMax(poplar::Graph &graph, const poplar::Tensor &t, const poplar::Tensor &s, const poplar::Tensor &offsets, const std::vector<std::size_t> &dims, const std::vector<std::size_t> &sizes, poplar::program::Sequence &prog, const SlicePlan &plan, const poplar::OptionFlags &options, const poplar::DebugContext &debugContext = {})

Find maximum over multiple slices in a tensor for i offsets: t[offsets[i]] = max(t[offsets[i]], s[i]) t, s must have the same element type offsets[i] >= t.dim(0) are ignored.

Parameters
  • graph – The Poplar graph.

  • t – The tensor being updated (must be rank 2).

  • s – The slices to find maximum over.

  • offsets – The offsets within t to find maximum over.

  • dims – The dimensions of t to find maximum over (must be rank 1).

  • sizes – The size of the update in each of the dimensions in dims.

  • prog – The program to be extended.

  • plan – Plan describing how the operation will be implemented.

  • options – Flags controlling how the operation will be implemented.

  • debugContext – Optional debug information.

class SlicePlan
#include <DynamicSlice.hpp>

An object representing a plan that describes how to implement a slice or update.

This can be used as a parameter to a function that will slice or update a tensor.

Public Functions

SlicePlan()
~SlicePlan()
SlicePlan(const SlicePlan &other)
SlicePlan(SlicePlan &&other)
SlicePlan &operator=(const SlicePlan &other)
SlicePlan &operator=(SlicePlan &&other)
SlicePlan(std::unique_ptr<SlicePlanInternal> internal)
inline SlicePlanInternal &getImpl() const

Private Members

std::unique_ptr<SlicePlanInternal> internal

Friends

friend std::ostream &operator<<(std::ostream &o, const SlicePlan &p)
friend bool operator<(const SlicePlan &a, const SlicePlan &b) noexcept
friend bool operator==(const SlicePlan &a, const SlicePlan &b) noexcept
friend poplar::ProfileValue toProfileValue(const SlicePlan &p)
namespace embedding

Functions

SlicePlan plan(const poplar::Graph &graph, const poplar::Type &dataType, const std::size_t numEntries, const std::size_t outputSize, const std::vector<std::size_t> &numLookups, const poplar::OptionFlags &options)

Create a plan for implementing a set of operations on an embedding matrix.

** Embedding plan options **

  • usedForUpdate (true, false) [=true]

    If true, you intend to use this embedding plan for both a multiSlice and multiUpdate* operation and the plan returned accounts for the costs of both operations. If false, only the costs of a multiSlice are accounted for.

  • availableMemoryProportion Positive decimal

    If set, gives the proportion of tile memory made available for temporary variables (variables that become live and die during the operation) for this operation. If not set, the operation has the freedom to use unlimited temporary memory.

  • indicesDistribution (uniform, onePoint) [=uniform]

    A description of the statistical distribution of the indices that will be sliced/updated over the input size (numEntries) of the operation. This is used to when estimating the runtime of the multiSlice and multiUpdate* operation.

    • uniform Indices are assumed to be uniformly distributed over the input size of the embedding.

    • onePoint Indices are assumed to all be equal.

  • planMinimisationTarget (memory, cycles) [=memory]

    Select what should be minimised when planning this operation.

    • memory Minimise a weighted combination of estimated maximum tile memory needed for code, for input/indices/output operands, and temporary variables for the operation.

    • cycles Minimise estimated total cycles for the operation.

Parameters
  • graph – The graph the operation will be added to.

  • dataType – The data type of the entries in the embedding matrix and the resulting lookups from the matrix.

  • numEntries – Input size of embedding matrix.

  • outputSize – Output size of embedding matrix lookup.

  • numLookups – Vector of numbers of indices which will be looked up in the embedding matrix.

  • options – Set of option flags controlling how the operation will be implemented.

Returns

A plan which describes how the embedding matrix lookup/update operations should be implemented.

popops/ElementWise.hpp

These functions perform the same operation on each element of one or more tensors.

Every function has an in-place overload that writes the result of the function to the first tensor argument of the function.

The functions that perform operations on two tensors also have overloads for one of the tensors being a constant scalar. These functions perform the same operation on each element in the remaining tensor using the scalar as the other side of the operation for all elements.

namespace poputil

General utility functions for building graphs.

Functions

template<>
poplar::ProfileValue toProfileValue(const popops::expr::UnaryOpType &op)
template<>
poplar::ProfileValue toProfileValue(const popops::expr::BinaryOpType &op)
template<>
poplar::ProfileValue toProfileValue(const popops::expr::TernaryOpType &op)
namespace popops

Common functions, such as elementwise and reductions.

Unnamed Group

poplar::Tensor varianceToInvStdDev(poplar::Graph &graph, const poplar::Tensor &src, const poplar::Tensor &epsilon, poplar::program::Sequence &prog, const poplar::Type dstType = poplar::HALF, const poplar::DebugContext &debugContext = {})

Convert variance to inverse standard deviation.

Parameters
  • graph – The graph to update.

  • src – The source tensor.

  • epsilon – A tensor initialised with the epsilon parameter used in conversion. Must have a single element and have the same type as the input type. Alternatively a float value can be used and the appropriate tensor will be created.

  • prog – The sequence to extend with the execution of conversion.

  • dstType – The type of the tensor to be output. Must be HALF or equal to the input type.

  • debugContext – Optional debug information

Returns

A tensor where each element is the inverse of standard deviation. Each element is the result of b = sqrt(1 / a), where a and b are the corresponding elements of src and the result tensor respectively.

poplar::Tensor varianceToInvStdDev(poplar::Graph &graph, const poplar::Tensor &src, const float epsilon, poplar::program::Sequence &prog, const poplar::Type dstType = poplar::HALF, const poplar::DebugContext &debugContext = {})

Unnamed Group

poplar::Tensor invStdDevToVariance(poplar::Graph &graph, const poplar::Tensor &src, const poplar::Tensor &epsilon, poplar::program::Sequence &prog, const poplar::Type dstType = poplar::FLOAT, const poplar::DebugContext &debugContext = {})

Convert inverse standard deviation to variance.

Parameters
  • graph – The graph to update.

  • src – The source tensor.

  • epsilon – A tensor initialised with the epsilon parameter used in conversion. Must have a single element and have the same type as the input type. Alternatively, a float value can be used and the appropriate tensor will be created.

  • prog – The sequence to extend with the execution of conversion.

  • dstType – The type of the tensor to be output. Must be FLOAT or equal to the input type.

  • debugContext – Optional debug information

Returns

A tensor where each element is the variance. Each element is the result of b = (1 / a) ^ 2, where a and b are the corresponding elements of src and the result tensor respectively.

poplar::Tensor invStdDevToVariance(poplar::Graph &graph, const poplar::Tensor &src, const float epsilon, poplar::program::Sequence &prog, const poplar::Type dstType = poplar::FLOAT, const poplar::DebugContext &debugContext = {})

Unnamed Group

poplar::Tensor map(poplar::Graph &graph, const expr::Expr &expr, const std::vector<poplar::Tensor> &ts, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Map an expression across tensors.

Element Wise Options

  • enableGenerateCodelet (true, false) [=true]

    If true (and all of the inputs are the same size and do not alias), a codelet is generated to execute this map operation. A codelet will not be generated if there is only a single operation unless forceGenerateCodelet is true.

Parameters
  • graph – The graph to update.

  • expr – The expression to map across the tensors. The placeholders in the expressions will be substituted with corresponding elements from the tensors in ts.

  • ts – The list of tensors to map the expression across. If elements from these tensors are used in binary/ternary operations in the expression the numpy-style broadcast rules are used to match the shapes of the tensors (see poputil::broadcastToMatch()).

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – A list of flags to pass to the expression evaluator.

Returns

A tensor containing the elements resulting from the application of the expression across the tensors.

inline poplar::Tensor map(poplar::Graph &graph, expr::UnaryOpType op, const poplar::Tensor &t, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
inline poplar::Tensor map(poplar::Graph &graph, expr::BinaryOpType op, const poplar::Tensor &a, const poplar::Tensor &b, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
inline poplar::Tensor map(poplar::Graph &graph, expr::TernaryOpType op, const poplar::Tensor &a, const poplar::Tensor &b, const poplar::Tensor &c, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

void mapInPlace(poplar::Graph &graph, const expr::Expr &expr, const std::vector<poplar::Tensor> &ts, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensors with the result of map().

inline void mapInPlace(poplar::Graph &graph, expr::UnaryOpType op, const poplar::Tensor &t, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
inline void mapInPlace(poplar::Graph &graph, expr::BinaryOpType op, const poplar::Tensor &a, const poplar::Tensor &b, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
inline void mapInPlace(poplar::Graph &graph, expr::TernaryOpType op, const poplar::Tensor &a, const poplar::Tensor &b, const poplar::Tensor &c, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline poplar::Tensor abs(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the absolute value of each element in A.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is equivalent to the result of std::abs(a), where a is an element of A.

inline void absInPlace(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of abs().

Unnamed Group

template<typename constType>
inline void checkTypes(poplar::Type elementType, constType constant)

Check that the host compile-time type constType is compatible with the run-time IPU type elementType.

Parameters
  • elementType – The run-time IPU type.

  • constant – Unused.

Template Parameters

constType – The host compile-time type.

Throws

std::runtime_error – If the types are not compatible.

template<>
inline void checkTypes<float>(poplar::Type elementType, float constant)
template<>
inline void checkTypes<double>(poplar::Type elementType, double constant)

Unnamed Group

inline poplar::Tensor add(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Add each element in A to the corresponding element in B.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • B – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is the result of a + b, where a and b are the corresponding elements of A and B tensors respectively.

template<typename constType>
inline poplar::Tensor add(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
template<typename constType>
inline poplar::Tensor add(poplar::Graph &graph, const constType A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline void addInPlace(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of add().

template<typename constType>
inline void addInPlace(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline poplar::Tensor atan2(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the two argument arctangent of each element in A with the corresponding element in B.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • B – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is the result of atan2(a, b), where a and b are the corresponding elements of A and B tensors respectively.

template<typename constType>
inline poplar::Tensor atan2(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
template<typename constType>
inline poplar::Tensor atan2(poplar::Graph &graph, const constType A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline void atan2InPlace(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of atan2().

template<typename constType>
inline void atan2InPlace(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline poplar::Tensor bitwiseAnd(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the bitwise AND of each element in A with the corresponding element in B.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • B – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is the result of a & b, where a and bare the corresponding elements of A and B tensors respectively.

template<typename constType>
inline poplar::Tensor bitwiseAnd(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
template<typename constType>
inline poplar::Tensor bitwiseAnd(poplar::Graph &graph, const constType A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline void bitwiseAndInPlace(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of bitwiseAnd().

template<typename constType>
inline void bitwiseAndInPlace(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline poplar::Tensor bitwiseOr(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the bitwise OR of each element in A with the corresponding element in B.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • B – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is the result of a | b, where a and b are the corresponding elements of A and B tensors respectively.

template<typename constType>
inline poplar::Tensor bitwiseOr(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
template<typename constType>
inline poplar::Tensor bitwiseOr(poplar::Graph &graph, const constType A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline void bitwiseOrInPlace(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of bitwiseOr().

template<typename constType>
inline void bitwiseOrInPlace(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline poplar::Tensor bitwiseXor(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the bitwise XOR of each element in A with the corresponding element in B.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • B – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is the result of a ^ b, where a and b are the corresponding elements of A and B tensors respectively.

template<typename constType>
inline poplar::Tensor bitwiseXor(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
template<typename constType>
inline poplar::Tensor bitwiseXor(poplar::Graph &graph, const constType A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline void bitwiseXorInPlace(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of bitwiseXor().

template<typename constType>
inline void bitwiseXorInPlace(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline poplar::Tensor bitwiseXnor(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the bitwise XNOR of each element in A with the corresponding element in B.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • B – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is the result of !(a ^ b), where a and b are the corresponding elements of A and B tensors respectively.

template<typename constType>
inline poplar::Tensor bitwiseXnor(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
template<typename constType>
inline poplar::Tensor bitwiseXnor(poplar::Graph &graph, const constType A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline void bitwiseXnorInPlace(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of bitwiseXnor().

template<typename constType>
inline void bitwiseXnorInPlace(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline poplar::Tensor div(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Divide each element in A by the corresponding element in B.

Parameters
  • graph – The graph to update.

  • A – The tensor of dividends.

  • B – The tensor of divisors.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is the result of a / b, where a and b are the corresponding elements of A and B tensors respectively.

template<typename constType>
inline poplar::Tensor div(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
template<typename constType>
inline poplar::Tensor div(poplar::Graph &graph, const constType A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline void divInPlace(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of div().

template<typename constType>
inline void divInPlace(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline poplar::Tensor eq(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Check if each element in A is equal to the corresponding element in B.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • B – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is the result of a == b, where a and b are the corresponding elements of A and B tensors respectively.

template<typename constType>
inline poplar::Tensor eq(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
template<typename constType>
inline poplar::Tensor eq(poplar::Graph &graph, const constType A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline void eqInPlace(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of eq().

template<typename constType>
inline void eqInPlace(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline poplar::Tensor gteq(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Check if each element in A is greater than or equal to the corresponding element in B.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • B – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is the result of a >= b, where a and b are the corresponding elements of A and B tensors respectively.

template<typename constType>
inline poplar::Tensor gteq(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
template<typename constType>
inline poplar::Tensor gteq(poplar::Graph &graph, const constType A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline void gteqInPlace(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of gteq().

template<typename constType>
inline void gteqInPlace(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline poplar::Tensor gt(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Check if each element in A is greater than the corresponding element in B.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • B – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is the result of a > b, where a and b are the corresponding elements of A and B tensors respectively.

template<typename constType>
inline poplar::Tensor gt(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
template<typename constType>
inline poplar::Tensor gt(poplar::Graph &graph, const constType A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline void gtInPlace(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of gt().

template<typename constType>
inline void gtInPlace(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline poplar::Tensor invStdDevToVariance(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Convert the inverse standard deviation to variance.

Parameters
  • graph – The graph to update.

  • A – The source tensor.

  • B – The destination tensor.

  • prog – The sequence to extend with the execution of conversion.

  • debugContext – Optional debug information.

  • options – A list of flags to pass to the expression evaluator.

Returns

A tensor where each element is the variance. Each element is the result of b = (1 / a) ^ 2, where a and b are the corresponding elements of A and B tensors respectively, and where A represents the inverse standard deviation and B the variance.

template<typename constType>
inline poplar::Tensor invStdDevToVariance(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
template<typename constType>
inline poplar::Tensor invStdDevToVariance(poplar::Graph &graph, const constType A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline void invStdDevToVarianceInPlace(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of invStdDevToVariance().

template<typename constType>
inline void invStdDevToVarianceInPlace(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline poplar::Tensor lteq(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Check if each element in A is less than or equal to the corresponding element in B.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • B – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information.

  • options – Element-wise options. See map().

Returns

A tensor where each element is the result of a <= b, where a and b are the corresponding elements of A and B tensors respectively.

template<typename constType>
inline poplar::Tensor lteq(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
template<typename constType>
inline poplar::Tensor lteq(poplar::Graph &graph, const constType A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline void lteqInPlace(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of lteq().

template<typename constType>
inline void lteqInPlace(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline poplar::Tensor logicalAnd(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the logical AND (&&) of each element in A with the corresponding element in B.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • B – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information.

  • options – Element-wise options. See map().

Returns

A tensor where each element is the result of a && b, where a and b are the corresponding elements of A and B tensors respectively.

template<typename constType>
inline poplar::Tensor logicalAnd(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
template<typename constType>
inline poplar::Tensor logicalAnd(poplar::Graph &graph, const constType A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline void logicalAndInPlace(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of logicalAnd().

template<typename constType>
inline void logicalAndInPlace(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline poplar::Tensor logicalOr(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the logical OR (||) of each element in A with the corresponding element in B.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • B – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information.

  • options – Element-wise options. See map().

Returns

A tensor where each element is the result of a || b, where a and b are the corresponding elements of A and B tensors respectively.

template<typename constType>
inline poplar::Tensor logicalOr(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
template<typename constType>
inline poplar::Tensor logicalOr(poplar::Graph &graph, const constType A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline void logicalOrInPlace(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of logicalOr().

template<typename constType>
inline void logicalOrInPlace(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline poplar::Tensor lt(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Check if each element in A is less than the corresponding element in B.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • B – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information.

  • options – Element-wise options. See map().

Returns

A tensor where each element is the result of a < b, where a and b are the corresponding elements of A and B tensors respectively.

template<typename constType>
inline poplar::Tensor lt(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
template<typename constType>
inline poplar::Tensor lt(poplar::Graph &graph, const constType A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline void ltInPlace(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of lt().

template<typename constType>
inline void ltInPlace(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline poplar::Tensor max(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the maximum of each element in A with the corresponding element in B.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • B – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information.

  • options – Element-wise options. See map().

Returns

A tensor where each element is the result of max(a, b), where a and b are the corresponding elements of A and B tensors respectively.

template<typename constType>
inline poplar::Tensor max(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
template<typename constType>
inline poplar::Tensor max(poplar::Graph &graph, const constType A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline void maxInPlace(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of max().

template<typename constType>
inline void maxInPlace(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline poplar::Tensor min(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the minimum of each element in A with the corresponding element in B.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • B – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information.

  • options – Element-wise options. See map().

Returns

A tensor where each element is the result of min(a, b), where a and b are the corresponding elements of A and B tensors respectively.

template<typename constType>
inline poplar::Tensor min(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
template<typename constType>
inline poplar::Tensor min(poplar::Graph &graph, const constType A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline void minInPlace(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of min().

template<typename constType>
inline void minInPlace(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline poplar::Tensor mul(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Multiply each element in A by the corresponding element in B.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • B – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information.

  • options – Element-wise options. See map().

Returns

A tensor where each element is the result of a * b, where a and b are the corresponding elements of A and B tensors respectively.

template<typename constType>
inline poplar::Tensor mul(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
template<typename constType>
inline poplar::Tensor mul(poplar::Graph &graph, const constType A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline void mulInPlace(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of mul().

template<typename constType>
inline void mulInPlace(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline poplar::Tensor neq(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Check if each element in A is not equal to the corresponding element in B.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • B – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information.

  • options – Element-wise options. See map().

Returns

A tensor where each element is the result of a != b, where a and b are the corresponding elements of A and B tensors respectively.

template<typename constType>
inline poplar::Tensor neq(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
template<typename constType>
inline poplar::Tensor neq(poplar::Graph &graph, const constType A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline void neqInPlace(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of neq().

template<typename constType>
inline void neqInPlace(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline poplar::Tensor pow(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute each element in A to the power of the corresponding element in B.

Parameters
  • graph – The graph to update.

  • A – The tensor of bases.

  • B – The tensor of exponents.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information.

  • options – Element-wise options. See map().

Returns

A tensor where each element is equal to pow(a, b), where a and b are the corresponding elements of A and B tensors respectively.

template<typename constType>
inline poplar::Tensor pow(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
template<typename constType>
inline poplar::Tensor pow(poplar::Graph &graph, const constType A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline void powInPlace(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of pow().

template<typename constType>
inline void powInPlace(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline poplar::Tensor rem(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the remainder of each element in A divided by the corresponding element in B.

Parameters
  • graph – The graph to update.

  • A – The tensor of dividends.

  • B – The tensor of divisors.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information.

  • options – Element-wise options. See map().

Returns

A tensor where each element is equal to a % b, where a and b are the corresponding elements of A and B tensors respectively.

template<typename constType>
inline poplar::Tensor rem(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
template<typename constType>
inline poplar::Tensor rem(poplar::Graph &graph, const constType A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline void remInPlace(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of rem().

template<typename constType>
inline void remInPlace(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline poplar::Tensor shiftLeft(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Shift the elements of A left by the corresponding elements of B.

Parameters
  • graph – The graph to update.

  • A – The tensor of elements which to left-shift.

  • B – The tensor of elements that describe the amount to left-shift A by.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information.

  • options – Element-wise options. See map().

Returns

A tensor where each element is equal to a << b, where a and b are the corresponding elements of A and B tensors respectively.

template<typename constType>
inline poplar::Tensor shiftLeft(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
template<typename constType>
inline poplar::Tensor shiftLeft(poplar::Graph &graph, const constType A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline void shiftLeftInPlace(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of shiftLeft().

template<typename constType>
inline void shiftLeftInPlace(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline poplar::Tensor shiftRight(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Shift the elements of A right by the corresponding elements of B.

Parameters
  • graph – The graph to update.

  • A – The tensor of elements which to right-shift.

  • B – The tensor of elements that describe the amount to right-shift by. A.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information.

  • options – Element-wise options. See map().

Returns

A tensor where each element is equal to a >> b (without sign extension), where a and b are the corresponding elements of A and B tensors respectively.

template<typename constType>
inline poplar::Tensor shiftRight(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
template<typename constType>
inline poplar::Tensor shiftRight(poplar::Graph &graph, const constType A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline void shiftRightInPlace(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of shiftRight().

template<typename constType>
inline void shiftRightInPlace(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline poplar::Tensor shiftRightSignExtend(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Shift the elements of A right with sign extension by the corresponding elements of B.

Parameters
  • graph – The graph to update.

  • A – The tensor of elements which to right-shift.

  • B – The tensor of elements that describe the amount to right-shift A by.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information.

  • options – Element-wise options. See map().

Returns

A tensor where each element is equal to a >> b with sign extension, where a and b are the corresponding elements of A and B tensors respectively.

template<typename constType>
inline poplar::Tensor shiftRightSignExtend(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
template<typename constType>
inline poplar::Tensor shiftRightSignExtend(poplar::Graph &graph, const constType A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline void shiftRightSignExtendInPlace(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of shiftRightSignExtend().

template<typename constType>
inline void shiftRightSignExtendInPlace(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline poplar::Tensor sub(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Subtract the elements of B from A and return the result in a new tensor.

Parameters
  • graph – The graph to update.

  • A – The tensor of elements which will be subtracted from.

  • B – The tensor of elements to subtract from A.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information.

  • options – Element-wise options. See map().

Returns

A tensor where each element is equal to a - b, where a and b are the corresponding elements of A and B tensors respectively.

template<typename constType>
inline poplar::Tensor sub(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
template<typename constType>
inline poplar::Tensor sub(poplar::Graph &graph, const constType A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline void subInPlace(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of sub().

template<typename constType>
inline void subInPlace(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline poplar::Tensor varianceToInvStdDev(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Convert variance to inverse standard deviation.

Parameters
  • graph – The graph to update.

  • A – The source tensor.

  • B – The destination tensor.

  • prog – The sequence to extend with the execution of conversion.

  • debugContext – Optional debug information.

Returns

A tensor where each element is the inverse of standard deviation. Each element is the result of b = sqrt(1 / a), where a and b are the corresponding elements of A and B tensors respectively, and where A represents the variance and B the inverse standard deviation.

template<typename constType>
inline poplar::Tensor varianceToInvStdDev(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
template<typename constType>
inline poplar::Tensor varianceToInvStdDev(poplar::Graph &graph, const constType A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Unnamed Group

inline void varianceToInvStdDevInPlace(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of varianceToInvStdDev().

template<typename constType>
inline void varianceToInvStdDevInPlace(poplar::Graph &graph, const poplar::Tensor &A, const constType B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Functions

inline poplar::Tensor asin(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the arc-sine of each element in A.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is equivalent to the result of std::asin(a), where a is an element of A.

inline void asinInPlace(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of asin().

inline poplar::Tensor bitwiseNot(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the bitwise NOT operation for each element in A.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is equivalent to the result of ~a, where a is an element of A.

inline void bitwiseNotInPlace(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of bitwiseNot().

inline poplar::Tensor cbrt(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the cube-root for each element in A.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is equivalent to the result of std::cbrt(a), where a is an element of A.

inline void cbrtInPlace(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of cbrt().

inline poplar::Tensor ceil(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the ceiling of each element in A.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is equivalent to the result of std::ceil(a), where a is an element of A.

inline void ceilInPlace(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of ceil().

inline poplar::Tensor countLeadingZeros(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the number of binary leading zeros of each element in A.

Note

If the element is zero then it is treated as 32 leading zeros.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is equivalent to the result of a ? __builtin_clz(a) : 32, where a is an element of A.

inline void countLeadingZerosInPlace(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of countLeadingZeros().

inline poplar::Tensor cos(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the cosine of each element in A.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is equivalent to the result of std::cos(a), where a is an element of A.

inline void cosInPlace(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of cos().

inline poplar::Tensor erf(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the error function of each element in A.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is equivalent to the result of std::erf(a), where a is an element of A.

inline void erfInPlace(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of erf().

inline poplar::Tensor exp(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the exponential of each element in A.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is equivalent to the result of std::exp(a), where a is an element of A.

inline void expInPlace(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of exp().

inline poplar::Tensor expm1(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the exponential of each element in A minus one.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is equivalent to the result of std::expm1(a), where a is an element of A.

inline void expm1InPlace(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of expm1().

inline poplar::Tensor floor(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the floor of each element in A.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is equivalent to the result of std::floor(a), where a is an element of A.

inline void floorInPlace(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of floor().

inline poplar::Tensor inv(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the inverse of each element in A.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is equivalent to the result of 1 / a, where a is an element of A.

inline void invInPlace(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of inv().

inline poplar::Tensor log(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the log base-e of each element in A.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is equivalent to the result of std::log(a), where a is an element of A.

inline void logInPlace(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of log().

inline poplar::Tensor log1p(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the log base-e of each element in A plus one.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is equivalent to the result of std::log1p(a), where a is an element of A.

inline void log1pInPlace(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of log1p().

inline poplar::Tensor logicalNot(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the logical NOT of each element in A.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is equivalent to the result of !a, where a is an element of A.

inline void logicalNotInPlace(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of logicalNot().

inline poplar::Tensor neg(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the negation of each element in A.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is equivalent to the result of -1 * a, where a is an element of A.

inline void negInPlace(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of neg().

inline poplar::Tensor popcount(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the number of 1 bits in each element of A.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is equivalent to the result of std::popcount(a), where a is an element of A.

inline void popcountInPlace(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of popcount().

inline poplar::Tensor signum(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the signum of each element in A.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is one of -1, 0 or +1 if the corresponding element in A was less than, equal to or greater than 0 respectively.

inline void signumInPlace(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of signum().

inline poplar::Tensor sin(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the sine of each element in A.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is equivalent to the result of std::sin(a), where a is an element of A.

inline void sinInPlace(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of sin().

inline poplar::Tensor tan(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the tangent of each element in A.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is equivalent to the result of std::tan(a), where a is an element of A.

inline void tanInPlace(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of tan().

inline poplar::Tensor tanh(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the hyperbolic tangent of each element in A.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is equivalent to the result of std::tanh(a), where a is an element of A.

inline void tanhInPlace(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of tanh().

inline poplar::Tensor round(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Round each element in A.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is equivalent to the result of std::round(a), where a is an element of A.

inline void roundInPlace(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of round().

inline poplar::Tensor sqrt(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the square-root for each element in A.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is equivalent to the result of std::sqrt(a), where a is an element of A.

inline void sqrtInPlace(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of sqrt().

inline poplar::Tensor square(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the square for each element in A.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is equivalent to the result of x * x, where a is an element of A.

inline void squareInPlace(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of square().

inline poplar::Tensor sigmoid(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the sigmoid (logistic) function for each element in A.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is equivalent to the result of 1 / (1 + exp(-x)), where a is an element of A.

inline void sigmoidInPlace(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of sigmoid().

inline poplar::Tensor rsqrt(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the reciprocal square root for each element in A.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is equivalent to the result of 1 / sqrt(a), where a is an element of A.

inline void rsqrtInPlace(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of rsqrt().

inline poplar::Tensor isFinite(poplar::Graph &graph, const poplar::Tensor &A, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Check if each element in A is finite.

Parameters
  • graph – The graph to update.

  • A – A tensor of elements.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information

  • options – Element-wise options. See map().

Returns

A tensor where each element is equivalent to the result of std::isfinite(a), where a is an element of A.

inline poplar::Tensor select(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, const poplar::Tensor &C, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Populate the returned tensor with elements from A or B depending on the corresponding element of C.

That is, for each element in the output compute c ? a : b, where a, b and c are the corresponding elements in the tensors A, B, C respectively.

Parameters
  • graph – The graph to update.

  • A – One of the tensors containing the elements to select from.

  • B – One of the tensors containing the elements to select from.

  • C – The tensor containing the elements to use as predicates.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information.

  • options – Element-wise options. See map().

Returns

A tensor containing the elements from A where the corresponding elements in C were not equal to zero and containing the elements from B where the corresponding elements in C were zero.

inline void selectInPlace(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, const poplar::Tensor &C, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of select().

inline poplar::Tensor clamp(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, const poplar::Tensor &C, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Populate the returned tensor with elements from A but clamp them such that each element is greater than or equal to the corresponding element in B and less than or equal to the corresponding element in C.

That is, for each element in the returned tensor compute: min(max(a, b), c), where a, and c are the corresponding elements in the tensors A, B and C respectively.

Parameters
  • graph – The graph to update.

  • A – The tensor containing the elements to clamp.

  • B – The tensor containing the elements to use as minimums.

  • C – The tensor containing the elements to use as maximums.

  • prog – The sequence to extend with the execution of the expression evaluation.

  • debugContext – Optional debug information.

  • options – Element-wise options. See map().

Returns

A tensor containing the elements resulting from the application of the expression across the tensors.

inline void clampInPlace(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, const poplar::Tensor &C, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update the input tensor with the result of clamp().

popops/ElementWiseUtil.hpp

Supporting functions for element-wise operations.

namespace popops

Common functions, such as elementwise and reductions.

Functions

poplar::Tensor createOutputForElementWiseOp(poplar::Graph &graph, const std::vector<poplar::Tensor> &inputs, const poplar::Type &outputType, const poplar::DebugContext &debugContext = {})

Create a tensor for use as the output of an element-wise operation (operation with no dependency between more than one element of the output and any given element of any input tensor).

Use the mapping of this tensor to map element-wise operations to tiles to produce an operation that is computationally balanced across tiles and which minimises exchange.

All input tensors must have the same shape.

Parameters
  • graph – A graph to add the tensor to and which the inputs belong to.

  • inputs – List of input tensors for the element-wise operation.

  • outputType – The element type of the tensor.

  • debugContext – Optional debug information.

Returns

A tensor with the same shape as the given inputs, with a complete tile mapping.

popops/Encoding.hpp

Encoding and generating ranges of integers.

namespace popops

Common functions, such as elementwise and reductions.

Functions

void encodeOneHot(poplar::Graph &graph, const poplar::Tensor &indices, const poplar::Tensor &encoded, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Encode a given set of indices as a set of one-hot vectors per-index with a hot element at that index.

That is, given a one-dimensional indices tensor with length N and a two-dimensional encoded tensor with shape N * x, encoded is a tensor with a single element equal to 1, and all others equal 0. The single hot element in each row is given by the indices in indices.

Parameters
  • graph – The graph to add the tensor and any vertices needed for the encoding to.

  • encoded – Tensor to encode output to.

  • indices – 1-dimensional tensor containing indices to encode as one-hot vectors. A codepoint MASKED_LABEL_CODE is reserved to indicate that the encoding is not done for that index.

  • prog – Sequence which the programs that perform the encoding are added to.

  • debugContext – Optional debug information.

Throws
  • poputil::poplibs_error – If encoded is not two dimensional.

  • poputil::poplibs_error – If indices and encoded do not have the same number of rows.

  • poputil::poplibs_error – If elements of indices are not an integer type.

void encodeOneHot(poplar::Graph &graph, const poplar::Tensor &indices, const poplar::Tensor &encoded, poplar::program::Sequence &prog, const poplar::Tensor &on, const poplar::Tensor &off, const poplar::DebugContext &debugContext = {})

Encode a given set of indices as a set of one-hot vectors per-index with a hot element at that index.

That is, given a one-dimensional indices tensor with length N and a two-dimensional encoded tensor with shape N * x encoded is a tensor with a single element equal to on, and all others equal to off as given by the user. The single hot element in each row is given by the indices in indices.

Parameters
  • graph – The graph to add the tensor and any vertices needed for the encoding to.

  • encoded – Tensor to encode output to.

  • indices – 1-dimensional tensor containing indices to encode as one-hot vectors.

  • prog – Sequence which the programs that perform the encoding are added to.

  • debugContext – Optional debug information.

  • on – Value which represents the “On” state in the one hot encoded output.

  • off – Value which represents the “Off” state.

Throws
  • poputil::poplibs_error – If encoded is not two dimensional.

  • poputil::poplibs_error – If indices and encoded do not have the same number of rows.

  • poputil::poplibs_error – If elements of indices are not an integer type.

void iota(poplar::Graph &graph, const poplar::Tensor &t, unsigned startInteger, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Fill a tensor with a right-open range of unsigned integers: [startInteger, startInteger + length), where length is the number of elements in the mapped 1-D output tensor t.

The output tensor t must be of type UNSIGNED_INT.

Parameters
  • graph – The graph to add the tensor and any vertices needed for the operation.

  • t – 1-D tensor to write the encoded output to. The tensor must be mapped.

  • startInteger – The start value in the output range.

  • prog – Sequence which the programs that perform the encoding are added to.

  • debugContext – Optional debug information.

Throws
  • poputil::poplibs_error – If the rank of t is greater than 1.

  • poputil::poplibs_error – If the type of t is not UNSIGNED_INT.

void iota(poplar::Graph &graph, const poplar::Tensor &t, int startInteger, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Fill a tensor with a right-open range of signed integers: [startInteger, startInteger + length), where length is the number of elements in the mapped 1-D output tensor t.

The output tensor t must be of type INT.

Parameters
  • graph – The graph to add the tensor and any vertices needed for the operation.

  • t – 1-D tensor to write the encoded output to. The tensor must be mapped.

  • startInteger – The start value in the output range.

  • prog – Sequence which the programs that perform the encoding are added to.

  • debugContext – Optional debug information.

Throws
  • poputil::poplibs_error – If the rank of t is greater than 1.

  • poputil::poplibs_error – If the type of t is not INT.

popops/EncodingConstants.hpp

Constants used by encoding functions.

Unnamed Group

EPS_LOG_N_FLOAT

Small constant used in natural logarithm computation.

EPS_LOG_N_HALF

Defines

MASKED_LABEL_CODE

Code point for masked index (an index to be ignored).

popops/Expr.hpp

Expressions with elements of tensors.

Defines

POPLIBS_DEFINE_EXPR_UNARY_OP(Name, Op)
POPLIBS_DEFINE_EXPR_UNARY_OP_AND_SYMBOL(Name, Op, Sym)
POPLIBS_DEFINE_EXPR_BINARY_OP(Name, Op)
POPLIBS_DEFINE_EXPR_BINARY_OP_AND_SYMBOL(Name, Op, Sym)
POPLIBS_DEFINE_EXPR_TERNARY_OP(Name, Op)
namespace popops

Common functions, such as elementwise and reductions.

namespace expr

Functions

std::ostream &operator<<(std::ostream &os, const Expr &expr)
bool deepEquals(const Expr &a, const Expr &b)
inline ConstHalf operator""_half(long double x)
const PlaceHolder _1 (1)
const PlaceHolder _2 (2)
const PlaceHolder _3 (3)
const PlaceHolder _4 (4)
const PlaceHolder _5 (5)
const PlaceHolder _6 (6)
const PlaceHolder _7 (7)
const PlaceHolder _8 (8)
const PlaceHolder _9 (9)
const PlaceHolder _10 (10)
const PlaceHolder _11 (11)
const PlaceHolder _12 (12)
const PlaceHolder _13 (13)
const PlaceHolder _14 (14)
const PlaceHolder _15 (15)
const PlaceHolder _16 (16)
const PlaceHolder _17 (17)
const PlaceHolder _18 (18)
const PlaceHolder _19 (19)
const PlaceHolder _20 (20)
inline BitwiseNot operator~(const Expr &a)
inline Not operator!(const Expr &a)
inline Neg operator-(const Expr &a)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), Add >::type operator+ (const T &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), Add >::type operator+ (const Expr &a, const T &b)
inline Add operator+(const Expr &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), BitwiseAnd >::type operator& (const T &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), BitwiseAnd >::type operator& (const Expr &a, const T &b)
inline BitwiseAnd operator&(const Expr &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), BitwiseOr >::type operator| (const T &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), BitwiseOr >::type operator| (const Expr &a, const T &b)
inline BitwiseOr operator|(const Expr &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), BitwiseXor >::type operator^ (const T &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), BitwiseXor >::type operator^ (const Expr &a, const T &b)
inline BitwiseXor operator^(const Expr &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), Divide >::type operator/ (const T &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), Divide >::type operator/ (const Expr &a, const T &b)
inline Divide operator/(const Expr &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), Equal >::type operator== (const T &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), Equal >::type operator== (const Expr &a, const T &b)
inline Equal operator==(const Expr &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), Gte >::type operator>= (const T &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), Gte >::type operator>= (const Expr &a, const T &b)
inline Gte operator>=(const Expr &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), Gt >::type operator> (const T &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), Gt >::type operator> (const Expr &a, const T &b)
inline Gt operator>(const Expr &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), Lte >::type operator<= (const T &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), Lte >::type operator<= (const Expr &a, const T &b)
inline Lte operator<=(const Expr &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), And >::type operator&& (const T &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), And >::type operator&& (const Expr &a, const T &b)
inline And operator&&(const Expr &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), Or >::type operator|| (const T &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), Or >::type operator|| (const Expr &a, const T &b)
inline Or operator||(const Expr &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), Lt >::type operator< (const T &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), Lt >::type operator< (const Expr &a, const T &b)
inline Lt operator<(const Expr &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), Mul >::type operator* (const T &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), Mul >::type operator* (const Expr &a, const T &b)
inline Mul operator*(const Expr &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), NotEqual >::type operator!= (const T &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), NotEqual >::type operator!= (const Expr &a, const T &b)
inline NotEqual operator!=(const Expr &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), Rem >::type operator% (const T &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), Rem >::type operator% (const Expr &a, const T &b)
inline Rem operator%(const Expr &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), Shl >::type operator<< (const T &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), Shl >::type operator<< (const Expr &a, const T &b)
inline Shl operator<<(const Expr &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), Shr >::type operator>> (const T &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), Shr >::type operator>> (const Expr &a, const T &b)
inline Shr operator>>(const Expr &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), Sub >::type operator- (const T &a, const Expr &b)
template<typename T> inline std::enable_if<!std::is_base_of< Expr, T >::value &&poplar::TypeTraits::isSimpleType< T >), Sub >::type operator- (const Expr &a, const T &b)
inline Sub operator-(const Expr &a, const Expr &b)
class Abs : public popops::expr::UnaryOp

Public Functions

inline Abs(const Expr &a)
class Add : public popops::expr::BinaryOp

Public Functions

inline Add(const Expr &a, const Expr &b)
class And : public popops::expr::BinaryOp

Public Functions

inline And(const Expr &a, const Expr &b)
class Any
#include <Expr.hpp>

A class that can contain any expression, useful for building up expression trees dynamically where the type of the outermost expression may change.

Public Functions

inline Any(const Expr &expr)
inline operator Expr&()
inline operator const Expr&() const
inline std::string name(const std::vector<poplar::Tensor> &inputs) const

Private Members

std::unique_ptr<Expr> expr
class Asin : public popops::expr::UnaryOp

Public Functions

inline Asin(const Expr &a)
class Atan2 : public popops::expr::BinaryOp

Public Functions

inline Atan2(const Expr &a, const Expr &b)
class BinaryOp : public popops::expr::ExprType<BinaryOp>
#include <Expr.hpp>

A class to represent expressions with binary operators.

Subclassed by popops::expr::Add, popops::expr::And, popops::expr::Atan2, popops::expr::BitwiseAnd, popops::expr::BitwiseOr, popops::expr::BitwiseXnor, popops::expr::BitwiseXor, popops::expr::Divide, popops::expr::Equal, popops::expr::Gt, popops::expr::Gte, popops::expr::InvStdDevToVariance, popops::expr::Lt, popops::expr::Lte, popops::expr::Max, popops::expr::Min, popops::expr::Mul, popops::expr::NotEqual, popops::expr::Or, popops::expr::Pow, popops::expr::Rem, popops::expr::Shl, popops::expr::Shr, popops::expr::ShrSE, popops::expr::Sub, popops::expr::VarianceToInvStdDev

Public Functions

inline BinaryOp(BinaryOpType type, const Expr &a, const Expr &b)
inline BinaryOpType getOpType() const
inline const Expr &getLHS() const
inline const Expr &getRHS() const
inline virtual std::unique_ptr<Expr> clone() const override
virtual std::string name(const std::vector<poplar::Tensor> &inputs) const override
inline std::string exprName(const std::vector<poplar::Tensor> &inputs) const
virtual bool deepEquals(const Expr &other) const override
virtual void print(std::ostream &os, unsigned indent = 0, bool prettyPrint = true) const override

Private Members

BinaryOpType type
std::unique_ptr<Expr> a
std::unique_ptr<Expr> b
class BitwiseAnd : public popops::expr::BinaryOp

Public Functions

inline BitwiseAnd(const Expr &a, const Expr &b)
class BitwiseNot : public popops::expr::UnaryOp

Public Functions

inline BitwiseNot(const Expr &a)
class BitwiseOr : public popops::expr::BinaryOp

Public Functions

inline BitwiseOr(const Expr &a, const Expr &b)
class BitwiseXnor : public popops::expr::BinaryOp

Public Functions

inline BitwiseXnor(const Expr &a, const Expr &b)
class BitwiseXor : public popops::expr::BinaryOp

Public Functions

inline BitwiseXor(const Expr &a, const Expr &b)
class Cast : public popops::expr::ExprType<Cast>
#include <Expr.hpp>

A class to represent cast expressions.

Public Functions

inline Cast(const Expr &a_, const poplar::Type bType_)
inline const Expr &getLHS() const
inline const poplar::Type &getRHSType() const
inline virtual std::unique_ptr<Expr> clone() const override
virtual std::string name(const std::vector<poplar::Tensor> &inputs) const override
virtual bool deepEquals(const Expr &other) const override
virtual void print(std::ostream &os, unsigned indent = 0, bool prettyPrint = true) const override

Private Members

std::unique_ptr<Expr> a
poplar::Type bType
class Cbrt : public popops::expr::UnaryOp

Public Functions

inline Cbrt(const Expr &a)
class Ceil : public popops::expr::UnaryOp

Public Functions

inline Ceil(const Expr &a)
class Clamp : public popops::expr::TernaryOp

Public Functions

inline Clamp(const Expr &a, const Expr &b, const Expr &c)
class Const : public popops::expr::ExprType<Const>
#include <Expr.hpp>

A class to represent constant expressions.

Subclassed by popops::expr::ConstHalf

Public Functions

template<typename T>
inline Const(T x)
inline Const(poplar::TypeTraits typeTraits_, poplar::Type type_, const char *data_)
inline char *getData() const
inline const poplar::TypeTraits &getTypeTraits() const
inline const poplar::Type &getType() const
std::string printValue() const
double getDataAsDouble() const
std::uint64_t getDataForUnsignedIntegral() const
inline virtual std::unique_ptr<Expr> clone() const override
virtual std::string name(const std::vector<poplar::Tensor>&) const override
virtual bool deepEquals(const Expr &other) const override
virtual void print(std::ostream &os, unsigned indent = 0, bool prettyPrint = true) const override

Protected Functions

template<typename T>
inline Const(T x, bool isHalfType)

Private Members

poplar::TypeTraits typeTraits
poplar::Type type
std::unique_ptr<char[]> data
class ConstHalf : public popops::expr::Const
#include <Expr.hpp>

A class to represent constant expressions of type half.

Public Functions

inline ConstHalf(float x)
class Cos : public popops::expr::UnaryOp

Public Functions

inline Cos(const Expr &a)
class Divide : public popops::expr::BinaryOp

Public Functions

inline Divide(const Expr &a, const Expr &b)
class Equal : public popops::expr::BinaryOp

Public Functions

inline Equal(const Expr &a, const Expr &b)
class Erf : public popops::expr::UnaryOp

Public Functions

inline Erf(const Expr &a)
class Exp : public popops::expr::UnaryOp

Public Functions

inline Exp(const Expr &a)
class Expm1 : public popops::expr::UnaryOp

Public Functions

inline Expm1(const Expr &a)
class Expr
#include <Expr.hpp>

Type to represent element expressions.

This class represents an expression that can be applied to elements of tensors.

The Expr type is an abstract type which can be instantiated by its sub-classes to build up expressions, for example: Tanh(Add(Square(_1), Const(3)))).

Expressions can be applied to tensors with the popops::map() and popops::mapInPlace() functions.

Subclassed by popops::expr::ExprType< BinaryOp >, popops::expr::ExprType< Cast >, popops::expr::ExprType< Const >, popops::expr::ExprType< PlaceHolder >, popops::expr::ExprType< TernaryOp >, popops::expr::ExprType< UnaryOp >, popops::expr::ExprType< T >

Public Functions

virtual ~Expr()
template<class T>
inline bool isA() const
template<class T>
inline T *getAs()
template<class T>
inline const T *getAs() const
virtual std::unique_ptr<Expr> clone() const = 0
virtual std::string name(const std::vector<poplar::Tensor>&) const = 0
virtual bool deepEquals(const Expr &other) const = 0
virtual void print(std::ostream &os, unsigned indent = 0, bool prettyPrint = true) const = 0

Protected Types

using ExprClassID = void (*)(void)

Protected Functions

inline Expr(ExprClassID classId)

Protected Attributes

ExprClassID classId
template<class T>
class ExprType : public popops::expr::Expr

Subclassed by popops::expr::BinaryOp, popops::expr::Cast, popops::expr::Const, popops::expr::PlaceHolder, popops::expr::TernaryOp, popops::expr::UnaryOp

Public Functions

inline ExprType()

Private Static Functions

static void loc()
static inline ExprClassID getClassId()

Friends

friend class Expr
class Floor : public popops::expr::UnaryOp

Public Functions

inline Floor(const Expr &a)
class Gt : public popops::expr::BinaryOp

Public Functions

inline Gt(const Expr &a, const Expr &b)
class Gte : public popops::expr::BinaryOp

Public Functions

inline Gte(const Expr &a, const Expr &b)
class Inv : public popops::expr::UnaryOp

Public Functions

inline Inv(const Expr &a)
class InvStdDevToVariance : public popops::expr::BinaryOp

Public Functions

inline InvStdDevToVariance(const Expr &a, const Expr &b)
class IsFinite : public popops::expr::UnaryOp

Public Functions

inline IsFinite(const Expr &a)
class IsInf : public popops::expr::UnaryOp

Public Functions

inline IsInf(const Expr &a)
class IsNaN : public popops::expr::UnaryOp

Public Functions

inline IsNaN(const Expr &a)
class Log : public popops::expr::UnaryOp

Public Functions

inline Log(const Expr &a)
class Log1p : public popops::expr::UnaryOp

Public Functions

inline Log1p(const Expr &a)
class Lt : public popops::expr::BinaryOp

Public Functions

inline Lt(const Expr &a, const Expr &b)
class Lte : public popops::expr::BinaryOp

Public Functions

inline Lte(const Expr &a, const Expr &b)
class Max : public popops::expr::BinaryOp

Public Functions

inline Max(const Expr &a, const Expr &b)
class Min : public popops::expr::BinaryOp

Public Functions

inline Min(const Expr &a, const Expr &b)
class Mul : public popops::expr::BinaryOp

Public Functions

inline Mul(const Expr &a, const Expr &b)
class Neg : public popops::expr::UnaryOp

Public Functions

inline Neg(const Expr &a)
class Not : public popops::expr::UnaryOp

Public Functions

inline Not(const Expr &a)
class NotEqual : public popops::expr::BinaryOp

Public Functions

inline NotEqual(const Expr &a, const Expr &b)
class Or : public popops::expr::BinaryOp

Public Functions

inline Or(const Expr &a, const Expr &b)
class PlaceHolder : public popops::expr::ExprType<PlaceHolder>

Public Functions

inline PlaceHolder(unsigned index)
inline unsigned getIndex() const
inline std::unique_ptr<Expr> clone() const override
std::string name(const std::vector<poplar::Tensor> &inputs) const override
bool deepEquals(const Expr &other) const override
void print(std::ostream &os, unsigned indent = 0, bool prettyPrint = true) const override

Private Members

unsigned index
class Pow : public popops::expr::BinaryOp

Public Functions

inline Pow(const Expr &a, const Expr &b)
class Rem : public popops::expr::BinaryOp

Public Functions

inline Rem(const Expr &a, const Expr &b)
class Round : public popops::expr::UnaryOp

Public Functions

inline Round(const Expr &a)
class Rsqrt : public popops::expr::UnaryOp

Public Functions

inline Rsqrt(const Expr &a)
class Select : public popops::expr::TernaryOp

Public Functions

inline Select(const Expr &a, const Expr &b, const Expr &c)
class Shl : public popops::expr::BinaryOp

Public Functions

inline Shl(const Expr &a, const Expr &b)
class Shr : public popops::expr::BinaryOp

Public Functions

inline Shr(const Expr &a, const Expr &b)
class ShrSE : public popops::expr::BinaryOp

Public Functions

inline ShrSE(const Expr &a, const Expr &b)
class Sigmoid : public popops::expr::UnaryOp

Public Functions

inline Sigmoid(const Expr &a)
class Signum : public popops::expr::UnaryOp

Public Functions

inline Signum(const Expr &a)
class Sin : public popops::expr::UnaryOp

Public Functions

inline Sin(const Expr &a)
class Sqrt : public popops::expr::UnaryOp

Public Functions

inline Sqrt(const Expr &a)
class Square : public popops::expr::UnaryOp

Public Functions

inline Square(const Expr &a)
class Sub : public popops::expr::BinaryOp

Public Functions

inline Sub(const Expr &a, const Expr &b)
class Tan : public popops::expr::UnaryOp

Public Functions

inline Tan(const Expr &a)
class Tanh : public popops::expr::UnaryOp

Public Functions

inline Tanh(const Expr &a)
class TernaryOp : public popops::expr::ExprType<TernaryOp>
#include <Expr.hpp>

A class to represent expressions with ternary operators.

Subclassed by popops::expr::Clamp, popops::expr::Select

Public Functions

inline TernaryOp(TernaryOpType type, const Expr &a, const Expr &b, const Expr &c)
inline TernaryOpType getOpType() const
inline const Expr &getArg0() const
inline const Expr &getArg1() const
inline const Expr &getArg2() const
inline virtual std::unique_ptr<Expr> clone() const override
virtual std::string name(const std::vector<poplar::Tensor> &inputs) const override
inline std::string exprName(const std::vector<poplar::Tensor> &inputs) const
virtual bool deepEquals(const Expr &other) const override
virtual void print(std::ostream &os, unsigned indent = 0, bool prettyPrint = true) const override

Private Members

TernaryOpType type
std::unique_ptr<Expr> a
std::unique_ptr<Expr> b
std::unique_ptr<Expr> c
class UnaryOp : public popops::expr::ExprType<UnaryOp>
#include <Expr.hpp>

A class to represent expressions with unary operators.

Subclassed by popops::expr::Abs, popops::expr::Asin, popops::expr::BitwiseNot, popops::expr::Cbrt, popops::expr::Ceil, popops::expr::Cos, popops::expr::Erf, popops::expr::Exp, popops::expr::Expm1, popops::expr::Floor, popops::expr::Inv, popops::expr::IsFinite, popops::expr::IsInf, popops::expr::IsNaN, popops::expr::Log, popops::expr::Log1p, popops::expr::Neg, popops::expr::Not, popops::expr::Round, popops::expr::Rsqrt, popops::expr::Sigmoid, popops::expr::Signum, popops::expr::Sin, popops::expr::Sqrt, popops::expr::Square, popops::expr::Tan, popops::expr::Tanh

Public Functions

inline UnaryOp(UnaryOpType type, const Expr &a)
inline UnaryOpType getOpType() const
inline const Expr &getArg() const
inline virtual std::unique_ptr<Expr> clone() const override
virtual std::string name(const std::vector<poplar::Tensor> &inputs) const override
inline std::string exprName(const std::vector<poplar::Tensor> &inputs) const
virtual bool deepEquals(const Expr &other) const override
virtual void print(std::ostream &os, unsigned indent = 0, bool prettyPrint = true) const override

Private Members

UnaryOpType type
std::unique_ptr<Expr> a
class VarianceToInvStdDev : public popops::expr::BinaryOp

Public Functions

inline VarianceToInvStdDev(const Expr &a, const Expr &b)

popops/ExprOp.hpp

Operators used in expressions with elements of tensors.

namespace popops

Common functions, such as elementwise and reductions.

namespace expr

Unnamed Group

enum TernaryOpType

Enumeration defining operators used by Expr for building expressions.

Values:

enumerator CLAMP
enumerator SELECT
enum BinaryOpType

Values:

enumerator ADD
enumerator ATAN2
enumerator BITWISE_AND
enumerator BITWISE_OR
enumerator BITWISE_XOR
enumerator BITWISE_XNOR
enumerator DIVIDE
enumerator EQUAL
enumerator GREATER_THAN_EQUAL
enumerator GREATER_THAN
enumerator INV_STD_DEV_TO_VARIANCE
enumerator LESS_THAN_EQUAL
enumerator LOGICAL_AND
enumerator LOGICAL_OR
enumerator LESS_THAN
enumerator MAXIMUM
enumerator MINIMUM
enumerator MULTIPLY
enumerator NOT_EQUAL
enumerator POWER
enumerator REMAINDER
enumerator SHIFT_LEFT
enumerator SHIFT_RIGHT
enumerator SHIFT_RIGHT_SIGN_EXTEND
enumerator SUBTRACT
enumerator VARIANCE_TO_INV_STD_DEV
enum UnaryOpType

Values:

enumerator ABSOLUTE
enumerator ASIN
enumerator BITWISE_NOT
enumerator CBRT
enumerator CEIL
enumerator COS
enumerator COUNT_LEADING_ZEROS
enumerator ERF
enumerator EXPONENT
enumerator EXPONENT_MINUS_ONE
enumerator FLOOR
enumerator INVERSE
enumerator IS_FINITE
enumerator IS_INF
enumerator IS_NAN
enumerator LOGARITHM
enumerator LOGARITHM_ONE_PLUS
enumerator LOGICAL_NOT
enumerator NEGATE
enumerator POPCOUNT
enumerator RELU
enumerator SIGNUM
enumerator SIN
enumerator TAN
enumerator TANH
enumerator ROUND
enumerator SQRT
enumerator SQUARE
enumerator SIGMOID
enumerator RSQRT

popops/Fill.hpp

Functions to fill tensors with values.

Supported fillValue types are FLOAT, HALF, INT and UNSIGNED_INT.

namespace popops

Common functions, such as elementwise and reductions.

Functions

template<typename FillValueType>
void fill(poplar::Graph &graph, poplar::Tensor t, const std::vector<poplar::Interval> &tileRegions, unsigned tile, poplar::ComputeSet fillCS, FillValueType fillValue)

Appends vertices to fillCS which fills elements in tileRegions of t which reside on tile tile.

Parameters
  • graph – The graph that the operation will be added to.

  • t – The tensor whose elements are to be set to zero.

  • tileRegions – Region mapping of the tensor on tile.

  • tile – Tile which the regions relate to.

  • fillCS – Compute set to add the operation into.

  • fillValue – The value to fill t with.

template<typename FillValueType>
void fill(poplar::Graph &graph, const poplar::Tensor &t, unsigned tile, poplar::ComputeSet fillCS, FillValueType fillValue)

Appends vertices to fillCS which fills all elements of t which reside on tile tile.

Parameters
  • graph – The graph that the operation will be added to.

  • t – The tensor whose elements are to be set to zero.

  • tile – Tile on which the tensor is mapped to.

  • fillCS – Compute set to add the operation into.

  • fillValue – The value to fill t with.

template<typename FillValueType>
void fill(poplar::Graph &graph, const poplar::Tensor &t, const std::vector<std::vector<poplar::Interval>> &mapping, poplar::ComputeSet fillCS, FillValueType fillValue)

Appends vertices to fillCS which fills elements in mapping of t which reside on tiles represented with mapping.

Parameters
  • graph – The graph that the operation will be added to.

  • t – The tensor whose elements are to be set to zero.

  • mapping – The tensor’s region mapping per tile. Each element describes a region mapping of a tile (ordered). That is, mapping[0] is the region of t mapped onto tile 0.

  • fillCS – Compute set to add the operation into.

  • fillValue – The value to fill t with.

template<typename FillValueType>
void fill(poplar::Graph &graph, const poplar::Tensor &t, poplar::program::Sequence &prog, FillValueType fillValue, const poplar::DebugContext &debugContext = {})

Appends programs to prog which fills all elements of the tensor t with a value of fillValue.

Note

The type of fillValue must be compatible with the element type of t.

Parameters
  • graph – The graph that the operation will be added to.

  • t – The tensor whose elements are to be filled.

  • prog – Poplar program sequence to append the operation onto.

  • fillValue – The value to fill t with.

  • debugContext – Optional debug information.

popops/Gather.hpp

Support for gather operations.

namespace popops

Common functions, such as elementwise and reductions.

Functions

poplar::Tensor createGatherInput(poplar::Graph &graph, const poplar::Type &type, const std::vector<std::size_t> &operandShape, unsigned axis, GatherParams params = {}, const poplar::DebugContext &debugContext = {})

Create the input of the gather with only a single gather axis.

This is designed to spread the gather, and each dynamic slice within the gather, across the tiles evenly.

Parameters
  • graph – The Poplar graph.

  • type – The data type of the required tensor.

  • operandShape – The desired shape of the input.

  • axis – The axis that will be gathered on.

  • params – The same parameters as used by the gather().

  • debugContext – Optional debug information.

Returns

A tensor with the desired shape.

poplar::Tensor gather(poplar::Graph &graph, const poplar::Tensor &input, const poplar::Tensor &indices, unsigned axis, poplar::program::Sequence &prog, GatherParams params, const poplar::DebugContext &debugContext = {})

The gather operation stitches together several slices (each slice at a potentially different runtime offset) of an input tensor.

To achieve the best performance, the input tensor should be created with createGatherInput().

Note

The indices are treated as offsets along the chosen axis. At this offset a slice of depth 1 in the axis dimension is taken.

Parameters
  • graph – The Poplar graph.

  • input – The tensor we are gathering from of rank x.

  • indices – Tensor containing the indices of the slices we gather of rank y.

  • axis – The axis to gather on. The axis must be less than x.

  • prog – The program sequence to add this operation to.

  • params – Parameters for the form of the gather.

  • debugContext – Optional debug information.

Returns

The gathered slices from the input with rank y + (x - 1).

poplar::Tensor createGatherInput(poplar::Graph &graph, const poplar::Type &type, const std::vector<std::size_t> &inputShape, const std::vector<std::size_t> &sliceSizes, std::vector<unsigned> startIndexMap, const poplar::DebugContext &debugContext = {})

Create the input of the gather given a start index map.

This is designed to spread the gather, and each dynamic slice within the gather, across the tiles evenly.

Parameters
  • graph – The Poplar graph.

  • type – The data type of the required tensor.

  • inputShape – The desired shape of the input.

  • sliceSizessliceSizes[i] is the bounds for the slice on dimension i.

  • startIndexMap – A map that describes how to map indices in indices in gather() to legal indices into the input.

  • debugContext – Optional debug information.

Returns

A tensor with the desired shape.

poplar::Tensor gather(poplar::Graph &graph, const poplar::Tensor &input, const poplar::Tensor &indices, std::size_t indexVectorDim, const std::vector<std::size_t> &offsetDims, const std::vector<std::size_t> &sliceSizes, const std::vector<std::size_t> &collapsedSliceDims, const std::vector<unsigned> &startIndexMap, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

The gather operation stitches together several slices (each slice at a potentially different runtime offset) of an input tensor.

To achieve the best performance, the input tensor should be created with createGatherInput().

Example use where we want to take 2 elements from a given tensor:

// The runtime defined input tensor
input = {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}}; // shape = {3, 3}

// The runtime defined indices tensor containing the coords we want to
// extract
indices = {{1, 1}, {2, 1}}; // shape = {2, 2}

// We want to extract elems at [1, 1] and [2, 1] from the input
// To achieve this we need to define the other parameters correctly

// We want to treat the rows of indices as coords into the input tensor
indexVectorDim = 1;

// None of the output dims will correspond to any of the input dims
offsetDims = {};

// We will be taking 1x1 slices to pick single elements
sliceSizes = {1, 1};

// We will collapse both dims of the input slices
collapsedSliceDims = {0, 1};

// An identity mapping between the indices coords and the input dims
startIndexMap = {0, 1};

// Perform the desired gather
result = gather(input,
                indices,
                indexVectorDim,
                offsetDims,
                sliceSizes
                collapsedSliceDims,
                startIndexMap) = {5, 8}; // shape = {2}

Note

When indexVectorDim == indices.rank(), the indices are interpreted as scalar values.

Note

This is a near direct port of https://www.tensorflow.org/xla/operation_semantics#gather from tensorflow/compiler/xla/service/gather_expander.cc

Parameters
  • graph – The Poplar graph.

  • input – The tensor we are gathering from.

  • indices – Tensor containing the starting indices of the slices we gather.

  • indexVectorDim – The dimension in indices that “contains” the starting indices.

  • offsetDims – The set of dimensions in the output shape that offset into a tensor sliced from input.

  • sliceSizessliceSizes[i] is the bounds for the slice on dimension i.

  • collapsedSliceDims – The set of dimensions in each slice that are collapsed away. These dimensions must have size 1.

  • startIndexMap – A map that describes how to map indices in indices to legal indices into input.

  • prog – The program sequence to add this operation to.

  • debugContext – Optional debug information.

Returns

The gathered slices from the input.

struct GatherParams
#include <Gather.hpp>

Defines the parameters to a gather operation.

Public Functions

GatherParams() = default
inline GatherParams(std::size_t maxElementsPerTile_)

Public Members

std::size_t maxElementsPerTile = 65535

Suggested maximum number of elements to place on a tile.

This can be used to balance the gather across the IPUs.

popops/GatherStatistics.hpp

Functions to generate histograms of data.

namespace popops

Common functions, such as elementwise and reductions.

Functions

poplar::Tensor histogram(poplar::Graph &graph, const poplar::Tensor &input, const poplar::Tensor &levels, bool absoluteOfInput, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Gather a histogram representing the statistics of the input tensor.

Compare each element of input to each value in the levels tensor. Where input <= levels[N] and input < levels[N-1], the histogram entry for that range will be incremented by 1. The lowest and highest histogram entries are bounded only by levels[0] and levels[N-1], respectively. The function returns a histogram tensor with a size one greater than the size of the levels tensor.

Histogram options

  • useFloatArithmetic (true, false) [=false]

    If true, use float arithmetic internally and return a float result rather than an unsigned int result. This has the benefit of simplicity and speed, but integer accuracy limited by the 32-bit float data format (integers > 16,777,216 are not all exactly represented).

Parameters
  • graph – The Poplar graph.

  • input – The input tensor on which to gather histogram statistics.

  • levels – The levels defining the comparisons to carry out in generating the histogram output.

  • absoluteOfInput – If true, the absolute value of each input is calculated before comparison to the levels data.

  • prog – A sequence program to which the code performing the histogram will be appended.

  • debugContext – Optional debug information.

  • options – A list of options to control the operation of the histogram function.

Returns

A tensor of type unsigned int that contains the levels + 1 histogram results. If the option useFloatArithmetic is “true” the returned tensor will have type float.

void histogram(poplar::Graph &graph, const poplar::Tensor &input, poplar::Tensor &output, bool updateOutput, const poplar::Tensor &levels, bool absoluteOfInput, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Fill a tensor with a histogram representing the statistics of the input tensor.

Performs the same function as histogram() but writes the output to output. This must be one element larger than the levels tensor and have elements of type float or unsigned integer. The type of the output tensor will determine the type of arithmetic used internally, as described above.

This function allows histogram results to be accumulated over a number of calls using the updateOutput parameter.

Parameters
  • graph – The Poplar graph.

  • input – The input tensor on which to gather histogram statistics.

  • input – The output tensor which will store the histogram results.

  • updateOutput – If true, the histogram counts will be added to the values already in output.

  • levels – The levels defining the comparisons to carry out in generating the histogram output.

  • absoluteOfInput – If true, the absolute value of each input is calculated before comparison to the levels data.

  • prog – A sequence program to which the code performing the histogram will be appended.

  • debugContext – Optional debug information.

popops/HostSliceTensor.hpp

Create tensor layouts that are optimised for host transfers.

namespace poplar

Poplar classes and functions.

namespace popops

Common functions, such as elementwise and reductions.

Functions

IndicesAndTensor createHostSliceableTensor(poplar::Graph &graph, const poplar::Type &type, const std::vector<size_t> &shape, const bool isRead, const poplar::DebugContext &debugContext = {})

Create a Tensor that is well laid out for a host exchange copy and at the same time create the index tensor for the copy.

The shape must be size 2, dim(1) must be the size of the datastream or remote buffer, if using a copy from a remote buffer with multiple slice indices dim(0) must be num slice indices, other wise dim(0) is 1.

Parameters
  • graph – The Poplar graph to add the tensor to.

  • type – The element type of the tensor created.

  • shape – The hape of created tensor.

  • isRead – If true, the tensor will be read by the host. If false, the tensor data will be written to the host. If isRead is true, tile imbalance is likely to be greater.

Returns

Two tensors: the indices, which will have size shape[0] and the tensor that will be written to.

poplar::Tensor createHostTransferableTensor(poplar::Graph &graph, const poplar::Type &type, const std::vector<size_t> &shape, bool isRead, const poplar::DebugContext &debugContext = {})

Create a tensor that is well laid out for a host exchange copy.

Parameters
  • graph – The graph to add the tensor to.

  • type – The element type of the tensor created.

  • shape – The shape of the tensor created.

  • isRead – If true, the tensor will be read by the host. If false, the tensor data will be written to the host. Setting isRead to true is likely to make the read operation faster without affecting the write, but is also likely to cause greater tile imbalance.

Returns

The tensor created.

struct IndicesAndTensor
#include <HostSliceTensor.hpp>

The pair of values returned by createHostSliceableTensor().

Public Members

poplar::Tensor indices
poplar::Tensor tensor

popops/Loop.hpp

The loop functions provide a more flexible interface to the Poplar Repeat* classes. They allow you to specify the start, end and increment values of the count, and also make the count variable available to the program in the body of the loop.

Functions to provide counted loops of programs.

namespace popops

Common functions, such as elementwise and reductions.

Typedefs

using CountedLoopBodyType = std::function<poplar::program::Program(const poplar::Tensor&)>

Functions

poplar::program::Sequence countedLoop(poplar::Graph &graph, std::size_t begin, std::size_t end, size_t step, const CountedLoopBodyType &body, const poplar::DebugContext &debugContext = {})

Create a loop program with constant initial count, increment and end value.

The loop count is passed to the body program.

The program is equivalent to:

for(unsigned i = begin; i != end; i += step){
  body;
}

Parameters
  • graph – The graph the loop program will be added to.

  • begin – Initial counter value.

  • end – Counter end value (exclusive).

  • step – The increment added on each loop pass (must be greater than zero).

  • body – The loop body program to run on each loop pass.

  • debugContext – Optional debug information.

Returns

A program providing the above loop function.

poplar::program::Sequence countedLoop(poplar::Graph &graph, std::size_t count, const CountedLoopBodyType &body, const poplar::DebugContext &debugContext = {})

Create a loop program which executes count times.

The loop count is passed to the body program.

The program is equivalent to:

for(unsigned i = 0; i != count; i += 1){
  body;
}
This is equivalent to poplar::Program::Repeat but with a loop counter that is passed to the body program. (It is actually implemented using poplar::Program::RepeatWhileTrue with a test for the count variable reaching count.)

Parameters
  • graph – The graph the loop program will be added to.

  • count – Number of loop iterations to execute.

  • body – The loop body program to run on each loop pass.

  • debugContext – Optional debug information.

Returns

A program providing the above loop function.

poplar::Tensor addForLoopCounterVertex(poplar::Graph &graph, const poplar::Tensor &count, const poplar::Tensor &countLimit, int countStep, unsigned tile, poplar::program::Sequence &prog, const poplar::DebugContext &di)
poplar::program::Sequence countedForLoop(poplar::Graph &graph, const poplar::Tensor &count, int initialCount, const poplar::Tensor &countLimit, int countStep, const poplar::program::Program &body, const poplar::DebugContext &debugContext = {})

Create a for-loop program with constant initial count and increment, and a tensor as the end value.

The use of a tensor as the loop end value means that the number of iterations can be calculated at run time. The loop count variable count is provided by the program that calls the loop program so it can be passed to the body program.

The program is equivalent to:

for(unsigned count = initialCount; count != countLimit; count += countStep){
  body;
}

Parameters
  • graph – The graph the loop program will be added to.

  • count – The loop count tensor, available to the body program with element type INT or UNSIGNED_INT. Value initialised by this function.

  • initialCount – Initial counter value.

  • countLimit – Count limit tensor.

  • countStep – The increment added to the count tensor on each loop pass.

  • body – The loop body program to run on each loop pass.

  • debugContext – Optional debug information.

Returns

A program providing the above loop function.

poplar::program::Sequence countedForLoop(poplar::Graph &graph, int initialCount, const poplar::Tensor &countLimit, int countStep, const poplar::program::Program &body, const poplar::DebugContext &debugContext = {})

Create a for loop program with constant initial count and increment and a tensor as the end value.

The use of a tensor as the loop end value means that the number of iterations can be calculated at run time. The count tensor is created internally and is not available to the body program.

The program is equivalent to:

for(unsigned count = initialCount; count != countLimit; count += countStep){
  body;
}

Parameters
  • graph – The graph the loop program will be added to.

  • initialCount – Initial counter value.

  • countLimit – Count limit tensor.

  • countStep – The increment added to the count tensor on each loop pass.

  • body – The loop body program to run on each loop pass.

  • debugContext – Optional debug information.

Returns

A program providing the above loop function

popops/NaN.hpp

Test for NaN values in a tensor.

namespace popops

Common functions, such as elementwise and reductions.

Functions

poplar::Tensor hasNaN(poplar::Graph &graph, const poplar::Tensor &src, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Test for NaN values in a tensor.

Takes a tensor of any shape and type float or half and returns a new scalar bool tensor whose only element is true if any of the elements of the src tensor contained a NaN.

Parameters
  • graph – The graph to add the tensor and any vertices to.

  • src – The input tensor, the type must be floating point.

  • prog – Sequence to add programs to, which perform the check.

  • debugContext – Optional debug information.

poplar::Tensor hasNaNOrInf(poplar::Graph &graph, const poplar::Tensor &src, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Test for NaN or Inf values in a tensor.

Takes a tensor of any shape and type float or half and returns a new scalar bool tensor whose only element is true if any of the elements of the src tensor contained a NaN or an Inf.

Parameters
  • graph – The graph to add the tensor and any vertices to.

  • src – The input tensor, the type must be floating point.

  • prog – Sequence to add programs to, which perform the check.

  • debugContext – Optional debug information.

popops/NormaliseImage.hpp

Functions for padding and normalising image tensors.

namespace popops

Common functions, such as elementwise and reductions.

Functions

poplar::Tensor createNormaliseImageInput(poplar::Graph &graph, const poplar::Type &type, const poplar::ArrayRef<std::size_t> shape, const poplar::DebugContext &debugContext = {})

Add a tensor for a 3-channel image suitable for padding to 4 channels.

Parameters
  • graph – The graph to which the tensor will be added.

  • type – The type of the elements. Must be UNSIGNED_CHAR, HALF or FLOAT.

  • shape – Required tensor shape. Must have an inner dimension of three.

  • debugContext – Debugging context.

poplar::Tensor normaliseImage(poplar::Graph &graph, poplar::program::Sequence &seq, poplar::Tensor tIn, float inScale, poplar::Tensor offsets, poplar::Tensor scales, const poplar::DebugContext &debugContext = {})

Pad a tensor to have 4 channel dimensions.

Each channel is normalised via:

 tIn[c] * inScale - offset[c]) * scale[c]

UINT8 inputs are cast to HALF. Otherwise the output tensor follows the input type.

Parameters
  • graph – The graph containing the tensor.

  • seq – The sequence to which the normalisation programs will be added.

  • tIn – Input image. It must have an inner dimension of 3. and be UNSIGNED_CHAR, HALF or FLOAT.

  • inScale – Input scaling.

  • offsets – Offset for each channel. Must be shape {3} and must match the output type.

  • scales – Scaling factor for each channel. Must be shape {3} and must match the output type.

  • debugContext – Debugging context.

popops/Operation.hpp

Define types of operations used in a reduce.

namespace popops

Common functions, such as elementwise and reductions.

Functions

std::istream &operator>>(std::istream &is, Operation &op)

Parse token from input stream to .

Valid input values are the stringified enumerations, for example “ADD” or “MUL”.

Returns

The original input stream.

std::ostream &operator<<(std::ostream &os, const Operation &op)

Write to output stream .

The value written is the stringified enumeration, for example “ADD” or “MUL”.

Returns

The original output stream.

popops/Pad.hpp

Functions for padding a tensor.

namespace popops

Common functions, such as elementwise and reductions.

Unnamed Group

poplar::Tensor pad(poplar::Graph &graph, const poplar::Tensor &t, const std::vector<std::ptrdiff_t> &paddingLower, const std::vector<std::ptrdiff_t> &paddingUpper, float val = 0.0f, padding::MappingMethod mappingMethod = padding::MappingMethod::ZERO)

Return a tensor with constant padding added.

Parameters
  • graph – The graph containing the tensor.

  • t – The tensor to pad.

  • paddingLower – A vector specifying the amount of padding to add at the start of each dimension. Negative padding truncates.

  • paddingUpper – A vector specifying the amount of padding to add at the end of each dimension. Negative padding truncates.

  • val – The input tensor will be padded with this value.

  • mappingMethod – The method that should be used to map added padding elements.

Returns

The tensor with padding added.

poplar::Tensor pad(poplar::Graph &graph, const poplar::Tensor &t, const std::vector<std::ptrdiff_t> &paddingLower, const std::vector<std::ptrdiff_t> &paddingUpper, int val, padding::MappingMethod mappingMethod = padding::MappingMethod::ZERO)
poplar::Tensor pad(poplar::Graph &graph, const poplar::Tensor &t, const std::vector<std::ptrdiff_t> &paddingLower, const std::vector<std::ptrdiff_t> &paddingUpper, const poplar::Tensor &val, padding::MappingMethod mappingMethod = padding::MappingMethod::ZERO)

Unnamed Group

poplar::Tensor pad(poplar::Graph &graph, const poplar::Tensor &t, std::ptrdiff_t paddingLower, std::ptrdiff_t paddingUpper, unsigned dim, float val = 0.0f, padding::MappingMethod mappingMethod = padding::MappingMethod::ZERO)

Return a tensor with constant padding added to one dimension.

Parameters
  • t – The tensor to pad.

  • paddingLower – The amount of padding to add at the start of the dimension. Negative padding truncates.

  • paddingUpper – The amount of padding to add at the end of the dimension. Negative padding truncates.

  • dim – The dimension to pad.

  • val – The input tensor will be padded with this value.

  • mappingMethod – The method that should be used to map added padding elements.

Returns

The tensor with padding added.

poplar::Tensor pad(poplar::Graph &graph, const poplar::Tensor &t, std::ptrdiff_t paddingLower, std::ptrdiff_t paddingUpper, unsigned dim, int val, padding::MappingMethod mappingMethod = padding::MappingMethod::ZERO)
poplar::Tensor pad(poplar::Graph &graph, const poplar::Tensor &t, std::ptrdiff_t paddingLower, std::ptrdiff_t paddingUpper, unsigned dim, const poplar::Tensor &val, padding::MappingMethod mappingMethod = padding::MappingMethod::ZERO)

Functions

poplar::Tensor pad(const poplar::Tensor &t, const std::vector<std::ptrdiff_t> &paddingLower, const std::vector<std::ptrdiff_t> &paddingUpper, padding::Type type)

Return a tensor with numpy-style padding added.

Parameters
  • t – The tensor to pad.

  • paddingLower – A vector specifying the amount of padding to add at the start of each dimension. Negative padding truncates.

  • paddingUpper – A vector specifying the amount of padding to add at the end of each dimension. Negative padding truncates.

  • type – The type of padding.

Returns

The tensor with padding added.

poplar::Tensor pad(const poplar::Tensor &t, std::ptrdiff_t paddingLower, std::ptrdiff_t paddingUpper, unsigned dim, padding::Type type)

Return a tensor with numpy-style padding added to one dimension.

Parameters
  • t – The tensor to pad.

  • paddingLower – The amount of padding to add at the start of the dimension. Negative padding truncates.

  • paddingUpper – The amount of padding to add at the end of the dimension. Negative padding truncates.

  • dim – The dimension to pad.

Returns

The tensor with padding added.

namespace padding

Enums

enum Type

Padding types as per numpy.pad.

Values:

enumerator EDGE

Also known as nearest-neighbour padding, each new pad element has its value set to that of the pre-padded element nearest to it.

Any such nearest neighbour lies on the edge of the pre-padded tensor, hence the name.

enumerator REFLECT

The tensor is reflected outwards.

Specifically, a new pad element has its value set to that of the element which is an equal distance to the pad element’s nearest neighbour as the pad element, but in the opposite direction.

enum MappingMethod

Methods to map added padding elements to tiles.

Values:

enumerator NONE

Padding won’t be mapped.

enumerator ZERO

Set tile mapping of padding element to tile 0 for the graph.

enumerator EDGE

Set tile mapping of padding elements to match the nearest-neighbour element which lies on the edge of the tensor prior to padding.

Requires a non-empty tensor to be padded with a complete tile mapping.

popops/Rearrange.hpp

Operations to rearrange tensors on tiles.

namespace popops

Common functions, such as elementwise and reductions.

namespace rearrange

Functions

bool canUseFastTranspose(const poplar::Target &target, const poplar::Type &type, unsigned numRows, unsigned numColumns, unsigned numTranspositions)

Determine if a fast transposition codelet may be used based on the given target/data type/no.

of rows/no. of columns.

Parameters
  • target – The target the operation will be targeted at.

  • type – The data type of the tensor to transpose.

  • numRows – The no. of rows in each transposition to perform.

  • numColumns – The no. of columns in each transposition to perform.

Returns

A boolean indicating whether or not the fast transposition codelets can be targeted based on the given parameters.

void addTransposeVertices(poplar::Graph &graph, const poplar::ComputeSet &cs, const poplar::Type &dType, unsigned rows, unsigned cols, const poplar::Graph::TileToTensorMapping &mapping, std::function<std::pair<const poplar::Tensor, const poplar::Tensor>(size_t)> getInOut)

Transposes of a set of matrices stored on multiple tiles.

This adds all the needed vertices on the graph.

Parameters
  • graph, cs – The graph and compute set to add the vertices to.

  • dType, rows, cols – The type and dimensions of the matrices to be transposed, the same for all of them.

  • mapping – A vector with ‘number of tiles’ elements, where each element is a vector of intervals indicating which matrices to be transposed are mapped (possibly partially) on each tile.

  • getInOut – A function: pair<Tensor, Tensor> getInOut(size_t index), which, given as input an index inside the intervals specified in ‘mapping’, returns a std::pair of Tensors (in, out) which are the input and output matrix for the ‘index’ transposition. The ‘in’ and ‘out’ return values are 2D matrices, but they must be flattened to a single dimension.

poplar::Tensor partialTranspose(poplar::Graph &graph, const poplar::Tensor &in, const poplar::ComputeSet &cs, const poplar::DebugContext &debugContext = {})

Transpose the innermost pair of dimensions of the specified tensor, writing the results to a new tensor.

This function assumes order of the underlying storage matches the order of the elements in the tensor. This function is optimized for group sizes that are typical of the underlying memory layout of convolution activations / weights - it may be inefficient for other group sizes.

unsigned getMinimumRegroupGrainSize(const poplar::Type &type)

Get the smallest grouping we can transpose between for the given type using fast transposition codelets.

Parameters

type – The data type to be transposed.

Returns

The smallest size of grouping that can be efficiently transposed for the given type.

poplar::Tensor regroupTensor(poplar::Graph &graph, const poplar::Tensor &t, poplar::program::Sequence &copies, const poplar::ComputeSet &transposeCS, const poputil::GroupingInfo &from, const poputil::GroupingInfo &to, const poplar::DebugContext &debugContext = {})

Insert copies or other operations into the given programs/compute sets to transform the grouping found on the given tensor from from to to.

This is a no-op for a one-dimensional tensor.

Parameters
  • graph – The graph to add the operation to.

  • t – The tensor to regroup.

  • copies – A poplar sequence to add pre-arranging copies to.

  • transposeCS – A compute set that may or may not have vertices added to it to perform the regrouping operation.

  • from – A grouping that is applied to the given tensor t to rearrange from.

  • to – A grouping wanted on the returned tensor.

  • debugContext – Optional debug information.

Returns

A tensor with the contents of t but laid out such that it has the grouping specified in to.

poplar::Tensor regroupTensor(poplar::Graph &graph, const poplar::Tensor &t, std::vector<poplar::program::Copy> &copies, const poplar::ComputeSet &transposeCS, const poputil::GroupingInfo &from, const poputil::GroupingInfo &to, const poplar::DebugContext &debugContext = {})

Insert copies or other operations into the given programs/compute sets to transform the grouping found on the given tensor from from to to.

This is a no-op for a one-dimensional tensor.

Overload that takes a vector of Copy programs instead of a Sequence.

Parameters
  • graph – The graph to add the operation to.

  • t – The tensor to regroup.

  • copies – A vector to add pre-arranging copies to.

  • transposeCS – A compute set that may or may not have vertices added to it to perform the regrouping operation.

  • from – A grouping that is applied to the given tensor t to rearrange from.

  • to – A grouping wanted on the returned tensor.

  • debugContext – Optional debug information.

Returns

A tensor with the contents of t but laid out such that it has the grouping specified in to.

poplar::Tensor regroupIfBeneficial(poplar::Graph &graph, const poplar::Tensor &in, const poplar::Tensor &ref, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

If possible and runtime efficient, add an operation to rearrange the given tensor in memory such that the grouping of the resulting tensor matches that of the reference tensor, or a factor of that grouping if it balances memory usage across the target better.

Parameters
  • graph – The graph to add the operation to.

  • in – The tensor to maybe regroup.

  • ref – A reference tensor which will be introspected to find a grouping to apply to the returned tensor.

  • prog – A poplar sequence to add the regrouping operation to.

  • debugContext – Optional debug information.

Returns

A tensor with the contents of the given tensor in rearranged in memory to have a grouping matching ref.

poplar::Tensor regroupIfBeneficial(poplar::Graph &graph, const poplar::Tensor &in, const poplar::Tensor &ref, std::vector<poplar::program::Copy> &copies, poplar::ComputeSet transposeCS, const poplar::DebugContext &debugContext = {})

If possible and runtime efficient, add an operation to rearrange the given tensor in memory such that the grouping of the resulting tensor matches that of the reference tensor, or a factor of that grouping if it balances memory usage across the target better.

Overload that takes a vector of Copy programs instead of a Sequence.

Parameters
  • graph – The graph to add the operation to.

  • in – The tensor to maybe regroup.

  • ref – A reference tensor which will be introspected to find a grouping to apply to the returned tensor.

  • copies – A vector to add pre-arranging copies to.

  • debugContext – Optional debug information.

Returns

A tensor with the contents of the given tensor in rearranged in memory to have a grouping matching ref.

poplar::Tensor regroupIfBeneficial(poplar::Graph &graph, const poplar::Tensor &in, std::size_t preferredGrouping, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

If possible and runtime efficient, add an operation to rearrange the given tensor in memory such that the resulting tensor has a grouping in the innermost dimension equivalent to, or a factor of the given preferred grouping if it balances memory usage across the target better.

Parameters
  • graph – The graph to add the operation to.

  • in – The tensor to maybe regroup.

  • preferredGrouping – A size of grouping of the innermost dimension of the given tensor to regroup to.

  • prog – A poplar sequence to add the regrouping operation to.

  • debugContext – Optional debug information.

Returns

A tensor with the contents of the given tensor in rearranged in memory to have a grouping matching ref.

popops/Reduce.hpp

Define types of operations used in a reduce.

namespace popops

Common functions, such as elementwise and reductions.

Unnamed Group

poplar::Tensor reduce(poplar::Graph &graph, const poplar::Tensor &in, const poplar::Type &outType, const std::vector<std::size_t> &dims, ReduceParams params, std::vector<poplar::ComputeSet> &css, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Apply a reduction operation to a tensor.

Deprecated:

The reduce overloads that expect a vector of compute sets are deprecated. Please use the reduceMany() function instead.

These are alternate forms that add their vertices to a vector of compute sets instead of a poplar::program::Sequence. The caller is expected to add each compute set to a poplar::program::Sequence (in a poplar::program::Execute) themselves, like this:

Sequence seq;
std::vector<ComputeSet> css;
auto A = reduce(..., css);
auto B = reduce(..., css);
for (const auto &cs : css) {
  seq.add(Execute(cs));

This allows you to do multiple reductions in parallel. Note that the reductions are not aware of each other, so it may be more efficient to concatenate tensors and do a single reduction instead if they have the same shape, operation, and input and output types.

scale and update are only valid with the ADD , SQUARE_ADD or LOG_ADD operations. LOG_ADD performs all arithmetic consistent with the input and output being log probabilities. In other words, the update is another log add operation and the scale is a log multiply operation.

Internally, this creates a new variable for the output then calls reduceWithOutput(). The type of the output will be outType.

The options parameter accepts the following:

  • accumType.interTile (float, half)

    The type to use for intermediate values between tiles.

  • accumType.inVertex (float, half)

    The type to use for intermediate values within a vertex.

If either of the above options are not set then the intermediate type will default to either the input tensor element type or float if the input is of type half and the reduction operation benefits from higher precision (for example, add).

The input and output types that are supported depend on the operation:

  • ADD, SQUARE_ADD, MUL: float->float, half->half, int->int, float->half, half->float

  • LOG_ADD : float->float, half->half, float->half, half->float

  • MAX, MIN: float->float, half->half, int->int

  • LOGICAL_AND, LOGICAL_OR: bool->bool

Parameters
  • graph – The graph to add the operation to.

  • in – The tensor to be reduced.

  • outType – The output type of the reduce operation.

  • dims – The dimensions to reduce in.

  • prog – The program sequence to add the operation to.

  • debugContext – Optional debug information.

poplar::Tensor reduce(poplar::Graph &graph, const poplar::Tensor &in, const std::vector<std::size_t> &dims, ReduceParams params, std::vector<poplar::ComputeSet> &css, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
void reduceWithOutput(poplar::Graph &graph, const poplar::Tensor &in, const poplar::Tensor &out, const std::vector<std::size_t> &dims, ReduceParams params, std::vector<poplar::ComputeSet> &css, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Functions

poplar::Tensor reduce(poplar::Graph &graph, const poplar::Tensor &in, const poplar::Type &outType, const std::vector<std::size_t> &dims, ReduceParams params, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Apply a reduction operation to a tensor.

scale and update are only valid with the ADD , SQUARE_ADD or LOG_ADD operations. LOG_ADD performs all arithmetic consistent with the input and output being log probabilities. In other words, the update is another log add operation and the scale is a log multiply operation.

Internally, this creates a new variable for the output then calls reduceWithOutput(). The type of the output will be outType.

The options parameter accepts the following:

  • accumType.interTile (float, half)

    The type to use for intermediate values between tiles.

  • accumType.inVertex (float, half)

    The type to use for intermediate values within a vertex.

If either of the above options are not set then the intermediate type will default to either the input tensor element type or float if the input is of type half and the reduction operation benefits from higher precision (for example, add).

The input and output types that are supported depend on the operation:

  • ADD, SQUARE_ADD, MUL: float->float, half->half, int->int, float->half, half->float

  • LOG_ADD : float->float, half->half, float->half, half->float

  • MAX, MIN: float->float, half->half, int->int

  • LOGICAL_AND, LOGICAL_OR: bool->bool

Parameters
  • graph – The graph to add the operation to.

  • in – The tensor to be reduced.

  • outType – The output type of the reduce operation.

  • dims – The dimensions to reduce in.

  • prog – The program sequence to add the operation to.

  • debugContext – Optional debug information.

poplar::Tensor reduce(poplar::Graph &graph, const poplar::Tensor &in, const std::vector<std::size_t> &dims, ReduceParams params, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Apply a reduction operation to a tensor.

An alias for reduce(graph, in, in.elementType(), …)

scale and update are only valid with the ADD , SQUARE_ADD or LOG_ADD operations. LOG_ADD performs all arithmetic consistent with the input and output being log probabilities. In other words, the update is another log add operation and the scale is a log multiply operation.

Internally, this creates a new variable for the output then calls reduceWithOutput(). The type of the output will be outType.

The options parameter accepts the following:

  • accumType.interTile (float, half)

    The type to use for intermediate values between tiles.

  • accumType.inVertex (float, half)

    The type to use for intermediate values within a vertex.

If either of the above options are not set then the intermediate type will default to either the input tensor element type or float if the input is of type half and the reduction operation benefits from higher precision (for example, add).

The input and output types that are supported depend on the operation:

  • ADD, SQUARE_ADD, MUL: float->float, half->half, int->int, float->half, half->float

  • LOG_ADD : float->float, half->half, float->half, half->float

  • MAX, MIN: float->float, half->half, int->int

  • LOGICAL_AND, LOGICAL_OR: bool->bool

Parameters
  • graph – The graph to add the operation to.

  • in – The tensor to be reduced.

  • outType – The output type of the reduce operation.

  • dims – The dimensions to reduce in.

  • prog – The program sequence to add the operation to.

  • debugContext – Optional debug information.

void reduceWithOutput(poplar::Graph &graph, const poplar::Tensor &in, const poplar::Tensor &out, const std::vector<std::size_t> &dims, ReduceParams params, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Apply a reduction operation to a tensor.

This is similar to reduce() but allows you to specify the output. If the tile mapping of out is not complete it will be set. Otherwise it won’t be changed.

scale and update are only valid with the ADD , SQUARE_ADD or LOG_ADD operations. LOG_ADD performs all arithmetic consistent with the input and output being log probabilities. In other words, the update is another log add operation and the scale is a log multiply operation.

Internally, this creates a new variable for the output then calls reduceWithOutput(). The type of the output will be outType.

The options parameter accepts the following:

  • accumType.interTile (float, half)

    The type to use for intermediate values between tiles.

  • accumType.inVertex (float, half)

    The type to use for intermediate values within a vertex.

If either of the above options are not set then the intermediate type will default to either the input tensor element type or float if the input is of type half and the reduction operation benefits from higher precision (for example, add).

The input and output types that are supported depend on the operation:

  • ADD, SQUARE_ADD, MUL: float->float, half->half, int->int, float->half, half->float

  • LOG_ADD : float->float, half->half, float->half, half->float

  • MAX, MIN: float->float, half->half, int->int

  • LOGICAL_AND, LOGICAL_OR: bool->bool

Parameters
  • graph – The graph to add the operation to.

  • in – The tensor to be reduced.

  • outType – The output type of the reduce operation.

  • dims – The dimensions to reduce in.

  • prog – The program sequence to add the operation to.

  • debugContext – Optional debug information.

void reduceMany(poplar::Graph &graph, const std::vector<SingleReduceOp> &reductions, std::vector<poplar::Tensor> &outputs, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Perform many reductions (in parallel if possible).

Please see the documentation for reduce() for details of the common inputs.

Parameters
  • reductions – The inputs to each reduction to perform. The outType attribute controls the element type of the output tensor if outputs is empty, otherwise it is ignored. If outputs is empty and useOutType is false then the output element type will be set to the same element type as the corresponding in tensor.

  • outputs – The tensors to store the output of the reductions. This may be empty in which case reduceMany will create the tensors. If the tile mapping is not set or not complete it will be set completely by this function.

Throws
  • poputils::poplibs_error – If outputs is not empty then its size must exactly match the size of reductions else an exception will be thrown.

  • poputils::poplibs_error – If outputs is empty and any reduction has params.update set to true then an exception will be thrown. outputs is required to perform an update reduction.

struct ReduceParams
#include <Reduce.hpp>

Stores parameters for the reduce operation, as well as the basic operation being performed (for example, add or mul).

Public Functions

ReduceParams() = default
inline ReduceParams(popops::Operation op, bool update = false)
inline ReduceParams(popops::Operation op, bool update, poplar::Tensor scale)

Define the details of the reduce operation that will be performed by the reduce() and reduceWithOutput() functions.

Parameters
  • op – The reduce operation to use.

  • scale – Can (optionally) scale the output.

  • update – Specify that the output should be updated, where out += reduce(in) rather than out = reduce(in).

ReduceParams(popops::Operation op, float constantScale, bool update = false) = delete

Public Members

popops::Operation op
bool update
bool useScale
poplar::Tensor scale
struct SingleReduceOp
#include <Reduce.hpp>

The parameterisation of the inputs to a single reduction for the reduceMany() function.

Please see the documentation for reduce() for a description of the struct members.

Public Functions

inline SingleReduceOp(poplar::Tensor in, std::vector<std::size_t> dims, ReduceParams params, poplar::Type outType, std::string debugName = "")
inline SingleReduceOp(poplar::Tensor in, std::vector<std::size_t> dims, ReduceParams params, std::string debugName = "")

Public Members

poplar::Tensor in
std::vector<std::size_t> dims
ReduceParams params
bool useOutType

Note that if useOutType is false then the element type of in is used.

Also note that OutType is ignored if the outputs vector is not empty when calling reduceMany().

poplar::Type outType
std::string debugName

popops/ScaledAdd.hpp

Functions for scaling and adding tensors.

namespace popops

Common functions, such as elementwise and reductions.

Enums

enum ScaledAddSpecialisation

Values:

enumerator DEFAULT
enumerator X_MINUS_AX_PLUS_BY

Functions

void scaledAddTo(poplar::Graph &graph, poplar::Tensor A, poplar::Tensor B, float scaleB, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Add the elements of one tensor multiplied by a scalar to another tensor.

Performs the calculations A += scaleB * B

The operation is performed after casting B to the type of A.

Scaled add options

  • optimizeForSpeed (true, false) [=false]

    The scaledAdd vertices default to being optimized to aid memory allocation. To optimise them for speed instead, set this option to true.

  • scaleFloatToHalfTolerance (double) [=1e-6]

    Where the tensors A, B are of type half and a scaleB is provided as a float or a tensor of type float, it is possible to to implement the scaledAddTo in half precision if scaleB

    can be cast to half precision with acceptable accuracy. Otherwise full precision arithmetic can be used internally, but at the cost of speed. Floating point arithmetic will be selected if the relative error in casting is greater than the relative tolerance.

    Only applies to

    scaledAddTo() with scaleB.

Parameters
  • graph – The Poplar graph.

  • A – The destination tensor.

  • B – The second tensor to add elements from (must be of the same shape as A).

  • scaleB – The scalar to multiply elements of B with before addition.

  • prog – A sequence program to which the code performing the add will be appended.

  • debugContext – Optional debug information.

  • options – A list of flags to control optimizations.

void scaledAddTo(poplar::Graph &graph, poplar::Tensor A, poplar::Tensor B, poplar::Tensor scaleB, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Add the elements of one tensor each multiplied by a (scalar) tensor to another tensor.

Performs the calculations A += scaleB * B

The operation is performed after casting scaleB and B to the type of A.

Parameters
  • graph – The Poplar graph.

  • A – The destination tensor.

  • B – The second tensor to add elements from (must be of the same shape as A).

  • scaleB – The scalar tensor to multiply elements of B with before addition.

  • prog – A sequence program to which the code performing the add will be appended.

  • debugContext – Optional debug information.

  • options – A list of flags to control optimizations. See scaledAddTo().

void scaledSubtractFrom(poplar::Graph &graph, poplar::Tensor A, poplar::Tensor B, float scaleB, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Subtract the elements of one tensor multiplied by a scalar from another tensor.

Performs the calculations A -= scaleB * B

The operation is performed after casting B to type A.

Parameters
  • graph – The Poplar graph.

  • A – The destination tensor.

  • B – The second tensor providing the elements to subtract (must be of the same shape as A).

  • scaleB – The scalar to multiply elements of B with before subtraction.

  • prog – A sequence program to which the code performing the add will be appended.

  • debugContext – Optional debug information.

  • options – A list of flags to control optimizations. See scaledAddTo().

void scaledSubtractFrom(poplar::Graph &graph, poplar::Tensor A, poplar::Tensor B, poplar::Tensor scaleB, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Subtract the elements of one tensor each multiplied by a (scalar) tensor from another tensor.

Performs the calculations A -= scaleB * B

The operation is performed after casting scaleB, and B to the type of A.

Parameters
  • graph – The Poplar graph.

  • A – The destination tensor.

  • B – The second tensor providing the elements to subtract (must be of the same shape as A).

  • scaleB – The scalar tensor to multiply elements of B with before subtraction.

  • prog – A sequence program to which the code performing the add will be appended.

  • debugContext – Optional debug information.

  • options – A list of flags to control optimizations. See scaledAddTo().

void scaledAddTo(poplar::Graph &graph, poplar::Tensor A, poplar::Tensor scaleA, poplar::Tensor B, poplar::Tensor scaleB, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Scale the elements of one tensor and add the scaled elements of another tensor to it.

The two scaling factors are (scalar) tensors.

Performs the calculations A = scaleA * A + scaleB * B

The operation is performed after casting scaleA, scaleB and B to the type of A.

Parameters
  • graph – The Poplar graph.

  • A – The destination tensor.

  • scaleA – The scalar tensor to multiply elements of A with before addition.

  • B – The second tensor to add elements from (must be of the same shape as A).

  • scaleB – The scalar tensor to multiply elements of B with before addition.

  • prog – A sequence program to which the code performing the add will be appended.

  • debugContext – Optional debug information.

  • options – A list of flags to control optimizations. See scaledAddTo().

void scaledAddTo(poplar::Graph &graph, poplar::Tensor A, poplar::Tensor scaleA, poplar::Tensor B, poplar::Tensor scaleB, poplar::program::Sequence &prog, const ScaledAddSpecialisation speciality, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Scale the elements of one tensor and add the scaled elements of another tensor to it.

The two scaling factors are (scalar) tensors.

Performs the calculations A = scaleA' * A + scaleB * B where scaleA’ is a function of scaleA specified by the “speciality” option.

The operation is performed after casting scaleA, scaleB and B to the type of A.

Parameters
  • graph – The Poplar graph.

  • A – The destination tensor.

  • scaleA – The scalar tensor to multiply elements of A with before addition.

  • B – The second tensor to add elements from (must be of the same shape as A).

  • scaleB – The scalar tensor to multiply elements of B with before addition.

  • prog – A sequence program to which the code performing the add will be appended.

  • speciality – Choice of ScaledAdd expression formulation

  • debugContext – Optional debug information.

  • options – A list of flags to control optimizations. See scaledAddTo().

void scaledAddTo(poplar::Graph &graph, poplar::Tensor A, float scaleA, poplar::Tensor B, float scaleB, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Scale the elements of one tensor and add the scaled elements of another tensor to it.

The two scaling factors are constants.

Performs the calculations A = scaleA * A + scaleB * B

If A and B are of different types, B is first cast to the type of A and the operation performed.

Parameters
  • graph – The Poplar graph.

  • A – The destination tensor.

  • scaleA – The constant to multiply elements of A with before addition.

  • B – The second tensor to add elements from (must be of the same shape as A).

  • scaleB – The constant to multiply elements of B with before addition.

  • prog – A sequence program to which the code performing the add will be appended.

  • debugContext – Optional debug information.

  • options – A list of flags to control optimizations. See scaledAddTo().

void scaledAddTo(poplar::Graph &graph, poplar::Tensor A, float scaleA, poplar::Tensor B, float scaleB, poplar::program::Sequence &prog, const ScaledAddSpecialisation speciality, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Scale the elements of one tensor and add the scaled elements of another tensor to it.

The two scaling factors are constants.

Performs the calculations A = scaleA' * A + scaleB * B where scaleA’ is a function of scaleA specified by the “speciality” option.

If A and B are of different types, B is first cast to the type of A and the operation performed.

Parameters
  • graph – The Poplar graph.

  • A – The destination tensor.

  • scaleA – The constant to multiply elements of A with before addition.

  • B – The second tensor to add elements from (must be of the same shape as A).

  • scaleB – The constant to multiply elements of B with before addition.

  • prog – A sequence program to which the code performing the add will be appended.

  • speciality – Choice of ScaledAdd expression formulation

  • debugContext – Optional debug information.

  • options – A list of flags to control optimizations. See scaledAddTo().

void scaledSubtractFrom(poplar::Graph &graph, poplar::Tensor A, poplar::Tensor scaleA, poplar::Tensor B, poplar::Tensor scaleB, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Scale the elements of one tensor and subtract the scaled elements of another tensor to it.

The two scaling factors are (scalar) tensors.

Performs the calculations A = scaleA * A - scaleB * B

The operation is performed after casting scaleA, scaleB and B to the type of A.

Parameters
  • graph – The Poplar graph.

  • A – The destination tensor.

  • scaleA – The scalar tensor to multiply elements of A with before subtraction.

  • B – The second tensor to subtract elements from (must be of the same shape as A).

  • scaleB – The scalar tensor to multiply elements of B with before subtraction.

  • prog – A sequence program to which the code performing the subtract will be appended.

  • debugContext – Optional debug information.

  • options – A list of flags to control optimizations. See scaledAddTo().

void scaledSubtractFrom(poplar::Graph &graph, poplar::Tensor A, float scaleA, poplar::Tensor B, float scaleB, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Scale the elements of one tensor and subtract the scaled elements of another tensor to it.

The two scaling factors are constants.

Performs the calculations A = scaleA * A - scaleB * B

If A and B are of different types, B is first cast to the type of A and the operation performed.

Parameters
  • graph – The Poplar graph.

  • A – The destination tensor.

  • scaleA – The constant to multiply elements of A with before subtraction.

  • B – The second tensor to subtract elements from (must be of the same shape as A).

  • scaleB – The constant to multiply elements of B with before subtraction.

  • prog – A sequence program to which the code performing the subtract will be appended.

  • debugContext – Optional debug information.

  • options – A list of flags to control optimizations. See scaledAddTo().

popops/Scatter.hpp

Scatter operations.

namespace popops

Common functions, such as elementwise and reductions.

Typedefs

using UpdateComputationFunc = std::function<poplar::Tensor(poplar::Graph&, poplar::Tensor&, poplar::Tensor&, poplar::program::Sequence&)>

Functions

void scatter(poplar::Graph &graph, const poplar::Tensor &operand, const poplar::Tensor &indices, const poplar::Tensor &updates, std::size_t indexVectorDim, std::vector<unsigned> updateWindowDims, std::vector<std::size_t> insertWindowDims, std::vector<unsigned> scatterDimsToOperandDims, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

The scatter operation generates a result which is the value of the input array operand, with several slices (at indices specified by indices) updated with the values in updates.

Note

This is a near direct port of https://www.tensorflow.org/xla/operation_semantics#scatter from tensorflow/compiler/xla/service/scatter_expander.cc

Parameters
  • graph – The Poplar graph.

  • operand – Array to be scattered into.

  • indices – Array containing the starting indices of the slices that must be scattered to.

  • updates – Array containing the values that must be used for scattering.

  • indexVectorDim – The dimension in indices that contains the starting indices.

  • updateWindowDims – The set of dimensions in updates shape that are window dimensions.

  • insertWindowDims – The set of window dimensions that must be inserted into updates shape.

  • scatterDimsToOperandDims – A dimensions map from the scatter indices to the operand index space. This array is interpreted as mapping i to scatterDimsToOperandDims[i]. It has to be one-to-one and total.

  • prog – The program to be extended.

  • debugContext – Optional debug information.

void scatter(poplar::Graph &graph, const poplar::Tensor &operand, const poplar::Tensor &indices, const poplar::Tensor &updates, std::size_t indexVectorDim, std::vector<unsigned> updateWindowDims, std::vector<std::size_t> insertWindowDims, std::vector<unsigned> scatterDimsToOperandDims, UpdateComputationFunc &updateComputation, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Similar to the above scatter(), but allows for a user defined update computation.

This computation is used to combine the existing values in the input tensor and the updates during the scatter.

Note

The first tensor parameter that is passed into the updateComputation will always be the current value from the operand tensor and the second parameter will always be the value from the updates tensor. This is important specifically for cases when the updateComputation is not commutative.

Parameters
  • graph – The Poplar graph.

  • operand – Array to be scattered into.

  • indices – Array containing the starting indices of the slices that must be scattered to.

  • updates – Array containing the values that must be used for scattering.

  • indexVectorDim – The dimension in indices that contains the starting indices.

  • updateWindowDims – The set of dimensions in updates shape that are window dimensions.

  • insertWindowDims – The set of window dimensions that must be inserted into updates shape.

  • scatterDimsToOperandDims – A map of dimensions from the scatter indices to the operand index space. This array is interpreted as mapping i to scatterDimsToOperandDims[i]. It has to be one-to-one and total.

  • updateComputation – Computation to be used for combining the existing values in the input tensor and the updates during scatter.

  • prog – The program to be extended.

  • debugContext – Optional debug information.

popops/SelectScalarFromRows.hpp

Select values from rows of a tensor.

namespace popops

Common functions, such as elementwise and reductions.

Functions

poplar::Tensor selectScalarFromRows(poplar::Graph &graph, const poplar::Tensor &params, const poplar::Tensor &indices, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

For each row in the 2D tensor params, select a single scalar value.

Aggregate the resulting scalars into a 1D tensor.

The size of the indices tensor must be equal to the size of dimension 0 of params. The ith element of indices represents an index in the ith row of the params tensor.

If ith element of the indices tensor is less than 0 or greater than the width of params then a NaN is stored into the ith element of the output. If the ith element of the indices tensor is equal to MASKED_LABEL_CODE then zero is stored into the ith element of the output.

Parameters
  • graph – The Poplar graph.

  • params – A 2D tensor, the element type must be either float or half.

  • indices – A 1D tensor, the element type must be unsigned integer.

  • prog – The program to be extended.

  • debugContext – Optional debug information.

Returns

A 1D tensor containing in the ith position the scalar params[indices[i]].

popops/SequenceSlice.hpp

Support for dynamic slices.

namespace poplar

Poplar classes and functions.

namespace popops

Common functions, such as elementwise and reductions.

Functions

void sequenceSlice(poplar::Graph &graph, const poplar::Tensor &tIn, const poplar::Tensor &tOut, const poplar::Tensor &tN, const poplar::Tensor &tInOffset, const poplar::Tensor &tOutOffset, bool zeroUnused, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Slice a 2d tensor based on offsets specified by a tensor.

Typically this is used to copy subsequences of one tensor to another. The outermost dimension is sliced; tOut[tOutOffset:tOutOffset+tN][…] = tIn[tInOffset:tInOffset+tN][…] for each entry in tN/tInOffset/tOutOffset; entries after the first tN==0 may be ignored. Unreferenced elements of tOut are zeroed if zeroUnused is set. The same output element should not be written by multiple inputs.

tIn and tOut must have rank >=2. The outer dimension is sliced; the product of the inner dimensions must match. tInOffset, tOutOffset and tN must be 1d and the same size.

Parameters
  • graph – The Poplar graph.

  • tIn – The source tensor.

  • tOut – The destination tensor.

  • tN – The number of elements to copy.

  • tInOffset – First element read from tIn.

  • tOutOffset – First element written in tOut.

  • zeroUnused – Whether to zero unreferenced tOut elements.

  • prog – The program to be extended.

  • debugContext – Optional debug information.

popops/Sort.hpp

Functions for sorting tensors.

namespace popops

Common functions, such as elementwise and reductions.

Functions

poplar::Tensor sort(poplar::Graph &graph, const poplar::Tensor &t, unsigned dim, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Sort a tensor along the given dimension.

This will return a tensor that is a permutation of the input tensor v with all the elements of the 1D slices in the chosen dimension in ascending order.

This aims to match TensorFlow’s XLA sort: https://www.tensorflow.org/xla/operation_semantics#sort

Parameters
  • graph – The Poplar graph.

  • t – The source tensor.

  • dim – The dimension to sort on.

  • prog – The program to be extended.

  • debugContext – Optional debug information.

Throws

poputil::poplibs_error – If dim is not a valid dimension of v.

Returns

A tensor which is a permutation of t such that all elements in the given dimension are in order.

void sortInPlace(poplar::Graph &graph, const poplar::Tensor &t, unsigned dim, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

In-place sort a tensor along the given dimension.

This will permute the input tensor so that all the elements of 1D slices in the chosen dimension are in ascending order.

Parameters
  • graph – The Poplar graph.

  • t – The source tensor to be sorted.

  • dim – The dimension to sort on.

  • prog – The program to be extended.

  • debugContext – Optional debug information.

Throws

poputil::poplibs_error – If dim is not a valid dimension of v.

poplar::Tensor sortKeyValue(poplar::Graph &graph, const poplar::Tensor &k, const poplar::Tensor &v, unsigned dim, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Sort a tensor by a key tensor along the given dimension.

This will return a tensor that is a permutation of the input tensor v with the property that all 1D slices in the chosen dimensions are in ascending order with respect to the key tensor k.

This aims to match TensorFlow’s XLA sort: https://www.tensorflow.org/xla/operation_semantics#sort

Note

If k and v alias, the result is undefined.

Parameters
  • graph – The Poplar graph.

  • k – The key tensor to sort on.

  • v – The value tensor to be sorted.

  • dim – The dimension to sort on.

  • prog – The program to be extended.

  • debugContext – Optional debug information.

Throws
  • poputil::poplibs_error – If dim is not a valid dimension of v.

  • poputil::poplibs_error – If v and k are not the same shape.

Returns

A tensor which is a permutation of v such that it is in order with respect to the tensor k in the given dimension.

void sortKeyValueInPlace(poplar::Graph &graph, const poplar::Tensor &k, const poplar::Tensor &v, unsigned dim, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

In-place sort a given tensor by a key tensor along the given dimension.

This will permute the key and value tensors so that all the elements of the 1D slices in the chosen dimension are in ascending order with respect to the key tensor.

Note

The k tensor is also sorted by this in-place operation.

Note

If the k tensor and the v tensor alias, the result is undefined.

Parameters
  • graph – The Poplar graph.

  • k – The key tensor to sort on.

  • v – The value tensor to be sorted.

  • dim – The dimension to sort on.

  • prog – The program to be extended.

  • debugContext – Optional debug information.

Throws
  • poputil::poplibs_error – If dim is not a valid dimension of v.

  • poputil::poplibs_error – If v and k are not the same shape.

popops/SortOrder.hpp

namespace popops

Common functions, such as elementwise and reductions.

Enums

enum SortOrder

Defines a required order for sorting operations.

Values:

enumerator NONE

No ordering is required.

enumerator ASCENDING

Sort in ascending order.

enumerator DESCENDING

Sort in descending order.

Functions

std::ostream &operator<<(std::ostream &os, const SortOrder &o)

popops/TensorCollectives.hpp

Support for collectives.

namespace popops

Common functions, such as elementwise and reductions.

popops/TopK.hpp

namespace popops

Common functions, such as elementwise and reductions.

Functions

std::ostream &operator<<(std::ostream &os, const TopKParams &p)
poplar::Tensor createTopKInput(poplar::Graph &graph, const poplar::Type &type, const std::vector<std::size_t> &shape, const TopKParams &params, const poplar::DebugContext &debugContext = {})

Create an return a new tensor laid out optimally to be used as an input to a topK operation with the given parameters.

Parameters
  • graph – The Poplar graph to add the tensor to.

  • type – The Poplar type of elements in the returned tensor.

  • shape – The shape of the returned tensor.

  • params – The parameters of the top k that the returned tensor will be used as input to.

  • debugContext – Optional debug information.

Returns

A newly created tensor with shape shape and full tile mapping.

poplar::Tensor topK(poplar::Graph &graph, poplar::program::Sequence &prog, const poplar::Tensor &t, const TopKParams &params, const poplar::DebugContext &debugContext = {})

Return the top k values in the innermost dimension of a tensor.

Parameters
  • graph – The Poplar graph to add the operation to.

  • prog – The Poplar sequence to add the operation to.

  • t – The tensor in which to find the top-k values in the innermost dimension.

  • params – The parameters of the top k.

  • debugContext – Optional debug information.

Returns

A tensor with the top k values found in the innermost dimension of t.

std::pair<poplar::Tensor, poplar::Tensor> topKKeyValue(poplar::Graph &graph, poplar::program::Sequence &prog, const poplar::Tensor &keys, const poplar::Tensor &values, const TopKParams &params, const poplar::DebugContext &debugContext = {})

Return the top k values in the innermost dimension of a tensor along with the permutation of another tensor with respect to the values.

Parameters
  • graph – The Poplar graph to add the operation to.

  • prog – The Poplar sequence to add the operation to.

  • key – The tensor in which to find the top-k values in the innermost dimension.

  • value – A tensor with the same shape as key for which to get the permutation with respect to key.

  • params – The parameters of the top k.

  • debugContext – Optional debug information.

Returns

A pair of tensors. The first contains the top k values found in the innermost dimension of key. The second contains the permutation of the tensor value with respect to the tensor key.

std::pair<poplar::Tensor, poplar::Tensor> topKWithPermutation(poplar::Graph &graph, poplar::program::Sequence &prog, const poplar::Tensor &t, const TopKParams &params, const poplar::DebugContext &debugContext = {})

Return the top k values in the innermost dimension of a tensor along with the indices of those values in the input tensor in the innermost dimension.

Parameters
  • graph – The Poplar graph to add the operation to.

  • prog – The Poplar sequence to add the operation to.

  • t – The tensor in which to find the top-k values in the innermost dimension.

  • params – The parameters of the top k.

  • debugContext – Optional debug information.

Returns

A pair of tensors. The first contains the top k values found in the innermost dimension of t. The second contains the indices of those values in the innermost dimension of t in the original input.

struct TopKParams
#include <TopK.hpp>

Parameters for topK* APIs.

Public Functions

TopKParams(unsigned k, bool largest, SortOrder sortOrder) noexcept

Public Members

unsigned k

The number of outputs from the top k operation.

This must be less or equal the number of elements in the innermost dimension of the tensor used as input to the operation.

bool largest

If true, return the top k largest elements.

Otherwise return the top k smallest elements.

SortOrder sortOrder

The required ordering of elements in the resulting tensor.

popops/UpdateScalarInRows.hpp

Functions for updating values in tensors.

namespace popops

Common functions, such as elementwise and reductions.

Functions

void updateScalarInRows(poplar::Graph &graph, const poplar::Tensor &params, const poplar::Tensor &indices, poplar::program::Sequence &program, const poplar::DebugContext &debugContext = {})

Update in-place one scalar per row of the tensor params.

For each row, the index of the value to update is specified by the tensor indices. If the index from indices is equal to MASKED_LABEL_CODE then no update is carried out.

Pseudo-code:

for each row r
  if indices[r] != MASKED_LABEL_CODE
    params[r][indices[r]] = params[r][indices[r]] - 1.f

If the ith index is less than 0 or greater than the size of the row then the whole row of the param tensor is set to NaN. This is to match the interface of the backward phase of tf.nn.sparse_softmax_cross_entropy_with_logits, see https://www.tensorflow.org/api_docs/python/tf/nn/sparse_softmax_cross_entropy_with_logits

Parameters
  • graph – The Poplar graph.

  • params – The 2D tensor to be updated, the element type must be either float or half.

  • indices – 1D tensor, the element-type must be unsigned integer.

  • program – The program to be extended.

  • debugContext – Optional debug information.

popops/Zero.hpp

Set elements of tensor to zero.

namespace popops

Common functions, such as elementwise and reductions.

Functions

void zero(poplar::Graph &graph, poplar::Tensor t, const std::vector<poplar::Interval> &tileRegions, unsigned tile, poplar::ComputeSet zeroCS)

Appends vertices to zeroCS which zeroes elements in tileRegions of t which reside on tile tile.

Parameters
  • graph – The graph that the operation will be added to.

  • t – The tensor whose elements are to be set to zero.

  • tileRegions – Region mapping of the tensor on tile.

  • tile – Tile which the regions relate to.

  • zeroCS – Compute set to add the operation into.

void zero(poplar::Graph &graph, const poplar::Tensor &t, unsigned tile, poplar::ComputeSet zeroCS)

Appends vertices to zeroCS which zeroes all elements of t which reside on tile tile.

Parameters
  • graph – The graph that the operation will be added to.

  • t – The tensor whose elements are to be set to zero.

  • tile – Tile on which the tensor is mapped to.

  • zeroCS – Compute set to add the operation into.

void zero(poplar::Graph &graph, const poplar::Tensor &t, const std::vector<std::vector<poplar::Interval>> &mapping, poplar::ComputeSet zeroCS)

Appends vertices to zeroCS which zeroes elements in mapping of t which reside on tiles represented with mapping.

Parameters
  • graph – The graph that the operation will be added to.

  • t – The tensor whose elements are to be set to zero.

  • mapping – The tensor’s region mapping per tile. Each element describes a region mapping of a tile (ordered). i.e. mapping[0] -> tile 0’s region mapping for t.

  • zeroCS – Compute set to add the operation into.

void zero(poplar::Graph &graph, const poplar::Tensor &t, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Appends programs to prog which zeroes all elements of the tensor t.

Parameters
  • graph – The graph that the operation will be added to.

  • t – The tensor whose elements are to be set to zero.

  • prog – Poplar program sequence to append the operation onto.

  • debugContext – Optional debug information.

Linear algebra functions (poplin)

Linear algebra functions (matrix multiplications, convolutions).

poplin/Cholesky.hpp

Factorise a positive definite matrix using Cholesky decomposition.

namespace poplin

Linear algebra functions.

Functions

std::vector<std::pair<MatMulParams, poplar::OptionFlags>> getCholeskyMatMulPrePlanParameters(const poplar::Type &type, const std::vector<std::size_t> &shape, bool lower, poplar::OptionFlags options)

Plan matrix multiplication for the Cholesky factoriser.

Supported options:

  • blockSize

    A hint for the size of block to be used.

See matMul() for additional options.

Parameters
  • type – The data type of the input tensor.

  • shape – The shape of the input tensor.

  • lower – Lower triangular matrix if true, else upper triangular.

  • options – A structure describing options on how the decomposition should be implemented.

Returns

Preplan parameters for matMul().

poplar::Tensor createCholeskyInput(poplar::Graph &graph, const poplar::Type &type, const std::vector<std::size_t> &shape, bool lower, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options = {}, matmul::PlanningCache *cache = nullptr)

Create a tensor that is used as the input for the Cholesky factoriser.

Supported options:

  • blockSize

    A hint for the size of block to be used.

See matMul() for additional options.

This will create a 2D/3D tensor in the graph. The ordering and tile mapping of the tensor will be set to make a triangular factoriser with this tensor as the left argument efficient.

Parameters
  • graph – The Poplar graph.

  • type – The input data type.

  • shape – The shape of the tensor.

  • debugContext – Debug information.

  • options – A structure describing options on how the decomposition should be implemented.

  • cache – Optional pointer to a planning cache to use.

Returns

A matrix of type type and shape shape. The tensor will have been mapped to tiles.

poplar::Tensor cholesky(poplar::Graph &graph, const poplar::Tensor &a, bool lower, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, poplar::OptionFlags options = {}, matmul::PlanningCache *cache = nullptr)

Computes Cholesky factor for a symmetric positive definite matrix.

Supported options:

  • blockSize

    A hint for the size of block to be used.

See matMul() for additional options.

Parameters
  • graph – The Poplar graph.

  • a – Tensor of floating-point type with shape […, N,N].

  • lower – If true, return a lower triangular matrix, else return an upper triangular matrix.

  • prog – A reference to a program sequence which the code to perform the arrangement will be appended to.

  • debugContext – Optional debug information.

  • options – A structure describing options on how the decomposition should be implemented.

  • cache – Optional pointer to a planning cache to use.

Returns

A tensor with the same shape as a with a triangular factor.

void choleskyInPlace(poplar::Graph &graph, const poplar::Tensor &a, bool lower, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, poplar::OptionFlags options = {}, matmul::PlanningCache *cache = nullptr)

Computes Cholesky factor in place for a symmetric positive definite matrix.

Supported options:

  • blockSize

    A hint for the size of block to be used.

See matMul() for additional options.

Parameters
  • graph – The Poplar graph.

  • a – Tensor of floating-point type with shape […, N,N].

  • lower – If true, return a lower triangular matrix, else return an upper triangular matrix.

  • prog – A reference to a program sequence which the code to perform the arrangement will be appended to.

  • debugContext – Optional debug information.

  • options – A structure describing options on how the decomposition should be implemented.

  • cache – Optional pointer to a planning cache to use.

Returns

None

namespace matmul

poplin/ConvParams.hpp

Data types for convolution parameters.

template<>
struct std::hash<poplin::ConvParams::InputTransform>

Public Functions

std::size_t operator()(const poplin::ConvParams::InputTransform &it) const
template<>
struct std::hash<poplin::ConvParams::OutputTransform>

Public Functions

std::size_t operator()(const poplin::ConvParams::OutputTransform &ot) const
template<>
struct std::hash<poplin::ConvParams>

Public Functions

std::size_t operator()(const poplin::ConvParams &params) const
namespace poplin

Linear algebra functions.

Functions

std::ostream &operator<<(std::ostream &os, const ConvParams &p)
std::istream &operator>>(std::istream &is, ConvParams &p)
std::size_t hash_value(const ConvParams::InputTransform &it)
std::size_t hash_value(const ConvParams::OutputTransform &ot)
struct ConvParams

Public Functions

ConvParams() = default
ConvParams(poplar::Type dataType, std::size_t batchSize, std::vector<std::size_t> inputFieldShape, std::vector<std::size_t> kernelShape, std::size_t inputChannels, std::size_t outputChannels, std::size_t numConvGroups)
ConvParams(poplar::Type inputType, poplar::Type outputType, std::size_t batchSize, std::vector<std::size_t> inputFieldShape, std::vector<std::size_t> kernelShape, std::size_t inputChannels, std::size_t outputChannels, std::size_t numConvGroups)
ConvParams(poplar::Type inputType, poplar::Type outputType, std::size_t batchSize, std::vector<std::size_t> inputFieldShape, std::vector<std::size_t> kernelShape, std::size_t inputChannels, std::size_t outputChannels, std::size_t numConvGroups, InputTransform inputTransform, InputTransform kernelTransform, OutputTransform outputTransform)
std::size_t getUntransformedOutputSize(unsigned dim) const

Return the size of the output of the convolution operation, before output transformations are applied.

std::size_t getOutputSize(unsigned dim) const

Return the size of the output.

inline std::size_t getNumOutputChansPerConvGroup() const

Return the number of output channels per group.

inline std::size_t getNumOutputChans() const

Return the number of output channels.

inline std::size_t getInputSize(unsigned dim) const

Return the input size.

inline std::size_t getNumInputChansPerConvGroup() const

Return the number of input channels per group.

inline std::size_t getNumInputChans() const

Return the number of input channels per group.

inline std::size_t getNumConvGroups() const

Return the number of convolution groups.

inline std::size_t getNumFieldDims() const

Return the number of dimensions of the input field.

inline std::vector<std::size_t> getInputFieldShape() const

Return the shape of the input field.

inline std::vector<std::size_t> getKernelShape() const

Return the shape of the kernel.

inline std::size_t getBatchSize() const

Return the batch size.

unsigned getTruncatedInputSize(unsigned dim) const

Return the size of input in the specified dimension after truncation.

unsigned getTruncatedKernelSize(unsigned dim) const

Return the size of kernel in the specified dimension after truncation.

unsigned getTransformedInputSize(unsigned dim) const

Return the size of input in the specified dimension after applying the input transforms.

unsigned getTransformedKernelSize(unsigned dim) const

Return the size of kernel in the specified dimension after applying the kernel transforms.

std::vector<size_t> getOutputFieldShape() const

Returns the shape of the output field.

void validate() const
ConvParams canonicalize() const

Public Members

poplar::Type inputType
poplar::Type outputType
std::size_t batchSize

Batch size (B).

std::vector<std::size_t> inputFieldShape

Input field shape for each channel in a batch.

std::vector<std::size_t> kernelShape

Kernel shape for each channel.

std::size_t inputChannelsPerConvGroup

Input channels per conv group (Ci).

std::size_t outputChannelsPerConvGroup

Output channels per group (Co).

std::size_t numConvGroups

Number of groups in a grouped convolution (G).

The input and output channels are divided by G such that G kernels are applied to an input tensors of size [B, O{dims}, Ci/G] to produce output tensors of size [B, O{dims}, Co/G]. O{dims} is the output field dimensions.

InputTransform inputTransform

The transform applied to the input.

InputTransform kernelTransform

The transform applied to the kernel.

OutputTransform outputTransform

The transform applied to the output.

Friends

friend bool operator<(const ConvParams &a, const ConvParams &b)
friend bool operator==(const ConvParams &a, const ConvParams &b)
friend bool operator!=(const ConvParams &a, const ConvParams &b)
struct InputTransform

Public Functions

InputTransform() = default
InputTransform(const std::size_t size)
InputTransform(std::vector<unsigned> truncationLower, std::vector<unsigned> truncationUpper, std::vector<unsigned> dilation, std::vector<unsigned> paddingLower, std::vector<unsigned> paddingUpper, std::vector<bool> flip)
Parameters
  • truncationLower – Where to truncate the lower end of each dimension.

  • truncationUpper – Where to truncate the upper end of each dimension.

  • dilation – Dilation to apply to each dimension.

  • paddingLower – How much to pad the lower end of each dimension.

  • paddingUpper – How much to pad the upper end of each dimension.

  • flip – If true, each spatial dimension is flipped after being padded.

Public Members

std::vector<unsigned> truncationLower

The position where the lower end of each spatial dimension is truncated before dilation.

std::vector<unsigned> truncationUpper

The position where the upper end of each spatial dimension is truncated before dilation.

std::vector<unsigned> dilation

Dilation applied to each spatial dimensions after truncation and before padding.

Dilation is performed by placing a number of zeroed elements between the elements of the field.

std::vector<unsigned> paddingLower

Padding applied to each spatial dimension after dilation and before flipping.

std::vector<unsigned> paddingUpper

Padding applied to each spatial dimension after dilation and before flipping.

std::vector<bool> flip

If true, each spatial dimension is flipped after being padded.

Friends

friend bool operator<(const InputTransform &a, const InputTransform &b)
friend bool operator==(const InputTransform &a, const InputTransform &b)
friend bool operator!=(const InputTransform &a, const InputTransform &b)
struct OutputTransform

Public Functions

OutputTransform() = default
OutputTransform(const std::size_t size)
OutputTransform(std::vector<unsigned> truncationLower, std::vector<unsigned> truncationUpper, std::vector<unsigned> striding, std::vector<unsigned> paddingLower, std::vector<unsigned> paddingUpper)
Parameters
  • truncationLower – Where to truncate the lower end of each dimension.

  • truncationUpper – Where to truncate the upper end of each dimension.

  • striding – Stride to use in convolution.

  • paddingLower – How much to pad the lower end of each dimension.

  • paddingUpper – How much to pad the upper end of each dimension.

Public Members

std::vector<unsigned> truncationLower

The position where the lower end of each spatial dimension is truncated before dilation.

std::vector<unsigned> truncationUpper

The position where the upper end of each spatial dimension is truncated before dilation.

std::vector<unsigned> stride

Striding applied to each spatial dimension after truncation and before padding.

std::vector<unsigned> paddingLower

Padding applied to lower end of each spatial dimension after striding.

std::vector<unsigned> paddingUpper

Padding applied to upper end of each spatial dimension after striding.

Friends

friend bool operator<(const OutputTransform &a, const OutputTransform &b)
friend bool operator==(const OutputTransform &a, const OutputTransform &b)
friend bool operator!=(const OutputTransform &a, const OutputTransform &b)
namespace std
template<> ConvParams >

Public Functions

std::size_t operator()(const poplin::ConvParams &params) const
template<> InputTransform >

Public Functions

std::size_t operator()(const poplin::ConvParams::InputTransform &it) const
template<> OutputTransform >

Public Functions

std::size_t operator()(const poplin::ConvParams::OutputTransform &ot) const

poplin/ConvPreplan.hpp

Functions and data types to support performing convolution preplanning.

namespace poplin

Linear algebra functions.

Functions

void preplan(const std::set<ConvPlanParams> &convs, const std::set<MatMulPlanParams> &matmuls, PlanningCache &cache)

Plan the specified convolutions & matmuls.

All entries must have matching machine parameters.

All entries must have matching machine parameters.

Parameters
  • convs – A set of tuples of:

    • conv-specific target for tile / IPU sizing

    • convolution parameters

    • implementation options. See createWeights().

  • matmuls – A set of tuples of:

    • matmul-specific target for tile / IPU sizing

    • convolution parameters

    • implementation options. See createWeights().

  • cache – The planning cache to update.

poplin/ConvUtil.hpp

A collection of utility functions to assist calculation of input/output ranges when moving a 2-dimensional kernel over a larger 2-dimensional space (for example in convolution or pooling layers)

namespace poplin

Linear algebra functions.

Functions

unsigned getDilatedSize(unsigned size, unsigned dilation)

Return the output size when the specified dilation is applied to an input of the specified size.

unsigned getInputIndex(unsigned dim, unsigned outputIndex, unsigned kernelIndex, const ConvParams &params)

Return the index of the input element that is multiplied by the specified kernel index to produce the specified output.

Return ~0U if there is no such input element.

unsigned getKernelIndex(unsigned dim, unsigned outputIndex, unsigned inputIndex, const ConvParams &params)

Return the index of the kernel element that is multiplied by the specified input index to produce the specified output.

Return ~0U if there is no such kernel element.

std::pair<unsigned, unsigned> getOutputRangeForKernelIndex(unsigned dim, std::pair<unsigned, unsigned> outputRange, unsigned kernelIndex, const ConvParams &params)

Given an output range, return the subset whose calculation involves the specified kernel index.

std::pair<unsigned, unsigned> getOutputRangeForInputIndex(unsigned dim, std::pair<unsigned, unsigned> outputRange, unsigned inputIndex, const ConvParams &params)

Given an output range, return the subset whose calculation involves the specified input.

std::pair<unsigned, unsigned> getOutputRangeForKernelRange(unsigned dim, std::pair<unsigned, unsigned> outputRange, std::pair<unsigned, unsigned> kernelIndexRange, const ConvParams &params)

Given an output range, return the subset whose calculation involves the specified range of kernel indicies.

std::pair<unsigned, unsigned> getOutputRangeForInputRange(unsigned dim, std::pair<unsigned, unsigned> outputRange, std::pair<unsigned, unsigned> inputRange, const ConvParams &params)

Given an output range, return the subset whose calculation involves the specified range of input indicies.

std::pair<unsigned, unsigned> getInputRange(unsigned dim, std::pair<unsigned, unsigned> outputRange, unsigned kernelIndex, const ConvParams &params)

Return the input range that is associated with the specified kernel index when calculating the specified output range.

std::pair<unsigned, unsigned> getKernelRange(unsigned dim, std::pair<unsigned, unsigned> outputRange, unsigned inputIndex, const ConvParams &params)

Return the kernel range that is associated with the specified input index when calculating the specified output range.

std::pair<unsigned, unsigned> getInputRange(unsigned dim, std::pair<unsigned, unsigned> outputRange, std::pair<unsigned, unsigned> kernelIndexRange, const ConvParams &params)

Return the input range that is associated with the specified kernel index range when calculating the specified output range.

std::pair<unsigned, unsigned> getKernelRange(unsigned dim, std::pair<unsigned, unsigned> outputRange, std::pair<unsigned, unsigned> inputRange, const ConvParams &params)

Return the kernel range that is associated with the specified input index range when calculating the specified output range.

inline std::pair<unsigned, unsigned> getInputRange(unsigned dim, unsigned outputIndex, std::pair<unsigned, unsigned> kernelIndexRange, const ConvParams &params)
inline std::pair<unsigned, unsigned> getInputRange(unsigned dim, unsigned outputIndex, const ConvParams &params)
inline std::pair<unsigned, unsigned> getInputRange(unsigned dim, std::pair<unsigned, unsigned> outputRange, const ConvParams &params)
ConvParams getGradientParams(const ConvParams &params)

Given a set of parameters, return the set of params that represent the convolution to be applied to the output gradients to get the input gradients (provided the weights have been transposed in the channel axes and flipped in the spatial axes).

ConvParams getWeightUpdateParams(const ConvParams &fwdParams)

Given a set of convolution parameters, return the set of params that represent the convolution to be applied to the output gradients to get the weight update gradients.

poplin/Convolution.hpp

Functions and data types to support performing convolutions.

namespace poplin

Linear algebra functions.

Typedefs

using ConvPlanParams = std::tuple<const poplar::Target*, const ConvParams, const poplar::OptionFlags*>

Functions

uint64_t getFwdFlops(const ConvParams &params)

Calculate the minimum number of floating point operations required to perform the forward pass convolution given a set of params.

uint64_t getBwdFlops(const ConvParams &params)

Calculate the minimum number of floating point operations required to perform the backward pass convolution given a set of params.

uint64_t getWuFlops(const ConvParams &params)

Calculate minimum number of floating point operations required to perform the weight update pass convolution given a set of params.

double getFwdPerfectCycleCount(const poplar::Graph &graph, const ConvParams &params)

Calculate the number of cycles to perform the forward pass assuming maximal utilisation of target hardware performing the minimum number of floating point operations.

This takes into account the number of tiles available and vectorization support on the target.

This is an optimistic number useful for estimating efficiency: cycleCount = getFwdFlops() / maximumHardwareVectorization.

Parameters
  • graph – Provides target the convolution will run on.

  • params – Description of convolution.

Returns

Estimated number of cycles to perform the forward pass.

double getBwdPerfectCycleCount(const poplar::Graph &graph, const ConvParams &params)

Calculate the number of cycles to perform the backward pass assuming maximal utilisation of the target hardware, performing the minimum number of floating point operations.

This takes into account the number of tiles available and vectorization support on the target.

This is an optimistic number useful for estimating efficiency: cycleCount = getBwdFlops() / maximumHardwareVectorization.

Parameters
  • graph – Provides target the convolution will run on.

  • params – Description of convolution.

Returns

Estimated number of cycles to perform the backward pass.

double getWuPerfectCycleCount(const poplar::Graph &graph, const ConvParams &params)

Calculate the number of cycles to perform the weight update pass assuming maximal utilisation of the target hardware, performing the minimum number of floating point operations.

This takes into account the number of tiles available and vectorization support on the target.

This is an optimistic number useful for estimating efficiency. cycleCount = getWuFlops() / maximumHardwareVectorization

Parameters
  • graph – Provides target the convolution will run on.

  • params – Description of convolution.

Returns

Estimated number of cycles to perform the weight update pass.

poplar::Tensor createWeights(poplar::Graph &graph, const ConvParams &params, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Create a weight tensor suitable for use with convolution()

The shape of the tensor will be [convGroups x outChansPerConvGroup x inChansPerConvGroup x H x W]

Convolution options

  • availableMemoryProportion Decimal between 0 and 1 (inclusive) [=0.6]

    The proportion of tile memory to be made available as temporary memory for this convolution. This constraint will be ignored (with a warning) if a conforming plan cannot be found and then the planner will replan for the smallest memory usage possible. Less temporary memory will generally result in a convolution that takes more cycles to complete. However, because always live memory (like code and vertex state) is not tracked by the planner, a convolution using less temporary memory may use more memory overall due to an increase of always live memory.

    Note: We recommend using a value greater than 0.05. Below this value the volume of always live memory quickly increases and can result in out of memory errors.

  • partialsType (half, float) [=float]

    Data type used for intermediate calculations. If the type specified is smaller than the output type then the option is ignored and the output type is used instead.

  • pass (NONE, INFERENCE_FWD, TRAINING_FWD, TRAINING_BWD, TRAINING_WU, FC_INFERENCE_FWD, FC_TRAINING_FWD, FC_TRAINING_BWD, FC_TRAINING_WU) [=NONE]

  • use128BitConvUnitLoad (true, false) [=false]

    If true, convolution weights are loaded 128-bits at a time. Otherwise, they are loaded 64-bits at a time. Not all codelets support 128-bit loads. This option affects memory usage and cycle count.

  • enableMultiStageReduce (true, false) [=true]

    If true, perform the reduction following the convolution in multiple stages if it would significantly reduce code size. This comes at the cost of increasing the number of cycles.

  • enableFastReduce (true, false) [=false]

    If true, use a faster reduction vertex if the data types and widths allow it. This comes at the cost of further constraints on memory allocation

  • enableConvDithering (true, false) [=false]

    If true, then convolutions with different parameters will be laid out from different tiles in an effort to improve tile balance in models.

Parameters
  • graph – The graph that the tensor will be added to.

  • params – The same parameters as used by the convolution().

  • name – Debugging name for the tensor.

  • options – Options controlling the implementation.

  • cache – Optional pointer to planning cache to use.

Returns

The weights tensor suitable for use with convolution().

poplar::Tensor createBiases(poplar::Graph &graph, const poplar::Tensor &activations, const poplar::DebugContext &debugContext = {"biases"})

Create a bias tensor suitable for input to addBias() function.

The tensor will have the shape [outChans]

Parameters
  • graph – The graph that the tensor will be added to.

  • activations – The activation tensor which is output from the convolution.

  • name – Debugging name for the tensor.

Returns

The tensor of biases.

poplar::Tensor createInput(poplar::Graph &graph, const ConvParams &params, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Create an input tensor for a convolution.

Use this when you need to create an input data tensor for a convolution. The same set of parameters which will be passed to the convolution() should also be passed to createInput().

The returned tensor has the shape [B x inChans x H x W].

Parameters
  • graph – The tensor will be added to this graph.

  • params – Parameters as passed to the target convolution.

  • name – Debugging name for the tensor.

  • options – Options controlling the implementation. See createWeights().

  • cache – Optional pointer to planning cache to use.

Returns

The allocated input tensor.

poplar::Tensor convolution(poplar::Graph &graph, const poplar::Tensor &in, const poplar::Tensor &weights, const ConvParams &params, bool transposeAndFlipWeights, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Convolve an input with a set of weights.

The input tensor is in the form [B x inChans x H x W], and can be allocated using createInput(). The weights tensor is in the form [convGroups x outChansPerConvGroup x inChansPerConvGroup x H x W], and can be allocated using createWeights().

The returned tensor has the shape [B x outChans x H x W]

Padding and striding are specified in the ConvParams structure.

Parameters
  • graph – The graph that the operation will be added to.

  • in – Input data tensor.

  • weights – Weights tensor.

  • params – Parameters for the form of the convolution.

  • transposeAndFlipWeights – For the weight update pass.

  • prog – Poplar program sequence to append the operation onto.

  • debugContext – Optional debug information.

  • options – Options that control the implementation. See createWeights().

  • cache – Optional pointer to planning cache to use.

Returns

The convolved output tensor.

void preplanConvolutions(const std::set<ConvPlanParams> &convs, PlanningCache &cache)

Deprecated:

Use preplan() instead.

Plan the specified convolutions.

All entries must have matching machine parameters.

Parameters
  • convs – A set of tuples of:

    • conv-specific target for tile / IPU sizing

    • convolution parameters

    • implementation options. See createWeights().

  • cache – The planning cache to update.

void preplanConvolutions(poplar::Graph &graph, const std::set<ConvPlanParams> &convs, PlanningCache &cache)

Deprecated:

Use preplan() instead.

Plan the specified convolutions.

All entries must have matching machine parameters.

Parameters
  • graph – The graph the convolutions will belong to

  • convs – A set of tuples of:

    • conv-specific target for tile / IPU sizing

    • convolution parameters

    • implementation options. See createWeights().

  • cache – The planning cache to update.

void weightsTransposeChansFlipXY(poplar::Graph &graph, const poplar::Tensor &weightsIn, const poplar::Tensor &weightsOut, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Copy the weights in weightsIn into weightsOut such that each element of the kernel is transposed with respect to the input and output channels and flip each spatial dimension of the kernel.

See the transposeAndFlipWeights parameter in convolution().

Parameters
  • graph – The graph that the operation will be added to.

  • weightsIn – The input weights tensor.

  • weightsOut – The output weights tensor.

  • prog – Poplar program sequence to append the operation onto.

  • debugContext – Optional debug information.

  • options – Options controlling the implementation. See createWeights().

poplar::Tensor calculateWeightDeltas(poplar::Graph &graph, const poplar::Tensor &zDeltas, const poplar::Tensor &activations, const ConvParams &params, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Append an operation to a poplar::Program to generate the tensor of weight deltas.

Parameters
  • graph – The tensor will be added to this graph.

  • zDeltas – Tensor containing the gradients with respect to the output of the convolution.

  • activation – Tensor containing the inputs to the convolution in the forward pass.

  • params – Parameters of the convolution.

  • prog – Poplar program sequence to append the operation onto.

  • debugContext – Optional debug information.

  • options – Options controlling the implementation. See createWeights().

  • cache – Optional pointer to planning cache to use.

Returns

The weight deltas are the gradients with respect to the weights of the convolution. These are populated when the operation runs.

void convolutionWeightUpdate(poplar::Graph &graph, const poplar::Tensor &zDeltas, const poplar::Tensor &weights, const poplar::Tensor &activations, ConvParams params, const poplar::Tensor &scale, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Append operations to a poplar::Program to generate and apply the weight update.

Parameters
  • graph – The graph that the operation will be added to.

  • zDeltas – Tensor containing the gradients with respect to the output of the convolution.

  • weights – Weights tensor.

  • activations – Tensor containing the inputs to the convolution in the forward pass.

  • params – Parameters of the convolution.

  • scale – Scale to apply to the zDeltas.

  • prog – Poplar program sequence to append the operations onto.

  • debugContext – Optional debug information.

  • options – Options controlling the implementation. See createWeights().

  • cache – Optional pointer to planning cache to use.

void convolutionWeightUpdate(poplar::Graph &graph, const poplar::Tensor &zDeltas, const poplar::Tensor &weights, const poplar::Tensor &activations, ConvParams params, float scale, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Append operations to a poplar::Program to generate and apply the weight update.

Parameters
  • graph – The graph that the operation will be added to.

  • zDeltas – Tensor containing the gradients with respect to the output of the convolution.

  • weights – Weights tensor.

  • activations – Tensor containing the inputs to the convolution in the forward pass.

  • params – Parameters of the convolution.

  • scale – Scale to apply to the zDeltas.

  • prog – Poplar program sequence to append the operations onto.

  • debugContext – Optional debug information.

  • options – Options controlling the implementation. See createWeights().

  • cache – Optional pointer to planning cache to use.

void convolutionBiasUpdate(poplar::Graph &graph, const poplar::Tensor &zDeltas, const poplar::Tensor &biases, const poplar::Tensor &scale, const poplar::OptionFlags &options, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Add a program to update biases tensor with the gradients derived from the zDeltas tensor.

Parameters
  • graph – The graph that the operation will be added to.

  • zDeltas – Tensor containing the gradients with respect to the output of the convolution.

  • biases – Biases tensor to update.

  • scale – Scale to apply to to zDeltas tensor.

  • options – Options controlling the implementation. See createWeights().

  • prog – Poplar program sequence to append the operation onto.

  • debugContext – Optional debug information.

void convolutionBiasUpdate(poplar::Graph &graph, const poplar::Tensor &zDeltas, const poplar::Tensor &biases, float scale, const poplar::OptionFlags &options, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Add a program to update biases tensor with the gradients derived from the zDeltas tensor.

Parameters
  • graph – The graph that the operation will be added to.

  • zDeltas – Tensor containing the gradients with respect to the output of the convolution.

  • biases – Biases tensor to update.

  • scale – Scale to apply to to zDeltas tensor.

  • options – Options controlling the implementation. See createWeights().

  • prog – Poplar program sequence to append the operation onto.

  • debugContext – Optional debug information.

void addBias(poplar::Graph &graph, const poplar::Tensor &in, const poplar::Tensor &biases, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Adds a program to prog which adds biases to activations tensor.

Parameters
  • graph – The graph that the operation will be added to.

  • input – Tensor containing values which to add the biases.

  • biases – Biases to add to the input tensor.

  • prog – Poplar program sequence to append the operation onto.

  • debugContext – Optional debug information.

void reportPlanInfo(std::ostream &out, const poplar::Graph &graph, const ConvParams &params, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Report the convolution plan corresponding to the params and options provided.

Parameters
  • out – Output stream to report the plan to.

  • graph – The graph that the convolution is planned with.

  • params – The same parameters as used by the convolution().

  • options – Options controlling the implementation. See createWeights().

  • cache – Optional pointer to planning cache to use.

PlanCosts reportPlanEstimatedCosts(const poplar::Graph &graph, const ConvParams &params, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Report the estimated cycles and memory costs of the convolution plan corresponding to the params and options provided.

Parameters
  • graph – The graph that the convolution is planned with.

  • params – The same parameters as used by the convolution().

  • options – Options controlling the implementation. See createWeights().

  • cache – Optional pointer to planning cache to use.

Returns

Cycles and memory cost estimates for the planned convolution.

void reportWeightUpdatePlanInfo(std::ostream &out, const poplar::Graph &graph, const ConvParams &fwdParams, const poplar::OptionFlags &fwdOptions = {}, PlanningCache *cache = nullptr)

Report the convolution plan corresponding to the weight update pass given the forward pass params and options.

Parameters
  • out – ostream to report the plan to.

  • graph – The graph that the convolution is planned with.

  • fwdParams – Forward pass parameters as used by the convolution().

  • fwdOptions – Forward pass options controlling the implementation. See createWeights().

  • cache – Optional pointer to planning cache to use.

poplar::Tensor fullyConnectedWeightTranspose(poplar::Graph &graph, poplar::Tensor weights, const ConvParams &params, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Arranges the weights (activations) such that they are suited for the backward pass in a fully connected layer.

Parameters
  • graph – The graph that the operation will be added to.

  • activations – Tensor containing the inputs to the convolution.

  • params – Parameters of the convolution.

  • prog – Poplar program sequence to append the operation onto.

  • debugContext – Optional debug information.

  • options – Options controlling the implementation. See createWeights().

  • cache – Optional pointer to planning cache to use.

Returns

A tensor with the weights suitably arranged.

struct PlanCosts
#include <Convolution.hpp>

Structure for estimated costs returned by reportPlanEstimatedCosts()

Public Members

std::size_t cycles
std::size_t memory
class PlanningCache

Public Functions

PlanningCache()
~PlanningCache()
std::size_t size() const

Returns the number of entries currently stored in the cache.

Public Members

std::unique_ptr<PlanningCacheImpl> impl

poplin/FullyConnected.hpp

Functions and data types to for performing operations on fully-connected layers.

namespace poplin

Linear algebra functions.

namespace fc

Functions

std::vector<std::pair<MatMulParams, poplar::OptionFlags>> getMatMulPrePlanParameters(FullyConnectedParams parameters, poplar::OptionFlags matmulOptions, poplar::Type type, bool inferenceOnly)

Predict what matrix multiplications will be needed for the given parameters and return a list of corresponding matmul() parameters and options.

Parameters
  • parameters – Parameters for the fully-connected layer.

  • matmulOptions – Option flags are the same as those from matmul(). They are passed through to the underlying matmul, updating the fullyConnectedPass option only.

  • type – Input and output datatype.

  • inferenceOnly – Whether the fully-connected layer is for inference only. If true, we can ignore backwards and weight-update passes.

Returns

Vector of pairs of [MatMulParams, OptionFlags] representing the complete set of matmul parameters for planning.

struct FullyConnectedParams
#include <FullyConnected.hpp>

Parameters to describe a fully-connected layer.

Public Members

std::size_t numGroups

The number of groups (where each group represents a fully-connected layer of the same shape) that operate in one layer.

Each group is totally independent of the others and so numGroups is a common dimension among the inputs, weights, and outputs for the layer.

std::size_t batchSize

Number of samples in the input to the layer.

std::size_t inputSize

Size of the input in each batch into the layer.

std::size_t outputSize

Size of the output in each batch from the layer.

poplin/MatMul.hpp

Functions and data types for performing matrix multiplies on the IPU.

namespace poplin

Linear algebra functions.

Typedefs

using MatMulPlanParams = std::tuple<const poplar::Target*, const MatMulParams, const poplar::OptionFlags*>

A tuple containing the required parameters to preplan a matmul:

  • matmul-specific target for tile / IPU sizing

  • matmul parameters

  • implementation options (see matMul() above)

All entries must have matching machine parameters.

using MatMulToConvOptions = std::unordered_map<const poplar::OptionFlags*, poplar::OptionFlags>

Mapping of pointers to matrix multiplication option flags to the corresponding convolution option flags.

Functions

poplar::Tensor matMul(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::Type &outputType, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, matmul::PlanningCache *cache = nullptr)

Multiply two matrices.

Calculates C = A * B where A and B are matrices.

Matrix multiply options

  • availableMemoryProportion Decimal between 0 and 1 (inclusive) [=0.6]

    See createWeights().

  • fullyConnectedPass (NONE, INFERENCE_FWD, TRAINING_FWD, TRAINING_BWD, TRAINING_WU) [=NONE]

    Optimize the plan for the specified type of pass. Note the abbreviations: FWD (forward), BWD (backward), WU (weight-update).

  • inputRHSIsPreArranged (true, false) [=false]

    Indicates to matMul functions whether the input data has already been re-arranged (using preArrangeMatMulInputRHS()). This allows data to be re-arranged once then used many times.

  • use128BitConvUnitLoad (true, false) [=false]

    If true, weights are loaded into the convolution unit 128-bits at a time. Otherwise, they are loaded 64-bits at a time. Not all codelets support 128-bit loads. This option affects memory usage and cycle count.

  • enableMultiStageReduce (true, false) [=true]

    If true, perform the reduction following the matrix multiplication in multiple stages if it would significantly reduce code size. This comes at the cost of increasing the number of cycles.

  • enableFastReduce (true, false) [=false]

    If true, use a faster reduction vertex if the data types and widths allow it. This comes at the cost of further constraints on memory allocation

  • remapOutputTensor (true, false) [=true]

    If true, the output of the convolution is remapped if the output is detected to have a poor layout.

  • partialsType (half, float) [=float]

    See createWeights().

Parameters
  • graph – The Poplar graph.

  • A – The left argument to the multiplication. This 2D tensor must be already mapped to tiles.

  • B – The right argument to the multiplication. This 2D tensor must be already mapped to tiles.

  • prog – A reference to a program sequence which will be appended with the code to perform the multiplication.

  • outputType – Optional via overloaded function. Element type of returned tensor. The default is A.elementType() if omitted.

  • debugContext – Optional debug information.

  • options – The structure describing options on how the multiplication should be implemented.

  • cache – Optional pointer to a planning cache to use.

Returns

The tensor holding the result of the multiplication. This tensor will be created, added to the graph and mapped to tiles.Matrix multiply with explicitly defined output type.

poplar::Tensor matMul(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, matmul::PlanningCache *cache = nullptr)

Matrix multiply where output type is the same as input A.

void matMulReportPlan(std::ostream &out, const poplar::Graph &graph, const poplar::Type &inputType, const poplar::Type &outputType, const std::vector<std::size_t> &aShape, const std::vector<std::size_t> &bShape, const poplar::OptionFlags &options = {}, matmul::PlanningCache *cache = nullptr)

Report the convolution plan corresponding to the parameters and options provided.

Parameters
  • out – Stream to write report to.

  • graph – The Poplar graph.

  • inputType – Element type of input tensors.

  • outputType – Element type of output tensor.

  • aShape – Shape of input tensor A.

  • bShape – Shape of input tensor B.

  • options – The structure describing options on how the multiplication should be implemented.

  • cache – Optional pointer to a planning cache to use.

poplar::Tensor matMulGrouped(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::Type &outputType, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, matmul::PlanningCache *cache = nullptr)

Multiply two grouped matrices.

Calculates C[g] = A[g] * B[g] where A[g] and B[g] are matrices for each element in the group, and g is an element of the set {0, 1, …, G-1}.

The multiplication is done for every element in the group. The first dimension of the matrices is the group dimension with value equal to G.

Parameters
  • graph – The Poplar graph.

  • A – The left argument to the grouped multiplication. This 3D tensor must be already mapped to tiles.

  • B – The right argument to the grouped multiplication. This 3D tensor must be already mapped to tiles.

  • prog – A reference to a program sequence which will be appended with the code to perform the multiplication.

  • outputType – Data type to be used for the returned tensor.

  • debugContext – Optional debug information.

  • options – The structure describing options on how the grouped multiplication should be implemented. See matMul().

  • cache – Optional pointer to a planning cache to use.

Returns

The tensor holding the result of the grouped multiplication. This tensor will be created, added to the graph and mapped to tiles.

void matMulGroupedReportPlan(std::ostream &out, const poplar::Graph &graph, const poplar::Type &inputType, const poplar::Type &outputType, const std::vector<std::size_t> &aShape, const std::vector<std::size_t> &bShape, const poplar::OptionFlags &options = {}, matmul::PlanningCache *cache = nullptr)

Report the convolution plan corresponding to the params and options provided.

Parameters
  • out – Stream to write report to.

  • graph – The Poplar graph.

  • inputType – Element type of input tensors.

  • outputType – Element type of output tensor.

  • aShape – Shape of input tensor A.

  • bShape – Shape of input tensor B.

  • options – The structure describing options on how the multiplication should be implemented.

  • cache – Optional pointer to a planning cache to use.

void matMulAcc(poplar::Graph &graph, const poplar::Tensor &C, float k, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, matmul::PlanningCache *cache = nullptr)

Multiply two matrices and add to a third (with a scaling factor).

Calculates C += k * A * B where A, B are matrices and k is a constant scalar.

Parameters
  • graph – The Poplar graph.

  • C – The matrix to add to. This 2D tensor must be already mapped to tiles.

  • k – The constant or a single element tensor to multiply the result of the multiplication. If k is a tensor, it must be of the same type as A

  • A – The left argument to the multiplication. This 2D tensor must be already mapped to tiles.

  • B – The right argument to the multiplication. This 2D tensor must be already mapped to tiles.

  • prog – A reference to a program sequence which will be appended with the code to perform the multiplication and add.

  • debugContext – Optional debug information.

  • options – The structure describing options on how the multiplication should be implemented. See matMul().

  • cache – Optional pointer to a planning cache to use.Matrix multiply and accumulate with a scalar scaling factor.

void matMulAcc(poplar::Graph &graph, const poplar::Tensor &C, const poplar::Tensor &k, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, matmul::PlanningCache *cache = nullptr)

Matrix multiply and accumulate with a single-element scaling factor.

void matMulGroupedAcc(poplar::Graph &graph, const poplar::Tensor &C, float k, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, matmul::PlanningCache *cache = nullptr)

Grouped matrix multiply and accumulate.

Multiply two grouped matrices and add to a third (with a scaling factor).

Calculates C[g] += k * A[g] * B[g] where A[g], B[g] are matrices and k is a constant scalar. g is element of the set g = {0, 1, …, G-1}

The multiplication is done for every element in the group. The first dimension of the matrices is the group dimension with value equal to G.

Parameters
  • graph – The Poplar graph.

  • C – The matrix to add to. This 3D tensor must be already mapped to tiles.

  • k – The constant or a single element tensor to multiply the result of the multiplication. If k is a tensor, it must be of the same type as A

  • A – The left argument to the grouped multiplication. This 3D tensor must be already mapped to tiles.

  • B – The right argument to the multiplication. This 3D tensor must be already mapped to tiles.

  • prog – A reference to a program sequence which will be appended with the code to perform the grouped multiplication and add.

  • debugContext – Optional debug information.

  • options – The structure describing options on how the multiplication should be implemented. See matMul().

  • cache – Optional pointer to planning cache to use.Grouped matrix multiply and accumulate with a scalar scaling factor.

void matMulGroupedAcc(poplar::Graph &graph, const poplar::Tensor &C, const poplar::Tensor &k, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, matmul::PlanningCache *cache = nullptr)

Grouped matrix multiply and accumulate with a single-element scaling factor.

poplar::Tensor createMatMulInputLHS(poplar::Graph &graph, const poplar::Type &inputType, const poplar::Type &outputType, const std::vector<std::size_t> &aShape, const std::vector<std::size_t> &bShape, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options = {}, matmul::PlanningCache *cache = nullptr)

Create a tensor that is used as the left operand of matrix multiplication.

The types of the input and and output tensors are specified separately. This will create a 2D tensor in the graph. The ordering and tile mapping of the tensor will be set to make a matrix multiplication with this tensor as the left argument efficient.

Parameters
  • graph – The Poplar graph.

  • inputType – The input data type.

  • outputType – The data type of the returned tensor.

  • aShape – The shape of the required matrix.

  • bShape – The shape of the matrix that the required matrix will be multiplied by.

  • debugContext – Debug information.

  • options – The implementation options of the multiplication. See matMul().

  • cache – Optional pointer to a planning cache to use.

Returns

A matrix of type type and shape aShape. The tensor will have been mapped to tiles.

poplar::Tensor createMatMulInputLHS(poplar::Graph &graph, const poplar::Type &dataType, const std::vector<std::size_t> &aShape, const std::vector<std::size_t> &bShape, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options = {}, matmul::PlanningCache *cache = nullptr)

Create a tensor that is used as the left operand of matrix multiplication.

The type of both input and output tensors is specified by dataType. This will create a 2D tensor in the graph. The ordering and tile mapping of the tensor will be set to make a matrix multiplication with this tensor as the left argument efficient.

Parameters
  • graph – The Poplar graph.

  • dataType – The data type of both the input and output tensors.

  • aShape – The shape of the required matrix.

  • bShape – The shape of the matrix that the required matrix will be multiplied by.

  • debugContext – Debug information.

  • options – The implementation options of the multiplication. See matMul().

  • cache – Optional pointer to a planning cache to use.

Returns

A matrix of type type and shape aShape. The tensor will have been mapped to tiles.

poplar::Tensor createMatMulGroupedInputLHS(poplar::Graph &graph, const poplar::Type &inputType, const poplar::Type &outputType, const std::vector<std::size_t> &aShape, const std::vector<std::size_t> &bShape, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options = {}, matmul::PlanningCache *cache = nullptr)

Create a tensor that is used as the left operand of a grouped matrix multiplication.

This will create a 3D tensor in the graph. The ordering and tile mapping of the tensor will be set to make a grouped matrix multiplication with this tensor as the left argument efficient.

The first dimension of the required matrix and the matrix it multiplies by must the number of groups.

Parameters
  • graph – The Poplar graph.

  • type – The data type of the required matrix.

  • aShape – The grouped shape [g, r, c] of the required matrix.

  • bShape – The grouped shape [g, r, c] of the matrix that the required matrix will be multiplied by.

  • debugContext – Debug information.

  • options – The implementation options of the multiplication. See matMul().

  • cache – Optional pointer to a planning cache to use.

Returns

A matrix of type type and grouped shape aShape. The tensor will have been mapped to tiles.

poplar::Tensor createMatMulInputRHS(poplar::Graph &graph, const poplar::Type &inputType, const poplar::Type &outputType, const std::vector<std::size_t> &aShape, const std::vector<std::size_t> &bShape, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options = {}, matmul::PlanningCache *cache = nullptr)

Create a tensor that is used as the right operand of matrix multiplication.

This will create a 2D tensor in the graph. The ordering and tile mapping of the tensor will be set to make a matrix multiplication with this tensor as the right argument efficient.

Parameters
  • graph – The Poplar graph.

  • inputType – The input data type.

  • outputType – The data type of the returned tensor.

  • aShape – The shape of the matrix that the required matrix will be multiplied by.

  • bShape – The shape of the required matrix.

  • debugContext – Debug information.

  • options – The implementation options of the multiplication. See matMul().

  • cache – Optional pointer to a planning cache to use.

Returns

A matrix of type type and shape bShape. The tensor will have been mapped to tiles.

poplar::Tensor createMatMulInputRHS(poplar::Graph &graph, const poplar::Type &dataType, const std::vector<std::size_t> &aShape, const std::vector<std::size_t> &bShape, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options = {}, matmul::PlanningCache *cache = nullptr)

Overloaded function for when inputType == outputType (represented by the dataType parameter).

poplar::Tensor createMatMulGroupedInputRHS(poplar::Graph &graph, const poplar::Type &inputType, const poplar::Type &outputType, const std::vector<std::size_t> &aShape, const std::vector<std::size_t> &bShape, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options = {}, matmul::PlanningCache *cache = nullptr)

Create a tensor that is used as the right operand of grouped matrix multiplication.

This will create a 3D tensor in the graph. The ordering and tile mapping of the tensor will be set to make a grouped matrix multiplication with this tensor as the right argument efficient.

The first dimension of the required matrix and the matrix it multiplies by must the number of groups.

Parameters
  • graph – The Poplar graph.

  • type – The data type of the required matrix.

  • aShape – The grouped shape [g, r, c] of the matrix that the required matrix will be multiplied by.

  • bShape – The grouped shape [g, r, c] of the required matrix.

  • debugContext – Debug information.

  • options – The implementation options of the multiplication. See matMul().

  • cache – Optional pointer to planning cache to use.

Returns

A matrix of type type and grouped shape bShape. The tensor will have been mapped to tiles.

poplar::Tensor preArrangeMatMulInputRHS(poplar::Graph &graph, const std::vector<std::size_t> &aShape, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::Type &outputType, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, matmul::PlanningCache *cache = nullptr)

Pre-arrange right-hand side input.

Re-arrange memory for RHS operand to an upcoming matmul operation. This allows the rearrangement of the memory of a tensor that would otherwise be rearranged as part of the matmul operation for efficiency.

Use this function and the matMul*() functions with the inputRHSIsPreArranged option flag to do any re-arrangement necessary once and then re-use that input multiple times.

Only valid for fully connected layers.

Parameters
  • graph – The Poplar graph.

  • aShape – The shape of the left argument to the multiplication.

  • B – The right argument to the multiplication. This 2D tensor must be already mapped to tiles.

  • prog – A reference to a program sequence which will be appended with the code to perform the arrangement.

  • outputType – Optional via overloaded function. Element type of returned tensor. The default is B.elementType() if omitted.

  • debugContext – Optional debug information.

  • options – Flags describing options for how the multiplication should be implemented. See matMul().

  • cache – Optional pointer to planning cache to use.

Returns

New tensor holding the rearranged input. This tensor has the same shape as the given tensor.Pre-arrange input with explicitly defined output type.

poplar::Tensor preArrangeMatMulInputRHS(poplar::Graph &graph, const std::vector<std::size_t> &aShape, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, matmul::PlanningCache *cache = nullptr)

Pre-arrange input where the output type is the same as B.

poplar::Tensor preArrangeMatMulGroupedInputRHS(poplar::Graph &graph, const std::vector<std::size_t> &aShape, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::Type &outputType, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, matmul::PlanningCache *cache = nullptr)

Pre-arrange grouped input with explicitly defined output type.

poplar::Tensor transposeGroupedMatrix(const poplar::Tensor &A)

Transposes a grouped matrix tensor.

Parameters

A – Tensor to transpose

Returns

Transposed tensor

std::set<ConvPlanParams> matMulGetConvPlanParams(const std::set<MatMulPlanParams> &matmuls, MatMulToConvOptions &matmulToConvOpts)

Obtain the set of convolution parameters corresponding to the user supplied set of parameters for matrix multiplication.

Parameters
  • matmuls – Set of Matrix multiplication parameter tuples

  • matmulToConvOpts – Convolution options corresponding to every matrix multiplication options.

Returns

Set of Convolution parameters

void preplanMatMuls(const std::set<MatMulPlanParams> &matmuls, matmul::PlanningCache &cache)

Deprecated:

Use preplan() instead.

Plan the specified matrix multiplications.

Parameters
  • matmuls – A set of parameters to preplan matmuls

  • cache – The planning cache to update

struct MatMulParams
#include <MatMul.hpp>

Parameters to define a Matrix multiplication.

C = A * B

Public Members

poplar::Type inputType

Input type (of A & B)

poplar::Type outputType

Output type (of C)

std::vector<std::size_t> aShape

Shape of the lhs input matrix (A)

std::vector<std::size_t> bShape

Shape of the rhs input matrix (B)

Friends

friend bool operator<(const MatMulParams &a, const MatMulParams &b)
namespace matmul
class PlanningCache
#include <MatMul.hpp>

Deprecated:

Use poplin::PlanningCache instead.

Public Functions

PlanningCache()
~PlanningCache()
std::size_t size() const

Returns the number of entries currently stored in the cache.

poplin::PlanningCache &getImpl()

Private Members

poplin::PlanningCache impl

poplin/MeshGrid.hpp

Functions to populate arrays with linear sequences of values.

namespace poplin

Linear algebra functions.

Functions

poplar::Tensor linspace(poplar::Graph &graph, const poplar::Type &type, float left, float right, size_t count, const poplar::DebugContext &debugContext = {})

Create a constant variable that contains values equally spaced in the specified closed range [left, right].

Parameters
  • graph – Graph to which the variable is added.

  • left – The first value in the range.

  • right – The last value in the range.

  • type – Data type of variable to create. Must be FLOAT or HALF.

  • debugContext – Optional debug information.

Returns

Constant Tensor of rank 1 (vector) containing the linspace values.

std::vector<poplar::Tensor> meshgrid2d(poplar::Graph &graph, poplar::Tensor x, poplar::Tensor y)

Create a coordinate grid for each axis by broadcasting the input tensors.

This 2D specialisation only supports two inputs that must be of rank 1 (vectors) and hence the output coordinate grids are always two matrices (so two outputs of rank 2).

Parameters
  • graph – Graph to which the variables are added.

  • x – Co-ordinates for the x-axis.

  • y – Co-ordinates for the y-axis.

Returns

A list of (two) tensors that form co-ordinate grids for each input axis. These output tensors will be views of the inputs (reshaped and broadcast).

poplin/MultiConvolution.hpp

Support performing convolutions in parallel.

namespace poplin

Linear algebra functions.

namespace multiconv

Functions

poplar::Tensor createWeights(poplar::Graph &graph, const std::vector<CreateTensorArgs> &args, unsigned weightsIndex, const poplar::OptionFlags &options = {}, poplin::PlanningCache *cache = nullptr)

Create a specific weights tensor for the multiconvolution.

Parameters
  • graph – The graph that the tensors will be added to.

  • args – The same set of parameters as used by convolution().

  • weightsIndex – Index into args describing the convolution which to create the weights for.

  • options – Options controlling the implementation.

  • cache – Optional pointer to a planning cache to use.

Returns

A weights tensor suitable for use with convolution().

poplar::Tensor createInput(poplar::Graph &graph, const std::vector<CreateTensorArgs> &args, unsigned inputIndex, const poplar::OptionFlags &options = {}, poplin::PlanningCache *cache = nullptr)

Create a specific input tensor for the multiconvolution.

Parameters
  • graph – The graph that the tensors will be added to.

  • args – The same set of parameters as used by convolution().

  • inputIndex – Index into args describing the convolution which to create the input for.

  • options – Options controlling the implementation.

  • cache – Optional pointer to a planning cache to use.

Returns

A tensor suitable for use as an input to convolution().

void weightsTransposeChansFlipXY(poplar::Graph &graph, std::vector<ConvolutionArgs> &args, const std::vector<poplar::Tensor> &weightsIn, poplar::program::Sequence &prog, const poplar::OptionFlags &options, const poplar::DebugContext &debugContext, poplin::PlanningCache *cache)

For each element in the multi-convolution set, copy the corresponding weightsIn element into the convolution weight input such that each element of the kernel is transposed with respect to the input and output channels and each spatial dimension of the kernel is flipped.

See Convolution.hpp for more information.

Parameters
  • graph – The graph that the operations will be added to.

  • args – Collection of inputs, weights, and convolution parameters specifying each convolution in the multiconvolution.

  • weightsIn – Collection of weights tensor to copy from, the arrangement of which must correspond with the arrangement of the collection of convolution parameters.

  • prog – Poplar program sequence to append the operations onto.

  • options – Options controlling the implementation.

  • debugContext – Optional debug information.

  • cache – Optional pointer to a planning cache to use.

std::vector<poplar::Tensor> convolution(poplar::Graph &graph, const std::vector<ConvolutionArgs> &args, bool transposeAndFlipWeights, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, poplin::PlanningCache *cache = nullptr)

Convolve a set of inputs with a set of weights.

See Convolution.hpp for more information.

Parameters
  • graph – The graph that the operations will be added to.

  • args – Collection of inputs, weights, and convolution parameters specifying each convolution in the multiconvolution.

  • transposeAndFlipWeights – Prepare the weights for the backwards pass.

  • prog – Poplar program sequence to append the operations onto.

  • debugContext – Optional debug information.

  • options – Options controlling the implementation.

  • cache – Optional pointer to a planning cache to use.

Returns

Set of convolved output tensors.

std::vector<poplar::Tensor> calculateWeightDeltas(poplar::Graph &graph, const std::vector<CalculateWeightDeltasArgs> &args, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, poplin::PlanningCache *cache = nullptr)

Append an operation to generate the set of weight delta tensors.

See Convolution.hpp for more information.

Parameters
  • graph – The graph that the operations will be added to.

  • args – Collection of zDeltas, activations, and convolution parameters specifying each convolution in the multiconvolution.

  • prog – Poplar program sequence to append the operations onto.

  • debugContext – Optional debug information.

  • options – Options controlling the implementation.

  • cache – Optional pointer to a planning cache to use.

Returns

Set of weight deltas.

void convolutionWeightUpdate(poplar::Graph &graph, const std::vector<ConvWeightUpdateArgs> &args, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, poplin::PlanningCache *cache = nullptr)

Append operations to prog to generate and apply the weight update.

See Convolution.hpp for more information.

Parameters
  • graph – The graph that the operations will be added to.

  • args – Collection of zDeltas, activations, scale, and convolution parameters for the weight updates in the multiconvolution.

  • prog – Poplar program sequence to append the operations onto.

  • debugContext – Optional debug information.

  • options – Options controlling the implementation.

  • cache – Optional pointer to a planning cache to use.

void convolutionWeightUpdate(poplar::Graph &graph, const std::vector<ConvWeightUpdateArgsScalar> &args, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, poplin::PlanningCache *cache = nullptr)

Append operations to prog to generate and apply the weight update.

See Convolution.hpp for more information.

Parameters
  • graph – The graph that the operations will be added to.

  • args – Collection of zDeltas, activations, scale, and convolution parameters for the weight updates in the multiconvolution.

  • prog – Poplar program sequence to append the operations onto.

  • debugContext – Optional debug information.

  • options – Options controlling the implementation.

  • cache – Optional pointer to a planning cache to use.

struct CalculateWeightDeltasArgs
#include <MultiConvolution.hpp>
Param zDeltas

Tensor containing gradients with respect to the output of the convolution.

Param activations

Tensor containing the inputs of the convolution in the forward pass.

Param params

Parameters specifying the convolution.

Param options

Options controlling the implementation.

Public Members

poplar::Tensor zDeltas
poplar::Tensor activations
ConvParams params
poplar::OptionFlags options
struct ConvolutionArgs
#include <MultiConvolution.hpp>
Param in

Input tensor.

Param weights

Weights tensor.

Param params

Parameters specifying the convolution.

Param options

Options controlling the implementation.

Public Members

poplar::Tensor inputs
poplar::Tensor weights
ConvParams params
poplar::OptionFlags options
struct ConvWeightUpdateArgs
#include <MultiConvolution.hpp>
Param zDeltas

Tensor containing gradients with respect to the output of the convolution.

Param weights

Weights tensor.

Param activations

Tensor containing the inputs of the convolution in the forward pass.

Param scale

Scale to apply to the zDeltas.

Param params

Parameters specifying the convolution.

Param options

Options controlling the implementation.

Public Members

poplar::Tensor zDeltas
poplar::Tensor weights
poplar::Tensor activations
poplar::Tensor scale
ConvParams params
poplar::OptionFlags options
struct ConvWeightUpdateArgsScalar
#include <MultiConvolution.hpp>
Param zDeltas

Tensor containing gradients with respect to the output of the convolution.

Param weights

Weights tensor.

Param activations

Tensor containing the inputs of the convolution in the forward pass.

Param scale

Scale to apply to the zDeltas.

Param params

Parameters specifying the convolution.

Param options

Options controlling the implementation.

Public Members

poplar::Tensor zDeltas
poplar::Tensor weights
poplar::Tensor activations
float scale
ConvParams params
poplar::OptionFlags options
struct CreateTensorArgs
#include <MultiConvolution.hpp>

Multi-convolutions allow for a set of convolutions to be executed in parallel.

The benefit of executing convolutions in parallel is an increase in data throughput. Specifically, executing N independent convolutions in parallel will be faster than sequentially executing them because less time is spent on the ~constant vertex overhead per tile.

Note that the allocation of associated tensors for convolutions should be done through the same api such that they are mapped across tiles appropriately for the operation.

See Convolution.hpp for information about convolutions and each individual operation.

Multi-Convolution options

  • planType (serial, parallel) [=parallel]

    Which multi-conv implementation to use. Serial is the same as using the normal API for each convolution.

  • perConvReservedTiles Integer [=50]

    The amount of tiles to reserve for each convolution when planning.

  • cycleBackOff Double [=0.1]

    A percentage, represented as a proportion between 0 and 1 of how much off the fastest plan when attempting to plan the largest convolution using the least amount of tiles.

    This number is scaled up according to how many convolutions are being run in parallel.

Param params

Parameters specifying the convolution.

Param options

Options controlling the implementation.

Param name

Debugging name for the tensor.

Public Members

ConvParams params
poplar::OptionFlags options
std::string name

poplin/Norms.hpp

Functions to support normalising values in a tensor.

namespace poplin

Linear algebra functions.

Typedefs

using DistributedNormReduceCallback = std::function<std::vector<poplar::Tensor>(poplar::Graph &replicatedGraph, const std::vector<poplar::Tensor> &inputsToReduce, poplar::program::Sequence &prog, unsigned groupSize, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options)>

Callback to reduce statistics and gradients.

The reduce operation is reduce-add.

Param graph

The replicated graph in which the computation is performed.

Param inputsToReduce

A vector of independent tensors to reduce

Param prog

A program sequence that the code to perform the normalisation will be appended to.

Param groupSize

The number of replicas that need to be reduced. This may be less than the total number of replicas in the top level graph. A group is formed by adjacent replicas such that the top level graph contains an integral number of groupSize replicas.

Param debugContext

Optional debug information.

Param options

The structure describing options on how the reduction should be implemented.

Return

A vector of reduced tensors in the same order as supplied in inputsToReduce

Functions

poplar::Tensor createNormGamma(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Type &type, const poplar::DebugContext &debugContext = {})

Create and map the per-channel multiplicative gamma parameter tensor used for normalisation in convolution layers.

Parameters
  • graph – The graph with the activations and gamma tensor.

  • acts – The activations tensor has shape [N][C][..F..] where:

    • N is the batch size

    • C is the number of channels

    • ..F.. is dimensions of a N-dimensional field.

  • type – The type of the output tensor.

  • debugContext – Optional debug information.

Returns

Gamma vector of dimension C.

poplar::Tensor createNormGamma(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::DebugContext &debugContext = {})

Create and map the per-channel multiplicative gamma parameter tensor used for normalisation in convolution layers.

Parameters
  • graph – The graph with the activations and gamma tensor.

  • acts – The activations tensor has shape [N][C][..F..] where:

    • N is the batch size

    • C is the number of channels

    • ..F.. is dimensions of a N-dimensional field.

  • debugContext – Optional debug information.

Returns

Gamma vector of dimension C.

poplar::Tensor createNormBeta(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Type &type, const poplar::DebugContext &debugContext = {})

Create and map the per-channel additive beta parameter tensor used for normalisation in convolution layers.

Parameters
  • graph – The graph with the activations and beta tensor.

  • acts – The activations tensor has shape [N][C][..F..] where:

    • N is the batch size

    • C is the number of channels

    • ..F.. is dimensions of a N-dimensional field

  • type – The type of the output tensor.

  • debugContext – Optional debug information.

Returns

Beta vector of dimension C.

poplar::Tensor createNormBeta(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::DebugContext &debugContext = {})

Create and map the per-channel additive beta parameter tensor used for normalisation in convolution layers.

Parameters
  • graph – The graph with the activations and beta tensor.

  • acts – The activations tensor has shape [N][C][..F..] where:

    • N is the batch size

    • C is the number of channels

    • ..F.. is dimensions of a N-dimensional field

  • debugContext – Optional debug information.

Returns

Beta vector of dimension C.

std::pair<poplar::Tensor, poplar::Tensor> createNormParams(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::DebugContext &debugContext = {})

Creates a tensor pair of normalisation parameters (gamma, beta).

Parameters
  • graph – The graph with the activations and beta/gamma tensors.

  • acts – The activations tensor has shape [N][C][..F..] where:

    • N is the batch size

    • C is the number of channels

    • ..F.. is dimensions of a N-dimensional field

  • debugContext – Optional debug information.

Returns

A pair of vectors of dimension C.

std::pair<poplar::Tensor, poplar::Tensor> normStatistics(poplar::Graph &graph, const poplar::Tensor &actsUngrouped, float eps, poplar::program::Sequence &prog, bool unbiasedVarEstimate, bool stableAlgo = false, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {})

Compute the normalisation statistics from the activations tensor.

The activations tensor is of shape [N][C][..F..]. The mean and inverse standard deviation is computed over dimensions {[N] [..F..]} and vectors of length C are returned as estimates.

The input activations tensor must be rearranged such that statistics are computed for C channels.

Parameters
  • graph – The graph in which the computation is performed.

  • actsUngrouped – The activation with shape [N][C][..F..] where:

    • N is the batch size

    • C is the number of channels

    • ..F.. is dimensions of a N-dimensional field.

  • eps – The epsilon added to the variance to avoid divide by zero.

  • prog – A program sequence that the code to perform the normalisation will be appended to.

  • unbiasedVarEstimate – Compute unbiased variance estimate.

  • stableAlgo – If true, computes the mean first and subtracts the activations by it before computing the variance. The implementation with this flag set to true is

  • partialsType – Poplar type used for partials.

  • debugContext – Optional debug information.

Returns

A vector pair with mean and inverse standard deviation.

std::pair<poplar::Tensor, poplar::Tensor> distributedNormStatistics(poplar::Graph &replicatedGraph, const poplar::Tensor &actsUngrouped, float eps, poplar::program::Sequence &prog, bool unbiasedVarEstimate, DistributedNormReduceCallback allReduceCallback, unsigned normSize, bool stableAlgo = false, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {})

Compute the normalisation statistics for a part of the activations tensor which is distributed over multiple replicas.

Each replica gets equal sized batches (N) with normalisation done over normSize batches. A callback does the required mean reduction over multiple replicas. The activations tensor is of shape [N][C][..F..]. The mean and inverse standard deviation is computed over dimensions {[N] [..F..]} and vectors of length C are returned as estimates.

The input activations tensor must be rearranged such that statistics are computed for C channels.

Parameters
  • replicatedGraph – The replicated graph in which the computation is performed.

  • actsUngrouped – The activation with shape [N][C][..F..] where:

    • N is the batch size

    • C is the number of channels

    • ..F.. is dimensions of a N-dimensional field.

  • eps – The epsilon added to the variance to avoid divide by zero.

  • prog – A program sequence that the code to perform the normalisation will be appended to.

  • unbiasedVarEstimate – Compute unbiased variance estimate.

  • stableAlgo – If true, computes the mean first and subtracts the activations by it before computing the variance. The implementation with this flag set to true is

  • partialsType – Poplar type used for partials.

  • allReduceCallback – Callback to perform all-reduce over ‘normSize’ batch elements.

  • normSize – Number of batch elements over which statistics are estimated.

  • debugContext – Optional debug information.

Returns

A vector pair with mean and inverse standard deviation.

poplar::Tensor normWhiten(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &mean, const poplar::Tensor &iStdDev, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Compute the whitened activations using the supplied mean and inverse standard deviation.

The input activations undergo a prior rearrangement such that C is the size of the statistics mean and iStdDev tensors.

Parameters
  • graph – The graph which the computation is in.

  • acts – The activations tensor of shape [N][C][..F..].

  • mean – Mean of the activations with dimension C.

  • iStdDev – Inverse standard deviation with dimension C.

  • prog – A program sequence that the code to perform the normalisation will be appended to.

  • debugContext – Optional debug information.

Returns

Whitened activations.

poplar::Tensor normalise(poplar::Graph &graph, const poplar::Tensor &actsWhitened, const poplar::Tensor &gamma, const poplar::Tensor &beta, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Computes the normalised output from whitened activations.

Parameters
  • graph – The graph to which the normalisaton operation is added.

  • actsWhitened – Whitened activations.

  • gamma – Per-channel multiplicative normalisation parameter.

  • beta – Per-channel additive normalisation parameter.

  • prog – A program sequence that the code to perform the normalisation will be appended to.

  • debugContext – Optional debug information.

std::pair<poplar::Tensor, poplar::Tensor> normParamGradients(poplar::Graph &graph, const poplar::Tensor &actsWhitened, const poplar::Tensor &gradsIn, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {})

Compute gradients with respect to parameters required for parameter update.

Parameters
  • graph – The graph to which the normalisaton operation is added.

  • actsWhitened – Whitened activations.

  • gradsIn – Input gradients to the normalisation layer.

  • prog – A program sequence that the code to perform the normalisation will be appended to.

  • partialsType – The intermediate type kept in the computation.

  • debugContext – Optional debug information.

poplar::Tensor normGradients(poplar::Graph &graph, const poplar::Tensor &gradsIn, const poplar::Tensor &gamma, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Propagate the gradients through the normalisation layer.

Parameters
  • graph – The graph to which the normalisaton operation is added.

  • gradsIn – Input gradients to the normalisation layer.

  • gamma – Multiplicative parameter used in the normalisation.

  • prog – A program sequence that the code to perform the normalisation will be appended to.

  • debugContext – Optional debug information.

poplar::Tensor normStatisticsGradients(poplar::Graph &graph, const poplar::Tensor &actsWhitened, const poplar::Tensor &gradsIn, const poplar::Tensor &invStdDev, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {})

Propagate the gradients through the norm statistics layer.

The input to the layer is the output gradients from the normalisation layer. The whitened activations and the input gradients must have undergone a prior rearrangement such that the channel dimension has the same elements as invStdDev.

Parameters
  • graph – The graph to which the normalisaton operation is added.

  • actsWhitened – Forward whitened activations.

  • gradsIn – Input gradients to the normalisation layer.

  • invStdDev – Inverse standard deviation from norm statistics.

  • prog – A program sequence that the code to perform the normalisation will be appended to.

  • debugContext – Optional debug information.

poplar::Tensor distributedNormStatisticsGradients(poplar::Graph &replocatedGraph, const poplar::Tensor &actsWhitened, const poplar::Tensor &gradsIn, const poplar::Tensor &invStdDev, poplar::program::Sequence &prog, poplin::DistributedNormReduceCallback reduceCallback, unsigned normSize, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {})

Propagate the gradients through the norm statistics layer where equal sized batch elements are distributed over replicas.

Each replica gets the same number of batches and norm gradients are computed over normSize batch elements. Each replica is given N batch elements. A callback does the required reduction over multiple replicas.

The input to the layer is the output gradients from the normalisation layer. The whitened activations and the input gradients must have undergone a prior rearrangement such that the channel dimension has the same elements as invStdDev.

Parameters
  • replicatedGraph – The replicated graph to which the normalisaton operation is added.

  • actsWhitened – Forward whitened activations.

  • gradsIn – Input gradients to the normalisation layer.

  • invStdDev – Inverse standard deviation from norm statistics.

  • prog – A program sequence that the code to perform the normalisation will be appended to.

  • reduceCallback – A call back to perform all reduce of the statistics gradients across the replicas.

  • normSize – The batch size over which the norm is done.

  • debugContext – Optional debug information.

poplin/TriangularSolve.hpp

Solving linear equations using triangular matrices.

namespace poplin

Linear algebra functions.

Functions

poplar::Tensor createTriangularSolveInputLHS(poplar::Graph &graph, const poplar::Type &inputType, const poplar::Type &outputType, const std::vector<std::size_t> &aShape, const std::vector<std::size_t> &bShape, bool leftSide, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options = {}, matmul::PlanningCache *cache = nullptr)

Create a tensor that is used as the left operand of triangular solve.

This will create a 2D/3D tensor in the graph. The ordering and tile mapping of the tensor will be set to make a triangular solver with this tensor as the left argument efficient.

Parameters
  • graph – The Poplar graph.

  • inputType – The input data type.

  • outputType – The data type of the returned tensor.

  • aShape – The shape of the left operand.

  • bShape – The shape of the right operand.

  • leftSide – Solve AX = B if true, XA = B overwise.

  • debugContext – Debug information.

  • options – The implementation options of the triangular solver. Supported options: ‘blockSize’ - blockSize hint See matMul() for additional options.

  • cache – Optional pointer to a planning cache to use.

Returns

A matrix of type type and shape aShape. The tensor will have been mapped to tiles.

poplar::Tensor createTriangularSolveInputRHS(poplar::Graph &graph, const poplar::Type &inputType, const poplar::Type &outputType, const std::vector<std::size_t> &aShape, const std::vector<std::size_t> &bShape, bool leftSide, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options = {}, matmul::PlanningCache *cache = nullptr)

Create a tensor that is used as the right operand of triangular solve.

This will create a 2D/3D tensor in the graph. The ordering and tile mapping of the tensor will be set to make a triangular solver with this tensor as the left argument efficient.

Parameters
  • graph – The Poplar graph.

  • inputType – The input data type.

  • outputType – The data type of the returned tensor.

  • aShape – The shape of the left operand.

  • bShape – The shape of the right operand.

  • leftSide – Solve AX = B if true, XA = B overwise.

  • debugContext – Debug information.

  • options – The implementation options of the triangular solver. Supported options: ‘blockSize’ - blockSize hint See matMul() for additional options.

  • cache – Optional pointer to a planning cache to use.

Returns

A matrix of type type and shape bShape. The tensor will have been mapped to tiles.

poplar::Tensor triangularMask(poplar::Graph &graph, const poplar::Tensor &a, bool lower, bool unitDiagonal, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Masks the unused components of the input tensor with zeroes, optionally allowing for a unit diagonal.

Parameters
  • graph – The Poplar graph.

  • a – Tensor of floating-point type with shape […, N,N].

  • lower – Whether to use the upper or lower triangle of a.

  • unitDiagonal – If true, the diagonal elements of a are assumed to be 1 and not accessed.

  • prog – A reference to a program sequence which the code to perform the arrangement will be appended to.

  • debugContext – Optional debug information.

Returns

A tensor with the same shape as a with all unused values masked.

poplar::Tensor triangularSolve(poplar::Graph &graph, const poplar::Tensor &a, const poplar::Tensor &b, bool leftSide, bool lower, bool unitDiagonal, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, matmul::PlanningCache *cache = nullptr)

Solves systems of linear equations with lower or upper triangular coefficients.

Parameters
  • graph – The Poplar graph.

  • a – Tensor of floating-point type with shape […, N,N].

  • b – Tensor of the same type with shape […, N, K] if left_side is true, […,K, N] otherwise.

  • leftSide – Solve AX = B if true, XA = B overwise.

  • lower – Use the upper or lower triangle of a.

  • unitDiagonal – If true, the diagonal elements of a are assumed to be 1 and not accessed.

  • prog – A reference to a program sequence which the code to perform the arrangement will be appended to.

  • debugContext – Optional debug information.

  • options – The implementation options of the triangular solver. Supported options: ‘blockSize’ - blockSize hint See matMul() for additional options.

  • cache – Optional pointer to a planning cache to use.

Returns

Tensor with shape of b with linear system solution.

std::vector<std::pair<MatMulParams, poplar::OptionFlags>> getTriangularSolveMatMulPrePlanParameters(const poplar::Type &inputType, const poplar::Type &outputType, const std::vector<std::size_t> &aShape, const std::vector<std::size_t> &bShape, bool leftSide, bool lower, const poplar::OptionFlags &options)

Plan matrix multiplication for given triangular solver.

Parameters
  • inputType – The data type of the lhs tensor.

  • outputType – The data type of the rhs tensor.

  • aShape – The shape of the left operand.

  • bShape – The shape of the right operand.

  • leftSide – Solve AX = B if true, XA = B overwise.

  • options – The implementation options of the triangular solver.

Returns

Mat mul preplan parameters.

Random number operations (poprand)

Functions for tensor operations using random numbers. These make use of the hardware pseudo-random number generators (PRNG) on each tile. There is a separate PRNG for each worker thread. These are designed to allow every vertex to generate a different pseudo-random sequence but also, importantly, to ensure that the same sequence can be regenerated when required.

These function have an optional seed parameter for initialising the tiles’ PRNGs. Because there is no 64-bit integer type in device code, this is passed as a tensor of two 32-bit integers. This seed value is common to an entire graph or subgraph.

A “seed modifier” parameter is also used, which enables each vertex to generate a different pseudo-random sequence from the same seed. This is ignored if the seed is not specified.

The pseudo-random sequence is determined by a combination of tile-id, thread-id, seed and seed modifier.

If a seed is provided then, at the end of the operation, the PRNG state is restored to be the same as it was before the operation.

The functions have a reference tensor as a parameter. This is used to define the layout of the output tensor in order to guarantee deterministic results when a seed is specified. It ensures that if the same seed and seed modifier values are used then the same output is obtained.

poprand/RandomGen.hpp

namespace poprand

Pseudo-random number generator (PRNG) functions.

Functions

poplar::Tensor dropout(poplar::Graph &graph, const poplar::Tensor *seed, const uint32_t seedModifier, const poplar::Tensor &input, const poplar::Tensor &reference, double keepProbability, double scale, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Apply dropout to a tensor.

The elements of tensor input are multiplied by a mask consisting of a sequence of randomly generated 1 or 0. The keep probability of the dropout P(1) = keepProbability. The contents of the mask depend on the keep probability, seed, seed modifier and layout of the reference tensor.

Parameters
  • graph – The graph to add this operation to.

  • seed – If not null, this is a pair of 32-bit integers used to seed the random number generator that generates the dropout mask.

  • seedModifier – Provides a further modification of the seed value. Ignored if seed is null.

  • input – The input tensor to be masked.

  • reference – A tensor that specifies the layout of the output tensor. Must be the same shape as the input.

  • keepProbability – The probability of keeping an input value.

  • scale – Scales the output tensor. This is typically the inverse of the dropout probability, (1 / P(1)).

  • prog – The program to add this operation to.

  • debugContext – Optional debug information.

Returns

A tensor with elements randomly set to either zero or the scaled input value.

poplar::Tensor dropout(poplar::Graph &graph, const poplar::Tensor *seed, const uint32_t seedModifier, const poplar::Tensor &input, const poplar::Tensor &reference, double keepProbability, double scale, bool outputClonesRef, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Apply dropout to a tensor.

The elements of tensor input are multiplied by a mask consisting of a sequence of randomly generated 1 or 0. The keep probability of the dropout P(1) = keepProbability. The contents of the mask depend on the keep probability, seed, seed modifier and layout of the reference tensor.

Parameters
  • graph – The graph to add this operation to.

  • seed – If not null, this is a pair of 32-bit integers used to seed the random number generator that generates the dropout mask.

  • seedModifier – Provides a further modification of the seed value. Ignored if seed is null.

  • input – The input tensor to be masked.

  • reference – A tensor that specifies the layout of the output tensor. Must be the same shape as the input.

  • keepProbability – The probability of keeping an input value.

  • scale – Scales the output tensor. This is typically the inverse of the dropout probability, (1 / P(1)).

  • outputClonesRef – When true, the output tensor is a clone of the reference tensors. When false, the output tensor is a clone of the input tensor.

  • prog – The program to add this operation to.

  • debugContext – Optional debug information.

Returns

A tensor with elements randomly set to either zero or the scaled input value.

poplar::Tensor shapedDropout(poplar::Graph &graph, const poplar::Tensor *seed, const uint32_t seedModifier, const poplar::Tensor &input, const poplar::Tensor &reference, double keepProbability, double scale, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Apply shaped dropout to a tensor.

The elements of tensor input are multiplied by a mask consisting of a sequence of randomly generated 1 or 0. The keep probability of the dropout P(1) = keepProbability.

Shaped dropout allows row, column and dimension wise dropout, versus element-wise standard dropout. The shape of the dropout must be compatible (broadcastable) to input.

The contents of the mask depend on the keep probability, seed, seed modifier and layout of the reference tensor.

Parameters
  • graph – The graph to add this operation to.

  • seed – If not null, this is a pair of 32-bit integers used to seed the random number generator that generates the dropout mask.

  • seedModifier – Provides a further modification of the seed value. Ignored if seed is null.

  • input – The input tensor to be masked.

  • reference – A tensor that specifies the shape and layout of the dropout. Must be broadcastable to the input.

  • keepProbability – The probability of keeping an input value.

  • scale – Scales the output tensor. This is typically the inverse of the dropout probability, (1 / P(1)).

  • prog – The program to add this operation to.

  • debugContext – Optional debug information.

Returns

A tensor with elements randomly set to either zero or the scaled input value.

poplar::Tensor uniform(poplar::Graph &graph, const poplar::Tensor *seed, uint32_t seedModifier, const poplar::Tensor &reference, const poplar::Type &outType, double minVal, double maxVal, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Uniform distribution in a given interval with maxVal > minVal.

Generates random data with uniform distribution in the interval [minVal, maxVal]. The output may be of type float, half or int.

For type int, data is generated in the interval [minVal, maxVal] with uniform probability if (maxVal - minVal) is a power of 2. Otherwise there will be a small bias in the probability generated, with the bias directly proportional to the ratio (maxVal - minVal + 1 ) / 2^32.

Parameters
  • graph – The graph to add this operation to.

  • seed – If not null, this is a pair of 32-bit integers used to seed the random number generator that generates the distribution.

  • seedModifier – Provides a further modification of the seed value. Ignored if seed is null.

  • reference – A tensor that specifies the layout of the output tensor.

  • outType – Type of the output tensor. One of float, half or int.

  • minVal – The minimum value of the distribution.

  • maxVal – The maximum value of the distribution.

  • prog – The program to add this operation to.

  • debugContext – Optional debug information.

Returns

A tensor with elements having a uniform distribution of random values.

poplar::Tensor logUniform(poplar::Graph &graph, const poplar::Tensor *seed, uint32_t seedModifier, const poplar::Tensor &reference, const poplar::Type &outType, double minVal, double maxVal, poplar::program::Sequence &prog, double base = M_E, const poplar::DebugContext &debugContext = {})

Log-uniform distribution over a closed interval [minVal, maxVal].

Generates random data log-uniformly distributed in the closed interval [minVal, maxVal]. The output may be of type float, half or int. The base of the log can be specified, but defaults to the natural base.

The actual interval of the samples depends on the representable values of the outType and is a subset of the initial interval; the interval will be squeezed inward to the next representable values of outType. For example, for half, the interval [2049.0, 4098.0] would be squeezed to [2050.0, 4096.0]. Depending on the interval’s representability, this may cause spikes in the distribution at the boundaries - careful choice of interval is suggested.

Parameters
  • graph – The graph to add this operation to.

  • seed – If not null, this is a pair of 32-bit integers used to seed the random number generator that generates the distribution.

  • seedModifier – Provides a further modification of the seed value. Ignored if seed is null.

  • reference – A tensor that specifies the layout of the output tensor.

  • outType – Type of the output tensor. One of float, half or int.

  • minVal – The minimum value of the distribution.

  • maxVal – The maximum value of the distribution.

  • prog – The program to add this operation to.

  • base – Optional base of the log / exponent of the underlying uniform distribution. Defaults to Euler’s number (natural base).

  • debugContext – Optional debug information.

Throws
  • poputil::poplibs_error – If minVal < 1

  • poputil::poplibs_error – If maxVal <= minVal

  • poputil::poplibs_error – If minVal and maxVal are not suitable for the outType (for example the range is too narrow)

Returns

A tensor the same size as reference with elements having a log-uniform distribution of random values of type outType.

poplar::Tensor bernoulli(poplar::Graph &graph, const poplar::Tensor *seed, uint32_t seedModifier, const poplar::Tensor &reference, const poplar::Type &outType, double prob, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Bernoulli distribution which has the value 1 with the specified probability.

Generates a tensor with random values of 0 and 1, determined by prob.

Parameters
  • graph – The graph to add this operation to.

  • seed – If not null, this is a pair of 32-bit integers used to seed the random number generator that generates the distribution.

  • seedModifier – Provides a further modification of the seed value. Ignored if seed is null.

  • reference – A tensor that specifies the layout of the output tensor.

  • outType – Type of the output tensor. One of float, half or int.

  • prob – Probability of an element being 1.

  • prog – The program to add this operation to.

  • debugContext – Optional debug information.

Returns

A tensor with elements randomly set to either zero or the scaled input value.

poplar::Tensor normal(poplar::Graph &graph, const poplar::Tensor *seed, uint32_t seedModifier, const poplar::Tensor &reference, const poplar::Type &outType, double mean, double stdDev, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Normal distribution with given mean and standard deviation.

Generates random data with a normal (Gaussian) distribution. The mean is given by mean and the standard deviation by stdDev.

Parameters
  • graph – The graph to add this operation to.

  • seed – If not null, this is a pair of 32-bit integers used to seed the random number generator that generates the distribution.

  • seedModifier – Provides a further modification of the seed value. Ignored if seed is null.

  • reference – A tensor that specifies the layout of the output tensor.

  • outType – Type of the output tensor. One of float or half.

  • mean – The mean value of the distribution.

  • stdDev – The standard deviation of the distribution.

  • prog – The program to add this operation to.

  • debugContext – Optional debug information.

Returns

A tensor with elements randomly set to either zero or the scaled input value.

poplar::Tensor truncatedNormal(poplar::Graph &graph, const poplar::Tensor *seed, uint32_t seedModifier, const poplar::Tensor &reference, const poplar::Type &outType, double mean, double stdDev, double alpha, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Truncated normal distribution.

Generates a distribution derived from a normal distribution with mean mean and standard deviation stdDev. This normal distribution is truncated symmetrically about the mean at (mean - alpha * stdDev) and (mean + alpha * stdDev)

Parameters
  • graph – The graph to add this operation to.

  • seed – If not null, this is a pair of 32-bit integers used to seed the random number generator that generates the distribution.

  • seedModifier – Provides a further modification of the seed value. Ignored if seed is null.

  • reference – A tensor that specifies the layout of the output tensor.

  • outType – Type of the output tensor. One of float or half.

  • mean – The mean value of the distribution.

  • stdDev – The standard deviation of the distribution.

  • alpha – Defines the minimum and maximum values of the distribution.

  • prog – The program to add this operation to.

  • debugContext – Optional debug information.

Returns

A tensor with elements randomly set to either zero or the scaled input value.

void setSeed(poplar::Graph &graph, const poplar::Tensor &masterSeed, uint32_t seedModifier, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Sets the random number generator seed on all tiles.

Parameters
  • graph – The graph to add this operation to.

  • masterSeed – A 64-bit integer to seed the random number on every tile.

  • seedModifier – Provides a further modification of the seed value.

  • prog – The program to add this operation to.

  • debugContext – Optional debug information.

Sparse tensor operations (popsparse)

Functions for operating on block sparse tensors. Static block and dynamic sparsity are supported.

popsparse/experimental/BlockSparse.hpp

namespace popsparse

Support for sparse matrices.

namespace experimental

Enums

enum SubBlockMask

Define the sparsity mask inside a block.

The diagonal is defined across sll the non-sparse matrix dimensions, where the row index is equal to the column index.

Values:

enumerator None

No elements are zeroed out.

enumerator ZeroUpperTriangle

Elements in the upper triangle, above the diagonal, are zeroed out.

enumerator ZeroLowerTriangle

Elements in the lower triangle, below the diagonal, are zeroed out.

Functions

poplar::Tensor bsSoftmax(poplar::Graph &graph, poplar::Tensor sparseTensor, const std::array<int, 2> &dim, const std::array<int, 2> &blockSize, const std::vector<unsigned char> &sparsity, SubBlockMask subBlockMaskType, unsigned numGroups, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

This function computes softmax on a sparse tensor.

Parameters
  • graph – The Poplar graph.

  • sparseTensor – The input sparse 2D tensor. It must be in a block-sparse format.

  • dim[0] – Number of rows of the original dense tensor.

  • dim[1] – Number of columns of the original dense tensor.

  • blockSize[0] – Block size of the rows.

  • blockSize[1] – Block size of the columns.

  • sparsity – The 2D sparsity mask for the block-sparse tensor, in which ‘1’ is a non zero block and ‘0’ is a zero block.

  • subBlockMaskType – Sub-block mask type. Elements in upper (or lower) triangle are filled by zeroes in the result.

  • numGroups – The umber of groups for group operation or 1 for non-group operation. This parameter affects sub-block mask application only. The 0 dimension of the dense representation is logically divided into groups and sub-block mask applies individually for each group.

  • prog – A reference to the program sequence to which the code to perform the softmax will be appended.

void bsSoftmaxInPlace(poplar::Graph &graph, poplar::Tensor sparseTensor, const std::array<int, 2> &dim, const std::array<int, 2> &blockSize, const std::vector<unsigned char> &sparsity, SubBlockMask subBlockMaskType, unsigned numGroups, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

This function computes softmax on a sparse tensor, in place.

Parameters
  • graph – The Poplar graph.

  • sparseTensor – The input sparse 2D tensor. It must be in a block-sparse format.

  • dim[0] – Number of rows of the original dense tensor.

  • dim[1] – Number of columns of the original dense tensor.

  • blockSize[0] – Block size of the rows.

  • blockSize[1] – Block size of the columns.

  • sparsity – The 2D sparsity mask for the block-sparse tensor, in which ‘1’ is a non zero block and ‘0’ is a zero block.

  • subBlockMaskType – Sub-block mask type. Elements in upper (or lower) triangle are filled by zeroes in the result.

  • numGroups – The umber of groups for group operation or 1 for non-group operation. This parameter affects sub-block mask application only. The 0 dimension of the dense representation is logically divided into groups and sub-block mask applies individually for each group.

  • prog – A reference to a program sequence which will be appended with the code to perform the softmax.

poplar::Tensor bsSoftmaxGrad(poplar::Graph &graph, poplar::Tensor sparseOut, poplar::Tensor sparseOutGrad, const std::array<int, 2> &dim, const std::array<int, 2> &blockSize, const std::vector<unsigned char> &sparsity, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

This function computes softmax gradient on a sparse tensor.

Parameters
  • graph – The Poplar graph

  • sparseOut – The outer (activation) sparse 2D tensor. It must be in block-sparse format.

  • sparseOutGrad – The outer gradient sparse 2D tensor. It must be in a block-sparse format.

  • dim[0] – Number of rows of the original dense tensor.

  • dim[1] – Number of columns of the original dense tensor.

  • blockSize[0] – Block size of the rows.

  • blockSize[1] – Block size of the columns.

  • sparsity – The 2D sparsity mask for the block-sparse tensor, in which ‘1’ is a non zero block and ‘0’ is a zero block.

  • prog – A reference to a program sequence which will be appended with the code to perform the softmax.

popsparse/experimental/BlockSparseMatMul.hpp

namespace popsparse

Support for sparse matrices.

namespace experimental

Functions

poplar::Tensor createBSMatMulInputLHS(poplar::Graph &graph, const BSMatMulParams &bsMatMul, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options = {})

Create a tensor for use as the left operand of block-sparse matrix multiplication.

Parameters
  • graph – The Poplar graph.

  • bsMatMul – The object for block-sparse information, includes the sparsity mask, the matrix size, the block size, and the data type.

  • debugContext – Optional debug information.

  • options – Matrix multiple options, see bsMatMul() for details.

Returns

For non-grouped BSMatMulParams object, if the left matrix is a dense matrix, the return tensor is just a regular 2D matrix. If it is a sparse matrix, the return tensor is an array of non-zero blocks. For group BSMatMulParams object, the return tensor is concatenated along 0 dimension for all matrices in a group.

poplar::Tensor createBSMatMulInputRHS(poplar::Graph &graph, const BSMatMulParams &bsMatMul, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options = {})

Create a tensor for use as the right operand of block-sparse matrix multiplication.

Parameters
  • graph – The Poplar graph.

  • bsMatMul – The object for block-sparse information, includes the sparsity mask, the matrix size, the block size, and the data type.

  • debugContext – Optional debug information.

  • options – Matrix multiple options, see bsMatMul() for details.

Returns

For non-grouped BSMatMulParams object, if the right matrix is a dense matrix, the return tensor is just a regular 2D matrix. If it is a sparse matrix, the return tensor is an array of non-zero blocks. For group BSMatMulParams object, the return tensor is concatenated along 0 dimension for all matrices in a group.

poplar::Tensor bsMatMul(poplar::Graph &graph, const BSMatMulParams &bsMatMulParams, poplar::program::Sequence &prog, const poplar::Tensor &lhsMatrix, const poplar::Tensor &rhsMatrix, const poplar::OptionFlags &options = {}, const poplar::DebugContext &debugContext = {})

This function multiplies the left-hand matrix by the right-hand matrix.

Matrix multiply options

  • numberOfPass Integer [=1]

    The number of passes used to serialise the matrix multiply.

    If this is greater than 1, the leading dimension (if the matmul shape is [MxN] x [NxK], it is M) will be divided by numberOfPass, and each sub matmul will be run in serial to reduce the temporary memory usage.

Parameters
  • graph – The Poplar graph.

  • bsMatMulParams – The object for block sparse information, includes the sparsity mask, the matrix size, the block size, and the data type.

  • prog – A reference to a program sequence which will be appended with the code to perform the multiplication.

  • lhsMatrix – If BSMatMulParams is for dense x sparse, this is the left-hand dense matrix. If BSMatMulParams is for sparse x sparse, this is the non-zero blocks of the left sparse matrix. For a group BSMatMulParams object, it should be concatenated along 0 dimension for all tensors in a group.

  • rhsMatrix – A tensor for an array of non-zero blocks in the right-hand sparse matrix. For a group BSMatMulParams object, it should be concatenated along 0 dimension for all tensors in a group.

  • options – The structure describing options for how the multiplication should be implemented.

  • debugContext – Optional debug information.

Returns

The tensor holding the result of the multiplication. This tensor will be created, added to the graph and mapped to tiles. For a group BSMatMulParams object, the return tensor is concatenated along 0 dimension for all ops in a group.

class BSMatMulParams
#include <BlockSparseMatMul.hpp>

This class supports block-sparse matrix multiplication.

The class only saves the sparsity mask, the matrix size, the block size, and the data type, which are used to generate the computation graph.

The matrix data is passed when bsMatMul() gets called.

The purpose of this design is to reuse the instance of this class when only the data of the matrix is changed, and the matrix sparsity does not change.

Public Functions

BSMatMulParams(const std::array<int, 3> &dim, const std::array<int, 3> &blockSize, const std::vector<unsigned char> &rhsSparsity, bool rhsNeedTranspose, poplar::Type inDataType, poplar::Type outDataType, poplar::Type partialDataType, unsigned numGroupsIn = 1)

This constructor is for a dense matrix (left side) multiplying a sparse matrix (right side).

Parameters
  • dim[0] – Number of rows in the left-hand matrix.

  • dim[1] – Number of columns in the left-hand matrix.

  • dim[2] – If the right matrix needs to be transposed, this is the number of rows in the right-hand matrix. Otherwise, it is number of columns in the right-hand matrix.

  • blockSize[0] – Block size of the rows in the left-hand matrix.

  • blockSize[1] – Block size of the columns in the left-hand matrix.

  • blockSize[2] – Block size of the columns in the right-hand matrix. Block size must be divisible by 16 for FP16 and divisible by 8 for FP32.

  • rhsSparsity – The 2D sparsity mask for right-hand block sparse matrix, in which ‘1’ is a non-zero block and ‘0’ is a zero block. For group operation this parameter is concatenated sparsity masks for all ops in a group.

  • rhsNeedTranspose – Whether the right-hand matrix need be transposed. This is mostly to support backward pass. If this parameter is true:

    • dim and blockSize must conform to the transposed shape.

    • rhsSparsity must be in the original, non-transposed order.

    • rhsMatrix in bsMatMul() must contain data within blocks in original, non-transposed order.

  • inDataType – Input data type.

  • outDataType – Output data type.

  • partialDataType – Partial data type.

  • numGroupsIn – The number of groups for group operation or 1 for non-group operation.

BSMatMulParams(const std::array<int, 3> &dim, const std::array<int, 3> &blockSize, const std::vector<unsigned char> &resSparsity, poplar::Type inDataType, poplar::Type outDataType, poplar::Type partialDataType, SubBlockMask subBlockMask = SubBlockMask::None, unsigned numGroupsIn = 1)

This constructor is for a dense matrix multiplying a dense matrix.

The multiply is performed as a sparse operation and the result stored as a sparse matrix.

Parameters
  • dim[0] – Number of rows in the left-hand matrix.

  • dim[1] – Number of columns in the left-hand matrix.

  • dim[2] – Number of columns in the right-hand matrix.

  • blockSize[0] – Block size of the rows in the left-hand matrix.

  • blockSize[1] – Block size of the columns in the left-hand matrix.

  • blockSize[2] – Block size of the columns in the right-hand matrix. The block size of the columns in the left-hand matrix equals the block size of the rows in the right-hand matrix. Block size must be divisible by 16 for FP16 and divisible by 8 for FP32.

  • resSparsity – The 2D sparsity mask for the result block-sparse matrix, in which ‘1’ is a non-zero block and ‘0’ is a zero block.

  • resSparsity – The 2D sparsity mask for the result block sparse matrix, in which ‘1’ is a non-zero block and ‘0’ is a zero block. For group operation this parameter is concatenated sparsity masks for all ops in a group.

  • outDataType – Output data type.

  • partialDataType – Partial data type.

  • SubBlockMask – The mask inside a block. See SubBlockMask in BlockSparse.hpp for details.

  • numGroupsIn – The number of groups for group operation or 1 for non-group operation.

BSMatMulParams(BSMatMulParams &&other)
~BSMatMulParams()

Public Members

std::unique_ptr<BSMatMulImpl> impl

Note: in the API, the sparse-weight matrix representing the parameters of the fully-connected layer per group is W, with a dense shape of [outputChannelsPerGroup, inputChannelsPerGroup].

The equivalent dense operations done for the different passes are where each multiplication is per group.

  • Fwd/Inf: Ao = W * Ai

    Where: - Ao has shape [outputChannelsPerGroup, batchSize] - Ai has shape [inputChannelsPerGroup, batchSize]

  • GradA: Gi = W’ * Go

    Where: - Go has shape [outputChannelsPerGroup, batchSize] - Gi has shape [inputChannelsPerGroup, batchSize]

  • GradW: Gw = Go * Ai

popsparse/MatMul.hpp

namespace popsparse

Support for sparse matrices.

namespace dynamic

Support for dynamic sparse matrices.

Functions

SparseTensor createSparseDenseMatMulLHS(poplar::Graph &graph, const poplar::Type &inputType, const MatMulParams &params, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Create a sparse tensor that is used as the left-hand operand in a sparse * dense matrix multiplication.

The following options are available:

  • availableMemoryProportion Decimal between 0 and 1 [=0.6]

    The maximum proportion of available memory on each tile that this layer should consume temporarily during the course of the operation.

  • metaInfoBucketOversizeProportion Decimal between 0 and 1 [=0.3]

    This specifies additional elements to allocate in each bucket of meta-information as a proportion of the required size for a perfectly uniformly distributed sparsity pattern.

  • partialsType poplar::Type [=poplar::FLOAT]

    The type to use for partial results.

  • sharedBuckets (true, false) [=true]

    If set, forces the same buckets to be used whether or not the sparse (left-hand) operand is transposed or not. Saves memory at the expense of runtime.

Parameters
  • graph – The Poplar graph.

  • inputType – The type for inputs to the operation.

  • params – Parameters for the matrix multiplication.

  • debugContext – Optional debug information.

  • options – Implementation options for the matrix multiplication.

  • cache – Optional pointer to planning cache to use.

Returns

A sparse tensor with sparse representation of left-hand operand for the matrix multiplication.

poplar::Tensor createSparseDenseMatMulRHS(poplar::Graph &graph, const poplar::Type &inputType, const MatMulParams &params, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Create a dense tensor that is used as the right-hand operand in a sparse * dense matrix multiplication.

Parameters
  • graph – The Poplar graph.

  • inputType – The type for inputs to the operation.

  • params – Parameters for the matrix multiplication.

  • debugContext – Optional debug information.

  • options – Implementation options for the matrix multiplication.

  • cache – Optional pointer to planning cache to use.

Returns

A dense tensor for use as right-hand operand for the matrix multiplication.

poplar::Tensor sparseDenseMatMul(poplar::Graph &graph, const SparseTensor &lhs, const poplar::Tensor &rhs, poplar::program::Sequence &prog, bool transposeLHS = false, bool transposeRHS = false, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Perform a sparse * dense matrix multiplication, yielding a dense result.

The sparse left-hand operand tensor is made up of meta information for the sparsity and the non-zero values of the matrix. This sparse tensor must have been created with createSparseDenseMatMulLHS.

If the sparse left-hand operand was created for the sparse equivalent of a dense matrix multiplication:

[groups][m][k] * [groups][k][n] = [groups][m][n]

Then the same sparse left-hand operand can be used to calculate the above as well as:

[groups][k][m] * [groups][m][n] = [groups][k][n]

through the use of the transposeLHS parameter. transposeRHS is also provided for convenience.

Parameters
  • graph – The Poplar graph.

  • lhs – The sparse left-hand operand to the matrix multiplication.

  • rhs – The dense right-hand operand to the matrix multiplication.

  • prog – A reference to a program sequence which will be appended with the code to perform the matrix multiplication.

  • transposeLHS – Whether or not to transpose the left-hand operand before multiplying.

  • transposeRHS – Whether or not to transpose the right-hand operand before multiplying.

  • debugContext – Optional debug information.

  • options – Implementation options for the matrix multiplication.

  • cache – Optional pointer to planning cache to use.

Returns

The tensor holding the dense result of the matrix multiplication. The tensor will be created, added to the graph, and mapped to tiles.

popsparse/MatMulParams.hpp

namespace popsparse

Support for sparse matrices.

namespace dynamic

Support for dynamic sparse matrices.

Functions

std::ostream &operator<<(std::ostream&, const MatMulParams&)
class MatMulParams

Matrix multiplication parameters

These are the parameters which define a matrix multiplication with one sparse operand (always the left-hand operand) and one dense operand.

Equivalent dense multiplication for given parameters is as follows:

[groups][m][k] * [groups][k][n] = [groups][m][n]

static MatMulParams createWithNzRatio(const SparsityParams &sparsityParams, double nzRatio, std::size_t groups, std::size_t m, std::size_t k, std::size_t n)
static MatMulParams createWithNumNonZeroValues(const SparsityParams &sparsityParams, std::size_t numNonZeroElems, std::size_t groups, std::size_t m, std::size_t k, std::size_t n)
inline const SparsityParams &getSparsityParams() const
inline std::size_t getNumGroups() const
inline std::size_t getM() const
inline std::size_t getK() const
inline std::size_t getN() const
double getNzRatio() const
std::size_t getNumNonZeroValues() const
friend bool operator<(const MatMulParams &a, const MatMulParams &b)
friend bool operator==(const MatMulParams &a, const MatMulParams &b)
friend bool operator!=(const MatMulParams &a, const MatMulParams &b)

Private Members

SparsityParams sparsityParams
double nzRatio
std::size_t groups
std::size_t m
std::size_t k
std::size_t n

popsparse/Embedding.hpp

namespace popsparse

Support for sparse matrices.

namespace dynamic

Support for dynamic sparse matrices.

Functions

poplar::Tensor createIndicesTensor(poplar::Graph &graph, const FullyConnectedParams &params, std::size_t numIndices, const poplar::OptionFlags &options = {}, const poplar::DebugContext &debugContext = {})

Create and map a tensor to contain indices for slicing/updating a tensor efficiently.

Parameters
  • graph – The Poplar graph.

  • params – Parameters for the fully connected layer which defines the embedding operation. Used to decide on layout for the indices.

  • options – Implementation options for the fully connected layer.

  • numIndices – The number of indices this tensor should contain

  • debugContext – Optional debug information.

Returns

A 1D tensor of shape [numIndices]. Element type is always UNSIGNED_INT.

poplar::Tensor createSliceTensor(poplar::Graph &graph, const poplar::Type &dataType, const FullyConnectedParams &params, std::size_t numIndices, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Create and map a tensor to be updated from efficiently.

Memory layout is based on the planned split of the sparse tensor.

Parameters
  • graph – The Poplar graph.

  • dataType – The data type of the returned tensor.

  • params – Parameters for the fully connected layer which will provide the planned memory layout for the sparse tensor being updated

  • numIndices – The number of slices this tensor should contain.

  • debugContext – Optional debug information.

  • options – Implementation options for the fully connected layer.

  • cache – Optional pointer to planning cache to use.

Returns

A 2D tensor with shape [numIndices, params.getInputChannels()] with layout optimised for slicing into/updating from.

poplar::Tensor embeddingSlice(poplar::Graph &graph, const SparseTensor &t, const poplar::Tensor &indices, poplar::program::Sequence &prog, const FullyConnectedParams &params, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Take multiple slices from a base tensor.

The returned tensor will have dimensions [offsets, k (from params)]

Parameters
  • graph – The Poplar graph.

  • t – The sparse tensor being sliced.

  • indices – The indices of rows of t to be sliced.

  • prog – The program to be extended.

  • params – Parameters for the fully connected layer which will provide the planned memory layout for the sparse tensor being sliced.

  • debugContext – Optional debug information.

  • options – Implementation options for the fully connected layer.

  • cache – Optional pointer to planning cache to use.

void embeddingUpdateAdd(poplar::Graph &graph, const SparseTensor &t, const poplar::Tensor &slices, const poplar::Tensor &indices, const poplar::Tensor &scale, poplar::program::Sequence &prog, const FullyConnectedParams &params, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Update a sparse tensor with a set of slices at the given row indices.

Parameters
  • graph – The Poplar graph.

  • t – The sparse tensor being updated.

  • slices – The slices to accumulate.

  • indices – The indices of rows of t to accumulate each slice in slices into.

  • scale – The scaling to apply to the update.

  • prog – The program to be extended.

  • params – Parameters for the fully connected layer which will provide the planned memory layout for the sparse tensor being updated

  • debugContext – Optional debug information.

  • options – Implementation options for the fully connected layer.

  • cache – Optional pointer to planning cache to use.

popsparse/FullyConnected.hpp

namespace popsparse

Support for sparse matrices.

namespace dynamic

Support for dynamic sparse matrices.

Functions

SparseTensor createFullyConnectedWeights(poplar::Graph &graph, const poplar::Type &inputType, const FullyConnectedParams &params, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Create a sparse tensor that is used as the weights W for a fully connected layer.

The following options are available:

  • availableMemoryProportion Decimal between 0 and 1 [=0.6]

    The maximum proportion of available memory on each tile that this layer should consume temporarily during the course of the operation.

  • metaInfoBucketOversizeProportion Decimal between 0 and 1 [=0.3]

    This specifies additional elements to allocate in each bucket of meta-information as a proportion of the required size for a perfectly uniformly distributed sparsity pattern.

  • doGradAPass (true, false) [=false]

    doGradWPass (true, false) [=false]

    Indicate which passes are present for the operation of the layer as a whole. It is assumed that the forward pass is always present.

  • partialsType poplar::Type [=poplar::FLOAT]

    The type to use for partial results.

  • sharedBuckets (true, false) [=true]

    If set, forces the same buckets to be used for all three passes.

Parameters
  • graph – The Poplar graph.

  • inputType – The type for inputs to the operation.

  • params – Parameters for the fully connected layer.

  • debugContext – Optional debug information.

  • options – Implementation options for the fully connected layer.

  • cache – Optional pointer to planning cache to use.

Returns

A tensor with sparse representation of weights for the fully connected layer.

poplar::Tensor createFullyConnectedInput(poplar::Graph &graph, const poplar::Type &inputType, const FullyConnectedParams &params, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Create a dense tensor that is used as the input activations for a fully connected layer.

This returned tensor is of shape [batchSize, inputChannelsPerGroup].

Parameters
  • graph – The Poplar graph.

  • inputType – The type for inputs to the operation.

  • params – Parameters for the fully connected layer.

  • debugContext – Optional debug information.

  • options – Implementation options for the fully connected layer. See createFullyConnectedWeights() for details.

  • cache – Optional pointer to planning cache to use.

poplar::Tensor fullyConnectedFwd(poplar::Graph &graph, const SparseTensor &weights, const poplar::Tensor &activations, const FullyConnectedParams &fcParams, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Run a fully connected forward (or inference) pass.

The sparse-weights tensor is made up of meta information for the sparsity and the non-zero values. Does the Fwd operation described in the Note above but with input and output transposed.

The meta information for the sparse weights tensor must be created for the forward (or inference) pass and should be created by use of the createFullyConnectedWeights() function.

Parameters
  • graph – The Poplar graph.

  • weights – Sparsity information of the weights tensor.

  • activations – The dense activations have shape [batchSize][inputChannelsPerGroup * numGroups]

  • fcParams – Fully connected layer parameters.

  • prog – A reference to a program sequence which will be appended with the code to perform the forward operation.

  • debugContext – Optional debug information.

  • options – The structure describing options on how the operation should be implemented. See createFullyConnectedWeights() for details.

  • cache – Optional pointer to planning cache to use.

Returns

The tensor holding the result. This tensor will be created, added to the graph and mapped to tiles. The result tensor is of shape [batchSize][outputChannelsPerGroup * numGroups]

poplar::Tensor fullyConnectedGradA(poplar::Graph &graph, const SparseTensor &weights, const poplar::Tensor &gradients, const FullyConnectedParams &fcParams, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Run a fully connected GradA pass.

The sparse-weights tensor is made up of meta information for the sparsity and the non-zero values. Does the GradA computation as described in the Note above but with input and output transposed.

The meta information for the sparse-weights tensor must be created for the GradA pass and should be created by use of createFullyConnectedWeights() function.

Parameters
  • graph – The Poplar graph.

  • weights – Sparsity information of the weights tensor.

  • gradients – The dense loss gradients with respect to output activations and are of shape [batchSize][outputChannelsPerGroup] .

  • fcParams – Fully connected layer parameters.

  • prog – A reference to a program sequence which will be appended with the code to perform the GradA operation.

  • debugContext – Optional debug information.

  • options – The structure describing options on how the operation should be implemented. See createFullyConnectedWeights() for details.

  • cache – Optional pointer to planning cache to use.

Returns

The tensor holding the result. This tensor will be created, added to the graph and mapped to tiles. The tensor is of shape [batchSize][inputChannelsPerGroup * numGroups]

poplar::Tensor fullyConnectedSparseGradW(poplar::Graph &graph, const poplar::Tensor sparsityMetaInfo, const poplar::Tensor &gradA, const poplar::Tensor &activations, const FullyConnectedParams &fcParams, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Run a fully connected GradW pass to compute sparse gradients.

The layout of the returned tensor is exactly as that of the representation of the weights NZ values so that any elementwise operation may be done between the two.

The actual implementation differs from that in the Note above as the transpose of the gradients and activations are supplied as parameters to this function.

Parameters
  • graph – The Poplar graph.

  • weightMetaInfo – Meta information for sparse weights. See SparseTensor representation.

  • gradA – Dense gradients wrt output activations of shape [batchSize][outputChannelsPerGroup * numGroups]

  • activations – Input activations of shape [batchSize][inputChannelsPerGroup * numGroups]

  • fcParams – Fully connected layer parameters.

  • prog – A reference to a program sequence which will be appended with the code to perform the GradW operation.

  • debugContext – Optional debug information.

  • options – The structure describing options on how the operation should be implemented. See createFullyConnectedWeights() for details.

  • cache – Optional pointer to planning cache to use.

Returns

The tensor holding the result. This tensor will be created, added to the graph and mapped to tiles.

std::tuple<unsigned, unsigned, unsigned> fullyConnectedDenseGradWSerialSplits(const poplar::Graph &graph, const poplar::Type &inputType, const FullyConnectedParams &fcParams, const poplar::OptionFlags &options_ = {}, PlanningCache *cache = nullptr)

Report the serial splitting of a dense gradW output given the memory proportion limit given in options.

A dense gradW output is of shape [numGroups][inputSize][outputSize]

Parameters
  • graph – The Poplar graph.

  • inputType – The type of input.

  • params – Fully connected params.

  • options – The structure describing options on how the operation should be implemented. See createFullyConnectedWeights() for details.

  • cache – Optional pointer to planning cache to use.

Returns

Serial splits for each of the output dimensions [numGroups][inputSize][outputSize].

popsparse/FullyConnectedParams.hpp

namespace popsparse

Support for sparse matrices.

namespace dynamic

Support for dynamic sparse matrices.

Functions

std::ostream &operator<<(std::ostream &os, const FullyConnectedParams &p)
class FullyConnectedParams

Fully connected parameters

These are the parameters which define a fully connected layer.

Matrix multiplications for the different passes are as follows

  • For pass = FC_INFERENCE or FC_TRAINING_FWD

    [numGroups][outputChannelsPerGroup][inputChannelsPerGroup] * [numGroups][inputChannelsPerGroup][batchSize]

  • For pass = FC_TRAINING_GRADA

    [numGroups][inputChannelsPerGroup][outputChannelsPerGroup] * [numGroups][outputChannelsPerGroup][batchSize]

  • For pass = FC_TRAINING_GRADW

    [numGroups][outputChannelsPerGroup][batchSize] * [numGroups][batchSize][inputChannelsPerGroup]

static FullyConnectedParams createWithNzRatio(const SparsityParams &sparsityParams, double nzRatio, std::size_t batchSize, std::size_t numGroups, std::size_t inputChannels, std::size_t outputChannels)

Create parameters with the specified ratio of non-zero elements.

static FullyConnectedParams createWithNumNonZeroValues(const SparsityParams &sparsityParams, std::size_t numNonZeroElems, std::size_t batchSize, std::size_t numGroups, std::size_t inputChannels, std::size_t outputChannels)

Create parameters with the specified number of non-zero elements.

Public Functions

inline std::size_t getBatchSize() const
inline std::size_t getNumGroups() const
inline std::size_t getInputChannels() const
inline std::size_t getOutputChannels() const
inline std::size_t getInputChannelsPerGroup() const
inline std::size_t getOutputChannelsPerGroup() const
inline const SparsityParams &getSparsityParams() const
double getNzRatio() const
std::size_t getNumNonZeroValues() const

Private Members

SparsityParams sparsityParams

Sparsity parameters.

double nzRatio

Proportion of weights which are non-zero in range [0,1].

std::size_t batchSize
std::size_t numGroups
std::size_t inputChannelsPerGroup
std::size_t outputChannelsPerGroup

Friends

friend bool operator<(const FullyConnectedParams &a, const FullyConnectedParams &b)
friend bool operator==(const FullyConnectedParams &a, const FullyConnectedParams &b)
friend bool operator!=(const FullyConnectedParams &a, const FullyConnectedParams &b)

popsparse/PlanningCache.hpp

namespace poplin

Linear algebra functions.

namespace matmul
namespace popsparse

Support for sparse matrices.

namespace dynamic

Support for dynamic sparse matrices.

class PlanningCache
#include <PlanningCache.hpp>

Class used to cache the calculation of plans for dynamically sparse operations.

This is optional and speeds up graph construction for these operations.

Public Functions

PlanningCache()
PlanningCache(poplin::matmul::PlanningCache *matMulCache)
~PlanningCache()

Public Members

std::unique_ptr<PlanningCacheImpl> impl

popsparse/SparsePartitioner.hpp

namespace popsparse

Support for sparse matrices.

namespace dynamic

Support for dynamic sparse matrices.

template<typename T>
class Partitioner
#include <SparsePartitioner.hpp>

Class to translate and encode sparsity information for a fully connected layer.

See createFullyConnectedWeights() for details of the options.

Public Functions

inline const PartitionerImpl &getImpl() const
Partitioner(const FullyConnectedParams &params, const poplar::Type &dataType, const poplar::Target &target, const poplar::OptionFlags &options, PlanningCache *cache = {}, std::string name = "")
Partitioner(const MatMulParams &params, const poplar::Type &dataType, const poplar::Target &target, const poplar::OptionFlags &options, PlanningCache *cache = {}, std::string name = "")
~Partitioner()
SparsityDataImpl<T> createSparsityDataImpl(const CSCMatrix<T> &matrix_) const

Create implementation sparsity representation for a compressed sparse columns (CSC) matrix.

SparsityDataImpl<T> createSparsityDataImpl(const CSRMatrix<T> &matrix_) const

Creates implementation sparsity representation for a compressed sparse rows (CSR) matrix.

SparsityDataImpl<T> createSparsityDataImpl(const COOMatrix<T> &matrix_) const

Creates implementation sparsity representation for a coordinate (COO) format matrix.

COOMatrix<T> sparsityDataImplToCOOMatrix(const SparsityDataImpl<T> &sparsityDataImpl) const

Create a coordinate (COO) representation matrix from implementation sparsity representation.

The COO entries are ordered by row first, and then columns.

CSRMatrix<T> sparsityDataImplToCSRMatrix(const SparsityDataImpl<T> &sparsityDataImpl) const

Create compressed sparse rows (CSR) representation from implementation sparsity representation.

CSCMatrix<T> sparsityDataImplToCSCMatrix(const SparsityDataImpl<T> &sparsityDataImpl) const

Create compressed sparse columns (CSC) representation from implementation sparsity representation.

std::array<std::vector<std::size_t>, 3> getPlanPartitions(void) const

Fetch the partitions in X, Y and Z to reveal the plan.

std::size_t getmetaInfoBucketElements(void) const

Fetch the number of elements in a meta info bucket.

Private Members

std::string name
std::unique_ptr<PartitionerImpl> impl
template<typename T>
struct SparsityDataImpl
#include <SparsePartitioner.hpp>

Encoding of sparsity representation.

Public Members

std::vector<std::size_t> metaInfo

Meta information representing sparsity for each tile.

std::vector<T> nzValues

The non-zero values of the sparse matrix.

popsparse/SparseStorageFormats.hpp

namespace popsparse

Support for sparse matrices.

struct Block

Subclassed by popsparse::COOMatrix< T >, popsparse::CSCMatrix< T >, popsparse::CSRMatrix< T >

Public Functions

inline std::size_t getNumColumnsInBlock() const
inline std::size_t getNumRowsInBlock() const
inline std::size_t getBlockSize() const
inline std::array<std::size_t, 2> getBlockDimensions() const

Protected Attributes

std::array<std::size_t, 2> blockDimensions
template<typename T>
struct COOMatrix : public popsparse::Block
#include <SparseStorageFormats.hpp>

Block Sparse matrix stored as coordinate (COO) or triplets format.

The case of element sparsity is treated as a special case with block size equal to {number of rows in block, number of columns in block} = {1, 1}.

Public Functions

inline COOMatrix(const std::vector<T> &nzValues, const std::vector<std::size_t> &columnIndices, const std::vector<std::size_t> &rowIndices, const std::array<std::size_t, 2> &blockDimensions = {1, 1})
inline COOMatrix(std::vector<T> &&nzValues, std::vector<std::size_t> &&columnIndices, std::vector<std::size_t> &&rowIndices, const std::array<std::size_t, 2> &blockDimensions = {1, 1})
inline COOMatrix(std::size_t numNZValues, const std::array<std::size_t, 2> &blockDimensions_ = {1, 1})

Constructor to allocate memory.

inline COOMatrix(const std::array<std::size_t, 2> &blockDimensions = {1, 1})
COOMatrix(const COOMatrix&) = default

Public Members

std::vector<T> nzValues

The non-zero values of the sparse matrix.

std::vector<std::size_t> columnIndices

Corresponding column indices for the non-zero values.

std::vector<std::size_t> rowIndices

Corresponding row indices for the non-zero values.

template<typename T>
struct CSCMatrix : public popsparse::Block
#include <SparseStorageFormats.hpp>

Sparse matrix stored in compressed sparse columns (CSC) format for a matrix of size [M x N].

There is no explicit encoding of M in the storage. The number of column indices is equal to (N/number of columns in block) + 1. The case of element sparsity is treated as a special case with block size equal to {number of rows in block, number of columns in block} = {1, 1}.

Public Functions

inline CSCMatrix(const std::vector<T> &nzValues, const std::vector<std::size_t> &columnIndices, const std::vector<std::size_t> &rowIndices, const std::array<std::size_t, 2> &blockDimensions = {1, 1})
inline CSCMatrix(std::vector<T> &&nzValues, std::vector<std::size_t> &&columnIndices, std::vector<std::size_t> &&rowIndices, const std::array<std::size_t, 2> &blockDimensions = {1, 1})
inline CSCMatrix(std::size_t numNZValues, std::size_t numColumns, const std::array<std::size_t, 2> &blockDimensions_ = {1, 1})

Constructor to allocate memory.

inline CSCMatrix(const std::array<std::size_t, 2> &blockDimensions_ = {1, 1})
CSCMatrix(const CSCMatrix&) = default

Public Members

std::vector<T> nzValues

The non-zero values of the sparse matrix.

The number of values is always an integer multiple of the block size.

std::vector<std::size_t> columnIndices

Indices where non-zero values for each column block starts.

There are a total of N/block size + 1 entries with the last entry equal to nzValues.

std::vector<std::size_t> rowIndices

The row index of each block in nzValues.

There are as many entries as the number of blocks in nzValues.

template<typename T>
struct CSRMatrix : public popsparse::Block
#include <SparseStorageFormats.hpp>

Sparse matrix stored in compressed sparse rows (CSR) format for a matrix of size [M x N].

There is no explicit encoding of N in the storage. The number of row indices is equal to (M / number of rows in block) + 1. The case of element sparsity is treated as a special case with block size equal to {number of rows in block, number of columns in block} = {1, 1}.

Public Functions

inline CSRMatrix(const std::vector<T> &nzValues, const std::vector<std::size_t> &columnIndices, const std::vector<std::size_t> &rowIndices, const std::array<std::size_t, 2> &blockDimensions = {1, 1})
inline CSRMatrix(std::vector<T> &&nzValues, std::vector<std::size_t> &&columnIndices, std::vector<std::size_t> &&rowIndices, const std::array<std::size_t, 2> &blockDimensions = {1, 1})
inline CSRMatrix(std::size_t numNZValues, std::size_t numRows, const std::array<std::size_t, 2> &blockDimensions_ = {1, 1})
inline CSRMatrix(const std::array<std::size_t, 2> &blockDimensions_ = {1, 1})
CSRMatrix(const CSRMatrix&) = default

Public Members

std::vector<T> nzValues

The non-zero values of the sparse matrix.

std::vector<std::size_t> columnIndices

The column index of each block in nzValues.

There are as many as blocks in nzValues.

std::vector<std::size_t> rowIndices

Indices where non-zero blocks of each row start.

There are a total of M+1 entries with the last entry equal to the number of entries in nzValues.

popsparse/SparseTensor.hpp

namespace popsparse

Support for sparse matrices.

namespace dynamic

Support for dynamic sparse matrices.

class SparseTensor
#include <SparseTensor.hpp>

Representation of a sparse tensor.

Public Functions

SparseTensor() = default
SparseTensor(const SparseTensor &t) = default
inline SparseTensor(const poplar::Tensor &metaInfo, const poplar::Tensor &nzValues, const poputil::TensorMetaData &opMetaData = {})
inline const poplar::Tensor &getMetaInfoTensor() const
inline const poplar::Tensor &getNzValuesTensor() const
inline const poputil::TensorMetaData &getOpMetaData() const

Private Members

poplar::Tensor metaInfo

Tensor containing positional sparsity information.

poplar::Tensor nzValues

Tensor contains non zero values.

poputil::TensorMetaData opMetaData

Meta-data for this tensor object.

popsparse/SparsityParams.hpp

namespace popsparse

Support for sparse matrices.

namespace dynamic

Support for dynamic sparse matrices.

Enums

enum SparsityType

Sparsity type.

Values:

enumerator Element

Sparsity is defined at an element level.

enumerator Block

Sparsity is defined at a block level.

The matrix is made up of blocks with each of these block are either all zero or not.

enum SparsityStructure

Sparsity structure.

Values:

enumerator Unstructured

Functions

std::ostream &operator<<(std::ostream &os, const SparsityType &t)
std::ostream &operator<<(std::ostream &os, const SparsityStructure &s)
struct SparsityParams

Public Functions

inline SparsityParams(SparsityType type_ = SparsityType::Element, SparsityStructure structure_ = SparsityStructure::Unstructured, std::array<std::size_t, 2> blockDimensions_ = {1, 1})
SparsityParams(const SparsityParams&) = default

Public Members

SparsityType type

sparsity type.

SparsityStructure structure

sparsity structure.

std::array<std::size_t, 2> blockDimensions

Block dimensions.

Friends

friend bool operator<(const SparsityParams &a, const SparsityParams &b)
friend bool operator==(const SparsityParams &a, const SparsityParams &b)
friend bool operator!=(const SparsityParams &a, const SparsityParams &b)
friend std::ostream &operator<<(std::ostream &os, const SparsityParams &p)

Neural network functions (popnn)

Functions used in neural networks (for example, non-linearities, pooling, loss functions).

popnn/BatchNorm.hpp

namespace popnn

Functions used in neural networks.

namespace bn

Functions

std::pair<poplar::Tensor, poplar::Tensor> batchNormStatistics(poplar::Graph &graph, const poplar::Tensor acts, float eps, poplar::program::Sequence &prog, bool unbiasedVarEstimate, bool stableAlgo = false, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Estimate mean and inverse of standard deviation of batched activations.

std::pair<poplar::Tensor, poplar::Tensor> distributedBatchNormStatistics(poplar::Graph &replicatedGraph, const poplar::Tensor acts, float eps, poplar::program::Sequence &prog, bool unbiasedVarEstimate, poplin::DistributedNormReduceCallback reduceCallback, unsigned normBatchSize, bool stableAlgo = false, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the batch normalisation statistics for a part of the activations tensor where the normBatchSize batch elements are distributed over multiple replicas.

Each replica gets equal sized batches (N). A callback does the required reduction over multiple replicas. The activations tensor is of shape [N][C][..F..]. The mean and inverse standard deviation is computed over dimensions {[N] [..F..]} and vectors of length C are returned as estimates.

Parameters
  • replicatedGraph – The replicated graph in which the computation is performed.

  • acts – The activation with shape [N][C][..F..] where:

    • N is the batch size

    • C is the number of channels

    • ..F.. is dimensions of a N-dimensional field.

  • eps – The epsilon added to the variance to avoid divide by zero.

  • prog – A program sequence that the code to perform the normalisation will be appended to.

  • unbiasedVarEstimate – Compute unbiased variance estimate.

  • stableAlgo – If true, computes the mean first and subtracts the activations by it before computing the variance. The implementation with this flag set to true is

  • partialsType – Poplar type used for partials.

  • allReduceCallback – Callback to perform all-reduce over ‘normBatchSize’ batch elements.

  • normBatchSize – Number of batch elements over which statistics are estimated.

  • debugContext – Optional debug information.

Returns

A vector pair with mean and inverse standard deviation.

poplar::Tensor batchNormWhiten(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &mean, const poplar::Tensor &invStdDev, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Whiten activations given mean and standard deviation.

std::pair<poplar::Tensor, poplar::Tensor> batchNormalise(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &gamma, const poplar::Tensor &beta, const poplar::Tensor &mean, const poplar::Tensor &invStdDev, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Batch normalise activations given mean, standard deviation and batch norm parameters.

The result is two tensors

  1. normalised activations

  2. whitened activations

poplar::Tensor batchNormalise(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &combinedMultiplicand, const poplar::Tensor &addend, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Computes the output of batch normalisation given:

  • combinedMultiplicand = gamma / stdDev

  • addend = beta - gamma * mean / stdDev

std::pair<poplar::Tensor, poplar::Tensor> batchNormParamGradients(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &gradsIn, const poplar::Tensor &mean, const poplar::Tensor &iStdDev, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute gradients w.r.t parameters required for parameter update.

std::pair<poplar::Tensor, poplar::Tensor> batchNormParamGradients(poplar::Graph &graph, const poplar::Tensor &actsWhitened, const poplar::Tensor &gradsIn, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute gradients w.r.t parameters required for parameter update.

poplar::Tensor batchNormGradients(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &gradsIn, const poplar::Tensor &mean, const poplar::Tensor &invStdDev, const poplar::Tensor &gamma, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute gradients w.r.t input activations for the batch norm layer.

i.e. gradients are propagated through the complete layer including statistics computation.

poplar::Tensor batchNormGradients(poplar::Graph &graph, const poplar::Tensor &actsWhitened, const poplar::Tensor &gradsIn, const poplar::Tensor &invStdDev, const poplar::Tensor &gamma, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute gradients w.r.t input activations for the batch norm layer.

i.e. gradients are propagated through the complete layer including statistics computation. Compute gradients w.r.t input activations for the batch norm layer. i.e. gradients are propagated through the complete layer including statistics computation.

poplar::Tensor distributedBatchNormGradients(poplar::Graph &replicatedGraph, const poplar::Tensor &actsWhitened, const poplar::Tensor &gradsIn, const poplar::Tensor &invStdDev, const poplar::Tensor &gamma, poplar::program::Sequence &prog, poplin::DistributedNormReduceCallback reduceCallback, unsigned normBatchSize, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Propagate the gradients through the batch norm layer where equal sized batch elements are distributed over replicas to effectively compute the batch norm over normBatchSize elements.

Each replica gets the same number of batches (`N’) with normBatchSize = N * number of devices A callback does the required reduction over the replicas the norm is spread over.

The input to the layer is the output gradients from the normalisation layer. The whitened activations and the input gradients must have undergone a prior rearrangement such that the channel dimension is the same as invStdDev.

Parameters
  • replicatedGraph – The replicated graph to which the normalisaton operation is added.

  • actsWhitened – Forward whitened activations.

  • gradsIn – Input gradients to the normalisation layer.

  • invStdDev – Inverse standard deviation from norm statistics.

  • gamma – Parameter gamma.

  • prog – A program sequence that the code to perform the normalisation will be appended to.

  • reduceCallback – A call back to perform all reduce of the statistics gradients.

  • normBatchSize – The batch size over which the norm is done.

  • debugContext – Optional debug information.

poplar::Tensor distributedBatchNormGradients(poplar::Graph &replicatedGraph, const poplar::Tensor &acts, const poplar::Tensor &gradsIn, const poplar::Tensor &mean, const poplar::Tensor &invStdDev, const poplar::Tensor &gamma, poplar::program::Sequence &prog, poplin::DistributedNormReduceCallback reduceCallback, unsigned normBatchSize, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Propagate the gradients through the batch norm layer where equal sized batch elements are distributed over replicas to effectively compute the batch norm over normBatchSize elements.

Each replica gets the same number of batches (`N’) with normBatchSize = N * number of replicas A callback does the required reduction over the replicas the norm is spread on.

The input to the layer is the output gradients from the normalisation layer. The activations and the input gradients must have undergone a prior rearrangement such that the channel dimension has the same elements as invStdDev. The activations are whitened within the function by applying the mean and invStdDev.

Parameters
  • replicatedGraph – The replicated graph to which the normalisaton operation is added.

  • acts – Inputs to the batch norm layer.

  • gradsIn – Input gradients to the normalisation layer.

  • mean – Estimated mean.

  • invStdDev – Inverse standard deviation from norm statistics.

  • gamma – Parameter gamma.

  • prog – A program sequence that the code to perform the normalisation will be appended to.

  • reduceCallback – A call back to perform all reduce of the statistics gradients.

  • normBatchSize – The batch size over which the norm is done.

  • debugContext – Optional debug information.

void batchNormParamUpdate(poplar::Graph &graph, const poplar::Tensor &gammaDelta, const poplar::Tensor &betaDelta, float scale, poplar::Tensor &gamma, poplar::Tensor &beta, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
void batchNormParamUpdate(poplar::Graph &graph, const poplar::Tensor &gammaDelta, const poplar::Tensor &betaDelta, const poplar::Tensor &scale, poplar::Tensor &gamma, poplar::Tensor &beta, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

popnn/CTCInference.hpp

Support for Connectionist Temporal Classification (CTC) Beam search decoder.

Support for Connectionist Temporal Classification (CTC) Beam search decoder.

namespace popnn

Functions used in neural networks.

namespace ctc_infer

Functions

ctc::Plan plan(const poplar::Graph &graph, const poplar::Type &inType, unsigned batchSize, unsigned maxTime, unsigned numClasses, unsigned beamwidth, const poplar::OptionFlags &options = {})

Create a plan for implementing the CTC Beam search inference function.

CTC Beam search inference options

  • partialsType poplar::Type [=poplar::FLOAT]

    The type to use for partial results.

  • availableMemoryProportion Decimal between 0 and 1 (inclusive) [=0.6]

    The maximum proportion of available memory on each tile that this layer should consume temporarily during the course of the operation.

Parameters
  • graph – The graph the operation will be added to

  • inType – The data type of the probability data input

  • batchSize – The size of the batch to be processed at once

  • maxTime – The maximum time of any sequence input

  • numClasses – The number of symbols/classes in the “alphabet”, including the blankClass

  • beamwidth – The number of beams to maintain during beamsearch

  • options – Any implementation/debug options for the operation

Returns

plan The plan produced, which will specify how the operation is to be implemented

poplar::Tensor createDataInput(poplar::Graph &graph, const poplar::Type &type, const std::size_t batchSize, const std::size_t maxTime, const std::size_t numClasses, const ctc::Plan &plan, const poplar::DebugContext &debugContext = {})

Create and map a data input [maxTime, batchSize, numClasses] tensor which the beam search function will use.

Mapping is according to the plan provided.

Parameters
  • graph – The graph the data tensor will be added to

  • type – The data type of the tensor to be added to the graph

  • batchSize – The size of the batch to be processed at once

  • maxTime – The time dimension of the tensor to be created

  • numClasses – The number of symbols/classes in the “alphabet”, including the blankClass

  • plan – The plan which will specify how the tensor is to be mapped

  • debugContext – Optional debug information

Returns

The data input [maxTime, batchSize, numClasses] tensor

std::tuple<poplar::Tensor, poplar::Tensor, poplar::Tensor> beamSearchDecoderLogProbabilities(poplar::Graph &graph, const poplar::Tensor &logProbs, const poplar::Tensor &dataLengths, poplar::program::Sequence &prog, unsigned blankClass, unsigned beamwidth, unsigned topPaths, const ctc::Plan &plan, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Calculate the most likely topPaths labels and their probabilities given the input logProbs with lengths dataLengths, creating and mapping the result tensors according to the plan provided.

Parameters
  • graph – The graph the operation will be added to

  • logProbs – The data input [maxTime, batchSize, numClasses] tensor

  • dataLengths – A tensor of shape [batchSize] containing the number of valid timesteps in each logProbs batch entry

  • prog – A program sequence to append the operation to

  • blankClass – The value associated with the blankClass

  • beamWidth – The number of beams to use when decoding

  • topPaths – The number of most likely decoded paths to return, must be less than or equal to beamWidth

  • plan – The plan which will specify how the output tensor is to be mapped and how the operation is to be carried out

  • debugContext – Optional debug information

  • options – Any implementation/debug options for the operation

Returns

The labelProbs[batchSize, topPaths] (negative log probability with the same type as logProbs), labelLengths[batchSize, topPaths] and decodedLabels [batchSize, topPaths, maxTime] tensors

std::tuple<poplar::Tensor, poplar::Tensor, poplar::Tensor> beamSearchDecoderLogits(poplar::Graph &graph, const poplar::Tensor &logits, const poplar::Tensor &dataLengths, poplar::program::Sequence &prog, unsigned blankClass, unsigned beamwidth, unsigned topPaths, const ctc::Plan &plan, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Calculate the most likely topPaths labels and their probabilities given the input logits with lengths dataLengths, creating and mapping the result tensors according to the plan provided.

Prior to performing the beam search, applies log softmax to logits input.

Parameters
  • graph – The graph the operation will be added to

  • logits – The data input [maxTime, batchSize, numClasses] tensor

  • dataLengths – A tensor of shape [batchSize] containing the number of valid timesteps in each logits batch entry

  • prog – A program sequence to append the operation to

  • blankClass – The value associated with the blankClass

  • beamWidth – The number of beams to use when decoding

  • topPaths – The number of most likely decoded paths to return, must be less than or equal to beamWidth

  • plan – The plan which will specify how the output tensor is to be mapped and how the operation is to be carried out

  • debugContext – Optional debug information

  • options – Any implementation/debug options for the operation

Returns

The labelProbs[batchSize, topPaths] (negative log probability with the same type as logits), labelLengths[batchSize, topPaths] and decodedLabels [batchSize, topPaths, maxTime] tensors

popnn/CTCLoss.hpp

Support for Connectionist Temporal Classification (CTC) Loss.

Support for Connectionist Temporal Classification (CTC) Loss.

namespace popnn

Functions used in neural networks.

namespace ctc

Functions

Plan plan(const poplar::Graph &graph, const poplar::Type &inType, const poplar::Type &outType, unsigned batchSize, unsigned maxTime, unsigned maxLabelLength, unsigned numClasses, const poplar::OptionFlags &options = {})

Create a plan for implementing the CTC Loss (gradient) function.

CTC Loss options

  • partialsType poplar::Type [=poplar::FLOAT]

    The type to use for partial results.

  • availableMemoryProportion Decimal between 0 and 1 (inclusive) [=0.6]

    The maximum proportion of available memory on each tile that this layer should consume temporarily during the course of the operation.

Parameters
  • graph – The graph the operation will be added to

  • inType – The data type of the probability data input

  • outType – The data type of the gradient output

  • batchSize – The size of the batch to be processed at once

  • maxTime – The maximum time of any data input to be planned for

  • maxLabelLength – The maximum length of any label to be planned for

  • numClasses – The number of symbols/classes in the “alphabet”, including the blankClass

  • options – Any implementation/debug options for the operation

Returns

plan The plan produced, which will specify how the operation is to be implemented

poplar::Tensor createDataInput(poplar::Graph &graph, const poplar::Type &type, const std::size_t batchSize, const std::size_t maxTime, const std::size_t numClasses, const Plan &plan, const poplar::DebugContext &debugContext = {})

Create and map a data input [maxTime, batchSize, numClasses] tensor which the gradient function will use.

Mapping is according to the plan provided.

Parameters
  • graph – The graph the data tensor will be added to

  • type – The data type of the tensor to be added to the graph

  • batchSize – The size of the batch to be processed at once

  • maxTime – The time dimension of the tensor to be created

  • numClasses – The number of symbols/classes in the “alphabet”, including the blankClass

  • plan – The plan which will specify how the tensor is to be mapped

  • debugContext – Optional debug information

Returns

The data input [maxTime, batchSize, numClasses] tensor

poplar::Tensor createLabelsInput(poplar::Graph &graph, const poplar::Type &type, const std::size_t batchSize, const std::size_t maxLabelLength, const Plan &plan, const poplar::DebugContext &debugContext = {})

Create and map a labels input [batchSize, maxLabelLength] tensor which the gradient function will use.

Mapping is according to the plan provided.

Parameters
  • graph – The graph the labels tensor will be added to

  • type – The data type of the tensor to be added to the graph

  • batchSize – The size of the batch to be processed at once

  • maxLabelLength – The maximum length of any label

  • plan – The plan which will specify how the tensor is to be mapped

  • debugContext – Optional debug information

Returns

The labels input [batchSize, maxLabelLength] tensor

std::pair<poplar::Tensor, poplar::Tensor> calcLossAndGradientLogProbabilities(poplar::Graph &graph, const poplar::Type &outType, const poplar::Tensor &logProbs, const poplar::Tensor &labels, const poplar::Tensor &dataLengths, const poplar::Tensor &labelLengths, poplar::program::Sequence &prog, const unsigned blankClass, const Plan &plan, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Calculate the CTC loss & gradient, creating and mapping the result tensor according to the plan provided.

calcLossAndGradientLogProbabilities options

  • includeSoftmaxGradient (true, false) [=true]

    Whether or not to include LogSoftmax in gradient calculation. To avoid numerical issues, it is recommended to be included. But care must be taken to not include gradient of the LogSoftmax (created external to this function call) twice.

Parameters
  • graph – The graph the operation will be added to

  • outType – The data type of the gradient output

  • logProbs – The data input [maxTime, batchSize, numClasses] tensor

  • labels – The labels input [batchSize, maxLabelLength] tensor

  • dataLengths – A tensor of shape [batchSize] containing the number of valid timesteps in each data[] batch entry

  • labelLengths – A tensor of shape [batchSize] containing the number of valid labels in each labels[] batch entry

  • prog – A program sequence to append the operation to

  • blankClass – The value associated with the blankClass

  • plan – The plan which will specify how the output tensor is to be mapped and how the operation is to be carried out

  • debugContext – Optional debug information

  • options – Any implementation/debug options for the operation

Returns

The loss[batchSize] (negative log probability), and gradient [maxTime, batchSize, numClasses] tensor

std::pair<poplar::Tensor, poplar::Tensor> calcLossAndGradientLogits(poplar::Graph &graph, const poplar::Type &outType, const poplar::Tensor &logits, const poplar::Tensor &labels, const poplar::Tensor &dataLengths, const poplar::Tensor &labelLengths, poplar::program::Sequence &prog, const unsigned blankClass, const Plan &plan, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Calculate the CTC loss & gradient, creating and mapping the result tensor according to the plan provided.

Prior to performing the gradient calculation, applies log softmax to logits input.

Parameters
  • graph – The graph the operation will be added to

  • outType – The data type of the gradient output

  • logits – The data input [maxTime, batchSize, numClasses] tensor

  • labels – The labels input [batchSize, maxLabelLength] tensor

  • dataLengths – A tensor of shape [batchSize] containing the number of valid timesteps in each data[] batch entry

  • labelLengths – A tensor of shape [batchSize] containing the number of valid labels in each labels[] batch entry

  • prog – A program sequence to append the operation to

  • blankClass – The value associated with the blankClass

  • plan – The plan which will specify how the output tensor is to be mapped and how the operation is to be carried out

  • debugContext – Optional debug information

  • options – Any implementation/debug options for the operation

Returns

The loss[batchSize] (negative log probability), and gradient [maxTime, batchSize, numClasses] tensor

popnn/CTCPlan.hpp

Support for planning Connectionist Temporal Classification (CTC) Operations.

Support for planning Connectionist Temporal Classification (CTC) Operations.

namespace popnn

Functions used in neural networks.

namespace ctc

Functions

bool operator<(const Plan &a, const Plan &b)
bool operator==(const Plan &a, const Plan &b)
bool operator!=(const Plan &a, const Plan &b)
class Plan
#include <CTCPlan.hpp>

An object representing a plan that describes how to map tensors and implement the CTC Loss or CTC Inference functions.

Public Functions

Plan()
~Plan()
Plan(const Plan &other)
Plan(Plan &&other)
Plan &operator=(const Plan &other)
Plan &operator=(Plan &&other)
inline Impl &getImpl() const
Plan(std::unique_ptr<Impl> impl)

Private Members

std::unique_ptr<Impl> impl

Friends

friend bool operator<(const Plan &a, const Plan &b)
friend bool operator==(const Plan &a, const Plan &b)
friend std::ostream &operator<<(std::ostream &o, const Plan &p)
friend poplar::ProfileValue toProfileValue(const Plan &p)

popnn/GroupNorm.hpp

namespace popnn

Functions used in neural networks.

namespace gn

Functions

std::pair<poplar::Tensor, poplar::Tensor> groupNormStatistics(poplar::Graph &graph, const poplar::Tensor acts, float eps, poplar::program::Sequence &prog, unsigned numGroups, bool unbiasedVarEstimate, bool stableAlgo = false, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Estimate mean and inverse of standard deviation of activations.

poplar::Tensor groupNormWhiten(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &mean, const poplar::Tensor &invStdDev, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Whiten activations given mean and standard deviation.

std::pair<poplar::Tensor, poplar::Tensor> groupNormalise(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &gamma, const poplar::Tensor &beta, const poplar::Tensor &mean, const poplar::Tensor &invStdDev, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Group normalise activations given mean, standard deviation and batch norm parameters.

The result is two tensors

  1. normalised activations

  2. whitened activations

std::pair<poplar::Tensor, poplar::Tensor> groupNormParamGradients(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &gradsIn, const poplar::Tensor &mean, const poplar::Tensor &iStdDev, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute gradients w.r.t parameters for parameter update.

std::pair<poplar::Tensor, poplar::Tensor> groupNormParamGradients(poplar::Graph &graph, const poplar::Tensor &actsWhitened, const poplar::Tensor &gradsIn, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute gradients w.r.t parameters for parameter update.

poplar::Tensor groupNormGradients(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &gradsIn, const poplar::Tensor &mean, const poplar::Tensor &invStdDev, const poplar::Tensor &gamma, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute gradients w.r.t input activations for the group norm layer.

Gradients are propagated through the complete layer including statistics computation.

poplar::Tensor groupNormGradients(poplar::Graph &graph, const poplar::Tensor &actsWhitened, const poplar::Tensor &gradsIn, const poplar::Tensor &invStdDev, const poplar::Tensor &gamma, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute gradients w.r.t input activations for the group norm layer.

Gradients are propagated through the complete layer including statistics computation.

void groupNormParamUpdate(poplar::Graph &graph, const poplar::Tensor &gammaDelta, const poplar::Tensor &betaDelta, float scale, poplar::Tensor &gamma, poplar::Tensor &beta, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
void groupNormParamUpdate(poplar::Graph &graph, const poplar::Tensor &gammaDelta, const poplar::Tensor &betaDelta, const poplar::Tensor &scale, poplar::Tensor &gamma, poplar::Tensor &beta, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

popnn/Gru.hpp

namespace popnn

Functions used in neural networks.

namespace gru

Functions

const std::vector<BasicGruCellUnit> getDefaultBasicGruCellOrder()

Get the default order of the gates in a basic GRU cell.

The default order is: [Reset gate, Update gate, Candidate].

uint64_t getBasicGruCellFwdFlops(const GruParams &params)
uint64_t getBasicGruCellBwdFlops(const GruParams &params)
uint64_t getBasicGruCellWuFlops(const GruParams &params)
poplar::Tensor createInput(poplar::Graph &graph, const GruParams &params, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options = {}, poplin::matmul::PlanningCache *planningCache = nullptr)

Create an input tensor of shape [numSteps, batchSize, inputSize] which is optimally mapped to multiply the whole input sequence in a single matrix multiply operation.

GRU options

  • availableMemoryProportion Decimal between 0 and 1 (inclusive)

    See createWeights().

  • inferenceOnly (true, false) [=true]

    Sets convolution pass to INFERENCE_FWD if true; TRAINING_FWD otherwise. See createWeights().

  • partialsType (half, float) [=float]

    See createWeights().

Parameters
  • graph – Graph object to add the tensor to.

  • params – The GRU parameters.

  • debugContext – Optional debug information.

  • options – Any implementation/debug options for the GRU.

  • planningCache – A poplin matrix multiply planning cache.

Returns

A tensor created in the graph of shape [timeSteps, batchSize, inputSize].

poplar::Tensor createInitialState(poplar::Graph &graph, const GruParams &params, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options, poplin::matmul::PlanningCache *cache)
std::pair<poplar::Tensor, poplar::Tensor> createWeightsKernel(poplar::Graph &graph, const GruParams &params, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options = {}, poplin::matmul::PlanningCache *planningCache = nullptr)

Create the weights kernel used to weight the input and output of a GRU.

Returns the inputWeights and outputWeights.

poplar::Tensor createWeightsBiases(poplar::Graph &graph, const GruParams &params, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options = {}, poplin::matmul::PlanningCache *planningCache = nullptr)

Create the weights biases.

GruWeights createWeights(poplar::Graph &graph, const GruParams &params, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options = {}, poplin::matmul::PlanningCache *planningCache = nullptr)

Create the weights (both kernel and biases) used to weight the input and output of a GRU.

poplar::Tensor createAttention(poplar::Graph &graph, const GruParams &params, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options = {})

Create attention tensor for AUGRU.

poplar::Tensor gruFwd(poplar::Graph &graph, const GruParams &params, const poplar::Tensor &stateInit, const poplar::Tensor &in, const GruWeights &weights, poplar::Tensor *intermediates, poplar::program::Sequence &fwdProg, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, poplin::matmul::PlanningCache *planningCache = nullptr)

Calculate the result of applying a GRU across a sequence.

The following are the formulas for a GRU cell:

  • r_t = sigmoid(w_r * x_t + u_r * h_t-1 + b_r)

  • u_t = sigmoid(w_u * x_t + u_u * h_t-1 + b_u)

  • c_t = tanh(w_c * x_t + u_c * (r_t x h_t-1) + b_c)

  • h_t = u_t x h_t-1 + (1 - u_t) x c_t

Where:

  • * is matrix multiplication

  • x is Hadamard product

The GRU is run for seqSize steps each with a batch of size batchSize and input size inputSize and output size outputSize. The total number of units within each GRU cell is BASIC_GRU_CELL_NUM_UNITS.

Parameters
  • graph – Graph to which the GRU cell belongs.

  • params – The parameters of the GRU.

  • stateInit – Initial state for the GRU.

  • in – The input tensor to the GRU of dimension [timesteps, batch, inputSize].

  • weights – The GRU weights structure.

  • intermediates[out] Intermediate results that are retained in the forward pass of training for use in the backward pass. It includes the data for reset gate, update gate, candidate, and output if outputFullSequence is false. This argument should be set to null if we are only doing inference.

  • fwdProg – Program sequence.

  • debugContext – Optional debug information.

  • options – GRU implementation options. See createInput().

  • planningCache – The matmul planning cache.

Returns

The output of the GRU. Depending on the outputFullSequence parameter the output tensor is either the output of the last timestep in the shape [batch, outputSize] or it is the sequence of outputs for every timestep in the shape [timesteps, batch, outputSize].

poplar::Tensor gruFwd(poplar::Graph &graph, const GruParams &params, const poplar::Tensor &stateInit, const poplar::Tensor &in, const poplar::Tensor &realTimeSteps, const GruWeights &weights, poplar::Tensor *intermediates, poplar::program::Sequence &fwdProg, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, poplin::matmul::PlanningCache *planningCache = nullptr)

Deprecated:

deprecated Use previously defined popnn::gruFwd() instead.

Calculate the result of applying a GRU across a sequence.

The following are the formulas for a GRU cell:

  • r_t = sigmoid(w_r * x_t + u_r * h_t-1 + b_r)

  • u_t = sigmoid(w_u * x_t + u_u * h_t-1 + b_u)

  • c_t = tanh(w_c * x_t + u_c * (r_t x h_t-1) + b_c)

  • h_t = u_t x h_t-1 + (1 - u_t) x c_t

Where:

  • * is matrix multiplication

  • x is Hadamard product

The GRU is run for seqSize steps each with a batch of size batchSize and input size inputSize and output size outputSize. The total number of units within each GRU cell is BASIC_GRU_CELL_NUM_UNITS.

Parameters
  • graph – Graph to which the GRU cell belongs.

  • params – The parameters of the GRU.

  • stateInit – Initial state for the GRU.

  • in – The input tensor to the GRU of dimension [timesteps, batch, inputSize].

  • realTimeSteps – The tensor contain real Time steps each seq, shape : [batch].

  • weights – The GRU weights structure.

  • intermediates[out] Intermediate results that are retained in the forward pass of training for use in the backward pass. It includes the data for reset gate, update gate, candidate, and output if outputFullSequence is false. This argument should be set to null if we are only doing inference.

  • fwdProg – Program sequence.

  • debugContext – Optional debug information.

  • options – GRU implementation options. See createInput().

  • planningCache – The matmul planning cache.

Returns

The output of the GRU. Depending on the outputFullSequence parameter the output tensor is either the output of the last timestep in the shape [batch, outputSize] or it is the sequence of outputs for every timestep in the shape [timesteps, batch, outputSize].

poplar::Tensor auGruFwd(poplar::Graph &graph, const GruParams &params, const poplar::Tensor &stateInit, const poplar::Tensor &in, const GruWeights &weights, poplar::Tensor *intermediates, const poplar::Tensor &attScores, poplar::program::Sequence &fwdProg, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, poplin::matmul::PlanningCache *planningCache = nullptr)

Calculate the result of applying a AUGRU across a sequence.

The following are the formulas for a AUGRU cell:

  • r_t = sigmod(w_r * x_t + u_r * h_t-1 + b_r)

  • u_t = sigmod(w_u * x_t + u_u * h_t-1 + b_u)

  • c_t = tanh(w_c * x_t + u_c * (r_t x h_t-1) + b_c)

  • u_t = (1 - a_t) * u_t

  • h_t = u_t x h_t-1 + (1 - u_t) x c_t

Where:

  • * is matrix multiplication

  • x is Hadamard product

  • a_t is a scalar

The AUGRU is run for seqSize steps each with a batch of size batchSize and input size inputSize and output size outputSize. The total number of units within each AUGRU cell is BASIC_GRU_CELL_NUM_UNITS.

Parameters
  • graph – Graph to which the AUGRU cell belongs.

  • params – The parameters of the AUGRU.

  • stateInit – Initial state for the AUGRU.

  • in – The input tensor to the AUGRU of dimension [timesteps, batch, inputSize].

  • weights – The AUGRU weights structure.

  • intermediates[out] Intermediate results that are retained in the forward pass of training for use in the backward pass. It includes the data for reset gate, update gate, candidate, and output if outputFullSequence is false. This argument should be set to null if we are only doing inference.

  • attScores – Attention for each time step.

  • fwdProg – Program sequence.

  • debugContext – Optional debug information.

  • options – GRU implementation options. See createInput().

  • planningCache – The matmul planning cache.

Returns

The output of the GRU. Depending on the outputFullSequence parameter the output tensor is either the output of the last timestep in the shape [batch, outputSize] or it is the sequence of outputs for every timestep in the shape [timesteps, batch, outputSize].

poplar::Tensor auGruFwd(poplar::Graph &graph, const GruParams &params, const poplar::Tensor &stateInit, const poplar::Tensor &in, const poplar::Tensor &realTimeSteps, const GruWeights &weights, poplar::Tensor *intermediates, const poplar::Tensor &attScores, poplar::program::Sequence &fwdProg, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, poplin::matmul::PlanningCache *planningCache = nullptr)

Deprecated:

deprecated Use previously defined popnn::auGruFwd() instead.

Calculate the result of applying a AUGRU across a sequence.

The following are the formulas for a AUGRU cell:

  • r_t = sigmod(w_r * x_t + u_r * h_t-1 + b_r)

  • u_t = sigmod(w_u * x_t + u_u * h_t-1 + b_u)

  • c_t = tanh(w_c * x_t + u_c * (r_t x h_t-1) + b_c)

  • u_t = (1 - a_t) * u_t

  • h_t = u_t x h_t-1 + (1 - u_t) x c_t

Where:

  • * is matrix multiplication

  • x is Hadamard product

  • a_t is a scalar

The AUGRU is run for seqSize steps each with a batch of size batchSize and input size inputSize and output size outputSize. The total number of units within each AUGRU cell is BASIC_GRU_CELL_NUM_UNITS.

Parameters
  • graph – Graph to which the AUGRU cell belongs.

  • params – The parameters of the AUGRU.

  • stateInit – Initial state for the AUGRU.

  • in – The input tensor to the AUGRU of dimension [timesteps, batch, inputSize].

  • realTimeSteps – The tensor contain real Time steps each seq, shape: [batch].

  • weights – The AUGRU weights structure.

  • intermediates[out] Intermediate results that are retained in the forward pass of training for use in the backward pass. It includes the data for reset gate, update gate, candidate, and output if outputFullSequence is false. This argument should be set to null if we are only doing inference.

  • attScores – Attention for each time step.

  • fwdProg – Program sequence.

  • debugContext – Optional debug information.

  • options – GRU implementation options. See createInput().

  • planningCache – The matmul planning cache.

Returns

The output of the GRU. Depending on the outputFullSequence parameter the output tensor is either the output of the last timestep in the shape [batch, outputSize] or it is the sequence of outputs for every timestep in the shape [timesteps, batch, outputSize].

poplar::Tensor gruBwd(poplar::Graph &graph, const GruParams &params, poplar::program::Sequence &prog, const poplar::Tensor &fwdOutputInit, const poplar::Tensor &fwdIntermediatesSeq, const GruWeights &weights, const poplar::Tensor &fwdInputSeq, const poplar::Tensor &fwdOutput, const poplar::Tensor &gradLayerNext, poplar::Tensor *inputGrad, poplar::Tensor *bwdIntermediates, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options_, poplin::matmul::PlanningCache *planningCache)

Run GRU backward pass.

The backward pass executes in reverse order compared to the forward pass. If the forward steps for a GRU layer are sf = {0, 1, 2, …, S - 1} then the backward steps run for sb = {S - 1, S - 2, …. , 1, 0}.

Parameters
  • graph – Graph to which the GRU cell belongs.

  • params – The parameters of the GRU.

  • prog – Program sequence.

  • fwdOutputInit – Forward output tensor for initial step.

  • fwdIntermediatesSeq – Intermediates results from the forward pass.

  • weights – The GRU weights structure.

  • fwdInputSeq – The input tensor to the GRU of shape: [timesteps, batch, inputSize]

  • fwdOutput – The output tensor from the forward pass. Depending on the outputFullSequence parameter this is either the output for the last timestep or it is a sequence of outputs for each timestep.

  • gradLayerNext – The gradients of the output. Depending on the outputFullSequence parameter this is either the gradient of the output for the last timestep or it is a sequence output gradients for each timestep.

  • *inputGrad[out] The gradients of the inputs - may be null if this information is not required.

  • *bwdIntermediates[out] Intermediates gradients that are retained in the backward pass of training for use in the weight update. It includes the derivatives for reset gate, update gate, and candidate. This argument should be set to null if you do not need to calculate weight deltas.

  • debugContext – Optional debug information.

  • options – GRU implementation options. See createInput().

  • planningCache – The matmul planning cache.

Returns

The gradient of the initial output.

poplar::Tensor gruBwd(poplar::Graph &graph, const GruParams &params, poplar::program::Sequence &prog, const poplar::Tensor &fwdOutputInit, const poplar::Tensor &fwdIntermediatesSeq, const GruWeights &weights, const poplar::Tensor &fwdInputSeq, const poplar::Tensor &realTimeSteps, const poplar::Tensor &fwdOutput, const poplar::Tensor &gradLayerNext, poplar::Tensor *inputGrad, poplar::Tensor *bwdIntermediates, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options_, poplin::matmul::PlanningCache *planningCache)

Deprecated:

deprecated Use previously defined popnn::gruBwd() instead.

Run GRU backward pass. The backward pass executes in reverse order compared to the forward pass. If the forward steps for a GRU layer are sf = {0, 1, 2, …, S - 1} then the backward steps run for sb = {S - 1, S - 2, …. , 1, 0}.

Parameters
  • graph – Graph to which the GRU cell belongs.

  • params – The parameters of the GRU.

  • prog – Program sequence.

  • fwdOutputInit – Forward output tensor for initial step.

  • fwdIntermediatesSeq – Intermediates results from the forward pass.

  • weights – The GRU weights structure.

  • realTimeSteps – The tensor contain real Time steps each seq, shape : [batch].

  • fwdInputSeq – The input tensor to the GRU of shape: [timesteps, batch, inputSize]

  • fwdOutput – The output tensor from the forward pass. Depending on the outputFullSequence parameter this is either the output for the last timestep or it is a sequence of outputs for each timestep.

  • gradLayerNext – The gradients of the output. Depending on the outputFullSequence parameter this is either the gradient of the output for the last timestep or it is a sequence output gradients for each timestep.

  • *inputGrad[out] The gradients of the inputs - may be null if this information is not required.

  • *bwdIntermediates[out] Intermediates gradients that are retained in the backward pass of training for use in the weight update. It includes the derivatives for reset gate, update gate, and candidate. This argument should be set to null if you do not need to calculate weight deltas.

  • debugContext – Optional debug information.

  • options – GRU implementation options. See createInput().

  • planningCache – The matmul planning cache.

Returns

The gradient of the initial output.

poplar::Tensor auGruBwd(poplar::Graph &graph, const GruParams &params, poplar::program::Sequence &prog, const poplar::Tensor &fwdOutputInit, const poplar::Tensor &fwdIntermediatesSeq, const GruWeights &weights, const poplar::Tensor &fwdInputSeq, const poplar::Tensor &fwdOutput, const poplar::Tensor &gradLayerNext, poplar::Tensor *inputGrad, poplar::Tensor *bwdIntermediates, const poplar::Tensor &attentions, poplar::Tensor *attentionsGrad, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options_, poplin::matmul::PlanningCache *planningCache)

Run AUGRU backward pass.

The backward pass executes in reverse order compared to the forward pass. If the forward steps for a AUGRU layer are sf = {0, 1, 2, …, S - 1} then the backward steps run for sb = {S - 1, S - 2, …. , 1, 0}.

Parameters
  • graph – Graph to which the AUGRU cell belongs.

  • params – The parameters of the AUGRU.

  • prog – Program sequence.

  • fwdOutputInit – Forward output tensor for initial step.

  • fwdIntermediatesSeq – Intermediates results from the forward pass.

  • weights – The AUGRU weights structure.

  • fwdInputSeq – The input tensor to the AUGRU of shape: [timesteps, batch, inputSize]

  • fwdOutput – The output tensor from the forward pass. Depending on the outputFullSequence parameter this is either the output for the last timestep or it is a sequence of outputs for each timestep.

  • gradLayerNext – The gradients of the output. Depending on the outputFullSequence parameter this is either the gradient of the output for the last timestep or it is a sequence output gradients for each timestep.

  • *inputGrad[out] The gradients of the inputs - may be null if this information is not required.

  • *bwdIntermediates[out] Intermediates gradients that are retained in the backward pass of training for use in the weight update. It includes the derivatives for reset gate, update gate, and candidate. This argument should be set to null if you do not need to calculate weight deltas.

  • attentions – Attentions for each time step.

  • attentionsGrad[out] Gradients for attentions.

  • debugContext – Optional debug information.

  • options – GRU implementation options. See createInput().

  • planningCache – The matmul planning cache.

Returns

The gradient of the initial output.

poplar::Tensor auGruBwd(poplar::Graph &graph, const GruParams &params, poplar::program::Sequence &prog, const poplar::Tensor &fwdOutputInit, const poplar::Tensor &fwdIntermediatesSeq, const GruWeights &weights, const poplar::Tensor &fwdInputSeq, const poplar::Tensor &realTimeSteps, const poplar::Tensor &fwdOutput, const poplar::Tensor &gradLayerNext, poplar::Tensor *inputGrad, poplar::Tensor *bwdIntermediates, const poplar::Tensor &attentions, poplar::Tensor *attentionsGrad, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options_, poplin::matmul::PlanningCache *planningCache)

Deprecated:

deprecated Use previously defined popnn::auGruBwd() instead.

Run AUGRU backward pass. The backward pass executes in reverse order compared to the forward pass. If the forward steps for a AUGRU layer are sf = {0, 1, 2, …, S - 1} then the backward steps run for sb = {S - 1, S - 2, …. , 1, 0}.

Parameters
  • graph – Graph to which the AUGRU cell belongs.

  • params – The parameters of the AUGRU.

  • prog – Program sequence.

  • fwdOutputInit – Forward output tensor for initial step.

  • fwdIntermediatesSeq – Intermediates results from the forward pass.

  • weights – The AUGRU weights structure.

  • fwdInputSeq – The input tensor to the AUGRU of shape: [timesteps, batch, inputSize]

  • realTimeSteps – The tensor contain real Time steps each seq, shape: [batch].

  • fwdOutput – The output tensor from the forward pass. Depending on the outputFullSequence parameter this is either the output for the last timestep or it is a sequence of outputs for each timestep.

  • gradLayerNext – The gradients of the output. Depending on the outputFullSequence parameter this is either the gradient of the output for the last timestep or it is a sequence output gradients for each timestep.

  • *inputGrad[out] The gradients of the inputs - may be null if this information is not required.

  • *bwdIntermediates[out] Intermediates gradients that are retained in the backward pass of training for use in the weight update. It includes the derivatives for reset gate, update gate, and candidate. This argument should be set to null if you do not need to calculate weight deltas.

  • attentions – Attentions for each time step.

  • attentionsGrad[out] Gradients for attentions.

  • debugContext – Optional debug information.

  • options – GRU implementation options. See createInput().

  • planningCache – The matmul planning cache.

Returns

The gradient of the initial output.

GruWeights gruWU(poplar::Graph &graph, const GruParams &params, poplar::program::Sequence &prog, const poplar::Tensor &fwdOutputInit, const poplar::Tensor &fwdIntermediates, const poplar::Tensor &bwdIntermediates, const GruWeights &weights, const poplar::Tensor &input, const poplar::Tensor &output, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options_, poplin::matmul::PlanningCache *planningCache)

Run a standalone weight update pass.

Takes intermediates and gradients from the backward pass and calculates and returns weight deltas.

Note: If the time step limit is variable, the entries above the given time step limit must be explicitly set to zero in fwdIntermediates, in order for the weights to be correctly updated.

Parameters
  • graph – Graph to which the GRU cell belongs.

  • params – The parameters of the GRU.

  • prog – Program sequence to add operations to.

  • fwdOutputInit – Forward output tensor for initial step.

  • fwdIntermediates – Intermediate results from the forward pass.

  • bwdIntermediates – Intermediate results from the backward pass.

  • weights – The GRU weights structure.

  • input – The input tensor to the GRU of shape: [timesteps, batch, inputSize]

  • output – The output tensor from the forward pass. Depending on the outputFullSequence parameter this is either the output for the last timestep or it is a sequence of outputs for each timestep.

  • debugContext – Optional debug information.

  • options – GRU implementation options. See createInput().

  • planningCache – The matmul planning cache.

Returns

A set of weight gradients to sum with weights.

GruWeights auGruWU(poplar::Graph &graph, const GruParams &params, poplar::program::Sequence &prog, const poplar::Tensor &fwdOutputInit, const poplar::Tensor &fwdIntermediates, const poplar::Tensor &bwdIntermediates, const GruWeights &weights, const poplar::Tensor &input, const poplar::Tensor &output, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options_, poplin::matmul::PlanningCache *planningCache)

Run a standalone weight update pass.

Takes intermediates and gradients from the backward pass and calculates and returns weight deltas.

Note: If the time step limit is variable, the entries above the given time step limit must be explicitly set to zero in fwdIntermediates, in order for the weights to be correctly updated.

Parameters
  • graph – Graph to which the GRU cell belongs.

  • params – The parameters of the GRU.

  • prog – Program sequence to add operations to.

  • fwdOutputInit – Forward output tensor for initial step.

  • fwdIntermediates – Intermediate results from the forward pass.

  • bwdIntermediates – Intermediate results from the backward pass.

  • weights – The GRU weights structure.

  • input – The input tensor to the GRU of shape: [timesteps, batch, inputSize]

  • output – The output tensor from the forward pass. Depending on the outputFullSequence parameter this is either the output for the last timestep or it is a sequence of outputs for each timestep.

  • debugContext – Optional debug information.

  • options – GRU implementation options. See createInput().

  • planningCache – The matmul planning cache.

Returns

A set of weight gradients to sum with weights.

poplar::Tensor gruBwdWithWU(poplar::Graph &graph, const GruParams &params, poplar::program::Sequence &prog, const poplar::Tensor &fwdOutputInit, const poplar::Tensor &fwdIntermediates, const GruWeights &weights, const poplar::Tensor &input, const poplar::Tensor &output, const poplar::Tensor &outputGrad, poplar::Tensor *inputGrad, GruWeights &weightsGrad, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options_, poplin::matmul::PlanningCache *planningCache)

Run a combined GRU backward and weight update pass.

Use this combined backward and weight update pass in preference to gruBwd and gruWU separately in order to allow the most efficient implementation to be chosen if you do not need to split the operation.

Note: If the time step limit is variable, the entries above the given time step limit must be explicitly set to zero in fwdIntermediates, in order for the weights to be correctly updated.

Parameters
  • graph – Graph to which the GRU cell belongs.

  • params – The parameters of the GRU.

  • prog – Program sequence.

  • fwdOutputInit – Forward output tensor for initial step.

  • fwdIntermediates – Intermediates results from the forward pass.

  • weights – The GRU weights structure.

  • input – The input tensor to the GRU of shape: [timesteps, batch, inputSize]

  • output – The output tensor from the forward pass. Depending on the outputFullSequence parameter this is either the output for the last timestep or it is a sequence of outputs for each timestep.

  • outputGrad – The gradients of the output. Depending on the outputFullSequence parameter this is either the gradient of the output for the last timestep or it is a sequence output gradients for each timestep.

  • *inputGrad[out] The gradients of the inputs - may be null if this information is not required.

  • weightsGrad – A set of weight deltas to sum with weights.

  • debugContext – Optional debug information.

  • options – GRU implementation options. See createInput().

  • planningCache – The matmul planning cache.

Returns

The gradient of the initial output.

poplar::Tensor gruBwdWithWU(poplar::Graph &graph, const GruParams &params, poplar::program::Sequence &prog, const poplar::Tensor &fwdOutputInit, const poplar::Tensor &fwdIntermediates, const GruWeights &weights, const poplar::Tensor &input, const poplar::Tensor &realTimeSteps, const poplar::Tensor &output, const poplar::Tensor &outputGrad, poplar::Tensor *inputGrad, GruWeights &weightsGrad, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options_, poplin::matmul::PlanningCache *planningCache)

Deprecated:

deprecated Use previously defined popnn::gruBwdWithWU() instead.

Run a combined GRU backward and weight update pass. Use this combined backward and weight update pass in preference to gruBwd and gruWU separately in order to allow the most efficient implementation to be chosen if you do not need to split the operation.

Note: If the time step limit is variable, the entries above the given time step limit must be explicitly set to zero in fwdIntermediates, in order for the weights to be correctly updated.

Parameters
  • graph – Graph to which the GRU cell belongs.

  • params – The parameters of the GRU.

  • prog – Program sequence.

  • fwdOutputInit – Forward output tensor for initial step.

  • fwdIntermediates – Intermediates results from the forward pass.

  • weights – The GRU weights structure.

  • input – The input tensor to the GRU of shape: [timesteps, batch, inputSize]

  • realTimeSteps – The tensor contain real Time steps each seq, shape: [batch].

  • output – The output tensor from the forward pass. Depending on the outputFullSequence parameter this is either the output for the last timestep or it is a sequence of outputs for each timestep.

  • outputGrad – The gradients of the output. Depending on the outputFullSequence parameter this is either the gradient of the output for the last timestep or it is a sequence output gradients for each timestep.

  • *inputGrad[out] The gradients of the inputs - may be null if this information is not required.

  • weightsGrad – A set of weight deltas to sum with weights.

  • debugContext – Optional debug information.

  • options – GRU implementation options. See createInput().

  • planningCache – The matmul planning cache.

Returns

The gradient of the initial output.

poplar::Tensor auGruBwdWithWU(poplar::Graph &graph, const GruParams &params, poplar::program::Sequence &prog, const poplar::Tensor &fwdOutputInit, const poplar::Tensor &fwdIntermediates, const GruWeights &weights, const poplar::Tensor &input, const poplar::Tensor &output, const poplar::Tensor &outputGrad, poplar::Tensor *inputGrad, GruWeights &weightsGrad, const poplar::Tensor &attentions, poplar::Tensor *attentionsGrad, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options_, poplin::matmul::PlanningCache *planningCache)

Run a combined AUGRU backward and weight update pass.

Use this combined backward and weight update pass in preference to augruBwd and augruWU separately in order to allow the most efficient implementation to be chosen if you do not need to split the operation.

Note: If the time step limit is variable, the entries above the given time step limit must be explicitly set to zero in fwdIntermediates, in order for the weights to be correctly updated.

Parameters
  • graph – Graph to which the GRU cell belongs.

  • params – The parameters of the GRU.

  • prog – Program sequence.

  • fwdOutputInit – Forward output tensor for initial step.

  • fwdIntermediates – Intermediates results from the forward pass.

  • weights – The GRU weights structure.

  • input – The input tensor to the GRU of shape: [timesteps, batch, inputSize]

  • output – The output tensor from the forward pass. Depending on the outputFullSequence parameter this is either the output for the last timestep or it is a sequence of outputs for each timestep.

  • outputGrad – The gradients of the output. Depending on the outputFullSequence parameter this is either the gradient of the output for the last timestep or it is a sequence output gradients for each timestep.

  • *inputGrad[out] The gradients of the inputs - may be null if this information is not required.

  • weightsGrad – A set of weight deltas to sum with weights.

  • attentions – Attention for each time step.

  • attentionsGrad[out] Gradients for attentions.

  • debugContext – Optional debug information.

  • options – GRU implementation options. See createInput().

  • planningCache – The matmul planning cache.

Returns

The gradient of the initial output.

poplar::Tensor auGruBwdWithWU(poplar::Graph &graph, const GruParams &params, poplar::program::Sequence &prog, const poplar::Tensor &fwdOutputInit, const poplar::Tensor &fwdIntermediates, const GruWeights &weights, const poplar::Tensor &input, const poplar::Tensor &realTimeSteps, const poplar::Tensor &output, const poplar::Tensor &outputGrad, poplar::Tensor *inputGrad, GruWeights &weightsGrad, const poplar::Tensor &attentions, poplar::Tensor *attentionsGrad, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options_, poplin::matmul::PlanningCache *planningCache)

Deprecated:

deprecated Use previously defined popnn::auGruBwdWithWU() instead.

Run a combined AUGRU backward and weight update pass. Use this combined backward and weight update pass in preference to augruBwd and augruWU separately in order to allow the most efficient implementation to be chosen if you do not need to split the operation.

Note: If the time step limit is variable, the entries above the given time step limit must be explicitly set to zero in fwdIntermediates, in order for the weights to be correctly updated.

Parameters
  • graph – Graph to which the GRU cell belongs.

  • params – The parameters of the GRU.

  • prog – Program sequence.

  • fwdOutputInit – Forward output tensor for initial step.

  • fwdIntermediates – Intermediates results from the forward pass.

  • weights – The GRU weights structure.

  • input – The input tensor to the GRU of shape: [timesteps, batch, inputSize]

  • realTimeSteps – The tensor contain real Time steps each seq, shape: [batch].

  • output – The output tensor from the forward pass. Depending on the outputFullSequence parameter this is either the output for the last timestep or it is a sequence of outputs for each timestep.

  • outputGrad – The gradients of the output. Depending on the outputFullSequence parameter this is either the gradient of the output for the last timestep or it is a sequence output gradients for each timestep.

  • *inputGrad[out] The gradients of the inputs - may be null if this information is not required.

  • weightsGrad – A set of weight deltas to sum with weights.

  • attentions – Attention for each time step.

  • attentionsGrad[out] Gradients for attentions.

  • debugContext – Optional debug information.

  • options – GRU implementation options. See createInput().

  • planningCache – The matmul planning cache.

Returns

The gradient of the initial output.

struct GruParams
#include <Gru.hpp>

Structure representing the parameters of the GRU.

Public Functions

GruParams(poplar::Type dataType, std::size_t batchSize, std::size_t timeSteps, std::vector<std::size_t> layerSizes, NonLinearityType activation = NonLinearityType::TANH, NonLinearityType recurrentActivation = NonLinearityType::SIGMOID)
GruParams(poplar::Type dataType, std::size_t batchSize, std::size_t maxTimeSteps, const poplar::Tensor &timeSteps, std::vector<std::size_t> layerSizes, NonLinearityType activation = NonLinearityType::TANH, NonLinearityType recurrentActivation = NonLinearityType::SIGMOID)
GruParams(const GruParams &other)

Public Members

rnn::RnnParams rnn
poplar::Type dataType

Deprecated:

Use rnn.dataType instead.

std::size_t batchSize

The batch size.

Deprecated:

Use rnn.batchSize instead.

std::size_t timeSteps

The number of time steps in the sequence of the GRU.

Deprecated:

Use rnn.timeSteps instead.

std::vector<std::size_t> layerSizes

The number of neurons for the input and output layer.

Deprecated:

Use rnn.layerSizes instead.

bool outputFullSequence = true

If true the GRU function returns the entire sequence of outputs, otherwise it returns just the final output.

bool calcInputGradients = true

If this parameter is set to false then the GRU will skip the calculation of the gradients of the inputs.

std::vector<BasicGruCellUnit> cellOrder = getDefaultBasicGruCellOrder()

The weight and bias tensors are concatenated tensors in terms of which gates they service.

This option allows the user to specify the order of the gates in that outermost dimension. The default order is: [Reset gate, Update gate, Candidate].

bool resetAfter = false

Controls whether the reset gate is applied before or after the candidate weights and biases.

NonLinearityType activation = NonLinearityType::TANH

Activation function.

NonLinearityType recurrentActivation = NonLinearityType::SIGMOID

Recurrent activation function.

struct GruWeights
#include <Gru.hpp>

Structure holding all the parameters of a GRU cell, or the deltas for those parameters (depending on the context).

Public Members

poplar::Tensor inputWeights
poplar::Tensor outputWeights
poplar::Tensor biases

popnn/GruDef.hpp

Enums

enum BasicGruCellUnit

The units within a basic GRU cell.

In general all of these require a weight matrix, a bias and a non-linearity. Typically, a fixed type of non-linearity is associated with each type of unit.

Values:

enumerator BASIC_GRU_CELL_RESET_GATE = 0
enumerator BASIC_GRU_CELL_UPDATE_GATE = 1
enumerator BASIC_GRU_CELL_CANDIDATE = 2
enumerator BASIC_GRU_CELL_NUM_UNITS = 3

popnn/InstanceNorm.hpp

namespace popnn

Functions used in neural networks.

namespace in

Functions

inline std::pair<poplar::Tensor, poplar::Tensor> instanceNormStatistics(poplar::Graph &graph, const poplar::Tensor acts, float eps, poplar::program::Sequence &prog, bool unbiasedVarEstimate, bool stableAlgo, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Estimate mean and inverse of standard deviation of activations.

inline poplar::Tensor instanceNormWhiten(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &mean, const poplar::Tensor &invStdDev, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Whiten activations given mean and standard deviation.

inline std::pair<poplar::Tensor, poplar::Tensor> instanceNormalise(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &gamma, const poplar::Tensor &beta, const poplar::Tensor &mean, const poplar::Tensor &invStdDev, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Instance normalise activations given mean, standard deviation and norm parameters.

The result is two tensors

  1. normalised activations

  2. whitened activations

inline std::pair<poplar::Tensor, poplar::Tensor> instanceNormParamGradients(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &gradsIn, const poplar::Tensor &mean, const poplar::Tensor &iStdDev, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute gradients w.r.t parameters for parameter update.

inline std::pair<poplar::Tensor, poplar::Tensor> instanceNormParamGradients(poplar::Graph &graph, const poplar::Tensor &actsWhitened, const poplar::Tensor &gradsIn, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute gradients w.r.t parameters for parameter update.

inline poplar::Tensor instanceNormGradients(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &gradsIn, const poplar::Tensor &mean, const poplar::Tensor &invStdDev, const poplar::Tensor &gamma, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute gradients w.r.t input activations for the instance norm layer.

Gradients are propagated through the complete layer including statistics computation.

inline poplar::Tensor instanceNormGradients(poplar::Graph &graph, const poplar::Tensor &actsWhitened, const poplar::Tensor &gradsIn, const poplar::Tensor &invStdDev, const poplar::Tensor &gamma, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute gradients w.r.t input activations for the instance norm layer.

Gradients are propagated through the complete layer including statistics computation.

inline void instanceNormParamUpdate(poplar::Graph &graph, const poplar::Tensor &gammaDelta, const poplar::Tensor &betaDelta, float scale, poplar::Tensor &gamma, poplar::Tensor &beta, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update parameters given gradients w.r.t. parameters.

inline void instanceNormParamUpdate(poplar::Graph &graph, const poplar::Tensor &gammaDelta, const poplar::Tensor &betaDelta, const poplar::Tensor &scale, poplar::Tensor &gamma, poplar::Tensor &beta, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
uint64_t getFwdFlops(uint64_t numChannels, uint64_t actsPerChannel, bool computeEstimates)

In flop computation, the following applies:

  • Acts per channel:

    • for fc layers: the total number of batches.

    • for conv layers: the field size per channel * batch size.

  • Number of channels:

    • for fc layers: the total number of activations in a batch.

    • for conv layers: the total number of channels.

uint64_t getBwdFlops(uint64_t numChannels, uint64_t actsPerChannel)
uint64_t getWuFlops(uint64_t numChannels, uint64_t actsPerChannel)

popnn/LayerNorm.hpp

namespace popnn

Functions used in neural networks.

namespace ln

Functions

inline std::pair<poplar::Tensor, poplar::Tensor> layerNormStatistics(poplar::Graph &graph, const poplar::Tensor acts, float eps, poplar::program::Sequence &prog, bool unbiasedVarEstimate, bool stableAlgo = false, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Estimate mean and inverse of standard deviation of activations.

inline poplar::Tensor layerNormWhiten(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &mean, const poplar::Tensor &invStdDev, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Whiten activations given mean and standard deviation.

inline std::pair<poplar::Tensor, poplar::Tensor> layerNormalise(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &gamma, const poplar::Tensor &beta, const poplar::Tensor &mean, const poplar::Tensor &invStdDev, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Layer normalise activations given mean, standard deviation and norm parameters.

The result is two tensors:

  1. normalised activations

  2. whitened activations

inline std::pair<poplar::Tensor, poplar::Tensor> layerNormParamGradients(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &gradsIn, const poplar::Tensor &mean, const poplar::Tensor &iStdDev, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute gradients w.r.t parameters for parameter update.

inline std::pair<poplar::Tensor, poplar::Tensor> layerNormParamGradients(poplar::Graph &graph, const poplar::Tensor &actsWhitened, const poplar::Tensor &gradsIn, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute gradients w.r.t parameters for parameter update.

inline poplar::Tensor layerNormGradients(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &gradsIn, const poplar::Tensor &mean, const poplar::Tensor &invStdDev, const poplar::Tensor &gamma, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute gradients w.r.t input activations for the layer norm layer.

Gradients are propagated through the complete layer including statistics computation.

inline poplar::Tensor layerNormGradients(poplar::Graph &graph, const poplar::Tensor &actsWhitened, const poplar::Tensor &gradsIn, const poplar::Tensor &invStdDev, const poplar::Tensor &gamma, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute gradients w.r.t input activations for the layer norm layer.

Gradients are propagated through the complete layer including statistics computation.

inline void layerNormParamUpdate(poplar::Graph &graph, const poplar::Tensor &gammaDelta, const poplar::Tensor &betaDelta, float scale, poplar::Tensor &gamma, poplar::Tensor &beta, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update layer norm parameters given the gradients w.r.t. parameters.

inline void layerNormParamUpdate(poplar::Graph &graph, const poplar::Tensor &gammaDelta, const poplar::Tensor &betaDelta, const poplar::Tensor &scale, poplar::Tensor &gamma, poplar::Tensor &beta, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

popnn/LogSoftmax.hpp

namespace popnn

Functions used in neural networks.

Functions

void logSoftmaxInPlace(poplar::Graph &graph, poplar::Tensor t, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Update tensor t by computing log of softmax in-place.

Parameters
  • graph – The graph to add the operation to.

  • t – The tensor to apply the log of softmax to.

  • prog – The sequence to add the operation to.

  • debugContext – Optional debug information.

poplar::Tensor logSoftmax(poplar::Graph &graph, poplar::Tensor t, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Compute the log of the softmax to tensor t and return the result.

Parameters
  • graph – The graph to add the operation to.

  • t – The tensor to apply the non-linearity to.

  • prog – The sequence to add the operation to.

  • debugContext – Optional debug information.

Returns

A new tensor containing the contents of t with the given log of the softmax applied.

popnn/Loss.hpp

namespace popnn

Functions used in neural networks.

Enums

enum LossType

Values:

enumerator SUM_SQUARED_LOSS
enumerator CROSS_ENTROPY_LOSS

Functions

poplar::program::Program calcLoss(poplar::Graph &graph, const poplar::Tensor &modelOutputs, const poplar::Tensor &expected, const poplar::Tensor &loss, const poplar::Tensor &deltas, const poplar::Tensor &deltasScale, const poplar::Tensor &modelOutputScaling, LossType lossType, const poplar::DebugContext &debugContext = {})

Calculate loss and gradient for a set of activations and expected labels.

Parameters
  • graph – Graph to add operations and tensors to.

  • modelOutputs – 2D tensor of model outputs per-batch to calculate loss for.

  • expected – One-hot encoded tensor (Labels per-batch) with the same number of rows as modelOutputs. Elements of the expected labels may be masked by using MASKED_LABEL_CODE. Such labels will not contribute to loss calculation.

  • loss – 1D Tensor to store the loss per-batch. Has the same number of rows as modelOutputs.

  • deltas – 2D Tensor to store deltas for each activation from the expected per-batch. Has the same dimensions as modelOutputs.

  • deltasScale – Optional Tensor to scale output deltas with when the lossType is CROSS_ENTROPY_LOSS. Scaling will be deltasScale / modelOutputScaling. If no tensor is specified a default will be created initialised with 1.0.

  • modelOutputScaling – Optional Tensor indicating the scaling of the modelOutputs when lossType is CROSS_ENTROPY_LOSS, normally from a softMax layer when the nonLinearity used is SOFTMAX_SCALED. If no tensor is specified a default will be created initialised with 1.0.

  • lossType – Method for calculating loss measurement.

  • debugContext – Optional debug information.

poplar::program::Program calcLoss(poplar::Graph &graph, const poplar::Tensor &modelOutputs, const poplar::Tensor &expected, const poplar::Tensor &loss, const poplar::Tensor &deltas, LossType lossType, const poplar::DebugContext &debugContext = {})
poplar::program::Program calcLoss(poplar::Graph &graph, const poplar::Tensor &modelOutputs, const poplar::Tensor &expected, const poplar::Tensor &loss, const poplar::Tensor &deltas, const poplar::Tensor &deltasScale, const poplar::Tensor &modelOutputScaling, const poplar::Tensor &numCorrect, LossType lossType, const poplar::DebugContext &debugContext = {})

Calculate loss, gradient, and number of correct classifications per-batch for a set of activations and expected labels.

Elements of the expected labels may be masked by using MASKED_LABEL_CODE. Such labels will not contribute to the accuracy and loss calculation.

See also

calcLoss, and calcAccuracy which this function is simply a combination of.

poplar::program::Program calcLoss(poplar::Graph &graph, const poplar::Tensor &modelOutputs, const poplar::Tensor &expected, const poplar::Tensor &loss, const poplar::Tensor &deltas, const poplar::Tensor &numCorrect, LossType lossType, const poplar::DebugContext &debugContext = {})
poplar::program::Program calcAccuracy(poplar::Graph &graph, const poplar::Tensor &modelOutputs, const poplar::Tensor &expected, const poplar::Tensor &numCorrect, const poplar::DebugContext &debugContext = {})

Calculate the number of correct classifications for a set of activations and expected labels.

Parameters
  • graph – Graph to add operations and tensors to.

  • modelOutputs – 2D tensor of model outputs per-batch to calculate loss for.

  • expected – Labels per-batch. Elements of the expected labels may be masked by using MASKED_LABEL_CODE. Such labels will not contribute to the accuracy calculation.

  • numCorrect – Tensor to store the number of correct classifications. Must be scalar, or single-element Tensor.

  • activationType – Device type used for activations.

  • expectedType – Device type used for expected labels.

  • debugContext – Optional debug information.

poplar::Tensor argMax(poplar::Graph &graph, const poplar::Tensor &input, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Compute argmax for each of the outer dimensions of input tensor.

If input is a tensor of dim [y][x] then argmax is computed over x elements for each of the y outer dimension elements

Parameters
  • graph – Graph to add operations and tensors to.

  • input – 2D tensor of inputs

  • prog – Program to which the graph for this operation is added

  • debugContext – Optional debug information.

std::pair<poplar::Tensor, poplar::Tensor> maxAndArgMax(poplar::Graph &graph, const poplar::Tensor &input, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Compute max and argmax for each of the outer dimensions of input tensor.

If input is a tensor of dim [y][x] then max and argmax is computed over x elements for each of the y outer dimension elements

Parameters
  • graph – Graph to add operations and tensors to.

  • input – 2D tensor of inputs

  • prog – Program to which the graph for this operation is added

  • debugContext – Optional debug information.

poplar::Tensor argMin(poplar::Graph &graph, const poplar::Tensor &input, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Compute argmin for each of the outer dimensions of input tensor.

If input is a tensor of dim [y][x] then argmin is computed over x elements for each of the y outer dimension elements

Parameters
  • graph – Graph to add operations and tensors to.

  • input – 2D tensor of inputs

  • prog – Program to which the graph for this operation is added

  • debugContext – Optional debug information.

std::pair<poplar::Tensor, poplar::Tensor> minAndArgMin(poplar::Graph &graph, const poplar::Tensor &input, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Compute min and argmin for each of the outer dimensions of input tensor.

If input is a tensor of dim [y][x] then argmin is computed over x elements for each of the y outer dimension elements

Parameters
  • graph – Graph to add operations and tensors to.

  • input – 2D tensor of inputs

  • prog – Program to which the graph for this operation is added

  • debugContext – Optional debug information.

poplar::Tensor topK(poplar::Graph &graph, const poplar::Tensor &input, poplar::Tensor &indices, unsigned K, bool sort, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Find the top K elements of |input|.

Takes a 2D tensor in the form of [batch][values] and will return a tensor in the shape of [batch][K] where K is the max values of each batch of values.

Parameters
  • graph – Graph to add operations and tensors to.

  • input – 2D tensor of inputs

  • indices – The tensor to store the indices in.

  • K – The number of values to return.

  • sort – If true values will be sorted in descending order.

  • prog – Program to which the graph for this operation is added

  • debugContext – Optional debug information.

popnn/Lstm.hpp

namespace popnn

Functions used in neural networks.

namespace lstm

Functions

const std::vector<BasicLstmCellUnit> getDefaultBasicLstmCellOrder()

Get the default order of the gates in a basic LSTM cell.

The default order is: [Forget gate, Input gate, Candidate, Output Gate].

std::vector<std::pair<poplin::MatMulParams, poplar::OptionFlags>> getMatMulPrePlanParameters(LstmParams params, poplar::OptionFlags opts)

Predict what matrix multiplications will be needed for the given parameters and return list of corresponding matmul parameters and options.

uint64_t getBasicLstmCellFwdFlops(const LstmParams &params)
uint64_t getBasicLstmCellBwdFlops(const LstmParams &params)
uint64_t getBasicLstmCellWuFlops(const LstmParams &params)
poplar::Tensor createInput(poplar::Graph &graph, const LstmParams &params, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options = {}, poplin::matmul::PlanningCache *planningCache = nullptr)

Create an input tensor of shape {numSteps, batchSize, inputSize} which is optimally mapped to multiply the whole input sequence in a single matrix multiply operation.

LSTM options

  • availableMemoryProportion Decimal between 0 and 1 (inclusive)

    See poplin::createWeights().

  • inferenceOnly (true, false) [=false]

    Sets convolution pass to INFERENCE_FWD if true; TRAINING_FWD otherwise. See poplin::createWeights().

  • partialsType (half, float) [=float]

    See poplin::createWeights().

  • weightAccumulatorsType (half, float) [=data type of lstm]

    Data type of the weight acccumulators for the lstms weight matrices and biases

  • preCalcWeights (true, false) [=false]

    If true, use one big matrix multiply before the recurrent calculation to perform the part of the calculation that only depends on the input sequence.

  • recomputationMode (none, cellAndTanh, full) [=none]

    • none: No recomputation in the backwards pass.

    • cellAndTanh: Small amount of recomputation in the backwards pass, yielding some reduction in memory footprint for the layer.

    • full: Recompute everything from the forward pass. Saves the most memory at the cost of an extra forward pass of cycles.

Parameters
  • graph – Graph object.

  • params – The LSTM parameters.

  • debugContext – Debug information.

  • options – Any implementation/debug options for the LSTM.

  • planningCache – A poplin matrix multiply planning cache.

Returns

A tensor created in the graph of shape {timeSteps, batchSize, inputSize}.

poplar::Tensor createInitialOutput(poplar::Graph &graph, const LstmParams &params, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options = {}, poplin::matmul::PlanningCache *planningCache = nullptr)

Create the initial output that can be combined with the initial cell state using a LstmState.

This then can be fed into the LSTM call at the first timestep.

Parameters
  • graph – Graph object.

  • params – The LSTM parameters.

  • debugContext – Debug information.

  • options – Any implementation/debug options for the LSTM. See createInput().

  • planningCache – A poplin matrix multiply planning cache.

Returns

A tensor which is the cell state for the forward operation of the LSTM cell.

poplar::Tensor createInitialCellState(poplar::Graph &graph, const LstmParams &params, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options = {}, poplin::matmul::PlanningCache *planningCache = nullptr)

Create the initial cell state that can be combined with the initial output using a LstmState.

This then can be fed into the LSTM call at the first timestep.

Parameters
  • graph – Graph object.

  • params – The LSTM parameters.

  • debugContext – Debug information.

  • options – Any implementation/debug options for the LSTM. See createInput().

  • planningCache – A poplin matrix multiply planning cache.

Returns

A tensor which is the cell state for the forward operation of the LSTM cell.

LstmState createInitialState(poplar::Graph &graph, const LstmParams &params, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options = {}, poplin::matmul::PlanningCache *planningCache = nullptr)

Creates the initial state (both output and cellState) that is fed into the LSTM call at the first timestep.

It can be initialised by writing the appropriate member or using zeroInitialState().

Parameters
  • graph – Graph object.

  • params – The LSTM parameters.

  • debugContext – Debug information.

  • options – Any implementation/debug options for the LSTM. See createInput().

  • planningCache – A poplin matrix multiply planning cache.

Returns

A tensor which is the state for the forward operation of the LSTM cell.

void zeroInitialState(poplar::Graph &graph, const LstmState &initialState, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Initialize the forward state of an LSTM with zeros.

Parameters
  • graph – Graph object.

  • initialState – The initial state to zero.

  • prog – The program to extend with the initialization code

  • debugContext – Optional debug information.

std::pair<poplar::Tensor, poplar::Tensor> createWeightsKernel(poplar::Graph &graph, const LstmParams &params, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options = {}, poplin::matmul::PlanningCache *planningCache = nullptr)

Create the weights kernel used to weight the input of an LSTM.

Returns the inputWeights and outputWeights.

poplar::Tensor createWeightsBiases(poplar::Graph &graph, const LstmParams &params, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options = {}, poplin::matmul::PlanningCache *planningCache = nullptr)

Create the weights biases.

LstmWeights createWeights(poplar::Graph &graph, const LstmParams &params, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options = {}, poplin::matmul::PlanningCache *planningCache = nullptr)

Create the weights (both kernel and biases) used to weight the input of an LSTM.

poplar::Tensor lstmFwd(poplar::Graph &graph, const LstmParams &params, poplar::program::Sequence &prog, const LstmState &stateInit, const LstmWeights &weights, const poplar::Tensor &in, poplar::Tensor *intermediates, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, poplin::matmul::PlanningCache *planningCache = nullptr)

Calculate the result of applying an LSTM across a sequence.

The LSTM is run for seqSize steps each with a batch of size batchSize and input size inputSize and output size outputSize. The total number of units within each LSTM cell is lstmUnits = BASIC_LSTM_CELL_NUM_UNITS.

Parameters
  • graph – Graph to which the LSTM cell belongs.

  • params – The parameters of the LSTM.

  • prog – Program sequence to add operations to.

  • stateInit – Initial state for the LSTM.

  • weights – The LSTM weights structure.

  • in – The input tensor to the LSTM of dimension [timesteps, batch, inputSize].

  • intermediates[out] Intermediate results that are retained in the the forward pass of training for use in the backward pass. This argument should be set to null if we are only doing inference.

  • debugContext – Optional debug information.

  • options – LSTM implementation options. See createInput().

  • planningCache – The matmul planning cache.

Returns

The LSTM output. If the outputFullSequence option is false the output of the last timestep of shape [batch, outputSize] is returned and if the outputFullSequence option is true the output sequence for every time step is returned in shape [timeSteps, batch, outputSize].

std::pair<poplar::Tensor, poplar::Tensor> lstmFwd(poplar::Graph &graph, const LstmParams &params, const LstmState &stateInit, const poplar::Tensor &in, const LstmWeights &weights, poplar::Tensor *intermediates, poplar::program::Sequence &fwdProg, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, poplin::matmul::PlanningCache *planningCache = nullptr)

Deprecated:

deprecated Use previously defined popnn::lstmFwd() instead.

Calculate the result of applying an LSTM across a sequence.

The LSTM is run for seqSize steps each with a batch of size batchSize and input size inputSize and output size outputSize. The total number of units within each LSTM cell is lstmUnits = BASIC_LSTM_CELL_NUM_UNITS.

Parameters
  • graph – Graph to which the LSTM cell belongs.

  • params – The parameters of the LSTM.

  • stateInit – Initial state for the LSTM.

  • in – The input tensor to the LSTM of dimension [timesteps, batch, inputSize].

  • weights – The LSTM weights structure.

  • intermediates[out] Intermediate results that are retained in the the forward pass of training for use in the backward pass. This argument should be set to null if we are only doing inference.

  • fwdProg – Program sequence.

  • debugContext – Optional debug information.

  • options – LSTM implementation options. See createInput().

  • planningCache – The matmul planning cache.

Returns

The output of the LSTM and the final cell state.

Depending on the outputFullSequence parameter the output tensor is either the output of the last timestep in the shape [batch, outputSize] or it is the sequence of outputs for every timestep in the shape [timesteps, batch, outputSize].

LstmState lstmBwd(poplar::Graph &graph, const LstmParams &params, poplar::program::Sequence &prog, const LstmState &fwdStateInit, const poplar::Tensor &fwdIntermediates, const LstmWeights &weights, const poplar::Tensor &input, const poplar::Tensor &output, const poplar::Tensor &outputGrad, const poplar::Tensor *lastCellStateGrad, poplar::Tensor *inputGrad, poplar::Tensor *bwdIntermediates, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, poplin::matmul::PlanningCache *planningCache = nullptr)

Run LSTM backward pass.

The backward pass executes in reverse order as compared to the forward pass. If the forward steps for a LSTM layer are sf = {0, 1, 2, …, S - 1} then the backward steps run for sb = {S - 1, S - 2, …. , 1, 0}.

Parameters
  • graph – Graph to which the LSTM cell belongs.

  • params – The parameters of the LSTM.

  • prog – Program sequence.

  • fwdStateInit – Forward state tensor for initial step.

  • fwdIntermediates – Intermediates results from the forward pass.

  • weights – The LSTM weights structure.

  • input – The input tensor to the LSTM of shape: [timesteps, batch, inputSize].

  • output – The output tensor from the forward pass. Depending on the outputFullSequence parameter this is either the output for the last timestep or it is a sequence of outputs for each timestep.

  • outputGrad – The gradients of the output. Depending on the outputFullSequence parameter this is either the gradient of the output for the last timestep or it is a sequence output gradients for each timestep.

  • lastCellStateGrad – The gradient of the last cell state - may be null if there is no incoming gradient.

  • *inputSeqGrad[out] The gradients of the inputs - may be null if this information is not required.

  • *bwdIntermediates[out] Intermediates gradients that are retained in the backward pass of training for use in the weight update. This argument should be set to null if you do not need to calculate weight deltas.

  • debugContext – Optional debug information.

  • options – LSTM implementation options. See createInput().

  • planningCache – The matmul planning cache.

Returns

The gradient of the initial state.

LstmWeights lstmWU(poplar::Graph &graph, const LstmParams &params, poplar::program::Sequence &prog, const LstmState &fwdStateInit, const poplar::Tensor &fwdIntermediates, const poplar::Tensor &bwdIntermediates, const LstmWeights &weights, const poplar::Tensor &input, const poplar::Tensor &output, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, poplin::matmul::PlanningCache *planningCache = nullptr)

Run a standalone weight update pass.

Takes intermediates and gradients from the backward pass and calculates and returns weight deltas.

Note: If the time step limit is variable, the entries above the given time step limit must be explicitly set to zero in fwdIntermediates, in order for the weights to be correctly updated.

Parameters
  • graph – Graph to which the LSTM cell belongs.

  • params – The parameters of the LSTM.

  • prog – Program sequence to add operations to.

  • fwdStateInit – Forward state tensor for initial step.

  • fwdIntermediates – Intermediate results from the forward pass.

  • bwdIntermediates – Intermediate results from the backward pass.

  • weights – The LSTM weights structure.

  • input – The input tensor to the LSTM of shape: [timesteps, batch, inputSize].

  • output – The output tensor from the forward pass. Depending on the outputFullSequence parameter this is either the output for the last timestep or it is a sequence of outputs for each timestep.

  • debugContext – Optional debug information.

  • options – LSTM implementation options. See createInput().

  • planningCache – The matmul planning cache.

Returns

A set of weight gradients to sum with weights.

LstmState lstmBwdWithWU(poplar::Graph &graph, const LstmParams &params, poplar::program::Sequence &prog, const LstmState &fwdStateInit, const poplar::Tensor &fwdIntermediates, const LstmWeights &weights, const poplar::Tensor &input, const poplar::Tensor &output, const poplar::Tensor &outputGrad, const poplar::Tensor *lastCellStateGrad, poplar::Tensor *inputGrad, LstmWeights &weightsGrad, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, poplin::matmul::PlanningCache *planningCache = nullptr)

Run a combined LSTM backward and weight update pass.

Use this combined backward and weight update pass in preference to lstmBwd and lstmWU separately in order to allow the most efficient implementation to be chosen if you do not need to split the operation.

Note: If the time step limit is variable, the entries above the given time step limit must be explicitly set to zero in fwdIntermediates, in order for the weights to be correctly updated.

Parameters
  • graph – Graph to which the LSTM cell belongs.

  • params – The parameters of the LSTM.

  • prog – Program sequence.

  • fwdStateInit – Forward state tensor for initial step.

  • fwdIntermediates – Intermediates results from the forward pass.

  • weights – The LSTM weights structure.

  • input – The input tensor to the LSTM of shape: [timesteps, batch, inputSize].

  • output – The output tensor from the forward pass. Depending on the outputFullSequence parameter this is either the output for the last timestep or it is a sequence of outputs for each timestep.

  • outputGrad – The gradients of the output. Depending on the outputFullSequence parameter this is either the gradient of the output for the last timestep or it is a sequence output gradients for each timestep.

  • lastCellStateGrad – The gradient of the last cell state - may be null if there is no incoming gradient.

  • *inputSeqGrad[out] The gradients of the inputs. May be null if this information is not required.

  • weightsGrad – A set of weight deltas to sum with weights.

  • debugContext – Optional debug information.

  • options – LSTM implementation options. See createInput().

  • planningCache – The matmul planning cache.

Returns

The gradient of the initial state.

struct LstmParams
#include <Lstm.hpp>

Structure representing the parameters of the LSTM.

Public Functions

LstmParams(poplar::Type dataType, std::size_t batchSize, std::size_t timeSteps, std::vector<std::size_t> layerSizes, NonLinearityType activation = NonLinearityType::TANH, NonLinearityType recurrentActivation = NonLinearityType::SIGMOID)
LstmParams(poplar::Type dataType, std::size_t batchSize, std::size_t maxTimeSteps, const poplar::Tensor &timeSteps, std::vector<std::size_t> layerSizes, NonLinearityType activation = NonLinearityType::TANH, NonLinearityType recurrentActivation = NonLinearityType::SIGMOID)

Public Members

rnn::RnnParams rnn
poplar::Type dataType

The datatype of the LSTM.

Deprecated:

Use rnn.dataType instead.

std::size_t batchSize

The batch size.

Deprecated:

Use rnn.batchSize instead.

std::size_t timeSteps

The number of time steps in the sequence of the LSTM.

Deprecated:

Use rnn.timeSteps instead.

std::vector<std::size_t> layerSizes

The number of neurons before and after each layer of the LSTM.

If the LSTM consists of N layers, then this should be a vector of size N+1. The first element is the input size and each subsequent element is the output size of the LSTM layer.

Deprecated:

Use rnn.layerSizes instead.

bool outputFullSequence = true

If true the Lstm function returns the entire sequence of outputs, otherwise it returns just the final output.

bool doInputWeightCalc = true

If this parameter is set to false then the LSTM will skip the calculation of weighted inputs (only useful for benchmarking).

bool calcInputGradients = true

If this parameter is set to false then the LSTM will skip the calculation of the gradients of the inputs.

std::vector<BasicLstmCellUnit> cellOrder = getDefaultBasicLstmCellOrder()

The weight and bias tensors are concatenated tensors in terms of which gates they service.

This option allows the user to specify the order of the gates in that outermost dimension. The default order is: [Forget gate, Input gate, Candidate, Output Gate].

NonLinearityType activation = NonLinearityType::TANH

Activation function.

NonLinearityType recurrentActivation = NonLinearityType::SIGMOID

Recurrent activation function.

struct LstmState
#include <Lstm.hpp>

Structure holding the state of a LSTM cell, or the gradients for the state (depending on the context).

Public Functions

poplar::Tensor getAsTensor() const

Public Members

poplar::Tensor output
poplar::Tensor cellState
struct LstmWeights
#include <Lstm.hpp>

Structure holding all the parameters of an LSTM cell, or the deltas for those parameters (depending on the context).

Public Members

poplar::Tensor inputWeights
poplar::Tensor outputWeights
poplar::Tensor biases

popnn/LstmDef.hpp

Enums

enum BasicLstmCellUnit

The units within a basic LSTM cell.

The term unit is used to refer to either a gate, or a cell state vector computation. In general all of these require a weight matrix, a bias and a non-linearity. Typically, a fixed type of non-linearity is associated with each type of unit.

Values:

enumerator BASIC_LSTM_CELL_FORGET_GATE = 0
enumerator BASIC_LSTM_CELL_INPUT_GATE = 1
enumerator BASIC_LSTM_CELL_CANDIDATE = 2
enumerator BASIC_LSTM_CELL_OUTPUT_GATE = 3
enumerator BASIC_LSTM_CELL_NUM_UNITS = 4

popnn/NonLinearity.hpp

Defines

DEF_NONLINEARITY_INPLACE(fn, nlType)
DEF_NONLINEARITY_(fn, nlType)
DEF_NONLINEARITY(fn, nlType)
namespace popnn

Functions used in neural networks.

Functions

void nonLinearityInPlace(poplar::Graph &graph, NonLinearityType nonLinearityType, poplar::Tensor t, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Update tensor t by applying the given non-linearity in-place.

Parameters
  • graph – The graph to add the operation to.

  • nonLinearityType – The type of non-linearity to apply to t.

  • t – The tensor to apply the non-linearity to.

  • prog – The sequence to add the operation to.

  • debugContext – Optional debug information.

void nonLinearityInPlace(poplar::Graph &graph, NonLinearityType nonLinearityType, poplar::Tensor t, poplar::ComputeSet &cs, const poplar::DebugContext &debugContext = {})

Update tensor t by applying the given non-linearity in-place.

Parameters
  • graph – The graph to add the operation to.

  • nonLinearityType – The type of non-linearity to apply to t.

  • t – The tensor to apply the non-linearity to.

  • cs – The compute set to add vertices to.

  • debugContext – Optional debug information.

void nonLinearityInPlace(poplar::Graph &graph, NonLinearityType nonLinearityType, poplar::Tensor t, float &nonLinearityScaling, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Update tensor t by applying the given non-linearity in-place and return the scaling factor by which outputs from this operation are multiplied in nonLinearityScaling.

For NonLinearityType other than SOFTMAX_SCALED nonLinearityScaling will be 1.0f upon return.

Parameters
  • graph – The graph to add the operation to.

  • nonLinearityType – The type of non-linearity to apply to t.

  • t – The tensor to apply the non-linearity to.

  • nonLinearityScaling – Reference to a float which will be overwritten with the scaling factor by which outputs from this operation in t are multiplied.

  • prog – The sequence to add the operation to.

  • debugContext – Optional debug information.

void nonLinearityInPlace(poplar::Graph &graph, NonLinearityType nonLinearityType, poplar::Tensor t, float &nonLinearityScaling, poplar::ComputeSet &cs, const poplar::DebugContext &debugContext = {})

Update tensor t by applying the given non-linearity in-place and return the scaling factor by which outputs from this operation are multiplied in nonLinearityScaling.

For NonLinearityType other than SOFTMAX_SCALED nonLinearityScaling will be 1.0f upon return.

Parameters
  • graph – The graph to add the operation to.

  • nonLinearityType – The type of non-linearity to apply to t.

  • t – The tensor to apply the non-linearity to.

  • nonLinearityScaling – Reference to a float which will be overwritten with the scaling factor by which outputs from this operation in t are multiplied.

  • cs – The compute set to add vertices to.

  • debugContext – Optional debug information.

poplar::Tensor nonLinearity(poplar::Graph &graph, NonLinearityType nonLinearityType, poplar::Tensor t, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Apply the given non-linearity to tensor t and return the result.

Parameters
  • graph – The graph to add the operation to.

  • nonLinearityType – The type of non-linearity to apply.

  • t – The tensor to apply the non-linearity to.

  • prog – The sequence to add the operation to.

  • debugContext – Optional debug information.

Returns

A new tensor containing the contents of t with the given non-linearity applied.

poplar::Tensor nonLinearity(poplar::Graph &graph, NonLinearityType nonLinearityType, poplar::Tensor t, float &nonLinearityScaling, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Apply the given non-linearity to tensor t and return the result.

Also returns the scaling factor by which outputs from this operation are multiplied in nonLinearityScaling.

For NonLinearityType other than SOFTMAX_SCALED nonLinearityScaling will be 1.0f upon return.

Parameters
  • graph – The graph to add the operation to.

  • nonLinearityType – The type of non-linearity to apply to t.

  • t – The tensor to apply the non-linearity to.

  • nonLinearityScaling – Reference to a float which will be overwritten with the scaling factor by which outputs from this operation in t are multiplied.

  • prog – The sequence to add the operation to.

  • debugContext – Optional debug information.

Returns

A new tensor containing the contents of t with the given non-linearity applied.

inline void sigmoidInPlace(poplar::Graph &graph, poplar::Tensor t, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
inline void sigmoidInPlace(poplar::Graph &graph, poplar::Tensor t, float &nonLinearityScaling, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
inline poplar::Tensor sigmoid(poplar::Graph &graph, poplar::Tensor t, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
inline poplar::Tensor sigmoid(poplar::Graph &graph, poplar::Tensor t, float &nonLinearityScaling, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
inline void reluInPlace(poplar::Graph &graph, poplar::Tensor t, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
inline void reluInPlace(poplar::Graph &graph, poplar::Tensor t, float &nonLinearityScaling, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
inline poplar::Tensor relu(poplar::Graph &graph, poplar::Tensor t, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
inline poplar::Tensor relu(poplar::Graph &graph, poplar::Tensor t, float &nonLinearityScaling, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
inline void tanhInPlace(poplar::Graph &graph, poplar::Tensor t, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
inline void tanhInPlace(poplar::Graph &graph, poplar::Tensor t, float &nonLinearityScaling, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
inline poplar::Tensor tanh(poplar::Graph &graph, poplar::Tensor t, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
inline poplar::Tensor tanh(poplar::Graph &graph, poplar::Tensor t, float &nonLinearityScaling, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
inline void geluInPlace(poplar::Graph &graph, poplar::Tensor t, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
inline void geluInPlace(poplar::Graph &graph, poplar::Tensor t, float &nonLinearityScaling, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
inline poplar::Tensor gelu(poplar::Graph &graph, poplar::Tensor t, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
inline poplar::Tensor gelu(poplar::Graph &graph, poplar::Tensor t, float &nonLinearityScaling, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
inline void swishInPlace(poplar::Graph &graph, poplar::Tensor t, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
inline void swishInPlace(poplar::Graph &graph, poplar::Tensor t, float &nonLinearityScaling, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
inline poplar::Tensor swish(poplar::Graph &graph, poplar::Tensor t, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
inline poplar::Tensor swish(poplar::Graph &graph, poplar::Tensor t, float &nonLinearityScaling, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
inline void softmaxInPlace(poplar::Graph &graph, poplar::Tensor t, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
inline void softmaxInPlace(poplar::Graph &graph, poplar::Tensor t, float &nonLinearityScaling, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
inline poplar::Tensor softmax(poplar::Graph &graph, poplar::Tensor t, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
inline poplar::Tensor softmax(poplar::Graph &graph, poplar::Tensor t, float &nonLinearityScaling, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
inline void softmaxStableInPlace(poplar::Graph &graph, poplar::Tensor t, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
inline void softmaxStableInPlace(poplar::Graph &graph, poplar::Tensor t, float &nonLinearityScaling, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
inline poplar::Tensor softmaxStable(poplar::Graph &graph, poplar::Tensor t, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
inline poplar::Tensor softmaxStable(poplar::Graph &graph, poplar::Tensor t, float &nonLinearityScaling, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
inline void scaledSoftmaxStableInPlace(poplar::Graph &graph, poplar::Tensor t, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
inline void scaledSoftmaxStableInPlace(poplar::Graph &graph, poplar::Tensor t, float &nonLinearityScaling, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
inline poplar::Tensor scaledSoftmaxStable(poplar::Graph &graph, poplar::Tensor t, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
inline poplar::Tensor scaledSoftmaxStable(poplar::Graph &graph, poplar::Tensor t, float &nonLinearityScaling, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
poplar::Tensor nonLinearityInputGradient(poplar::Graph &graph, NonLinearityType nonLinearityType, poplar::Tensor act, poplar::Tensor outGradient, poplar::ComputeSet &cs, const poplar::DebugContext &debugContext = {})

Computes and returns the input gradient for a non-linearity from the activations and gradients at the output of the non-linearity.

Parameters
  • graph – The graph to add the operation to.

  • nonLinearityType – The type of non-linearity to compute the input gradient for.

  • act – The output activations from the non-linearity. For the GELU non-linearity only this is the input to the non-linearity.

  • outGradient – The gradients at the output of the non-linearity.

  • cs – The compute set to add vertices to.

  • debugContext – Optional debug information.

Returns

A new tensor with the calculated gradient for the input of the non-linearity.

poplar::Tensor nonLinearityInputGradient(poplar::Graph &graph, NonLinearityType nonLinearityType, poplar::Tensor act, poplar::Tensor outGradient, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Computes and returns the input gradient for a non-linearity from the activations and gradients at the output of the non-linearity.

Parameters
  • graph – The graph to add the operation to.

  • nonLinearityType – The type of non-linearity to compute the input gradient for.

  • act – The output activations from the non-linearity. For the GELU and SWISH non-linearity only this is the input to the non-linearity.

  • outGradient – The gradients at the output of the non-linearity.

  • prog – The sequence to add the operation to.

  • debugContext – Optional debug information.

Returns

A new tensor with the calculated gradient for the input of the non-linearity.

popnn/NonLinearityDef.hpp

namespace popnn

Functions used in neural networks.

Enums

enum NonLinearityType

Values:

enumerator SIGMOID

Sigmoid:

  • y = 1 / (1 + e^(-x))

enumerator HARD_SIGMOID

Hard Sigmoid:

  • y = max(0, min(1, 0.2*x + 0.5)

enumerator RELU

Rectified Linear Unit:

  • x >= 0 -> y = x

  • x < 0 -> y = 0

enumerator TANH

Hyperbolic tangent:

  • y = tanh(x)

enumerator GELU

Gaussian Error Linear Unit:

  • y = x * Phi(x) where Phi(x) is the cumulative distribution function of normal gaussian distribution. Phi(x) is approximated as:

  • Phi(x) = 0.5 * (1 + (tanh(x * 0.7978845608 * (1 + 0.044715 * x * x))))

enumerator SWISH
enumerator SOFTMAX

Softmax:

  • Always applied over the innermost dimension of the given tensor. Outer dimensions are independent of one another.

enumerator SOFTMAX_STABLE

Same as SOFTMAX, but slower more numerically stable algorithm used.

enumerator SOFTMAX_SCALED

Same as SOFTMAX, but slower more numerically stable algorithm used.

Outputs are scaled to allow use of greater dynamic range in outputs.

popnn/NonLinearityDefUtil.hpp

template<>
struct poputil::VertexTemplateToString<popnn::NonLinearityType>
#include <NonLinearityDefUtil.hpp>

Specialise vertex template stringification for non-linearity type.

Public Static Functions

static inline std::string to_string(const popnn::NonLinearityType &nlType)
namespace popnn

Functions used in neural networks.

Functions

inline const char *asString(const popnn::NonLinearityType &type)
inline std::ostream &operator<<(std::ostream &os, const popnn::NonLinearityType &type)
inline std::istream &operator>>(std::istream &in, popnn::NonLinearityType &type)
namespace poputil

General utility functions for building graphs.

template<> NonLinearityType >
#include <NonLinearityDefUtil.hpp>

Specialise vertex template stringification for non-linearity type.

Public Static Functions

static inline std::string to_string(const popnn::NonLinearityType &nlType)

popnn/Norms.hpp

namespace popnn

Functions used in neural networks.

Functions

std::uint64_t getNormFwdFlops(std::size_t statisticsSize, std::size_t numActsElements, bool computeStats = true)

Flops for forward pass of a norm layer with a given size of statistics vector and the total elements in the activations input to the layer.

For Batch Norm, computeStats should be set to false for inference if batch statistics are not computed as averaged batch statistics may be combined with norm parameters.

std::uint64_t getNormBwdFlops(std::size_t statisticsSize, std::size_t numActsElements)

Flops for computation of gradient w.r.t activations for a norm layer with a given size of statistics vector and the total elements in the activations input to the layer.

std::uint64_t getNormWuFlops(std::size_t paramsSize, std::size_t numActsElements)
poplar::Tensor createNormGamma(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::DebugContext &debugContext = {})
poplar::Tensor createNormBeta(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::DebugContext &debugContext = {})
std::pair<poplar::Tensor, poplar::Tensor> createNormParams(poplar::Graph &graph, const poplar::Tensor acts, const poplar::DebugContext &debugContext = {})

popnn/Pooling.hpp

namespace popnn

Functions used in neural networks.

namespace pooling

Functions

std::ostream &operator<<(std::ostream &o, const PoolParams &params)
const char *asString(const PoolingType &method)
std::vector<std::size_t> getOutputFieldShape(const PoolParams &params)
uint64_t getFwdFlops(const PoolParams &params)
uint64_t getBwdFlops(const PoolParams &params)
double getFwdPerfectCycleCount(const poplar::Graph &graph, const PoolParams &params)
double getBwdPerfectCycleCount(const poplar::Graph &graph, const PoolParams &params)
poplar::Tensor pool(poplar::Graph &graph, const PoolParams &params, const poplar::Tensor &in, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Add a pooling operation to the graph.

This performs a pooling over the spatial dimensions […]. The shape of the input should be [batchSize x numChannels x …].

Parameters
  • graph – The operation will be added to this graph

  • params – Pooling parameters

  • in – Input tensor

  • prog – Program sequence to append the operation to

  • debugContext – Optional debug information.

  • options – Pooling options (not currently used)

Returns

A tensor with the results of the pooling operation

poplar::Tensor poolInputGradient(poplar::Graph &graph, const PoolParams &params, const poplar::Tensor &in, const poplar::Tensor &pooled, const poplar::Tensor &pooledGradient, bool useScaledGradient, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Calculate the gradient with respect to the input of a pooling operation given the gradient of the output.

This can be used for MAX, AVG or SUM pooling Note - recommend the specific function for AVG or SUM pooling, below.

This performs a pooling over the spatial dimensions […]. The shape of the input should be [batchSize x numChannels x …].

Parameters
  • graph – The operation will be added to this graph

  • params – Pooling parameters

  • in – Forward activations tensor input to pooling

  • pooled – Output of pooling in the forward pass

  • pooledGradient – Gradients to the pooling operation

  • useScaledGradient – Use scaled gradient if set to true. Otherwise, the gradient is propagated to all the positions which matched pooled value in forward pass.

  • prog – Program sequence to append the operation to

  • debugContext – Optional debug information.

  • options – Pooling options. See pool().

Returns

A tensor with the results of the pooling operation

poplar::Tensor poolInputGradient(poplar::Graph &graph, const PoolParams &params, const unsigned fwdChansPerGroup, const poplar::Tensor &pooledGradient, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Calculate the gradient with respect to the input of a pooling operation given the gradient of the output.

This should be used for AVG and SUM pooling

This performs a pooling over the spatial dimensions […]. The shape of the output will be [batchSize x numChannels x …].

Parameters
  • graph – The operation will be added to this graph

  • params – Pooling parameters

  • fwdChansPerGroup – Used in creating the output tensor

  • pooledGradient – Gradients to the pooling operation

  • prog – Program sequence to append the operation to

  • debugContext – Optional debug information.

  • options – Pooling options. See pool().

Returns

A tensor with the results of the pooling operation

struct PoolParams

Public Functions

inline PoolParams(PoolingType poolingType, std::vector<std::size_t> inputFieldShape, std::vector<std::size_t> kernelShape, std::vector<unsigned> stride, std::vector<int> inputTruncationOrPaddingLower, std::vector<int> inputTruncationOrPaddingUpper, std::size_t numChannels, std::size_t batchSize, poplar::Type dType)
inline std::size_t getNumFieldDims() const
std::vector<std::size_t> getOutputFieldShape() const

Public Members

PoolingType poolingType

The type of pooling to be performed (for example maximum, sum or average)

std::vector<std::size_t> inputFieldShape

The input shape not including the dimensions for batch size batchSize and number of channels numChannels

std::vector<std::size_t> kernelShape

The shape of the pooling kernel.

std::vector<unsigned> stride

The stride per input dimension.

std::vector<int> inputTruncationOrPaddingLower

The lower padding (for values >0) or truncation (for values <0) for each input dimension specifying the padding/truncation at the start of the dimension.

std::vector<int> inputTruncationOrPaddingUpper

The upper padding (for values >0) or truncation (for values <0) for each input dimension specifying the padding/truncation at the end of the dimension.

std::size_t numChannels

The number of channels in the input tensor.

std::size_t batchSize

The batch size of the input tensor.

poplar::Type dType

The output type.

popnn/PoolingDef.hpp

namespace popnn

Functions used in neural networks.

Enums

enum PoolingType

Pooling types.

Values:

enumerator MAX
enumerator AVG
enumerator SUM

popnn/Recurrent.hpp

namespace poplin

Linear algebra functions.

namespace matmul
namespace popnn

Functions used in neural networks.

namespace rnn

Functions

std::vector<std::pair<poplin::MatMulParams, poplar::OptionFlags>> getMatMulPrePlanParameters(std::size_t numSteps, std::size_t batchSize, std::size_t inputSize, std::size_t outputSize, const poplar::Type &dType, const poplar::Type &partialsType = poplar::FLOAT, bool inferenceOnly = false, bool hasFeedforwardWeights = true)

Predict what matrix multiplications will be needed for the given parameters and return list of corresponding matmul parameters and options.

uint64_t getFwdFlops(unsigned sequenceSize, unsigned batchSize, unsigned inputSize, unsigned outputSize, bool weightInput = true)

Compute the total flops for the forward pass of RNN.

uint64_t getBwdFlops(unsigned sequenceSize, unsigned batchSize, unsigned inputSize, unsigned outputSize, bool calcInputGrad = true)

Compute the total flops for the backward pass of RNN.

uint64_t getWuFlops(unsigned sequenceSize, unsigned batchSize, unsigned inputSize, unsigned outputSize)

Compute the total flops for the weight update pass of RNN.

poplar::Tensor createInput(poplar::Graph &graph, unsigned numSteps, unsigned batchSize, unsigned inputSize, unsigned outputSize, const poplar::Type &dType, const poplar::Type &partialsType = poplar::FLOAT, bool inferenceOnly = false, const poplar::DebugContext &debugContext = {}, poplin::matmul::PlanningCache *planningCache = nullptr)

Create a tensor which is input to a vanilla RNN.

The layout of the tensor is best for a multiplication of the input weight matrix with the given number of steps.

Parameters
  • graph – Graph object

  • numSteps – Number of steps used in the forward weighting of input

  • batchSize – Number of batch elements

  • inputSize – Size of the input for each sequence step

  • outputSize – Output(hidden) size of each sequence element

  • inferenceOnly – Whether the RNN layer is for inference only. If true, we can ignore backwards and weight update passes

  • dType – Data type of the created tensor

  • partialsType – Data type of intermediate calculations

  • debugContext – Optional debug information.

  • planningCache – The matmul planning cache.

Returns

Tensor of shape {numSteps, batchSize, inputSize}

poplar::Tensor createFwdState(poplar::Graph &graph, const poplar::Type &dType, unsigned batchSize, unsigned outputSize, poplar::program::Sequence &prog, bool initState, bool inferenceOnly, const poplar::DebugContext &debugContext = {}, poplin::matmul::PlanningCache *planningCache = nullptr)

Create initial state for a vanilla RNN.

The state apart from the activations are initialised by the control program

The amount of hidden state may depend on whether the RNN is used for inference or training.

Parameters
  • graph – Graph object

  • dType – data type of the created tensor

  • batchSize – Number of batch elements

  • outputSize – Output(hidden) of each sequence element

  • prog – Control program

  • initState – Initialise the state

  • inferenceOnly – Whether the RNN layer is for inference only. If true, we can ignore backwards and weight update passes

  • debugContext – Optional debug information.

  • planningCache – The matmul planning cache.

Returns

A 2D tensor of shape {batchSize, outputSize}

poplar::Tensor getOutputFromFwdState(const poplar::Tensor &fwdState)

Extract prev output tensor from hidden state.

The returned tensor is a view of tensor and can be used to initialise the tensor if required

poplar::Tensor createWeightsInput(poplar::Graph &graph, unsigned sequenceSize, unsigned batchSize, unsigned inputSize, unsigned outputSize, const poplar::Type &dType, const poplar::Type &partialsType = poplar::FLOAT, bool inferenceOnly = false, const poplar::DebugContext &debugContext = {}, poplin::matmul::PlanningCache *planningCache = nullptr)

Create the weights used to weight the input of a vanilla RNN layer.

The tile mapping of the weight tensor is best for multiplication with a sequence size in the input activation tensor used to multiply with the input weights.

Parameters
  • graph – Graph object

  • sequenceSize – Number of sequence steps used in the forward weighting of the input. The best tile mapping is when this matches the sequence size of the input activation tensor

  • batchSize – Number of batch elements

  • inputSize – Input size of each sequence

  • outputSize – Output(hidden) size of each sequence

  • dType – Data type of the created tensor

  • partialsType – Data type of partial results in the computation

  • inferenceOnly – Whether the RNN layer is for inference only. If true, we can ignore backwards and weight update passes

  • debugContext – Optional debug information.

  • planningCache – The matmul planning cache.

poplar::Tensor createWeightsFeedback(poplar::Graph &graph, unsigned batchSize, unsigned outputSize, const poplar::Type &dType, const poplar::Type &partialsType = poplar::FLOAT, bool inferenceOnly = false, const poplar::DebugContext &debugContext = {}, poplin::matmul::PlanningCache *planningCache = nullptr)

Create the weights used in the recurrent part of a vanilla RNN layer.

Parameters
  • graph – Graph object

  • batchSize – Number of batch elements

  • outputSize – Output(hidden) size of each sequence

  • dType – Data type of the created tensor

  • partialsType – Data type of partial results in the computation

  • inferenceOnly – Whether the RNN layer is for inference only. If true, we can ignore backwards and weight update passes

  • debugContext – Optional debug information.

  • planningCache – The matmul planning cache.

poplar::Tensor forwardWeightInput(poplar::Graph &graph, const poplar::Tensor &actIn, const poplar::Tensor &weights, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, bool inferenceOnly = false, const poplar::DebugContext &debugContext = {}, poplin::matmul::PlanningCache *planningCache = nullptr)

Perform feedforward part of a RNN layer.

The feedforward part of the RNN layer must be followed by the feedback part to complete the RNN layer. i.e. the output must be fed as the feedforward input to the feedback part.

The following definitions are used below: numSteps is the number of sequence steps batchSize is the batchSize inputSize is the size of the input for each step outputSize is the size of the output for each step

See also

forwardIterate

Parameters
  • graph – Graph pbject

  • actIn – Input activation tensor with shape {numSteps, batchSize, inputSize}

  • weights – Feedforward weights with shape {outputSize, inputSize}

  • prog – Program sequence to which programs added by this function are appended to

  • partialsType – Data type for intermediates

  • inferenceOnly – Whether the RNN layer is for inference only. If true, we can ignore backwards and weight update passes

  • debugContext – Optional debug information.

  • planningCache – The matmul planning cache.

Returns

Output tensor with shape {numSteps, batchSize, outputSize}

poplar::Tensor forwardIterate(poplar::Graph &graph, const poplar::Tensor &feedFwdIn, const poplar::Tensor &initState, const poplar::Tensor &feedbackWeights, const poplar::Tensor &biases, poplar::program::Sequence &prog, popnn::NonLinearityType nonLinearityType, const poplar::Type &partialsType = poplar::FLOAT, bool inferenceOnly = false, const poplar::DebugContext &debugContext = {}, poplin::matmul::PlanningCache *planningCache = nullptr)

Perform the feedback part of the RNN layer.

The feedback part of the RNN layer must be preceded by the feedforward part of the RNN layer to complete the layer

The following definitions are used below: numSteps is the number of steps batchSize is the batchSize inputSize is the size of the input for each step outputSize is the size of the output for each step

See also

forwardWeightInput

Parameters
  • graph – Graph object

  • feedFwdIn – Input to this function (output from feedforward part of the RNN layer

  • initState – The initial state of the RNN layer(i.e. the previous output)

  • feedbackWeights – Feedback weights

  • biases – Biases

  • prog – Program sequence to which programs added by this function are appended to

  • nonLinearityType – Non linearity used for the output activations

  • partialsType – Data type for intermediates

  • inferenceOnly – Whether the RNN layer is for inference only. If true, we can ignore backwards and weight update passes

  • debugContext – Optional debug information.

  • planningCache – The matmul planning cache.

Returns

Output activations of RNN layer

poplar::Tensor createBwdState(poplar::Graph &graph, const poplar::Type &dType, unsigned batchSize, unsigned outputSize, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, poplin::matmul::PlanningCache *planningCache = nullptr)

Create initial state for backward pass of a vanilla RNN.

Parameters
  • graph – Graph object

  • dType – Data type of the created tensor

  • batchSize – Number of batch elements processed

  • outputSize – Number of output activations

  • prog – Control program

  • debugContext – Optional debug information.

  • planningCache – The matmul planning cache.

Returns

Tile mapped initial state tensor

std::pair<poplar::Tensor, poplar::Tensor> backwardGradientStep(poplar::Graph &graph, const poplar::Tensor &nextLayerGrad, const poplar::Tensor &bwdState, const poplar::Tensor &actOut, const poplar::Tensor &weightsInput, const poplar::Tensor &weightsFeedback, poplar::program::Sequence &prog, popnn::NonLinearityType nonLinearityType, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, poplin::matmul::PlanningCache *planningCache = nullptr)

Compute a single step of backward pass of a vanilla RNN layer.

Two gradient outputs are produced. The first is at the input of the RNN layer for the step. The second is at the adder and can be used to backward propagate through the earlier steps.

Parameters
  • graph – Graph object

  • nextLayerGrad – Loss gradient fed as input to this step

  • bwdState – Gradient state for previous step

  • actOut – Output activation

  • weightsInput – Input weights

  • weightsFeedback – Feedback weights

  • prog – Control program to which to add programs to

  • nonLinearityType – Type of non-linearity

  • firstStep – Set to true to indicate if first step in the backward pass

  • partialsType – Data type used in intermediate calculations

  • debugContext – Optional debug information.

  • planningCache – The matmul planning cache.

Returns

A pair of tensors. The first is the loss gradient at the input layer. The second is the backward state needed to run the next backward step

poplar::Tensor backwardGradientStep(poplar::Graph &graph, const poplar::Tensor &nextLayerGrad, const poplar::Tensor &bwdState, const poplar::Tensor &actOut, const poplar::Tensor &weightsFeedback, poplar::program::Sequence &prog, popnn::NonLinearityType nonLinearityType, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, poplin::matmul::PlanningCache *planningCache = nullptr)

Same as function above with the difference that the input gradients are not computed.

void paramDeltaUpdate(poplar::Graph &graph, const poplar::Tensor &bwdState, const poplar::Tensor &actIn, const poplar::Tensor &prevOut, poplar::Tensor &weightsInputDeltasAcc, poplar::Tensor &weightsFeedbackDeltasAcc, poplar::Tensor &biasDeltasAcc, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, poplin::matmul::PlanningCache *planningCache = nullptr)

Update parameter deltas for a vanilla RNN step.

The parameter deltas updated are:

  • Feedback Weights

  • Input Weights

  • Bias The new deltas computed for this step are added to the accumulated deltas from previous steps. The caller must zero the accumulated tensors at the first call if the tensors to maintain the result are in-place.

Parameters
  • graph – Graph object.

  • bwdState – Gradient state for this step.

  • actIn – Input activations for this step.

  • prevOut – Previous RNN output activations for this step.

  • weightsInputDeltasAcc – Previous weights input deltas tensor. This tensor must be tile-mapped. The deltas from this step are added to this tensor.

  • weightsFeedbackDeltasAcc – Previous feedback weights deltas tensor. This tensor must be tile-mapped. The deltas from this step are added to this tensor.

  • biasDeltasAcc – Previous bias deltas tensor. This tensor must be tile-mapped. The deltas from this step are added to this tensor.

  • prog – Control program to which to add programs to.

  • partialsType – Data type used in intermediate calculations.

  • debugContext – Optional debug information.

  • planningCache – The matmul planning cache.

poplar::Tensor rnnFwdSequence(poplar::Graph &graph, poplar::program::Sequence &prog, const poplar::Tensor &fwdStateInit, const poplar::Tensor *weightedIn, const poplar::Tensor &biases, const poplar::Tensor &feedFwdWeights, const poplar::Tensor &feedbackWeights, const poplar::Tensor &prevLayerActs, const popnn::NonLinearityType &nonLinearityType, const poplar::Type &partialsType, bool inferenceOnly, const poplar::DebugContext &debugContext = {}, poplin::matmul::PlanningCache *planningCache = nullptr)

Perform the forward part of the RNN layer.

The feedback part of the RNN layer must be preceded by the feedforward part of the RNN layer to complete the layer.

The following definitions are used below:

  • numSteps is the number of steps

  • batchSize is the batchSize

  • inputSize is the size of the input for each step

  • outputSize is the size of the output for each step

See also

forwardWeightInput

Parameters
  • graph – Graph object.

  • prog – Control program.

  • fwdStateInit – Forward state tensor for initial step.

  • weightedIn – Preweighted input, or nullptr if Wff is to be applied.

  • biases – Biases.

  • feedFwdWeights – Input weights Wff.

  • feedbackWeights – Feedback weights Wfb.

  • prevLayerActs – Activations from previous layer (output from feedforward part of the RNN layer.

  • nonLinearityType – Non linearity used for the output activations.

  • partialsType – Data type for intermediates.

  • inferenceOnly – Whether the RNN layer is for inference only. If true, we can ignore backwards and weight update passes

  • debugContext – Optional debug information.

  • planningCache – The matmul planning cache.

Returns

Forward state tensor for all steps [0:seqSize)

std::tuple<poplar::Tensor, poplar::Tensor, poplar::Tensor, poplar::Tensor> rnnBwdSequence(poplar::Graph &graph, bool doWU, bool ignoreInputGradientCalc, poplar::program::Sequence &prog, const poplar::Tensor &fwdStateInit, const poplar::Tensor &fwdState, const poplar::Tensor &biases, const poplar::Tensor &feedFwdWeights, const poplar::Tensor &feedbackWeights, const poplar::Tensor &outGradient, const poplar::Tensor &actIn, const popnn::NonLinearityType &nonLinearityType, const poplar::Type &partialsType, const poplar::DebugContext &debugContext = {}, poplin::matmul::PlanningCache *planningCache = nullptr)

Perform the feedback part of the RNN layer.

The feedback part of the RNN layer must be preceded by the feedforward part of the RNN layer to complete the layer.

The following definitions are used below:

  • numSteps is the number of steps

  • batchSize is the batchSize

  • inputSize is the size of the input for each step

  • outputSize is the size of the output for each step

See also

forwardWeightInput

Parameters
  • graph – Graph object

  • doWU – Calculate weight updates

  • ignoreInputGradientCalc – Do not calculate the gradients over the input weights

  • prog – Control program

  • fwdStateInit – Forward state tensor for initial step

  • fwdState – Forward state tensor for all steps [0:seqSize)

  • biases – Biases

  • feedFwdWeights – Input weights Wff

  • feedbackWeights – Feedback weights Wfb

  • outGradient – Gradient from next layer

  • actIn – Activations from previous layer (output from feedforward part of the RNN layer

  • nonLinearityType – Non linearity used for the output activations

  • partialsType – Data type for intermediates

  • debugContext – Optional debug information.

  • planningCache – The matmul planning cache.

Returns

Returns four tensors:

  • gradients for previous layer

  • input weight deltas

  • output weight deltas

  • bias deltas

    When doWU is false the weight and bias deltas are not calculated

popnn/Rnn.hpp

namespace popnn

Functions used in neural networks.

namespace rnn

Typedefs

using LoopBodyType = std::function<poplar::program::Sequence(poplar::Graph &graph, const poplar::Tensor&, const poplar::Tensor&, const RnnBatchwiseFlags&, std::vector<poplar::Tensor>&, const RnnSlice &slice, std::vector<poplar::Tensor>&, poplar::program::Sequence*, const poplar::DebugNameAndId&)>

Loop body function wrapper with the following arguments:

Param graph

Graph Object

Param shardIdx

Tensor that specifies the starting sequence index for the current shard.

Param seqIdx

Tensor that iterates over the range of input sequences that are mapped on the current shard, beginning from 0.

Param batchwiseFlags

Flags that indicate batches for which the current step is within the batchwise step limit.

Param state

state tensors

Param slice

Input/Output tensors for a specific shard

Param created

Output tensors which are created by this function.

Param prog

Program initialization sequence

Param dnai

Debug name and Id

Return

Program for the given shard

using GatherBodyType = std::function<poplar::program::Sequence(poplar::Graph &graph, const RnnSlice &slice, unsigned stepsPerGather, poplar::program::Sequence*, const poplar::DebugNameAndId&)>

Gather body function wrapper with the following arguments:

Param graph

Graph Object

Param slice

Input/Output tensors for a specific shard

Param stepsPerGather

stepsPerGather for current shard.

Param prog

Program initialization sequence

Param dnai

Debug name and Id

Return

Program for the given shard

Functions

poplar::Tensor createInitialState(poplar::Graph &graph, const RnnParams &params, bool isOutput, unsigned multiple, unsigned numShards, const poplar::DebugContext &debugContext = {})

Create state tensor to be used in all recurrences of the RNN.

The tensor shape is {multiple, batchSize, size}. If the RNN happens to be sharded, a tensor of this shape is created for each shard.

Parameters
  • graph – Graph object.

  • params – The RNN parameters.

  • isOutput – Flag that indicates that the tensor is for output. If the flag is false that indicates that this is an input

  • multiple – The number of state variables that are concatenated into one single state tensor.

  • numShards – The number of shards to be used.

  • debugContext – Debug information.

Returns

Tensor of shape {multiple, batchSize, size}.

poplar::Tensor createRecurrentTensor(poplar::Graph &graph, const RnnParams &params, unsigned size, unsigned numShards, const poplar::DebugContext &debugContext = {})

Create tensor of shape {timeSteps, batchSize, size} which is suitable for slicing and/or sharding outermost dimension.

Parameters
  • graph – Graph object.

  • params – The RNN parameters.

  • size – The inner most dimension of the tensor

  • numShards – The number of shards to be used.

  • debugContext – Debug information.

Returns

Tensor of shape {timeSteps, batchSize, size}.

poplar::Tensor createInputTensor(poplar::Graph &graph, const RnnParams &params, unsigned numShards, const poplar::DebugContext &debugContext = {})

Create tensor of shape {timeSteps, batchSize, inputSize} which is suitable for slicing and/or sharding outermost dimension.

Parameters
  • graph – Graph object.

  • params – The RNN parameters.

  • numShards – The number of shards to be used.

  • debugContext – Debug information.

Returns

Tensor of shape {timeSteps, batchSize, inputSize}.

poplar::Tensor createOutputTensor(poplar::Graph &graph, const RnnParams &params, unsigned numShards, const poplar::DebugContext &debugContext = {})

Create tensor of shape {timeSteps, batchSize, outputSize} which is suitable for slicing and/or sharding outermost dimension.

Parameters
  • graph – Graph object.

  • params – The RNN parameters.

  • numShards – The number of shards to be used.

  • debugContext – Debug information.

Returns

Tensor of shape {timeSteps, batchSize, outputSize}.

poplar::Tensor createOutputTensor(poplar::Graph &graph, const RnnParams &params, unsigned multiple, unsigned numShards, const poplar::DebugContext &debugContext = {})

Create tensor with size which is a multiple of an output tensor.

The concatenated tensor is of shape {multiple * timeSteps, batchSize, outputSize} which is suitable for slicing and/or sharding along outermost dimension.

Parameters
  • graph – Graph object.

  • params – The RNN parameters.

  • multiple – Integer multiple of standard output tensor.

  • numShards – The number of shards to be used.

  • debugContext – Debug information.

Returns

Tensor of shape {timeSteps * multiple, batchSize, outputSize}.

poplar::Tensor shiftRnnTensor(poplar::Graph &graph, const RnnParams &params, const poplar::Tensor &tBase, const poplar::Tensor &tSingle, poplar::program::Sequence &prog, unsigned numShards, const poplar::DebugContext &debugContext = {})

Create RNN tensor based on a provided ‘tBase’ tensor such that for the ‘n’th iteration the RNN tensor points to the ‘n-1’th iteration of the tBase tensor.

For the 0’th iteration of the RNN tensor, a copy is made from the provided ‘tSingle’ tensor.

Parameters
  • graph – Graph object.

  • params – The RNN parameters.

  • tBase – Tensor to shift

  • tSingle – Tensor to be copied to 0’th iteration

  • prog – The program to add tensor copy

  • numShards – The number of shards to be used.

  • debugContext – Debug information.

Returns

Tensor which is single step shifted version of tBase.

std::vector<poplar::Tensor> Rnn(poplar::Graph &graph, const RnnParams &params, bool reverse, const std::vector<poplar::Tensor> &initState, const StateSequence &stateSequence, const std::vector<poplar::Tensor> &inputs, const poplar::Tensor *interimIn, poplar::Tensor *interimOut, const std::vector<poplar::Tensor> &outputs, const std::vector<poplar::Tensor> &created, poplar::program::Sequence &prog, const LoopBodyType &loopFn, unsigned numShards, poplar::OptionFlags &options, const poplar::DebugContext &debugContext = {})

Run custom Recurrent Neural Net cell implementation recurrently.

RNN options

  • codeReuse (true, false) [=false]

    If true, the custom RNN implementation defined by the loopFn parameter will be reused by every shard. If false the RNN code is duplicated for every shard.

Parameters
  • graph – Graph to which the RNN cell belongs.

  • params – The parameters of the RNN.

  • reverse – Process tensors in reverse, i.e., beginning from the last element.

  • initState – state tensors that specify the initial states.

  • stateSequence – Optionally specifies that the recurrent updates of a state Tensor need to be stored to a user defined output tensor.

  • inputs – Input tensors for each recurrence

  • *interimIn – Pointer to intermediate inputs to Cell computation.

  • *interimOut – Pointer to intermediate outputs from Cell computation.

  • output – Output tensor for each recurrence. Each tensor must be defined prior to calling the Rnn function.

  • created – Output tensor that is allocated by the custom implementation defined in the loopFn parameter.

  • prog – Program sequence.

  • loopFn – Function for RNN cell computation which is invoked for every shard.

  • numShards – The number of shards to be used.

  • options – RNN implementation options. See createInput().

  • debugContext – Optional debug information.

std::vector<poplar::Tensor> Rnn(poplar::Graph &graph, const RnnParams &params, const std::vector<poplar::Tensor> &initState, const StateSequence &stateSequence, const std::vector<poplar::Tensor> &inputs, const poplar::Tensor &interimIn, const unsigned numTemps, poplar::program::Sequence &prog, const LoopBodyType &loopFn, const std::vector<poplar::Tensor> &gatherInputs, const GatherBodyType &gatherFn, unsigned numShards, unsigned stepsPerGather, poplar::OptionFlags &options, const poplar::DebugContext &debugContext = {})

Run custom Recurrent Neural Net cell callback every at time step in decrementing order.

At each time step create a temporary variable and pass it to a Gather callback which gets called at a cadence determined by the stepsPerGather parameter.

RNN options

  • codeReuse (true, false) [=false]

    If true, the custom RNN implementation defined by the loopFn parameter will be reused by every shard. If false the RNN code is duplicated for every shard.

Parameters
  • graph – Graph to which the RNN cell belongs.

  • params – The parameters of the RNN.

  • initState – state tensors that specify the initial states.

  • stateSequence – Optionally specifies that the recurrent updates of a state Tensor need to be stored to a user defined output tensor.

  • inputs – Input tensors to loopFn function.

  • interimIn – Intermediate inputs to Cell computation.

  • numTemps – Number of temporary variables of shape {batchSize, size} per time step which are to be passed to the Gather callback.

  • prog – Program sequence.

  • loopFn – Function for RNN cell computation which is invoked for every time step.

  • gatherInputs – Input tensors to gatherFn function.

  • gatherFn – Function which processes the temporary buffer generated by loopFn with cadence determined by the stepsPerGather parameter.

  • numShards – The number of shards to be used.

  • stepsPerGather – The time step cadence used for the gatherFn callback.

  • options – RNN implementation options. See createInput().

  • debugContext – Optional debug information.

struct RnnBatchwiseFlags

Public Functions

inline bool valid() const

Public Members

poplar::Tensor mask
poplar::Tensor inverse
struct RnnParams
#include <Rnn.hpp>

Structure of Recurrent Neural Network (RNN) parameters which allows for any customized implementation of the cellular part of the RNN.

Public Functions

RnnParams(poplar::Type dataType, std::size_t batchSize, std::size_t timeSteps, std::vector<std::size_t> layerSizes)
RnnParams(poplar::Type dataType, std::size_t batchSize, std::size_t maxTimeSteps, const poplar::Tensor &varTimeSteps, std::vector<std::size_t> layerSizes)
std::size_t getMaxShards(const poplar::Graph &graph) const
std::size_t getInputBytesPerTile(const poplar::Graph &graph) const
std::size_t getOutputBytesPerTile(const poplar::Graph &graph) const
bool variableTimeSteps() const
bool batchVariableTimeSteps() const

Public Members

poplar::Type dataType

The datatype used for the RNN.

std::size_t batchSize

The batch size.

std::size_t maxTimeSteps

The maximum number of RNN time steps.

std::size_t timeSteps

Deprecated:

Use RnnParams.maxTimeSteps instead.

poplar::Tensor varTimeSteps

The run-time number of RNN time steps of dimension {batchSize} If this tensor is default constructed, the number of time steps for the sequence corresponding to each batch will be set according to the maxTimeSteps member.

std::vector<std::size_t> layerSizes

For each RNN layer the layer size parameter need to be specified for the input and the output.

This is done using a 2-element vector of which the first element is the input size and the second element is the output size of the RNN layer.

struct RnnSlice
#include <Rnn.hpp>

Tensors required for processing a single time step.

Param inputs

Input tensor sequences.

Param interimIn

Intermediate input sequence.

Param interimOut

Intermediate output sequence.

Param outputs

Output tensor sequences.

Public Members

std::vector<poplar::Tensor> inputs
poplar::Tensor interimIn
poplar::Tensor interimOut
std::vector<poplar::Tensor> outputs
struct StateSequence
#include <Rnn.hpp>

Structure that associates a particular state tensor with a user defined output tensor.

When passed to the Rnn() function the state tensor for each recurrence is stored to the provided tensor.

Param output

A tensor to which the state is to be stored

Param stateIndex

Index which identifies the state tensor which is to form the output tensor.

Public Members

poplar::Tensor output
std::size_t stateIndex

popnn/SpatialSoftMax.hpp

namespace popnn

Functions used in neural networks.

Functions

std::pair<poplar::Tensor, poplar::Tensor> spatialSoftMax2D(poplar::Graph &graph, poplar::program::Sequence &prog, const poplar::Tensor &fields, float temperature, bool disableSoftmax = false, const poplar::DebugContext &debugContext = {})

Implements a spatial softmax specialised for 2D input fields.

This computes the expected coordinates (normalised to be in [-1.0, 1.0]) for every 2D field in the input tensor. A (trainable) temperature scalar is added which normalises the softmax across the fields.

The output of the spatial softmax (first tensor in the returned pair) is a set of expected x and y coordinates for the maximum activation in each field. This result has shape {F, 2} where F is the number of fields. Y-coordinates run down the first column and X-coordinates down the second column to preserve (row,column) indexing order into the original fields.

Parameters
  • graph – Graph to which variables and vertices will be added.

  • prog – Program to which operations will be added.

  • fields – The input Tensor. Must have rank 3. Interpretation is a set of 2D scalar fields of identical height (H) and width (W) given by the two inner dimensions (so shape is {F, H, W} where F is the number of fields).

  • temperature – Initial value for the softmax scaling/normalisation parameter.

  • debugContext – Optional debug information.

  • disableSoftmax – Turns off softmax computation in this function. This is useful if you have already computed a softmax over all the fields due to other processing or for test/debug.

Returns

A pair of tensors. First is the output of the spatial-softmax, second is scalar temperature variable.