Norms

#include <poplin/Norms.hpp>

Functions to support normalising values in a tensor.

namespace poplin

Linear algebra functions.

Typedefs

using DistributedNormReduceCallback = std::function<std::vector<poplar::Tensor>(poplar::Graph &replicatedGraph, const std::vector<poplar::Tensor> &inputsToReduce, poplar::program::Sequence &prog, unsigned groupSize, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options)>

Callback to reduce statistics and gradients.

The reduce operation is reduce-add.

Param graph: The replicated graph in which the computation is performed.
Param inputsToReduce: A vector of independent tensors to reduce
Param prog: A program sequence that the code to perform the normalisation will be appended to.
Param groupSize: The number of replicas that need to be reduced. This may be less than the total number of replicas in the top level graph. A group is formed by adjacent replicas such that the top level graph contains an integral number of groupSize replicas.
Param debugContext: Optional debug information.
Param options: The structure describing options on how the reduction should be implemented.
Return: A vector of reduced tensors in the same order as supplied in inputsToReduce

Functions

poplar::Tensor createNormGamma(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Type &type, const poplar::DebugContext &debugContext = {})

Create and map the per-channel multiplicative gamma parameter tensor used for normalisation in convolution layers.

Parameters

graph – The graph with the activations and gamma tensor.
acts – The activations tensor has shape [N][C][..F..] where:
- N is the batch size
- C is the number of channels
- ..F.. is dimensions of a N-dimensional field.
type – The type of the output tensor.
debugContext – Optional debug information.

Returns

Gamma vector of dimension C.

poplar::Tensor createNormGamma(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::DebugContext &debugContext = {})

Create and map the per-channel multiplicative gamma parameter tensor used for normalisation in convolution layers.

Parameters

graph – The graph with the activations and gamma tensor.
acts – The activations tensor has shape [N][C][..F..] where:
- N is the batch size
- C is the number of channels
- ..F.. is dimensions of a N-dimensional field.
debugContext – Optional debug information.

Returns

Gamma vector of dimension C.

poplar::Tensor createNormBeta(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Type &type, const poplar::DebugContext &debugContext = {})

Create and map the per-channel additive beta parameter tensor used for normalisation in convolution layers.

Parameters

graph – The graph with the activations and beta tensor.
acts – The activations tensor has shape [N][C][..F..] where:
- N is the batch size
- C is the number of channels
- ..F.. is dimensions of a N-dimensional field
type – The type of the output tensor.
debugContext – Optional debug information.

Returns

Beta vector of dimension C.

poplar::Tensor createNormBeta(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::DebugContext &debugContext = {})

Create and map the per-channel additive beta parameter tensor used for normalisation in convolution layers.

Parameters

graph – The graph with the activations and beta tensor.
acts – The activations tensor has shape [N][C][..F..] where:
- N is the batch size
- C is the number of channels
- ..F.. is dimensions of a N-dimensional field
debugContext – Optional debug information.

Returns

Beta vector of dimension C.

std::pair<poplar::Tensor, poplar::Tensor> createNormParams(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::DebugContext &debugContext = {})

Creates a tensor pair of normalisation parameters (gamma, beta).

Parameters

graph – The graph with the activations and beta/gamma tensors.
acts – The activations tensor has shape [N][C][..F..] where:
- N is the batch size
- C is the number of channels
- ..F.. is dimensions of a N-dimensional field
debugContext – Optional debug information.

Returns

A pair of vectors of dimension C.

std::pair<poplar::Tensor, poplar::Tensor> normStatistics(poplar::Graph &graph, const poplar::Tensor &actsUngrouped, float eps, poplar::program::Sequence &prog, bool unbiasedVarEstimate, bool stableAlgo = false, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {})

Compute the normalisation statistics from the activations tensor.

The activations tensor is of shape [N][C][..F..]. The mean and inverse standard deviation is computed over dimensions {[N] [..F..]} and vectors of length C are returned as estimates.

The input activations tensor must be rearranged such that statistics are computed for C channels.

Parameters

graph – The graph in which the computation is performed.
actsUngrouped – The activation with shape [N][C][..F..] where:
- N is the batch size
- C is the number of channels
- ..F.. is dimensions of a N-dimensional field.
eps – The epsilon added to the variance to avoid divide by zero.
prog – A program sequence that the code to perform the normalisation will be appended to.
unbiasedVarEstimate – Compute unbiased variance estimate.
stableAlgo – If true, computes the mean first and subtracts the activations by it before computing the variance. The implementation with this flag set to true is
partialsType – Poplar type used for partials.
debugContext – Optional debug information.

Returns

A vector pair with mean and inverse standard deviation.

std::pair<poplar::Tensor, poplar::Tensor> distributedNormStatistics(poplar::Graph &replicatedGraph, const poplar::Tensor &actsUngrouped, float eps, poplar::program::Sequence &prog, bool unbiasedVarEstimate, DistributedNormReduceCallback allReduceCallback, unsigned normSize, bool stableAlgo = false, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {})

Compute the normalisation statistics for a part of the activations tensor which is distributed over multiple replicas.

Each replica gets equal sized batches (N) with normalisation done over normSize batches. A callback does the required mean reduction over multiple replicas. The activations tensor is of shape [N][C][..F..]. The mean and inverse standard deviation is computed over dimensions {[N] [..F..]} and vectors of length C are returned as estimates.

The input activations tensor must be rearranged such that statistics are computed for C channels.

Parameters

replicatedGraph – The replicated graph in which the computation is performed.
actsUngrouped – The activation with shape [N][C][..F..] where:
- N is the batch size
- C is the number of channels
- ..F.. is dimensions of a N-dimensional field.
eps – The epsilon added to the variance to avoid divide by zero.
prog – A program sequence that the code to perform the normalisation will be appended to.
unbiasedVarEstimate – Compute unbiased variance estimate.
stableAlgo – If true, computes the mean first and subtracts the activations by it before computing the variance. The implementation with this flag set to true is
partialsType – Poplar type used for partials.
allReduceCallback – Callback to perform all-reduce over ‘normSize’ batch elements.
normSize – Number of batch elements over which statistics are estimated.
debugContext – Optional debug information.

Returns

A vector pair with mean and inverse standard deviation.

poplar::Tensor normWhiten(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &mean, const poplar::Tensor &iStdDev, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Compute the whitened activations using the supplied mean and inverse standard deviation.

The input activations undergo a prior rearrangement such that C is the size of the statistics mean and iStdDev tensors.

Parameters

graph – The graph which the computation is in.
acts – The activations tensor of shape [N][C][..F..].
mean – Mean of the activations with dimension C.
iStdDev – Inverse standard deviation with dimension C.
prog – A program sequence that the code to perform the normalisation will be appended to.
debugContext – Optional debug information.

Returns

A new tensor with the whitened activations.

poplar::Tensor normalise(poplar::Graph &graph, const poplar::Tensor &actsWhitened, const poplar::Tensor &gamma, const poplar::Tensor &beta, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Computes the normalised output from whitened activations.

Parameters

graph – The graph to which the normalisation operation is added.
actsWhitened – The whitened activation inputs to this layer.
gamma – Per-channel multiplicative normalisation parameter.
beta – Per-channel additive normalisation parameter.
prog – A program sequence that the code to perform the normalisation will be appended to.
debugContext – Optional debug information.

Returns

A tensor containing the normalised activations.

std::pair<poplar::Tensor, poplar::Tensor> normParamGradients(poplar::Graph &graph, const poplar::Tensor &actsWhitened, const poplar::Tensor &gradsIn, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {})

Compute gradients with respect to parameters required for parameter update.

Parameters

graph – The graph to which the normalisation operation is added.
actsWhitened – The whitened activation inputs to this layer.
gradsIn – The gradient with respect to the output of this layer.
prog – A program sequence that the code to perform the normalisation will be appended to.
partialsType – The intermediate type kept in the computation.
debugContext – Optional debug information.

Returns

A pair of tensors, gammaDelta and betaDelta which are the gradients with respect to gamma and beta.

poplar::Tensor normGradients(poplar::Graph &graph, const poplar::Tensor &gradsIn, const poplar::Tensor &gamma, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Propagate the gradients through the normalisation layer.

Parameters

graph – The graph to which the normalisation operation is added.
gradsIn – The gradient with respect to the output of this layer.
gamma – Multiplicative parameter used in the normalisation.
prog – A program sequence that the code to perform the normalisation will be appended to.
debugContext – Optional debug information.

Returns

The gradient with respect to the input of this layer.

poplar::Tensor normStatisticsGradients(poplar::Graph &graph, const poplar::Tensor &actsWhitened, const poplar::Tensor &gradsIn, const poplar::Tensor &invStdDev, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {})

Propagate the gradients through the norm statistics layer.

The input to the layer is the output gradients from the normalisation layer. The whitened activations and the input gradients must have undergone a prior rearrangement such that the channel dimension has the same elements as invStdDev.

Parameters

graph – The graph to which the normalisation operation is added.
actsWhitened – The forward-pass whitened activation inputs to this layer.
gradsIn – The gradient with respect to the output of this layer.
invStdDev – Inverse standard deviation from norm statistics.
prog – A program sequence that the code to perform the normalisation will be appended to.
debugContext – Optional debug information.

Returns

The gradient with respect to the input of this layer.

poplar::Tensor distributedNormStatisticsGradients(poplar::Graph &replicatedGraph, const poplar::Tensor &actsWhitened, const poplar::Tensor &gradsIn, const poplar::Tensor &invStdDev, poplar::program::Sequence &prog, poplin::DistributedNormReduceCallback reduceCallback, unsigned normSize, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {})

Propagate the gradients through the norm statistics layer where equal sized batch elements are distributed over replicas.

Each replica gets the same number of batches and norm gradients are computed over normSize batch elements. Each replica is given N batch elements. A callback does the required reduction over multiple replicas.

The input to the layer is the output gradients from the normalisation layer. The whitened activations and the input gradients must have undergone a prior rearrangement such that the channel dimension has the same elements as invStdDev.

Parameters

replicatedGraph – The replicated graph to which the normalisation operation is added.
actsWhitened – The forward-pass whitened activation inputs to this layer.
gradsIn – The gradient with respect to the output of this layer.
invStdDev – Inverse standard deviation from norm statistics.
prog – A program sequence that the code to perform the normalisation will be appended to.
reduceCallback – A call back to perform all reduce of the statistics gradients across the replicas.
normSize – The batch size over which the norm is done.
debugContext – Optional debug information.

Returns

The gradient with respect to the input of this layer.