Norms
#include <poplin/Norms.hpp>
Functions to support normalising values in a tensor.
-
namespace poplin
Linear algebra functions.
Typedefs
-
using DistributedNormReduceCallback = std::function<std::vector<poplar::Tensor>(poplar::Graph &replicatedGraph, const std::vector<poplar::Tensor> &inputsToReduce, poplar::program::Sequence &prog, unsigned groupSize, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options)>
Callback to reduce statistics and gradients.
The reduce operation is reduce-add.
- Param graph
The replicated graph in which the computation is performed.
- Param inputsToReduce
A vector of independent tensors to reduce
- Param prog
A program sequence that the code to perform the normalisation will be appended to.
- Param groupSize
The number of replicas that need to be reduced. This may be less than the total number of replicas in the top level graph. A group is formed by adjacent replicas such that the top level graph contains an integral number of
groupSize
replicas.- Param debugContext
Optional debug information.
- Param options
The structure describing options on how the reduction should be implemented.
- Return
A vector of reduced tensors in the same order as supplied in
inputsToReduce
Functions
-
poplar::Tensor createNormGamma(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Type &type, const poplar::DebugContext &debugContext = {})
Create and map the per-channel multiplicative gamma parameter tensor used for normalisation in convolution layers.
- Parameters
graph – The graph with the activations and gamma tensor.
acts – The activations tensor has shape
[N][C][..F..]
where:N
is the batch sizeC
is the number of channels..F..
is dimensions of a N-dimensional field.
type – The type of the output tensor.
debugContext – Optional debug information.
- Returns
Gamma vector of dimension
C
.
-
poplar::Tensor createNormGamma(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::DebugContext &debugContext = {})
Create and map the per-channel multiplicative gamma parameter tensor used for normalisation in convolution layers.
- Parameters
graph – The graph with the activations and gamma tensor.
acts – The activations tensor has shape
[N][C][..F..]
where:N
is the batch sizeC
is the number of channels..F..
is dimensions of a N-dimensional field.
debugContext – Optional debug information.
- Returns
Gamma vector of dimension
C
.
-
poplar::Tensor createNormBeta(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Type &type, const poplar::DebugContext &debugContext = {})
Create and map the per-channel additive beta parameter tensor used for normalisation in convolution layers.
- Parameters
graph – The graph with the activations and beta tensor.
acts – The activations tensor has shape
[N][C][..F..]
where:N
is the batch sizeC
is the number of channels..F..
is dimensions of a N-dimensional field
type – The type of the output tensor.
debugContext – Optional debug information.
- Returns
Beta vector of dimension
C
.
-
poplar::Tensor createNormBeta(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::DebugContext &debugContext = {})
Create and map the per-channel additive beta parameter tensor used for normalisation in convolution layers.
- Parameters
graph – The graph with the activations and beta tensor.
acts – The activations tensor has shape
[N][C][..F..]
where:N
is the batch sizeC
is the number of channels..F..
is dimensions of a N-dimensional field
debugContext – Optional debug information.
- Returns
Beta vector of dimension
C
.
-
std::pair<poplar::Tensor, poplar::Tensor> createNormParams(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::DebugContext &debugContext = {})
Creates a tensor pair of normalisation parameters (gamma, beta).
- Parameters
graph – The graph with the activations and beta/gamma tensors.
acts – The activations tensor has shape
[N][C][..F..]
where:N
is the batch sizeC
is the number of channels..F..
is dimensions of a N-dimensional field
debugContext – Optional debug information.
- Returns
A pair of vectors of dimension
C
.
-
std::pair<poplar::Tensor, poplar::Tensor> normStatistics(poplar::Graph &graph, const poplar::Tensor &actsUngrouped, float eps, poplar::program::Sequence &prog, bool unbiasedVarEstimate, bool stableAlgo = false, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {})
Compute the normalisation statistics from the activations tensor.
The activations tensor is of shape
[N][C][..F..]
. The mean and inverse standard deviation is computed over dimensions{[N] [..F..]}
and vectors of lengthC
are returned as estimates.The input activations tensor must be rearranged such that statistics are computed for
C
channels.- Parameters
graph – The graph in which the computation is performed.
actsUngrouped – The activation with shape
[N][C][..F..]
where:N
is the batch sizeC
is the number of channels..F..
is dimensions of a N-dimensional field.
eps – The epsilon added to the variance to avoid divide by zero.
prog – A program sequence that the code to perform the normalisation will be appended to.
unbiasedVarEstimate – Compute unbiased variance estimate.
stableAlgo – If true, computes the mean first and subtracts the activations by it before computing the variance. The implementation with this flag set to true is
partialsType – Poplar type used for partials.
debugContext – Optional debug information.
- Returns
A vector pair with mean and inverse standard deviation.
-
std::pair<poplar::Tensor, poplar::Tensor> distributedNormStatistics(poplar::Graph &replicatedGraph, const poplar::Tensor &actsUngrouped, float eps, poplar::program::Sequence &prog, bool unbiasedVarEstimate, DistributedNormReduceCallback allReduceCallback, unsigned normSize, bool stableAlgo = false, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {})
Compute the normalisation statistics for a part of the activations tensor which is distributed over multiple replicas.
Each replica gets equal sized batches (
N
) with normalisation done overnormSize
batches. A callback does the required mean reduction over multiple replicas. The activations tensor is of shape[N][C][..F..]
. The mean and inverse standard deviation is computed over dimensions{[N] [..F..]}
and vectors of lengthC
are returned as estimates.The input activations tensor must be rearranged such that statistics are computed for
C
channels.- Parameters
replicatedGraph – The replicated graph in which the computation is performed.
actsUngrouped – The activation with shape
[N][C][..F..]
where:N
is the batch sizeC
is the number of channels..F..
is dimensions of a N-dimensional field.
eps – The epsilon added to the variance to avoid divide by zero.
prog – A program sequence that the code to perform the normalisation will be appended to.
unbiasedVarEstimate – Compute unbiased variance estimate.
stableAlgo – If true, computes the mean first and subtracts the activations by it before computing the variance. The implementation with this flag set to true is
partialsType – Poplar type used for partials.
allReduceCallback – Callback to perform all-reduce over ‘normSize’ batch elements.
normSize – Number of batch elements over which statistics are estimated.
debugContext – Optional debug information.
- Returns
A vector pair with mean and inverse standard deviation.
-
poplar::Tensor normWhiten(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &mean, const poplar::Tensor &iStdDev, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
Compute the whitened activations using the supplied mean and inverse standard deviation.
The input activations undergo a prior rearrangement such that
C
is the size of the statisticsmean
andiStdDev
tensors.- Parameters
graph – The graph which the computation is in.
acts – The activations tensor of shape [N][C][..F..].
mean – Mean of the activations with dimension C.
iStdDev – Inverse standard deviation with dimension C.
prog – A program sequence that the code to perform the normalisation will be appended to.
debugContext – Optional debug information.
- Returns
A new tensor with the whitened activations.
-
poplar::Tensor normalise(poplar::Graph &graph, const poplar::Tensor &actsWhitened, const poplar::Tensor &gamma, const poplar::Tensor &beta, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
Computes the normalised output from whitened activations.
- Parameters
graph – The graph to which the normalisation operation is added.
actsWhitened – The whitened activation inputs to this layer.
gamma – Per-channel multiplicative normalisation parameter.
beta – Per-channel additive normalisation parameter.
prog – A program sequence that the code to perform the normalisation will be appended to.
debugContext – Optional debug information.
- Returns
A tensor containing the normalised activations.
-
std::pair<poplar::Tensor, poplar::Tensor> normParamGradients(poplar::Graph &graph, const poplar::Tensor &actsWhitened, const poplar::Tensor &gradsIn, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {})
Compute gradients with respect to parameters required for parameter update.
- Parameters
graph – The graph to which the normalisation operation is added.
actsWhitened – The whitened activation inputs to this layer.
gradsIn – The gradient with respect to the output of this layer.
prog – A program sequence that the code to perform the normalisation will be appended to.
partialsType – The intermediate type kept in the computation.
debugContext – Optional debug information.
- Returns
A pair of tensors,
gammaDelta
andbetaDelta
which are the gradients with respect togamma
andbeta
.
-
poplar::Tensor normGradients(poplar::Graph &graph, const poplar::Tensor &gradsIn, const poplar::Tensor &gamma, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})
Propagate the gradients through the normalisation layer.
- Parameters
graph – The graph to which the normalisation operation is added.
gradsIn – The gradient with respect to the output of this layer.
gamma – Multiplicative parameter used in the normalisation.
prog – A program sequence that the code to perform the normalisation will be appended to.
debugContext – Optional debug information.
- Returns
The gradient with respect to the input of this layer.
-
poplar::Tensor normStatisticsGradients(poplar::Graph &graph, const poplar::Tensor &actsWhitened, const poplar::Tensor &gradsIn, const poplar::Tensor &invStdDev, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {})
Propagate the gradients through the norm statistics layer.
The input to the layer is the output gradients from the normalisation layer. The whitened activations and the input gradients must have undergone a prior rearrangement such that the channel dimension has the same elements as
invStdDev
.- Parameters
graph – The graph to which the normalisation operation is added.
actsWhitened – The forward-pass whitened activation inputs to this layer.
gradsIn – The gradient with respect to the output of this layer.
invStdDev – Inverse standard deviation from norm statistics.
prog – A program sequence that the code to perform the normalisation will be appended to.
debugContext – Optional debug information.
- Returns
The gradient with respect to the input of this layer.
-
poplar::Tensor distributedNormStatisticsGradients(poplar::Graph &replicatedGraph, const poplar::Tensor &actsWhitened, const poplar::Tensor &gradsIn, const poplar::Tensor &invStdDev, poplar::program::Sequence &prog, poplin::DistributedNormReduceCallback reduceCallback, unsigned normSize, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {})
Propagate the gradients through the norm statistics layer where equal sized batch elements are distributed over replicas.
Each replica gets the same number of batches and norm gradients are computed over
normSize
batch elements. Each replica is givenN
batch elements. A callback does the required reduction over multiple replicas.The input to the layer is the output gradients from the normalisation layer. The whitened activations and the input gradients must have undergone a prior rearrangement such that the channel dimension has the same elements as
invStdDev
.- Parameters
replicatedGraph – The replicated graph to which the normalisation operation is added.
actsWhitened – The forward-pass whitened activation inputs to this layer.
gradsIn – The gradient with respect to the output of this layer.
invStdDev – Inverse standard deviation from norm statistics.
prog – A program sequence that the code to perform the normalisation will be appended to.
reduceCallback – A call back to perform all reduce of the statistics gradients across the replicas.
normSize – The batch size over which the norm is done.
debugContext – Optional debug information.
- Returns
The gradient with respect to the input of this layer.
-
using DistributedNormReduceCallback = std::function<std::vector<poplar::Tensor>(poplar::Graph &replicatedGraph, const std::vector<poplar::Tensor> &inputsToReduce, poplar::program::Sequence &prog, unsigned groupSize, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options)>