BatchNorm

#include <popnn/BatchNorm.hpp>

Batch normalization operations.

namespace popnn

Functions used in neural networks.

namespace bn

Functions

std::pair<poplar::Tensor, poplar::Tensor> batchNormStatistics(poplar::Graph &graph, const poplar::Tensor acts, float eps, poplar::program::Sequence &prog, bool unbiasedVarEstimate, bool stableAlgo = false, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}): Estimate mean and inverse of standard deviation of batched activations.

std::pair<poplar::Tensor, poplar::Tensor> distributedBatchNormStatistics(poplar::Graph &replicatedGraph, const poplar::Tensor acts, float eps, poplar::program::Sequence &prog, bool unbiasedVarEstimate, poplin::DistributedNormReduceCallback reduceCallback, unsigned normBatchSize, bool stableAlgo = false, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute the batch normalisation statistics for a part of the activations tensor where the normBatchSize batch elements are distributed over multiple replicas.

Each replica gets equal-sized batches (N). A callback does the required reduction over multiple replicas. The activations tensor is of shape [N][C][..F..]. The mean and inverse standard deviation is computed over dimensions {[N] [..F..]} and vectors of length C are returned as estimates.

Parameters

replicatedGraph – The replicated graph in which the computation is performed.
acts – The activation with shape [N][C][..F..] where:
- N is the batch size
- C is the number of channels
- ..F.. are the dimensions of an N-dimensional field.
eps – The epsilon value added to the variance to avoid division by zero.
prog – A program sequence that the code to perform the normalisation will be appended to.
unbiasedVarEstimate – If true an unbiased variance estimate will be computed.
stableAlgo – If true, computes the mean first then subtracts the activations from it before computing the variance. The implementation with this flag set to true is
partialsType – Poplar type used for partials.
allReduceCallback – Callback to perform all-reduce over normBatchSize batch elements.
normBatchSize – Number of batch elements over which statistics are estimated.
debugContext – Optional debug information.

Returns

A vector pair with mean and inverse standard deviation.

poplar::Tensor batchNormWhiten(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &mean, const poplar::Tensor &invStdDev, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}): Whiten activations given the mean and standard deviation.

std::pair<poplar::Tensor, poplar::Tensor> batchNormalise(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &gamma, const poplar::Tensor &beta, const poplar::Tensor &mean, const poplar::Tensor &invStdDev, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Batch normalise activations given mean, standard deviation and batch norm parameters.

The result is two tensors:

normalised activations
whitened activations

poplar::Tensor batchNormalise(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &combinedMultiplicand, const poplar::Tensor &addend, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Computes the output of batch normalisation given:

combinedMultiplicand = gamma / stdDev
addend = beta - gamma * mean / stdDev

std::pair<poplar::Tensor, poplar::Tensor> batchNormParamGradients(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &gradsIn, const poplar::Tensor &mean, const poplar::Tensor &iStdDev, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}): Compute gradients with respect to parameters required for parameter update.

std::pair<poplar::Tensor, poplar::Tensor> batchNormParamGradients(poplar::Graph &graph, const poplar::Tensor &actsWhitened, const poplar::Tensor &gradsIn, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}): Compute gradients with respect to parameters required for parameter update.

poplar::Tensor batchNormGradients(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &gradsIn, const poplar::Tensor &mean, const poplar::Tensor &invStdDev, const poplar::Tensor &gamma, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute gradients with respect to input activations for the batch norm layer.

Gradients are propagated through the complete layer including statistics computation.

poplar::Tensor batchNormGradients(poplar::Graph &graph, const poplar::Tensor &actsWhitened, const poplar::Tensor &gradsIn, const poplar::Tensor &invStdDev, const poplar::Tensor &gamma, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute gradients with respect to input activations for the batch norm layer.

Gradients are propagated through the complete layer including statistics computation.

poplar::Tensor distributedBatchNormGradients(poplar::Graph &replicatedGraph, const poplar::Tensor &actsWhitened, const poplar::Tensor &gradsIn, const poplar::Tensor &invStdDev, const poplar::Tensor &gamma, poplar::program::Sequence &prog, poplin::DistributedNormReduceCallback reduceCallback, unsigned normBatchSize, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Propagate the gradients through the batch norm layer where equal-sized batch elements are distributed over replicas to effectively compute the batch norm over normBatchSize elements.

Each replica gets the same number of batches (N) with normBatchSize = N * number-of-devices

A callback does the required reduction over the replicas the norm is spread over.

The input to the layer is the output gradients from the normalisation layer. The whitened activations and the input gradients must have undergone a prior rearrangement such that the channel dimension is the same as invStdDev.

Parameters

replicatedGraph – The replicated graph to which the normalisaton operation is added.
actsWhitened – Forward whitened activations.
gradsIn – Input gradients to the normalisation layer.
invStdDev – Inverse standard deviation from norm statistics.
gamma – Parameter gamma.
prog – A program sequence that the code to perform the normalisation will be appended to.
reduceCallback – A callback to perform all-reduce of the statistics gradients.
normBatchSize – The batch size over which the norm is done.
debugContext – Optional debug information.

poplar::Tensor distributedBatchNormGradients(poplar::Graph &replicatedGraph, const poplar::Tensor &acts, const poplar::Tensor &gradsIn, const poplar::Tensor &mean, const poplar::Tensor &invStdDev, const poplar::Tensor &gamma, poplar::program::Sequence &prog, poplin::DistributedNormReduceCallback reduceCallback, unsigned normBatchSize, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Propagate the gradients through the batch norm layer where equal-sized batch elements are distributed over replicas to effectively compute the batch norm over normBatchSize elements.

Each replica gets the same number of batches (N) with normBatchSize = N * number-of-devices

A callback does the required reduction over the replicas the norm is spread on.

The input to the layer is the output gradients from the normalisation layer. The activations and the input gradients must have undergone a prior rearrangement such that the channel dimension has the same elements as invStdDev. The activations are whitened within the function by applying the mean and invStdDev.

Parameters

replicatedGraph – The replicated graph to which the normalisaton operation is added.
acts – Inputs to the batch norm layer.
gradsIn – Input gradients to the normalisation layer.
mean – Estimated mean.
invStdDev – Inverse standard deviation from norm statistics.
gamma – Parameter gamma.
prog – A program sequence that the code to perform the normalisation will be appended to.
reduceCallback – A callback to perform all-reduce of the statistics gradients.
normBatchSize – The batch size over which the norm is done.
debugContext – Optional debug information.

void batchNormParamUpdate(poplar::Graph &graph, const poplar::Tensor &gammaDelta, const poplar::Tensor &betaDelta, float scale, poplar::Tensor &gamma, poplar::Tensor &beta, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

void batchNormParamUpdate(poplar::Graph &graph, const poplar::Tensor &gammaDelta, const poplar::Tensor &betaDelta, const poplar::Tensor &scale, poplar::Tensor &gamma, poplar::Tensor &beta, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})