Poplar and PopLibs
BatchNorm.hpp File Reference

Batch normalization operations. More...

#include "poplar/DebugContext.hpp"
#include "poplar/Program.hpp"
#include "poplar/Tensor.hpp"
#include "poplin/Norms.hpp"
#include <utility>

Go to the source code of this file.

Namespaces

namespace  popnn
 Functions used in neural networks.
 

Functions

std::pair< poplar::Tensor, poplar::Tensorpopnn::bn::batchNormStatistics (poplar::Graph &graph, const poplar::Tensor acts, float eps, poplar::program::Sequence &prog, bool unbiasedVarEstimate, bool stableAlgo=false, const poplar::Type &partialsType=poplar::FLOAT, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 Estimate mean and inverse of standard deviation of batched activations. More...
 
std::pair< poplar::Tensor, poplar::Tensorpopnn::bn::distributedBatchNormStatistics (poplar::Graph &replicatedGraph, const poplar::Tensor acts, float eps, poplar::program::Sequence &prog, bool unbiasedVarEstimate, poplin::DistributedNormReduceCallback reduceCallback, unsigned normBatchSize, bool stableAlgo=false, const poplar::Type &partialsType=poplar::FLOAT, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 Compute the batch normalisation statistics for a part of the activations tensor. More...
 
poplar::Tensor popnn::bn::batchNormWhiten (poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &mean, const poplar::Tensor &invStdDev, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 Whiten activations given the mean and standard deviation. More...
 
std::pair< poplar::Tensor, poplar::Tensorpopnn::bn::batchNormalise (poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &gamma, const poplar::Tensor &beta, const poplar::Tensor &mean, const poplar::Tensor &invStdDev, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 Batch normalise the activations using the given mean, standard deviation and batch norm parameters. More...
 
poplar::Tensor popnn::bn::batchNormalise (poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &combinedMultiplicand, const poplar::Tensor &addend, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 Computes the batch normalisation from a combined multiplicand and addend. More...
 
std::pair< poplar::Tensor, poplar::Tensorpopnn::bn::batchNormParamGradients (poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &gradsIn, const poplar::Tensor &mean, const poplar::Tensor &iStdDev, poplar::program::Sequence &prog, const poplar::Type &partialsType=poplar::FLOAT, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 Compute gradients with respect to parameters required for parameter update. More...
 
std::pair< poplar::Tensor, poplar::Tensorpopnn::bn::batchNormParamGradients (poplar::Graph &graph, const poplar::Tensor &actsWhitened, const poplar::Tensor &gradsIn, poplar::program::Sequence &prog, const poplar::Type &partialsType=poplar::FLOAT, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 Compute gradients with respect to parameters required for parameter update. More...
 
poplar::Tensor popnn::bn::batchNormGradients (poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &gradsIn, const poplar::Tensor &mean, const poplar::Tensor &invStdDev, const poplar::Tensor &gamma, poplar::program::Sequence &prog, const poplar::Type &partialsType=poplar::FLOAT, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 Compute gradients with respect to input activations for the batch norm layer. More...
 
poplar::Tensor popnn::bn::batchNormGradients (poplar::Graph &graph, const poplar::Tensor &actsWhitened, const poplar::Tensor &gradsIn, const poplar::Tensor &invStdDev, const poplar::Tensor &gamma, poplar::program::Sequence &prog, const poplar::Type &partialsType=poplar::FLOAT, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 Compute gradients with respect to input activations for the batch norm layer. More...
 
poplar::Tensor popnn::bn::distributedBatchNormGradients (poplar::Graph &replicatedGraph, const poplar::Tensor &actsWhitened, const poplar::Tensor &gradsIn, const poplar::Tensor &invStdDev, const poplar::Tensor &gamma, poplar::program::Sequence &prog, poplin::DistributedNormReduceCallback reduceCallback, unsigned normBatchSize, const poplar::Type &partialsType=poplar::FLOAT, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 Propagate the gradients through the batch norm layer where equal-sized batch elements are distributed over replicas to effectively compute the batch norm over normBatchSize elements. More...
 
poplar::Tensor popnn::bn::distributedBatchNormGradients (poplar::Graph &replicatedGraph, const poplar::Tensor &acts, const poplar::Tensor &gradsIn, const poplar::Tensor &mean, const poplar::Tensor &invStdDev, const poplar::Tensor &gamma, poplar::program::Sequence &prog, poplin::DistributedNormReduceCallback reduceCallback, unsigned normBatchSize, const poplar::Type &partialsType=poplar::FLOAT, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 Propagate the gradients through the batch norm layer where equal-sized batch elements are distributed over replicas to effectively compute the batch norm over normBatchSize elements. More...
 
void popnn::bn::batchNormParamUpdate (poplar::Graph &graph, const poplar::Tensor &gammaDelta, const poplar::Tensor &betaDelta, float scale, poplar::Tensor &gamma, poplar::Tensor &beta, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 Update the parameters for the batch norm layer. More...
 
void popnn::bn::batchNormParamUpdate (poplar::Graph &graph, const poplar::Tensor &gammaDelta, const poplar::Tensor &betaDelta, const poplar::Tensor &scale, poplar::Tensor &gamma, poplar::Tensor &beta, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext={}, const poplar::OptionFlags &options={})
 Update parameters for the batch norm layer. More...
 

Detailed Description

Batch normalization operations.

Function Documentation

◆ batchNormalise() [1/2]

poplar::Tensor popnn::bn::batchNormalise ( poplar::Graph graph,
const poplar::Tensor acts,
const poplar::Tensor combinedMultiplicand,
const poplar::Tensor addend,
poplar::program::Sequence prog,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

Computes the batch normalisation from a combined multiplicand and addend.

Parameters
graphThe graph that the normalisation operation is added to.
actsThe input activations that will be normalised using the combined multiplicand and addend.
combinedMultiplicand= gamma * invStdDev
addend= beta - (gamma * mean * invStdDev)
progThe program sequence to add the operation to.
debugContextOptional debug information.
optionsBatch normalisation options. Presently, there are no options that affect the operation of batch norm.
Returns
A new tensor with the normalised activations.

◆ batchNormalise() [2/2]

std::pair< poplar::Tensor, poplar::Tensor > popnn::bn::batchNormalise ( poplar::Graph graph,
const poplar::Tensor acts,
const poplar::Tensor gamma,
const poplar::Tensor beta,
const poplar::Tensor mean,
const poplar::Tensor invStdDev,
poplar::program::Sequence prog,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

Batch normalise the activations using the given mean, standard deviation and batch norm parameters.

Parameters
graphThe graph that the normalisation operation is added to.
actsThe input activations to whiten and normalise, with shape [B][C][..F..]
where:
  • B is the batch size
  • C is the number of channels
  • ..F.. are the dimensions of an N-dimensional field.
gammaThe gamma weights to multiply by when normalising the whitened activations.
betaThe beta weights to add when normalising the whitened activations.
meanThe mean to subtract when whitening the activations.
invStdDevThe inverse standard deviation to multiply by when whitening the activations.
progThe program sequence to add the operation to.
debugContextOptional debug information.
optionsBatch normalisation options. Presently, there are no options that affect the operation of batch norm.
Returns
Two tensors containing:
  • normalised activations
  • whitened activations

◆ batchNormGradients() [1/2]

poplar::Tensor popnn::bn::batchNormGradients ( poplar::Graph graph,
const poplar::Tensor acts,
const poplar::Tensor gradsIn,
const poplar::Tensor mean,
const poplar::Tensor invStdDev,
const poplar::Tensor gamma,
poplar::program::Sequence prog,
const poplar::Type partialsType = poplar::FLOAT,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

Compute gradients with respect to input activations for the batch norm layer.

Gradients are propagated through the complete layer including statistics computation.

Parameters
graphThe graph that the normalisation operation is added to.
actsThe forward-pass activation inputs to this layer.
gradsInThe gradient with respect to the output of this layer.
meanThe mean of the acts tensor, typically calculated using batchNormStatistics().
invStdDevThe inverse standard deviation of the acts tensor, typically calculated using batchNormStatistics().
gammaThe gamma weights to multiply by when normalising the whitened activations.
progThe program sequence to add the operation to.
partialsTypePoplar type used for partials. If the type specified is smaller than the input/output type then partialsType is ignored and the input/output type is used instead.
debugContextOptional debug information.
optionsBatch normalisation options. See batchNormalise().
Returns
A tensor containing the gradients with respect to the input activations for this layer.

◆ batchNormGradients() [2/2]

poplar::Tensor popnn::bn::batchNormGradients ( poplar::Graph graph,
const poplar::Tensor actsWhitened,
const poplar::Tensor gradsIn,
const poplar::Tensor invStdDev,
const poplar::Tensor gamma,
poplar::program::Sequence prog,
const poplar::Type partialsType = poplar::FLOAT,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

Compute gradients with respect to input activations for the batch norm layer.

Gradients are propagated through the complete layer including statistics computation.

Parameters
graphThe graph that the normalisation operation is added to.
actsWhitenedThe forward-pass whitened activation inputs to this layer.
gradsInThe gradient with respect to the output of this layer.
invStdDevThe inverse standard deviation to multiply by when whitening the activations.
gammaThe gamma weights to multiply by when normalising the whitened activations.
progThe program sequence to add the operation to.
partialsTypePoplar type used for partials. If the type specified is smaller than the input/output type then partialsType is ignored and the input/output type is used instead.
debugContextOptional debug information.
optionsBatch normalisation options. See batchNormalise().
Returns
A tensor containing the gradients with respect to the input activations for this layer.

◆ batchNormParamGradients() [1/2]

std::pair< poplar::Tensor, poplar::Tensor > popnn::bn::batchNormParamGradients ( poplar::Graph graph,
const poplar::Tensor acts,
const poplar::Tensor gradsIn,
const poplar::Tensor mean,
const poplar::Tensor iStdDev,
poplar::program::Sequence prog,
const poplar::Type partialsType = poplar::FLOAT,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

Compute gradients with respect to parameters required for parameter update.

Parameters
graphThe graph that the normalisation operation is added to.
actsThe forward-pass activation inputs to this layer.
gradsInThe gradient with respect to the output of this layer.
meanThe mean of the acts tensor, typically calculated using batchNormStatistics().
iStdDevThe inverse standard deviation of the acts tensor, typically calculated using batchNormStatistics().
progThe program sequence to add the operation to.
partialsTypePoplar type used for partials. If the type specified is smaller than the input/output type then partialsType is ignored and the input/output type is used instead.
debugContextOptional debug information.
optionsBatch normalisation options. See batchNormalise().
Returns
A pair of tensors, gammaDelta and betaDelta which are the gradients with respect to gamma and beta.

◆ batchNormParamGradients() [2/2]

std::pair< poplar::Tensor, poplar::Tensor > popnn::bn::batchNormParamGradients ( poplar::Graph graph,
const poplar::Tensor actsWhitened,
const poplar::Tensor gradsIn,
poplar::program::Sequence prog,
const poplar::Type partialsType = poplar::FLOAT,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

Compute gradients with respect to parameters required for parameter update.

Parameters
graphThe graph that the normalisation operation is added to.
actsWhitenedThe forward-pass whitened activation inputs to this layer.
gradsInThe gradient with respect to the output of this layer.
progThe program sequence to add the operation to.
partialsTypePoplar type used for partials. If the type specified is smaller than the input/output type then partialsType is ignored and the input/output type is used instead.
debugContextOptional debug information.
optionsBatch normalisation options. See batchNormalise().
Returns
A pair of tensors, gammaDelta and betaDelta which are the gradients with respect to gamma and beta.

◆ batchNormParamUpdate() [1/2]

void popnn::bn::batchNormParamUpdate ( poplar::Graph graph,
const poplar::Tensor gammaDelta,
const poplar::Tensor betaDelta,
const poplar::Tensor scale,
poplar::Tensor gamma,
poplar::Tensor beta,
poplar::program::Sequence prog,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

Update parameters for the batch norm layer.

Gradients are propagated through the complete layer including statistics computation.

The gamma and beta parameters are updated as follows:

  • gamma += gammaDelta * scale
  • beta += betaDelta * scale

scale is a tensor and therefore variable.

Parameters
graphThe graph that the normalisation operation is added to.
gammaDeltaValue used to update gamma.
betaDeltaValue used to update beta.
scaleScale factor for gammaDelta and betaDelta.
gammaThe gamma weights to multiply by when normalising the activations.
betaThe beta weights to add when normalising the activations.
progThe program sequence to add the operation to.
debugContextOptional debug information.
optionsBatch normalisation options. See batchNormalise().

◆ batchNormParamUpdate() [2/2]

void popnn::bn::batchNormParamUpdate ( poplar::Graph graph,
const poplar::Tensor gammaDelta,
const poplar::Tensor betaDelta,
float  scale,
poplar::Tensor gamma,
poplar::Tensor beta,
poplar::program::Sequence prog,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

Update the parameters for the batch norm layer.

Gradients are propagated through the complete layer including statistics computation.

The gamma and beta parameters are updated as follows:

  • gamma += gammaDelta * scale
  • beta += betaDelta * scale

scale is a float and therefore constant.

Parameters
graphThe graph that the normalisation operation is added to.
gammaDeltaValue used to update gamma.
betaDeltaValue used to update beta.
scaleScale factor for gammaDelta and betaDelta.
gammaThe gamma weights to multiply by when normalising the activations.
betaThe beta weights to add when normalising the activations.
progThe program sequence to add the operation to.
debugContextOptional debug information.
optionsBatch normalisation options. See batchNormalise().

◆ batchNormStatistics()

std::pair< poplar::Tensor, poplar::Tensor > popnn::bn::batchNormStatistics ( poplar::Graph graph,
const poplar::Tensor  acts,
float  eps,
poplar::program::Sequence prog,
bool  unbiasedVarEstimate,
bool  stableAlgo = false,
const poplar::Type partialsType = poplar::FLOAT,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

Estimate mean and inverse of standard deviation of batched activations.

Parameters
graphThe graph that the normalisation operation is added to.
actsThe activations for which the mean and variance are estimated.
epsThe epsilon value added to the variance to avoid division by zero.
progThe program sequence to add the operation to.
unbiasedVarEstimateIf true, an unbiased variance estimate will be computed.
stableAlgoIf true, computes the mean first then subtracts the activations from it before computing the variance. The implementation with this flag set to true is slower than when set to false.
partialsTypePoplar type used for partials. If the type specified is smaller than the input/output type then partialsType is ignored and the input/output type is used instead.
debugContextOptional debug information.
optionsBatch normalisation options. See batchNormalise().
Returns
A vector pair with mean and inverse standard deviations.

◆ batchNormWhiten()

poplar::Tensor popnn::bn::batchNormWhiten ( poplar::Graph graph,
const poplar::Tensor acts,
const poplar::Tensor mean,
const poplar::Tensor invStdDev,
poplar::program::Sequence prog,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

Whiten activations given the mean and standard deviation.

Parameters
graphThe graph that the normalisation operation is added to.
actsThe input activations that will be whitened.
meanThe previously calculated mean to subtract from the activations. Typically calculated using batchNormStatistics().
invStdDevThe previously calculated inverse standard deviation to multiply the activations by. Typically calculated using batchNormStatistics().
progThe program sequence to add the operation to.
debugContextOptional debug information.
optionsBatch normalisation options. See batchNormalise().
Returns
A new tensor with the whitened activations.

◆ distributedBatchNormGradients() [1/2]

poplar::Tensor popnn::bn::distributedBatchNormGradients ( poplar::Graph replicatedGraph,
const poplar::Tensor acts,
const poplar::Tensor gradsIn,
const poplar::Tensor mean,
const poplar::Tensor invStdDev,
const poplar::Tensor gamma,
poplar::program::Sequence prog,
poplin::DistributedNormReduceCallback  reduceCallback,
unsigned  normBatchSize,
const poplar::Type partialsType = poplar::FLOAT,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

Propagate the gradients through the batch norm layer where equal-sized batch elements are distributed over replicas to effectively compute the batch norm over normBatchSize elements.

Each replica gets the same number of batches (N) with normBatchSize = N * number-of-devices

A callback does the required reduction over the replicas the norm is spread on.

The input to the layer is the output gradients from the normalisation layer. The activations and the input gradients must have undergone a prior rearrangement such that the channel dimension has the same elements as invStdDev. The activations are whitened within the function by applying the mean and invStdDev.

Parameters
replicatedGraphThe replicated graph to which the normalisation operation is added.
actsThe forward-pass activation inputs to this layer.
gradsInThe gradient with respect to the output of this layer.
meanThe mean of the acts tensor, typically calculated using batchNormStatistics().
invStdDevThe inverse standard deviation of the acts tensor, typically calculated using batchNormStatistics().
gammaThe gamma weights to multiply by when normalising the whitened activations.
progA program sequence that the code to perform the normalisation will be appended to.
reduceCallbackA callback to perform all-reduce of the statistics gradients.
normBatchSizeThe batch size over which the norm is done.
partialsTypePoplar type used for partials. If the type specified is smaller than the input/output type then partialsType is ignored and the input/output type is used instead.
debugContextOptional debug information.
Returns
A tensor containing the gradients with respect to the input activations for this layer.

◆ distributedBatchNormGradients() [2/2]

poplar::Tensor popnn::bn::distributedBatchNormGradients ( poplar::Graph replicatedGraph,
const poplar::Tensor actsWhitened,
const poplar::Tensor gradsIn,
const poplar::Tensor invStdDev,
const poplar::Tensor gamma,
poplar::program::Sequence prog,
poplin::DistributedNormReduceCallback  reduceCallback,
unsigned  normBatchSize,
const poplar::Type partialsType = poplar::FLOAT,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

Propagate the gradients through the batch norm layer where equal-sized batch elements are distributed over replicas to effectively compute the batch norm over normBatchSize elements.

Each replica gets the same number of batches (N) with normBatchSize = N * number-of-devices

A callback does the required reduction over the replicas the norm is spread over.

The input to the layer is the output gradients from the normalisation layer. The whitened activations and the input gradients must have undergone a prior rearrangement such that the channel dimension is the same as invStdDev.

Parameters
replicatedGraphThe replicated graph to which the normalisation operation is added.
actsWhitenedThe forward-pass whitened activation inputs to this layer.
gradsInThe gradient with respect to the output of this layer.
invStdDevThe inverse standard deviation of the acts tensor, typically calculated using batchNormStatistics().
gammaThe gamma weights to multiply by when normalising the whitened activations.
progA program sequence that the code to perform the normalisation will be appended to.
reduceCallbackA callback to perform all-reduce of the statistics gradients.
normBatchSizeThe batch size over which the norm is done.
partialsTypePoplar type used for partials. If the type specified is smaller than the input/output type then partialsType is ignored and the input/output type is used instead.
debugContextOptional debug information.
Returns
A tensor containing the gradients with respect to the input activations for this layer.

◆ distributedBatchNormStatistics()

std::pair< poplar::Tensor, poplar::Tensor > popnn::bn::distributedBatchNormStatistics ( poplar::Graph replicatedGraph,
const poplar::Tensor  acts,
float  eps,
poplar::program::Sequence prog,
bool  unbiasedVarEstimate,
poplin::DistributedNormReduceCallback  reduceCallback,
unsigned  normBatchSize,
bool  stableAlgo = false,
const poplar::Type partialsType = poplar::FLOAT,
const poplar::DebugContext debugContext = {},
const poplar::OptionFlags options = {} 
)

Compute the batch normalisation statistics for a part of the activations tensor.

normBatchSize batch elements are distributed over multiple replicas. Each replica gets equal-sized batches (B). A callback does the required reduction over multiple replicas. The activations tensor is of shape [B][C][..F..]. The mean and inverse standard deviation are computed over dimensions {[B] [..F..]} and vectors of length C are returned as estimates.

Parameters
replicatedGraphThe replicated graph in which the computation is performed.
actsThe activation tensor with shape [B][C][..F..]
where:
  • B is the batch size
  • C is the number of channels
  • ..F.. are the dimensions of an N-dimensional field.
epsThe epsilon value added to the variance to avoid division by zero.
progA program sequence that the code to perform the normalisation will be appended to.
unbiasedVarEstimateIf true an unbiased variance estimate will be computed.
stableAlgoIf true, computes the mean first then subtracts the activations from it before computing the variance. The implementation with this flag set to true is slower than when set to false.
partialsTypePoplar type used for partials. If the type specified is smaller than the input/output type then partialsType is ignored and the input/output type is used instead.
allReduceCallbackCallback to perform all-reduce over normBatchSize batch elements.
normBatchSizeNumber of batch elements over which statistics are estimated.
debugContextOptional debug information.
Returns
A vector pair with mean and inverse standard deviation.