BatchNorm
#include <popnn/BatchNorm.hpp>
Batch normalization operations.
-
namespace popnn
Functions used in neural networks.
-
namespace bn
Functions
-
std::pair<poplar::Tensor, poplar::Tensor> batchNormStatistics(poplar::Graph &graph, const poplar::Tensor acts, float eps, poplar::program::Sequence &prog, bool unbiasedVarEstimate, bool stableAlgo = false, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
Estimate mean and inverse of standard deviation of batched activations.
-
std::pair<poplar::Tensor, poplar::Tensor> distributedBatchNormStatistics(poplar::Graph &replicatedGraph, const poplar::Tensor acts, float eps, poplar::program::Sequence &prog, bool unbiasedVarEstimate, poplin::DistributedNormReduceCallback reduceCallback, unsigned normBatchSize, bool stableAlgo = false, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
Compute the batch normalisation statistics for a part of the activations tensor where the
normBatchSize
batch elements are distributed over multiple replicas.Each replica gets equal-sized batches (
N
). A callback does the required reduction over multiple replicas. The activations tensor is of shape[N][C][..F..]
. The mean and inverse standard deviation is computed over dimensions{[N] [..F..]}
and vectors of lengthC
are returned as estimates.- Parameters
replicatedGraph – The replicated graph in which the computation is performed.
acts – The activation with shape
[N][C][..F..]
where:N
is the batch sizeC
is the number of channels..F..
are the dimensions of an N-dimensional field.
eps – The epsilon value added to the variance to avoid division by zero.
prog – A program sequence that the code to perform the normalisation will be appended to.
unbiasedVarEstimate – If true an unbiased variance estimate will be computed.
stableAlgo – If true, computes the mean first then subtracts the activations from it before computing the variance. The implementation with this flag set to true is
partialsType – Poplar type used for partials.
allReduceCallback – Callback to perform all-reduce over
normBatchSize
batch elements.normBatchSize – Number of batch elements over which statistics are estimated.
debugContext – Optional debug information.
- Returns
A vector pair with mean and inverse standard deviation.
-
poplar::Tensor batchNormWhiten(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &mean, const poplar::Tensor &invStdDev, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
Whiten activations given the mean and standard deviation.
-
std::pair<poplar::Tensor, poplar::Tensor> batchNormalise(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &gamma, const poplar::Tensor &beta, const poplar::Tensor &mean, const poplar::Tensor &invStdDev, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
Batch normalise activations given mean, standard deviation and batch norm parameters.
The result is two tensors:
normalised activations
whitened activations
-
poplar::Tensor batchNormalise(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &combinedMultiplicand, const poplar::Tensor &addend, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
Computes the output of batch normalisation given:
combinedMultiplicand
= gamma / stdDevaddend
= beta - gamma * mean / stdDev
-
std::pair<poplar::Tensor, poplar::Tensor> batchNormParamGradients(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &gradsIn, const poplar::Tensor &mean, const poplar::Tensor &iStdDev, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
Compute gradients with respect to parameters required for parameter update.
-
std::pair<poplar::Tensor, poplar::Tensor> batchNormParamGradients(poplar::Graph &graph, const poplar::Tensor &actsWhitened, const poplar::Tensor &gradsIn, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
Compute gradients with respect to parameters required for parameter update.
-
poplar::Tensor batchNormGradients(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &gradsIn, const poplar::Tensor &mean, const poplar::Tensor &invStdDev, const poplar::Tensor &gamma, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
Compute gradients with respect to input activations for the batch norm layer.
Gradients are propagated through the complete layer including statistics computation.
-
poplar::Tensor batchNormGradients(poplar::Graph &graph, const poplar::Tensor &actsWhitened, const poplar::Tensor &gradsIn, const poplar::Tensor &invStdDev, const poplar::Tensor &gamma, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
Compute gradients with respect to input activations for the batch norm layer.
Gradients are propagated through the complete layer including statistics computation.
-
poplar::Tensor distributedBatchNormGradients(poplar::Graph &replicatedGraph, const poplar::Tensor &actsWhitened, const poplar::Tensor &gradsIn, const poplar::Tensor &invStdDev, const poplar::Tensor &gamma, poplar::program::Sequence &prog, poplin::DistributedNormReduceCallback reduceCallback, unsigned normBatchSize, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
Propagate the gradients through the batch norm layer where equal-sized batch elements are distributed over replicas to effectively compute the batch norm over
normBatchSize
elements.Each replica gets the same number of batches (N) with
normBatchSize
= N * number-of-devicesA callback does the required reduction over the replicas the norm is spread over.
The input to the layer is the output gradients from the normalisation layer. The whitened activations and the input gradients must have undergone a prior rearrangement such that the channel dimension is the same as
invStdDev
.- Parameters
replicatedGraph – The replicated graph to which the normalisaton operation is added.
actsWhitened – Forward whitened activations.
gradsIn – Input gradients to the normalisation layer.
invStdDev – Inverse standard deviation from norm statistics.
gamma – Parameter gamma.
prog – A program sequence that the code to perform the normalisation will be appended to.
reduceCallback – A callback to perform all-reduce of the statistics gradients.
normBatchSize – The batch size over which the norm is done.
debugContext – Optional debug information.
-
poplar::Tensor distributedBatchNormGradients(poplar::Graph &replicatedGraph, const poplar::Tensor &acts, const poplar::Tensor &gradsIn, const poplar::Tensor &mean, const poplar::Tensor &invStdDev, const poplar::Tensor &gamma, poplar::program::Sequence &prog, poplin::DistributedNormReduceCallback reduceCallback, unsigned normBatchSize, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
Propagate the gradients through the batch norm layer where equal-sized batch elements are distributed over replicas to effectively compute the batch norm over
normBatchSize
elements.Each replica gets the same number of batches (N) with
normBatchSize
= N * number-of-devicesA callback does the required reduction over the replicas the norm is spread on.
The input to the layer is the output gradients from the normalisation layer. The activations and the input gradients must have undergone a prior rearrangement such that the channel dimension has the same elements as
invStdDev
. The activations are whitened within the function by applying themean
andinvStdDev
.- Parameters
replicatedGraph – The replicated graph to which the normalisaton operation is added.
acts – Inputs to the batch norm layer.
gradsIn – Input gradients to the normalisation layer.
mean – Estimated mean.
invStdDev – Inverse standard deviation from norm statistics.
gamma – Parameter gamma.
prog – A program sequence that the code to perform the normalisation will be appended to.
reduceCallback – A callback to perform all-reduce of the statistics gradients.
normBatchSize – The batch size over which the norm is done.
debugContext – Optional debug information.
-
void batchNormParamUpdate(poplar::Graph &graph, const poplar::Tensor &gammaDelta, const poplar::Tensor &betaDelta, float scale, poplar::Tensor &gamma, poplar::Tensor &beta, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
-
void batchNormParamUpdate(poplar::Graph &graph, const poplar::Tensor &gammaDelta, const poplar::Tensor &betaDelta, const poplar::Tensor &scale, poplar::Tensor &gamma, poplar::Tensor &beta, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
-
std::pair<poplar::Tensor, poplar::Tensor> batchNormStatistics(poplar::Graph &graph, const poplar::Tensor acts, float eps, poplar::program::Sequence &prog, bool unbiasedVarEstimate, bool stableAlgo = false, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})
-
namespace bn