GroupNorm

#include <popnn/GroupNorm.hpp>

Group normalization operations.

namespace popnn

Functions used in neural networks.

namespace gn

Functions

std::pair<poplar::Tensor, poplar::Tensor> groupNormStatistics(poplar::Graph &graph, const poplar::Tensor acts, float eps, poplar::program::Sequence &prog, unsigned numGroups, bool unbiasedVarEstimate, bool stableAlgo = false, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Estimate mean and inverse of standard deviation of activations.

Parameters
  • graph – The graph that the normalisation operation is added to.

  • acts – The activations for which the mean and variance are estimated.

  • eps – The epsilon value added to the variance to avoid division by zero.

  • prog – The program sequence to add the operation to.

  • numGroups – The number of groups to split the channel dimension into when calculating group norm statistics. The groupNormStridedChannelGrouping option defines how the split is made.

  • unbiasedVarEstimate – If true, an unbiased variance estimate will be computed.

  • stableAlgo – If true, computes the mean first then subtracts the activations from it before computing the variance. The implementation with this flag set to true is slower than when set to false.

  • partialsType – Poplar type used for intermediate values. If the type specified is smaller than the input/ output type then partialsType is ignored and the input/output type is used instead.

  • debugContext – Optional debug information.

  • options – Group normalisation options. See groupNormalise().

Returns

A vector pair with mean and inverse standard deviation.

poplar::Tensor groupNormWhiten(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &mean, const poplar::Tensor &invStdDev, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Whiten activations given the mean and standard deviation.

Parameters
  • graph – The graph that the normalisation operation is added to.

  • acts – The input activations that will be whitened.

  • mean – The previously calculated mean to subtract from the activations. Typically calculated using groupNormStatistics().

  • invStdDev – The previously calculated inverse standard deviation to multiply the activations by. Typically calculated using groupNormStatistics().

  • prog – The program sequence to add the operation to.

  • debugContext – Optional debug information.

  • options – Group normalisation options. See groupNormalise().

Returns

A new tensor with the whitened activations.

std::pair<poplar::Tensor, poplar::Tensor> groupNormalise(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &gamma, const poplar::Tensor &beta, const poplar::Tensor &mean, const poplar::Tensor &invStdDev, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Group normalise activations given the mean, standard deviation and group norm parameters.

Group normalisation options

  • groupNormStridedChannelGrouping (true, false) [=true]

    Select groups of channels for group normalisation with a stride between channels. This makes the implementation more efficient but is unconventional. Among other things this will mean that using pre-trained weights would not be possible if not produced with this unconventional implementation.

    If we have numGroups groups then the channels in the group groups[groupIdx] are given by:

    • Strided channel grouping: channelInGroupIdx * numGroups + groupIdx

    • Otherwise: channelInGroupIdx + channelsPerGroup * groupIdx

    In the case of instanceNormalise() and layerNormalise() (which use group norm in their implementation) this option will have no effect.

Parameters
  • graph – The graph that the normalisation operation is added to.

  • acts – The input activations to whiten and normalise, with shape [B][C][..F..] where:

    • B is the batch size

    • C is the number of channels

    • ..F.. are the dimensions of an N-dimensional field.

  • gamma – The gamma weights to multiply by when normalising the whitened activations.

  • beta – The beta weights to add when normalising the whitened activations.

  • mean – The mean to subtract when whitening the activations.

  • invStdDev – The inverse standard deviation to multiply by when whitening the activations.

  • prog – The program sequence to add the operation to.

  • debugContext – Optional debug information.

  • options – Group normalisation options.

Returns

Two tensors containing:

  • normalised activations

  • whitened activations

std::pair<poplar::Tensor, poplar::Tensor> groupNormParamGradients(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &gradsIn, const poplar::Tensor &mean, const poplar::Tensor &iStdDev, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute gradients with respect to parameters for parameter update.

Parameters
  • graph – The graph that the normalisation operation is added to.

  • acts – The forward-pass activation inputs to this layer.

  • gradsIn – The gradient with respect to the output of this layer.

  • mean – The mean of the acts tensor, typically calculated using groupNormStatistics().

  • iStdDev – The inverse standard deviation of the acts tensor, typically calculated using groupNormStatistics().

  • prog – The program sequence to add the operation to.

  • partialsType – Poplar type used for intermediate values. If the type specified is smaller than the input/output type then partialsType is ignored and the input/output type is used instead.

  • debugContext – Optional debug information.

  • options – Group normalisation options. See groupNormalise().

Returns

A pair of tensors, gammaDelta and betaDelta which are the gradients with respect to gamma and beta.

std::pair<poplar::Tensor, poplar::Tensor> groupNormParamGradients(poplar::Graph &graph, const poplar::Tensor &actsWhitened, const poplar::Tensor &gradsIn, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute gradients with respect to parameters for parameter update.

Parameters
  • graph – The graph that the normalisation operation is added to.

  • actsWhitened – The forward-pass whitened activation inputs to this layer.

  • gradsIn – The gradient with respect to the output of this layer.

  • prog – The program sequence to add the operation to.

  • partialsType – Poplar type used for intermediate values. If the type specified is smaller than the input/output type then partialsType is ignored and the input/output type is used instead.

  • debugContext – Optional debug information.

  • options – Group normalisation options. See groupNormalise().

Returns

A pair of tensors, gammaDelta and betaDelta which are the gradients with respect to gamma and beta.

poplar::Tensor groupNormGradients(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &gradsIn, const poplar::Tensor &mean, const poplar::Tensor &invStdDev, const poplar::Tensor &gamma, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute gradients with respect to input activations for the group norm layer.

Gradients are propagated through the complete layer including statistics computation.

Parameters
  • graph – The graph that the normalisation operation is added to.

  • acts – The forward-pass activation inputs to this layer.

  • gradsIn – The gradient with respect to the output of this layer.

  • mean – The mean of the acts tensor, typically calculated using groupNormStatistics().

  • invStdDev – The inverse standard deviation of the acts tensor, typically calculated using groupNormStatistics().

  • gamma – The gamma weights to multiply by when normalising the whitened activations.

  • prog – The program sequence to add the operation to.

  • partialsType – Poplar type used for intermediate values. If the type specified is smaller than the input/output type then partialsType is ignored and the input/output type is used instead.

  • debugContext – Optional debug information.

  • options – Group normalisation options. See groupNormalise().

Returns

A tensor containing the gradients with respect to the input activations for this layer.

poplar::Tensor groupNormGradients(poplar::Graph &graph, const poplar::Tensor &actsWhitened, const poplar::Tensor &gradsIn, const poplar::Tensor &invStdDev, const poplar::Tensor &gamma, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute gradients with respect to input activations for the group norm layer.

Gradients are propagated through the complete layer including statistics computation.

Parameters
  • graph – The graph that the normalisation operation is added to.

  • actsWhitened – The forward-pass whitened activation inputs to this layer.

  • gradsIn – The gradient with respect to the output of this layer.

  • invStdDev – The inverse standard deviation of the acts tensor, typically calculated using groupNormStatistics().

  • gamma – The gamma weights to multiply by when normalising the whitened activations.

  • prog – The program sequence to add the operation to.

  • partialsType – Poplar type used for intermediate values. If the type specified is smaller than the input/output type then partialsType is ignored and the input/output type is used instead.

  • debugContext – Optional debug information.

  • options – Group normalisation options. See groupNormalise().

Returns

A tensor containing the gradients with respect to the input activations for this layer.

void groupNormParamUpdate(poplar::Graph &graph, const poplar::Tensor &gammaDelta, const poplar::Tensor &betaDelta, float scale, poplar::Tensor &gamma, poplar::Tensor &beta, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update parameters for the group norm layer.

Gradients are propagated through the complete layer including statistics computation.

The gamma and beta parameters are updated as follows:

  • gamma += gammaDelta * scale

  • beta += betaDelta * scale

scale is a float and therefore constant.

Parameters
  • graph – The graph that the normalisation operation is added to.

  • gammaDelta – Value used to update gamma.

  • betaDelta – Value used to update beta.

  • scale – Scale factor for gammaDelta and betaDelta.

  • gamma – The gamma weights to multiply by when normalising the activations.

  • beta – The beta weights to add when normalising the activations.

  • prog – The program sequence to add the operation to.

  • debugContext – Optional debug information.

  • options – Group normalisation options. See groupNormalise().

void groupNormParamUpdate(poplar::Graph &graph, const poplar::Tensor &gammaDelta, const poplar::Tensor &betaDelta, const poplar::Tensor &scale, poplar::Tensor &gamma, poplar::Tensor &beta, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update parameters for the group norm layer.

Gradients are propagated through the complete layer including statistics computation.

The gamma and beta parameters are updated as follows:

  • gamma += gammaDelta * scale

  • beta += betaDelta * scale

scale is a tensor and therefore variable.

Parameters
  • graph – The graph that the normalisation operation is added to.

  • gammaDelta – Value used to update gamma.

  • betaDelta – Value used to update beta.

  • scale – Scale factor for gammaDelta and betaDelta.

  • gamma – The gamma weights to multiply by when normalising the activations.

  • beta – The beta weights to add when normalising the activations.

  • prog – The program sequence to add the operation to.

  • debugContext – Optional debug information.

  • options – Group normalisation options. See groupNormalise().