InstanceNorm

#include <popnn/InstanceNorm.hpp>

Instance normalization operations.

Instance norm uses group norm with number of groups = number of channels.

namespace popnn

Functions used in neural networks.

namespace in

Functions

inline std::pair<poplar::Tensor, poplar::Tensor> instanceNormStatistics(poplar::Graph &graph, const poplar::Tensor acts, float eps, poplar::program::Sequence &prog, bool unbiasedVarEstimate, bool stableAlgo, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Estimate mean and inverse of standard deviation of activations.

Parameters
  • graph – The graph that the normalisation operation is added to.

  • acts – The activations for which the mean and variance are estimated.

  • eps – The epsilon value added to the variance to avoid division by zero.

  • prog – The program sequence to add the operation to.

  • unbiasedVarEstimate – If true, an unbiased variance estimate will be computed.

  • stableAlgo – If true, computes the mean first then subtracts the activations from it before computing the variance. The implementation with this flag set to true is slower than when set to false.

  • partialsType – Poplar type used for partial results. If the type specified is smaller than the input/output type then partialsType is ignored and the input/output type is used instead.

  • debugContext – Optional debug information.

  • options – Instance normalisation options. See groupNormalise().

Returns

A vector pair with mean and inverse standard deviation.

inline poplar::Tensor instanceNormWhiten(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &mean, const poplar::Tensor &invStdDev, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Whiten activations given the mean and standard deviation.

Parameters
  • graph – The graph that the normalisation operation is added to.

  • acts – The input activations that will be whitened.

  • mean – The previously calculated mean to subtract from the activations. Typically calculated using InstanceNormStatistics().

  • invStdDev – The previously calculated inverse standard deviation to multiply the activations by. Typically calculated using InstanceNormStatistics().

  • prog – The program sequence to add the operation to.

  • debugContext – Optional debug information.

  • options – Instance normalisation options. See groupNormalise().

Returns

A new tensor with the whitened activations.

inline std::pair<poplar::Tensor, poplar::Tensor> instanceNormalise(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &gamma, const poplar::Tensor &beta, const poplar::Tensor &mean, const poplar::Tensor &invStdDev, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Instance normalise activations given the mean, standard deviation and norm parameters.

As instance normalise uses group normalise, options are passed through. See the groupNormalise() documentation for details of the options.

Parameters
  • graph – The graph that the normalisation operation is added to.

  • acts – The input activations to whiten and normalise, with shape [B][C][..F..] where:

    • B is the batch size

    • C is the number of channels

    • ..F.. are the dimensions of an N-dimensional field.

  • gamma – The gamma weights to multiply by when normalising the whitened activations.

  • beta – The beta weights to add when normalising the whitened activations.

  • mean – The mean to subtract when whitening the activations.

  • invStdDev – The inverse standard deviation to multiply by when whitening the activations.

  • prog – The program sequence to add the operation to.

  • debugContext – Optional debug information.

  • options – Instance normalisation options. See groupNormalise().

Returns

Two tensors containing:

  • normalised activations

  • whitened activations

inline std::pair<poplar::Tensor, poplar::Tensor> instanceNormParamGradients(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &gradsIn, const poplar::Tensor &mean, const poplar::Tensor &iStdDev, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute gradients with respect to parameters for parameter update.

Parameters
  • graph – The graph that the normalisation operation is added to.

  • acts – The forward-pass activation inputs to this layer.

  • gradsIn – The gradient with respect to the output of this layer.

  • mean – The mean of the acts tensor, typically calculated using InstanceNormStatistics().

  • iStdDev – The inverse standard deviation of the acts tensor, typically calculated using InstanceNormStatistics().

  • prog – The program sequence to add the operation to.

  • partialsType – The Poplar type to be used for intermediate values. If the type specified is smaller than the input/output type then partialsType is ignored and the input/output type is used instead.

  • debugContext – Optional debug information.

  • options – Instance normalisation options. See groupNormalise().

Returns

A pair of tensors, gammaDelta and betaDelta which are the gradients with respect to gamma and beta.

inline std::pair<poplar::Tensor, poplar::Tensor> instanceNormParamGradients(poplar::Graph &graph, const poplar::Tensor &actsWhitened, const poplar::Tensor &gradsIn, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute gradients with respect to parameters for parameter update.

Parameters
  • graph – The graph that the normalisation operation is added to.

  • actsWhitened – The forward-pass whitened activation inputs to this layer.

  • gradsIn – The gradient with respect to the output of this layer.

  • prog – The program sequence to add the operation to.

  • partialsType – The Poplar type to be used for intermediate values. If the type specified is smaller than the input/output type then partialsType is ignored and the input/output type is used instead.

  • debugContext – Optional debug information.

  • options – Instance normalisation options. See groupNormalise().

Returns

A pair of tensors, gammaDelta and betaDelta which are the gradients with respect to gamma and beta.

inline poplar::Tensor instanceNormGradients(poplar::Graph &graph, const poplar::Tensor &acts, const poplar::Tensor &gradsIn, const poplar::Tensor &mean, const poplar::Tensor &invStdDev, const poplar::Tensor &gamma, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute gradients with respect to input activations for the instance norm layer.

Gradients are propagated through the complete layer including statistics computation.

Parameters
  • graph – The graph that the normalisation operation is added to.

  • acts – The forward-pass activation inputs to this layer.

  • gradsIn – The gradient with respect to the output of this layer.

  • mean – The mean of the acts tensor, typically calculated using InstanceNormStatistics().

  • invStdDev – The inverse standard deviation of the acts tensor, typically calculated using InstanceNormStatistics().

  • gamma – The gamma weights to multiply by when normalising the whitened activations.

  • prog – The program sequence to add the operation to.

  • partialsType – The Poplar type to be used for intermediate values. If the type specified is smaller than the input/output type then partialsType is ignored and the input/output type is used instead.

  • debugContext – Optional debug information.

  • options – Instance normalisation options. See groupNormalise().

Returns

A tensor containing the gradients with respect to the input activations for this layer.

inline poplar::Tensor instanceNormGradients(poplar::Graph &graph, const poplar::Tensor &actsWhitened, const poplar::Tensor &gradsIn, const poplar::Tensor &invStdDev, const poplar::Tensor &gamma, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Compute gradients with respect to input activations for the instance norm layer.

Gradients are propagated through the complete layer including statistics computation.

Parameters
  • graph – The graph that the normalisation operation is added to.

  • actsWhitened – The forward-pass activation inputs to this layer.

  • gradsIn – The gradient with respect to the output of this layer.

  • invStdDev – The inverse standard deviation of the acts tensor, typically calculated using InstanceNormStatistics().

  • gamma – The gamma weights to multiply by when normalising the whitened activations.

  • prog – The program sequence to add the operation to.

  • partialsType – The Poplar type to be used for intermediate values. If the type specified is smaller than the input/output type then partialsType is ignored and the input/output type is used instead.

  • debugContext – Optional debug information.

  • options – Instance normalisation options. See groupNormalise().

Returns

A tensor containing the gradients with respect to the input activations for this layer.

inline void instanceNormParamUpdate(poplar::Graph &graph, const poplar::Tensor &gammaDelta, const poplar::Tensor &betaDelta, float scale, poplar::Tensor &gamma, poplar::Tensor &beta, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update parameters for the instance norm layer.

Gradients are propagated through the complete layer including statistics computation.

The gamma and beta parameters are updated as follows:

  • gamma += gammaDelta * scale

  • beta += betaDelta * scale

scale is a float and therefore constant.

Parameters
  • graph – The graph that the normalisation operation is added to.

  • gammaDelta – Value used to update gamma.

  • betaDelta – Value used to update beta.

  • scale – Scale factor for gammaDelta and betaDelta.

  • gamma – The gamma weights to multiply by when normalising the activations.

  • beta – The beta weights to add when normalising the activations.

  • prog – The program sequence to add the operation to.

  • debugContext – Optional debug information.

  • options – Instance normalisation options. See groupNormalise().

inline void instanceNormParamUpdate(poplar::Graph &graph, const poplar::Tensor &gammaDelta, const poplar::Tensor &betaDelta, const poplar::Tensor &scale, poplar::Tensor &gamma, poplar::Tensor &beta, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Update parameters for the instance norm layer.

Gradients are propagated through the complete layer including statistics computation.

The gamma and beta parameters are updated as follows:

  • gamma += gammaDelta * scale

  • beta += betaDelta * scale

scale is a tensor and therefore variable.

Parameters
  • graph – The graph that the normalisation operation is added to.

  • gammaDelta – Value used to update gamma.

  • betaDelta – Value used to update beta.

  • scale – Scale factor for gammaDelta and betaDelta.

  • gamma – The gamma weights to multiply by when normalising the activations.

  • beta – The beta weights to add when normalising the activations.

  • prog – The program sequence to add the operation to.

  • debugContext – Optional debug information.

  • options – Instance normalisation options. See groupNormalise().

uint64_t getFwdFlops(uint64_t numChannels, uint64_t actsPerChannel, bool computeEstimates)

For computing the floating point operations required, the following values are used:

  • Activations per channel:

    • for fully-connected layers: the total number of batches.

    • for convolution layers: the field size per channel * batch size.

  • Number of channels:

    • for fully-connected layers: the total number of activations in a batch.

    • for convolution layers: the total number of channels.

Parameters
  • numChannels – The activations per channel.

  • actsPerChannel – The number of channels.

  • computeEstimates

Returns

Number of floating point operations required.

uint64_t getBwdFlops(uint64_t numChannels, uint64_t actsPerChannel)
Parameters
Returns

Number of floating point operations required.

uint64_t getWuFlops(uint64_t numChannels, uint64_t actsPerChannel)
Parameters
Returns

Number of floating point operations required.