Convolution

#include <poplin/Convolution.hpp>

Functions and data types to support performing convolutions.

namespace poplin

Linear algebra functions.

Decomposition of a matrix into an lower triangular matrix L and upper triangular matrix U.

Typedefs

using ConvPlanParams = std::tuple<const poplar::Target*, const ConvParams, const poplar::OptionFlags*>

Functions

uint64_t getFwdFlops(const ConvParams &params): Calculate the minimum number of floating point operations required to perform the forward pass convolution given a set of params.

uint64_t getBwdFlops(const ConvParams &params): Calculate the minimum number of floating point operations required to perform the backward pass convolution given a set of params.

uint64_t getWuFlops(const ConvParams &params): Calculate minimum number of floating point operations required to perform the weight update pass convolution given a set of params.

double getFwdPerfectCycleCount(const poplar::Graph &graph, const ConvParams &params)

Calculate the number of cycles to perform the forward pass assuming maximal utilisation of target hardware performing the minimum number of floating point operations.

This takes into account the number of tiles available and vectorization support on the target.

This is an optimistic number useful for estimating efficiency: cycleCount = getFwdFlops() / maximumHardwareVectorization.

Parameters

graph – Provides target the convolution will run on.
params – Description of convolution.

Returns

Estimated number of cycles to perform the forward pass.

double getBwdPerfectCycleCount(const poplar::Graph &graph, const ConvParams &params)

Calculate the number of cycles to perform the backward pass assuming maximal utilisation of the target hardware, performing the minimum number of floating point operations.

This takes into account the number of tiles available and vectorization support on the target.

This is an optimistic number useful for estimating efficiency: cycleCount = getBwdFlops() / maximumHardwareVectorization.

Parameters

graph – Provides target the convolution will run on.
params – Description of convolution.

Returns

Estimated number of cycles to perform the backward pass.

double getWuPerfectCycleCount(const poplar::Graph &graph, const ConvParams &params)

Calculate the number of cycles to perform the weight update pass assuming maximal utilisation of the target hardware, performing the minimum number of floating point operations.

This takes into account the number of tiles available and vectorization support on the target.

This is an optimistic number useful for estimating efficiency. cycleCount = getWuFlops() / maximumHardwareVectorization

Parameters

graph – Provides target the convolution will run on.
params – Description of convolution.

Returns

Estimated number of cycles to perform the weight update pass.

poplar::Tensor createWeights(poplar::Graph &graph, const ConvParams &params, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Create a weight tensor suitable for use with convolution()

The shape of the tensor will be [convGroups x outChansPerConvGroup x inChansPerConvGroup x H x W]

Convolution options

availableMemoryProportion Decimal between 0 and 1 (inclusive) [=0.6]

The amount of memory allocated for use for temporary data whilst the operation is executing (for example, for intermediate calculated values or temporary values passed between tiles on the IPU). The value is specified as a proportion of available memory on the IPU. So, for example, a value of 0.1 will constrain the library to use 10% of the total memory for temporary data.

The library will try and constrain the use of temporary memory to below this value. An operation that has more temporary memory available to use will run in the same or fewer cycles.

For a specific operation, the minimum amount of temporary memory the library is able to use may be more than the amount specified by this option. In this case, if POPLIBS_LOG_LEVEL=WARN or POPLIBS_POPLIN_LOG_LEVEL=WARN, a warning message will be output, and the amount specified by this option is ignored.

Note: if this value is set to less than 5% of memory (so, a value less than 0.05) then it is often the case that the library will need to create a large amount of code and data structures to keep the temporary memory low which could have a permanent memory overhead larger than the saving of temporary memory. You should take great care when setting a value this low.

See also

Optimising Temporary Memory Usage for Convolutions and Matmuls on the IPU technical note for some practical examples of using availableMemoryProportion
partialsType (half, float) [=float]

Data type used for intermediate calculations. If the type specified is smaller than the output type then the option is ignored and the output type is used instead.
pass (NONE, INFERENCE_FWD, TRAINING_FWD, TRAINING_BWD, TRAINING_WU, FC_INFERENCE_FWD, FC_TRAINING_FWD, FC_TRAINING_BWD, FC_TRAINING_WU) [=NONE]

Optimize the plan for the specified type of pass. Note the abbreviations: FWD (forward), BWD (backward), WU (weight-update), FC (fully-connected).
use128BitConvUnitLoad (true, false) [=false]

If true, convolution weights are loaded 128-bits at a time. Otherwise, they are loaded 64-bits at a time. Not all codelets support 128-bit loads. This option affects memory usage and cycle count.
enableMultiStageReduce (true, false) [=true]

If true, perform the reduction following the convolution in multiple stages if it would significantly reduce code size. This comes at the cost of increasing the number of cycles.
enableFastReduce (true, false) [=false]

If true, use a faster reduction vertex if the data types and widths allow it. This comes at the cost of further constraints on memory allocation
enableConvDithering (true, false) [=false]

If true, then convolutions with different parameters will be laid out from different tiles in an effort to improve tile balance in models.

Parameters

graph – The graph that the tensor will be added to.
params – The same parameters as used by the convolution().
name – Debugging name for the tensor.
options – Options controlling the implementation.
cache – Optional pointer to planning cache to use.

Returns

The weights tensor suitable for use with convolution().

poplar::Tensor createBiases(poplar::Graph &graph, const poplar::Tensor &activations, const poplar::DebugContext &debugContext = {"biases"})

Create a bias tensor suitable for input to the addBias() function.

The tensor will have the shape [outChans]

Parameters

graph – The graph that the tensor will be added to.
activations – The activation tensor which is output from the convolution.
name – Debugging name for the tensor.

Returns

The tensor of biases.

poplar::Tensor createBiases(poplar::Graph &graph, const poplar::Tensor &activations, const ConvParams &params, const poplar::DebugContext &debugContext = {"biases"}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Create a bias tensor suitable for input to the addBias() function with allocation consistent with plan parameters.

The tensor will have the shape [outChans]

Parameters

graph – The graph that the tensor will be added to.
activations – The activation tensor which is output from the convolution.
params – Parameters as passed to the target convolution.
name – Debugging name for the tensor.
options – Options controlling the implementation. See createWeights().
cache – Optional pointer to planning cache to use.

Returns

The tensor of biases.

poplar::Tensor createInput(poplar::Graph &graph, const ConvParams &params, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Create an input tensor for a convolution.

Use this when you need to create an input data tensor for a convolution. The same set of parameters which will be passed to the convolution() should also be passed to createInput().

The returned tensor has the shape [B x inChans x H x W].

Parameters

graph – The tensor will be added to this graph.
params – Parameters as passed to the target convolution.
name – Debugging name for the tensor.
options – Options controlling the implementation. See createWeights().
cache – Optional pointer to planning cache to use.

Returns

The allocated input tensor.

poplar::Tensor createConvOutput(poplar::Graph &graph, const ConvParams &params, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Create an output tensor for a convolution.

Use this when you need to create an output data tensor for a convolution. The same set of parameters which will be passed to the convolution() should also be passed to createInput().

The returned tensor has the shape [B x inChans x H x W].

Parameters

graph – The tensor will be added to this graph.
params – Parameters as passed to the target convolution.
debugContext – Debugging name for the tensor.
options – Options controlling the implementation. See createWeights().
cache – Optional pointer to planning cache to use.

Returns

The allocated output tensor.

poplar::Tensor convolution(poplar::Graph &graph, const poplar::Tensor &in, const poplar::Tensor &weights, const ConvParams &params, bool transposeAndFlipWeights, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Convolve an input with a set of weights.

The input tensor is in the form [B x inChans x H x W], and can be allocated using createInput(). The weights tensor is in the form [convGroups x outChansPerConvGroup x inChansPerConvGroup x H x W], and can be allocated using createWeights().

The returned tensor has the shape [B x outChans x H x W]

Padding and striding are specified in the ConvParams structure.

Parameters

graph – The graph that the operation will be added to.
in – Input data tensor.
weights – Weights tensor.
params – Parameters for the form of the convolution.
transposeAndFlipWeights – For the weight update pass.
prog – Poplar program sequence to append the operation onto.
debugContext – Optional debug information.
options – Options that control the implementation. See createWeights().
cache – Optional pointer to planning cache to use.

Returns

The convolved output tensor.

void convolutionWithOutput(poplar::Graph &graph, const poplar::Tensor &in, const poplar::Tensor &weights, const poplar::Tensor &out, const ConvParams &params, bool transposeAndFlipWeights, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Convolve an input with a set of weights into a pre-allocated output tensor.

The output tensor is in the form [B x OutChans x H x W], and can be allocated using createConvOutput(). The weights tensor is in the form [convGroups x outChansPerConvGroup x inChansPerConvGroup x H x W], and can be allocated using createWeights(). The input tensor is in the form [B x inChans x H x W], and can be allocated using createInput().

Padding and striding are specified in the ConvParams structure.

Parameters

graph – The graph that the operation will be added to.
in – Input data tensor.
weights – Weights tensor.
out – Pre-allocated output tensor.
params – Parameters for the form of the convolution.
transposeAndFlipWeights – For the weight update pass.
prog – Poplar program sequence to append the operation onto.
debugContext – Optional debug information.
options – Options that control the implementation. See createWeights().
cache – Optional pointer to planning cache to use.

void preplanConvolutions(const std::set<ConvPlanParams> &convs, PlanningCache &cache)

Deprecated:: Use preplan() instead.

Plan the specified convolutions.

All entries must have matching machine parameters.

Parameters

convs – A set of tuples of:
- conv-specific target for tile / IPU sizing
- convolution parameters
- implementation options. See createWeights().
cache – The planning cache to update.

void preplanConvolutions(poplar::Graph &graph, const std::set<ConvPlanParams> &convs, PlanningCache &cache)

Deprecated:: Use preplan() instead.

Plan the specified convolutions.

All entries must have matching machine parameters.

Parameters

graph – The graph the convolutions will belong to
convs – A set of tuples of:
- conv-specific target for tile / IPU sizing
- convolution parameters
- implementation options. See createWeights().
cache – The planning cache to update.

void weightsTransposeChansFlipXY(poplar::Graph &graph, const poplar::Tensor &weightsIn, const poplar::Tensor &weightsOut, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Copy the weights in weightsIn into weightsOut such that each element of the kernel is transposed with respect to the input and output channels and flip each spatial dimension of the kernel.

See the transposeAndFlipWeights parameter in convolution().

Parameters

graph – The graph that the operation will be added to.
weightsIn – The input weights tensor.
weightsOut – The output weights tensor.
prog – Poplar program sequence to append the operation onto.
debugContext – Optional debug information.
options – Options controlling the implementation. See createWeights().

poplar::Tensor calculateWeightDeltas(poplar::Graph &graph, const poplar::Tensor &zDeltas, const poplar::Tensor &activations, const ConvParams &params, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Append an operation to a poplar::Program to generate the tensor of weight deltas.

Parameters

graph – The tensor will be added to this graph.
zDeltas – Tensor containing the gradients with respect to the output of the convolution.
activation – Tensor containing the inputs to the convolution in the forward pass.
params – Parameters of the convolution.
prog – Poplar program sequence to append the operation onto.
debugContext – Optional debug information.
options – Options controlling the implementation. See createWeights().
cache – Optional pointer to planning cache to use.

Returns

The weight deltas are the gradients with respect to the weights of the convolution. These are populated when the operation runs.

void convolutionWeightUpdate(poplar::Graph &graph, const poplar::Tensor &zDeltas, const poplar::Tensor &weights, const poplar::Tensor &activations, ConvParams params, const poplar::Tensor &scale, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Append operations to a poplar::Program to generate and apply the weight update.

Search help

Convolution