Convolution

#include <poplin/Convolution.hpp>

Functions and data types to support performing convolutions.

namespace poplin

Linear algebra functions.

Decomposition of a matrix into an lower triangular matrix L and upper triangular matrix U.

Typedefs

using ConvPlanParams = std::tuple<const poplar::Target*, const ConvParams, const poplar::OptionFlags*>

Functions

uint64_t getFwdFlops(const ConvParams &params)

Calculate the minimum number of floating point operations required to perform the forward pass convolution given a set of params.

uint64_t getBwdFlops(const ConvParams &params)

Calculate the minimum number of floating point operations required to perform the backward pass convolution given a set of params.

uint64_t getWuFlops(const ConvParams &params)

Calculate minimum number of floating point operations required to perform the weight update pass convolution given a set of params.

double getFwdPerfectCycleCount(const poplar::Graph &graph, const ConvParams &params)

Calculate the number of cycles to perform the forward pass assuming maximal utilisation of target hardware performing the minimum number of floating point operations.

This takes into account the number of tiles available and vectorization support on the target.

This is an optimistic number useful for estimating efficiency: cycleCount = getFwdFlops() / maximumHardwareVectorization.

Parameters
  • graph – Provides target the convolution will run on.

  • params – Description of convolution.

Returns

Estimated number of cycles to perform the forward pass.

double getBwdPerfectCycleCount(const poplar::Graph &graph, const ConvParams &params)

Calculate the number of cycles to perform the backward pass assuming maximal utilisation of the target hardware, performing the minimum number of floating point operations.

This takes into account the number of tiles available and vectorization support on the target.

This is an optimistic number useful for estimating efficiency: cycleCount = getBwdFlops() / maximumHardwareVectorization.

Parameters
  • graph – Provides target the convolution will run on.

  • params – Description of convolution.

Returns

Estimated number of cycles to perform the backward pass.

double getWuPerfectCycleCount(const poplar::Graph &graph, const ConvParams &params)

Calculate the number of cycles to perform the weight update pass assuming maximal utilisation of the target hardware, performing the minimum number of floating point operations.

This takes into account the number of tiles available and vectorization support on the target.

This is an optimistic number useful for estimating efficiency. cycleCount = getWuFlops() / maximumHardwareVectorization

Parameters
  • graph – Provides target the convolution will run on.

  • params – Description of convolution.

Returns

Estimated number of cycles to perform the weight update pass.

poplar::Tensor createWeights(poplar::Graph &graph, const ConvParams &params, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Create a weight tensor suitable for use with convolution()

The shape of the tensor will be [convGroups x outChansPerConvGroup x inChansPerConvGroup x H x W]

Convolution options

  • availableMemoryProportion Decimal between 0 and 1 (inclusive) [=0.6]

    The amount of memory allocated for use for temporary data whilst the operation is executing (for example, for intermediate calculated values or temporary values passed between tiles on the IPU). The value is specified as a proportion of available memory on the IPU. So, for example, a value of 0.1 will constrain the library to use 10% of the total memory for temporary data.

    The library will try and constrain the use of temporary memory to below this value. An operation that has more temporary memory available to use will run in the same or fewer cycles.

    For a specific operation, the minimum amount of temporary memory the library is able to use may be more than the amount specified by this option. In this case, if POPLIBS_LOG_LEVEL=WARN or POPLIBS_POPLIN_LOG_LEVEL=WARN, a warning message will be output, and the amount specified by this option is ignored.

    Note: if this value is set to less than 5% of memory (so, a value less than 0.05) then it is often the case that the library will need to create a large amount of code and data structures to keep the temporary memory low which could have a permanent memory overhead larger than the saving of temporary memory. You should take great care when setting a value this low.

    See also

    Optimising Temporary Memory Usage for Convolutions and Matmuls on the IPU technical note for some practical examples of using availableMemoryProportion

  • partialsType (half, float) [=float]

    Data type used for intermediate calculations. If the type specified is smaller than the output type then the option is ignored and the output type is used instead.

  • pass (NONE, INFERENCE_FWD, TRAINING_FWD, TRAINING_BWD, TRAINING_WU, FC_INFERENCE_FWD, FC_TRAINING_FWD, FC_TRAINING_BWD, FC_TRAINING_WU) [=NONE]

    Optimize the plan for the specified type of pass. Note the abbreviations: FWD (forward), BWD (backward), WU (weight-update), FC (fully-connected).

  • use128BitConvUnitLoad (true, false) [=false]

    If true, convolution weights are loaded 128-bits at a time. Otherwise, they are loaded 64-bits at a time. Not all codelets support 128-bit loads. This option affects memory usage and cycle count.

  • enableMultiStageReduce (true, false) [=true]

    If true, perform the reduction following the convolution in multiple stages if it would significantly reduce code size. This comes at the cost of increasing the number of cycles.

  • enableFastReduce (true, false) [=false]

    If true, use a faster reduction vertex if the data types and widths allow it. This comes at the cost of further constraints on memory allocation

  • enableConvDithering (true, false) [=false]

    If true, then convolutions with different parameters will be laid out from different tiles in an effort to improve tile balance in models.

Parameters
  • graph – The graph that the tensor will be added to.

  • params – The same parameters as used by the convolution().

  • name – Debugging name for the tensor.

  • options – Options controlling the implementation.

  • cache – Optional pointer to planning cache to use.

Returns

The weights tensor suitable for use with convolution().

poplar::Tensor createBiases(poplar::Graph &graph, const poplar::Tensor &activations, const poplar::DebugContext &debugContext = {"biases"})

Create a bias tensor suitable for input to the addBias() function.

The tensor will have the shape [outChans]

Parameters
  • graph – The graph that the tensor will be added to.

  • activations – The activation tensor which is output from the convolution.

  • name – Debugging name for the tensor.

Returns

The tensor of biases.

poplar::Tensor createBiases(poplar::Graph &graph, const poplar::Tensor &activations, const ConvParams &params, const poplar::DebugContext &debugContext = {"biases"}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Create a bias tensor suitable for input to the addBias() function with allocation consistent with plan parameters.

The tensor will have the shape [outChans]

Parameters
  • graph – The graph that the tensor will be added to.

  • activations – The activation tensor which is output from the convolution.

  • params – Parameters as passed to the target convolution.

  • name – Debugging name for the tensor.

  • options – Options controlling the implementation. See createWeights().

  • cache – Optional pointer to planning cache to use.

Returns

The tensor of biases.

poplar::Tensor createInput(poplar::Graph &graph, const ConvParams &params, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Create an input tensor for a convolution.

Use this when you need to create an input data tensor for a convolution. The same set of parameters which will be passed to the convolution() should also be passed to createInput().

The returned tensor has the shape [B x inChans x H x W].

Parameters
  • graph – The tensor will be added to this graph.

  • params – Parameters as passed to the target convolution.

  • name – Debugging name for the tensor.

  • options – Options controlling the implementation. See createWeights().

  • cache – Optional pointer to planning cache to use.

Returns

The allocated input tensor.

poplar::Tensor createConvOutput(poplar::Graph &graph, const ConvParams &params, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Create an output tensor for a convolution.

Use this when you need to create an output data tensor for a convolution. The same set of parameters which will be passed to the convolution() should also be passed to createInput().

The returned tensor has the shape [B x inChans x H x W].

Parameters
  • graph – The tensor will be added to this graph.

  • params – Parameters as passed to the target convolution.

  • debugContext – Debugging name for the tensor.

  • options – Options controlling the implementation. See createWeights().

  • cache – Optional pointer to planning cache to use.

Returns

The allocated output tensor.

poplar::Tensor convolution(poplar::Graph &graph, const poplar::Tensor &in, const poplar::Tensor &weights, const ConvParams &params, bool transposeAndFlipWeights, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Convolve an input with a set of weights.

The input tensor is in the form [B x inChans x H x W], and can be allocated using createInput(). The weights tensor is in the form [convGroups x outChansPerConvGroup x inChansPerConvGroup x H x W], and can be allocated using createWeights().

The returned tensor has the shape [B x outChans x H x W]

Padding and striding are specified in the ConvParams structure.

Parameters
  • graph – The graph that the operation will be added to.

  • in – Input data tensor.

  • weights – Weights tensor.

  • params – Parameters for the form of the convolution.

  • transposeAndFlipWeights – For the weight update pass.

  • prog – Poplar program sequence to append the operation onto.

  • debugContext – Optional debug information.

  • options – Options that control the implementation. See createWeights().

  • cache – Optional pointer to planning cache to use.

Returns

The convolved output tensor.

void convolutionWithOutput(poplar::Graph &graph, const poplar::Tensor &in, const poplar::Tensor &weights, const poplar::Tensor &out, const ConvParams &params, bool transposeAndFlipWeights, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Convolve an input with a set of weights into a pre-allocated output tensor.

The output tensor is in the form [B x OutChans x H x W], and can be allocated using createConvOutput(). The weights tensor is in the form [convGroups x outChansPerConvGroup x inChansPerConvGroup x H x W], and can be allocated using createWeights(). The input tensor is in the form [B x inChans x H x W], and can be allocated using createInput().

Padding and striding are specified in the ConvParams structure.

Parameters
  • graph – The graph that the operation will be added to.

  • in – Input data tensor.

  • weights – Weights tensor.

  • out – Pre-allocated output tensor.

  • params – Parameters for the form of the convolution.

  • transposeAndFlipWeights – For the weight update pass.

  • prog – Poplar program sequence to append the operation onto.

  • debugContext – Optional debug information.

  • options – Options that control the implementation. See createWeights().

  • cache – Optional pointer to planning cache to use.

void preplanConvolutions(const std::set<ConvPlanParams> &convs, PlanningCache &cache)

Deprecated:

Use preplan() instead.

Plan the specified convolutions.

All entries must have matching machine parameters.

Parameters
  • convs – A set of tuples of:

    • conv-specific target for tile / IPU sizing

    • convolution parameters

    • implementation options. See createWeights().

  • cache – The planning cache to update.

void preplanConvolutions(poplar::Graph &graph, const std::set<ConvPlanParams> &convs, PlanningCache &cache)

Deprecated:

Use preplan() instead.

Plan the specified convolutions.

All entries must have matching machine parameters.

Parameters
  • graph – The graph the convolutions will belong to

  • convs – A set of tuples of:

    • conv-specific target for tile / IPU sizing

    • convolution parameters

    • implementation options. See createWeights().

  • cache – The planning cache to update.

void weightsTransposeChansFlipXY(poplar::Graph &graph, const poplar::Tensor &weightsIn, const poplar::Tensor &weightsOut, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {})

Copy the weights in weightsIn into weightsOut such that each element of the kernel is transposed with respect to the input and output channels and flip each spatial dimension of the kernel.

See the transposeAndFlipWeights parameter in convolution().

Parameters
  • graph – The graph that the operation will be added to.

  • weightsIn – The input weights tensor.

  • weightsOut – The output weights tensor.

  • prog – Poplar program sequence to append the operation onto.

  • debugContext – Optional debug information.

  • options – Options controlling the implementation. See createWeights().

poplar::Tensor calculateWeightDeltas(poplar::Graph &graph, const poplar::Tensor &zDeltas, const poplar::Tensor &activations, const ConvParams &params, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Append an operation to a poplar::Program to generate the tensor of weight deltas.

Parameters
  • graph – The tensor will be added to this graph.

  • zDeltas – Tensor containing the gradients with respect to the output of the convolution.

  • activation – Tensor containing the inputs to the convolution in the forward pass.

  • params – Parameters of the convolution.

  • prog – Poplar program sequence to append the operation onto.

  • debugContext – Optional debug information.

  • options – Options controlling the implementation. See createWeights().

  • cache – Optional pointer to planning cache to use.

Returns

The weight deltas are the gradients with respect to the weights of the convolution. These are populated when the operation runs.

void convolutionWeightUpdate(poplar::Graph &graph, const poplar::Tensor &zDeltas, const poplar::Tensor &weights, const poplar::Tensor &activations, ConvParams params, const poplar::Tensor &scale, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Append operations to a poplar::Program to generate and apply the weight update.

Parameters
  • graph – The graph that the operation will be added to.

  • zDeltas – Tensor containing the gradients with respect to the output of the convolution.

  • weights – Weights tensor.

  • activations – Tensor containing the inputs to the convolution in the forward pass.

  • params – Parameters of the convolution.

  • scale – Scale to apply to the zDeltas.

  • prog – Poplar program sequence to append the operations onto.

  • debugContext – Optional debug information.

  • options – Options controlling the implementation. See createWeights().

  • cache – Optional pointer to planning cache to use.

void convolutionWeightUpdate(poplar::Graph &graph, const poplar::Tensor &zDeltas, const poplar::Tensor &weights, const poplar::Tensor &activations, ConvParams params, float scale, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Append operations to a poplar::Program to generate and apply the weight update.

Parameters
  • graph – The graph that the operation will be added to.

  • zDeltas – Tensor containing the gradients with respect to the output of the convolution.

  • weights – Weights tensor.

  • activations – Tensor containing the inputs to the convolution in the forward pass.

  • params – Parameters of the convolution.

  • scale – Scale to apply to the zDeltas.

  • prog – Poplar program sequence to append the operations onto.

  • debugContext – Optional debug information.

  • options – Options controlling the implementation. See createWeights().

  • cache – Optional pointer to planning cache to use.

void convolutionBiasUpdate(poplar::Graph &graph, const poplar::Tensor &zDeltas, const poplar::Tensor &biases, const poplar::Tensor &scale, const poplar::OptionFlags &options, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Add a program to update biases tensor with the gradients derived from the zDeltas tensor.

Parameters
  • graph – The graph that the operation will be added to.

  • zDeltas – Tensor containing the gradients with respect to the output of the convolution.

  • biases – Biases tensor to update.

  • scale – Scale to apply to to zDeltas tensor.

  • options – Options controlling the implementation. See createWeights().

  • prog – Poplar program sequence to append the operation onto.

  • debugContext – Optional debug information.

void convolutionBiasUpdate(poplar::Graph &graph, const poplar::Tensor &zDeltas, const poplar::Tensor &biases, float scale, const poplar::OptionFlags &options, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Add a program to update biases tensor with the gradients derived from the zDeltas tensor.

Parameters
  • graph – The graph that the operation will be added to.

  • zDeltas – Tensor containing the gradients with respect to the output of the convolution.

  • biases – Biases tensor to update.

  • scale – Scale to apply to to zDeltas tensor.

  • options – Options controlling the implementation. See createWeights().

  • prog – Poplar program sequence to append the operation onto.

  • debugContext – Optional debug information.

void addBias(poplar::Graph &graph, const poplar::Tensor &in, const poplar::Tensor &biases, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {})

Adds a program to prog which adds biases to activations tensor.

Parameters
  • graph – The graph that the operation will be added to.

  • input – Tensor containing values which to add the biases.

  • biases – Biases to add to the input tensor.

  • prog – Poplar program sequence to append the operation onto.

  • debugContext – Optional debug information.

void reportPlanInfo(std::ostream &out, const poplar::Graph &graph, const ConvParams &params, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Report the convolution plan corresponding to the params and options provided.

Parameters
  • out – Output stream to report the plan to.

  • graph – The graph that the convolution is planned with.

  • params – The same parameters as used by the convolution().

  • options – Options controlling the implementation. See createWeights().

  • cache – Optional pointer to planning cache to use.

PlanCosts reportPlanEstimatedCosts(const poplar::Graph &graph, const ConvParams &params, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Report the estimated cycles and memory costs of the convolution plan corresponding to the params and options provided.

Parameters
  • graph – The graph that the convolution is planned with.

  • params – The same parameters as used by the convolution().

  • options – Options controlling the implementation. See createWeights().

  • cache – Optional pointer to planning cache to use.

Returns

Cycles and memory cost estimates for the planned convolution.

void reportWeightUpdatePlanInfo(std::ostream &out, const poplar::Graph &graph, const ConvParams &fwdParams, const poplar::OptionFlags &fwdOptions = {}, PlanningCache *cache = nullptr)

Report the convolution plan corresponding to the weight update pass given the forward pass params and options.

Parameters
  • out – ostream to report the plan to.

  • graph – The graph that the convolution is planned with.

  • fwdParams – Forward pass parameters as used by the convolution().

  • fwdOptions – Forward pass options controlling the implementation. See createWeights().

  • cache – Optional pointer to planning cache to use.

poplar::Tensor fullyConnectedWeightTranspose(poplar::Graph &graph, poplar::Tensor weights, const ConvParams &params, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Arranges the weights (activations) such that they are suited for the backward pass in a fully connected layer.

Parameters
  • graph – The graph that the operation will be added to.

  • activations – Tensor containing the inputs to the convolution.

  • params – Parameters of the convolution.

  • prog – Poplar program sequence to append the operation onto.

  • debugContext – Optional debug information.

  • options – Options controlling the implementation. See createWeights().

  • cache – Optional pointer to planning cache to use.

Returns

A tensor with the weights suitably arranged.

void convolutionValidateOptions(const poplar::OptionFlags &options)

Provides an interface to validate the convolution options.

Presence of invalid key or a value will throw an exception.

Parameters

options – Options controlling the implementation. See createWeights().

struct PlanCosts
#include <Convolution.hpp>

Structure for estimated costs returned by reportPlanEstimatedCosts()

Public Members

std::size_t cycles
std::size_t memory

Public Static Attributes

static constexpr size_t unknown = std::numeric_limits<std::size_t>::max()
class PlanningCache

Subclassed by poplin::matmul::PlanningCache

Public Functions

PlanningCache()
~PlanningCache()
std::size_t size() const

Returns the number of entries currently stored in the cache.

Public Members

std::unique_ptr<PlanningCacheImpl> impl
namespace internal

Functions

std::ostream &operator<<(std::ostream &os, DetailedPlanCosts const &c)
std::istream &operator>>(std::istream &is, DetailedPlanCosts &c)
DetailedPlanCosts reportDetailedPlanEstimatedCosts(const poplar::Graph &graph, const ConvParams &params, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Like reportPlanEstimatedCosts() but return an itemised breakdown of the estimates based on what the planner did.

struct DetailedPlanCosts
#include <Convolution.hpp>

Structure for detailed estimated costs returned by reportDetailedPlanEstimatedCosts().

Public Functions

template<typename Function>
inline void apply(Function fn, bool includeTotal = true, bool serialSplitOnly = false) const
template<typename Function>
inline void apply(Function fn, bool includeTotal = true, bool serialSplitOnly = false)

Public Members

std::size_t parallelSplit = 1
std::size_t serialSplit = 1
PlanCosts broadcast = {}
PlanCosts rearrangement = {}
PlanCosts dynamicSlice = {}
PlanCosts transform = {}
PlanCosts exchange = {}
PlanCosts tileLevelTransform = {}
PlanCosts inputsCast = {}
PlanCosts compute = {}
PlanCosts reduction = {}
PlanCosts dynamicUpdate = {}
PlanCosts addInPlace = {}
PlanCosts outputCast = {}
PlanCosts total = {}