MatMul

#include <poplin/MatMul.hpp>

Functions and data types for performing matrix multiplies on the IPU.

namespace poplin

Linear algebra functions.

Typedefs

using MatMulPlanParams = std::tuple<const poplar::Target*, const MatMulParams, const poplar::OptionFlags*>

A tuple containing the required parameters to preplan a matmul:

  • matmul-specific target for tile / IPU sizing

  • matmul parameters

  • implementation options (see matMul() above)

All entries must have matching machine parameters.

using MatMulToConvOptions = std::unordered_map<const poplar::OptionFlags*, poplar::OptionFlags>

Mapping of pointers to matrix multiplication option flags to the corresponding convolution option flags.

Functions

poplar::Tensor matMul(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::Type &outputType, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Multiply two matrices.

Calculates C = A * B where A and B are matrices.

Matrix multiply options

  • availableMemoryProportion Decimal between 0 and 1 (inclusive) [=0.6]

    See createWeights().

  • fullyConnectedPass (NONE, INFERENCE_FWD, TRAINING_FWD, TRAINING_BWD, TRAINING_WU) [=NONE]

    Optimize the plan for the specified type of pass. Note the abbreviations: FWD (forward), BWD (backward), WU (weight-update).

  • inputRHSIsPreArranged (true, false) [=false]

    Indicates to matMul functions whether the input data has already been re-arranged (using preArrangeMatMulInputRHS()). This allows data to be re-arranged once then used many times.

  • use128BitConvUnitLoad (true, false) [=false]

    If true, weights are loaded into the convolution unit 128-bits at a time. Otherwise, they are loaded 64-bits at a time. Not all codelets support 128-bit loads. This option affects memory usage and cycle count.

  • enableMultiStageReduce (true, false) [=true]

    If true, perform the reduction following the matrix multiplication in multiple stages if it would significantly reduce code size. This comes at the cost of increasing the number of cycles.

  • enableFastReduce (true, false) [=false]

    If true, use a faster reduction vertex if the data types and widths allow it. This comes at the cost of further constraints on memory allocation

  • remapOutputTensor (true, false) [=true]

    If true, the output of the convolution is remapped if the output is detected to have a poor layout.

  • partialsType (half, float) [=float]

    See createWeights().

Parameters
  • graph – The Poplar graph.

  • A – The left argument to the multiplication. This 2D tensor must be already mapped to tiles.

  • B – The right argument to the multiplication. This 2D tensor must be already mapped to tiles.

  • prog – A reference to a program sequence which will be appended with the code to perform the multiplication.

  • outputType – Optional via overloaded function. Element type of returned tensor. The default is A.elementType() if omitted.

  • debugContext – Optional debug information.

  • options – The structure describing options on how the multiplication should be implemented.

  • cache – Optional pointer to a planning cache to use.

Returns

The tensor holding the result of the multiplication. This tensor will be created, added to the graph and mapped to tiles.Matrix multiply with explicitly defined output type.

poplar::Tensor matMul(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Matrix multiply where output type is the same as input A.

void matMulWithOutput(poplar::Graph &graph, const poplar::Tensor &A_, const poplar::Tensor &B_, poplar::Tensor &out, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options_ = {}, PlanningCache *cache = nullptr)

Matrix multiply with explicitly defined output.

void matMulReportPlan(std::ostream &out, const poplar::Graph &graph, const poplar::Type &inputType, const poplar::Type &outputType, const std::vector<std::size_t> &aShape, const std::vector<std::size_t> &bShape, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Report the convolution plan corresponding to the parameters and options provided.

Parameters
  • out – Stream to write report to.

  • graph – The Poplar graph.

  • inputType – Element type of input tensors.

  • outputType – Element type of output tensor.

  • aShape – Shape of input tensor A.

  • bShape – Shape of input tensor B.

  • options – The structure describing options on how the multiplication should be implemented.

  • cache – Optional pointer to a planning cache to use.

poplar::Tensor matMulGrouped(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::Type &outputType, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Multiply two grouped matrices.

Calculates C[g] = A[g] * B[g] where A[g] and B[g] are matrices for each element in the group, and g is an element of the set {0, 1, …, G-1}.

The multiplication is done for every element in the group. The first dimension of the matrices is the group dimension with value equal to G.

Parameters
  • graph – The Poplar graph.

  • A – The left argument to the grouped multiplication. This 3D tensor must be already mapped to tiles.

  • B – The right argument to the grouped multiplication. This 3D tensor must be already mapped to tiles.

  • prog – A reference to a program sequence which will be appended with the code to perform the multiplication.

  • outputType – Data type to be used for the returned tensor.

  • debugContext – Optional debug information.

  • options – The structure describing options on how the grouped multiplication should be implemented. See matMul().

  • cache – Optional pointer to a planning cache to use.

Returns

The tensor holding the result of the grouped multiplication. This tensor will be created, added to the graph and mapped to tiles.

void matMulGroupedWithOutput(poplar::Graph &graph, const poplar::Tensor &A, const poplar::Tensor &B, poplar::Tensor &out, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options_ = {}, PlanningCache *cache = nullptr)

Grouped matmul with explicit output argument.

void matMulGroupedReportPlan(std::ostream &out, const poplar::Graph &graph, const poplar::Type &inputType, const poplar::Type &outputType, const std::vector<std::size_t> &aShape, const std::vector<std::size_t> &bShape, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Report the convolution plan corresponding to the params and options provided.

Parameters
  • out – Stream to write report to.

  • graph – The Poplar graph.

  • inputType – Element type of input tensors.

  • outputType – Element type of output tensor.

  • aShape – Shape of input tensor A.

  • bShape – Shape of input tensor B.

  • options – The structure describing options on how the multiplication should be implemented.

  • cache – Optional pointer to a planning cache to use.

void matMulAcc(poplar::Graph &graph, const poplar::Tensor &C, float k, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Multiply two matrices and add to a third (with a scaling factor).

Calculates C += k * A * B where A, B are matrices and k is a constant scalar.

Parameters
  • graph – The Poplar graph.

  • C – The matrix to add to. This 2D tensor must be already mapped to tiles.

  • k – The constant or a single element tensor to multiply the result of the multiplication. If k is a tensor, it must be of the same type as A

  • A – The left argument to the multiplication. This 2D tensor must be already mapped to tiles.

  • B – The right argument to the multiplication. This 2D tensor must be already mapped to tiles.

  • prog – A reference to a program sequence which will be appended with the code to perform the multiplication and add.

  • debugContext – Optional debug information.

  • options – The structure describing options on how the multiplication should be implemented. See matMul().

  • cache – Optional pointer to a planning cache to use.Matrix multiply and accumulate with a scalar scaling factor.

void matMulAcc(poplar::Graph &graph, const poplar::Tensor &C, const poplar::Tensor &k, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Matrix multiply and accumulate with a single-element scaling factor.

void matMulGroupedAcc(poplar::Graph &graph, const poplar::Tensor &C, float k, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Grouped matrix multiply and accumulate.

Multiply two grouped matrices and add to a third (with a scaling factor).

Calculates C[g] += k * A[g] * B[g] where A[g], B[g] are matrices and k is a constant scalar. g is element of the set g = {0, 1, …, G-1}

The multiplication is done for every element in the group. The first dimension of the matrices is the group dimension with value equal to G.

Parameters
  • graph – The Poplar graph.

  • C – The matrix to add to. This 3D tensor must be already mapped to tiles.

  • k – The constant or a single element tensor to multiply the result of the multiplication. If k is a tensor, it must be of the same type as A

  • A – The left argument to the grouped multiplication. This 3D tensor must be already mapped to tiles.

  • B – The right argument to the multiplication. This 3D tensor must be already mapped to tiles.

  • prog – A reference to a program sequence which will be appended with the code to perform the grouped multiplication and add.

  • debugContext – Optional debug information.

  • options – The structure describing options on how the multiplication should be implemented. See matMul().

  • cache – Optional pointer to planning cache to use.Grouped matrix multiply and accumulate with a scalar scaling factor.

void matMulGroupedAcc(poplar::Graph &graph, const poplar::Tensor &C, const poplar::Tensor &k, const poplar::Tensor &A, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Grouped matrix multiply and accumulate with a single-element scaling factor.

poplar::Tensor createMatMulInputLHS(poplar::Graph &graph, const poplar::Type &inputType, const poplar::Type &outputType, const std::vector<std::size_t> &aShape, const std::vector<std::size_t> &bShape, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Create a tensor that is used as the left operand of matrix multiplication.

The types of the input and and output tensors are specified separately. This will create a 2D tensor in the graph. The ordering and tile mapping of the tensor will be set to make a matrix multiplication with this tensor as the left argument efficient.

Parameters
  • graph – The Poplar graph.

  • inputType – The input data type.

  • outputType – The data type of the returned tensor.

  • aShape – The shape of the required matrix.

  • bShape – The shape of the matrix that the required matrix will be multiplied by.

  • debugContext – Debug information.

  • options – The implementation options of the multiplication. See matMul().

  • cache – Optional pointer to a planning cache to use.

Returns

A matrix of type type and shape aShape. The tensor will have been mapped to tiles.

poplar::Tensor createMatMulInputLHS(poplar::Graph &graph, const poplar::Type &dataType, const std::vector<std::size_t> &aShape, const std::vector<std::size_t> &bShape, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Create a tensor that is used as the left operand of matrix multiplication.

The type of both input and output tensors is specified by dataType. This will create a 2D tensor in the graph. The ordering and tile mapping of the tensor will be set to make a matrix multiplication with this tensor as the left argument efficient.

Parameters
  • graph – The Poplar graph.

  • dataType – The data type of both the input and output tensors.

  • aShape – The shape of the required matrix.

  • bShape – The shape of the matrix that the required matrix will be multiplied by.

  • debugContext – Debug information.

  • options – The implementation options of the multiplication. See matMul().

  • cache – Optional pointer to a planning cache to use.

Returns

A matrix of type type and shape aShape. The tensor will have been mapped to tiles.

poplar::Tensor createMatMulGroupedInputLHS(poplar::Graph &graph, const poplar::Type &inputType, const poplar::Type &outputType, const std::vector<std::size_t> &aShape, const std::vector<std::size_t> &bShape, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Create a tensor that is used as the left operand of a grouped matrix multiplication.

This will create a 3D tensor in the graph. The ordering and tile mapping of the tensor will be set to make a grouped matrix multiplication with this tensor as the left argument efficient.

The first dimension of the required matrix and the matrix it multiplies by must the number of groups.

Parameters
  • graph – The Poplar graph.

  • type – The data type of the required matrix.

  • aShape – The grouped shape [g, r, c] of the required matrix.

  • bShape – The grouped shape [g, r, c] of the matrix that the required matrix will be multiplied by.

  • debugContext – Debug information.

  • options – The implementation options of the multiplication. See matMul().

  • cache – Optional pointer to a planning cache to use.

Returns

A matrix of type type and grouped shape aShape. The tensor will have been mapped to tiles.

poplar::Tensor createMatMulInputRHS(poplar::Graph &graph, const poplar::Type &inputType, const poplar::Type &outputType, const std::vector<std::size_t> &aShape, const std::vector<std::size_t> &bShape, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Create a tensor that is used as the right operand of matrix multiplication.

This will create a 2D tensor in the graph. The ordering and tile mapping of the tensor will be set to make a matrix multiplication with this tensor as the right argument efficient.

Parameters
  • graph – The Poplar graph.

  • inputType – The input data type.

  • outputType – The data type of the returned tensor.

  • aShape – The shape of the matrix that the required matrix will be multiplied by.

  • bShape – The shape of the required matrix.

  • debugContext – Debug information.

  • options – The implementation options of the multiplication. See matMul().

  • cache – Optional pointer to a planning cache to use.

Returns

A matrix of type type and shape bShape. The tensor will have been mapped to tiles.

poplar::Tensor createMatMulInputRHS(poplar::Graph &graph, const poplar::Type &dataType, const std::vector<std::size_t> &aShape, const std::vector<std::size_t> &bShape, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Overloaded function for when inputType == outputType (represented by the dataType parameter).

poplar::Tensor createMatMulOutput(poplar::Graph &graph, const poplar::Type &inputType, const poplar::Type &outputType, const std::vector<std::size_t> &aShape, const std::vector<std::size_t> &bShape, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Create a tensor that is used as the output operand of matrix multiplication.

This will create a 2D tensor in the graph. The ordering and tile mapping of the tensor will be set to make a matrix multiplication with this tensor as the output argument efficient.

Parameters
  • graph – The Poplar graph.

  • inputType – The input data type.

  • outputType – The data type of the returned tensor.

  • aShape – The shape of the matrix that the required matrix will be multiplied by.

  • bShape – The shape of the required matrix.

  • debugContext – Debug information.

  • options – The implementation options of the multiplication. See matMul().

  • cache – Optional pointer to a planning cache to use.

Returns

A matrix of type type and shape [ aShape[0], bShape[1] ]. The tensor will have been mapped to tiles.

poplar::Tensor createMatMulOutput(poplar::Graph &graph, const poplar::Type &dataType, const std::vector<std::size_t> &aShape, const std::vector<std::size_t> &bShape, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Overloaded function for when inputType == outputType (represented by the dataType parameter).

poplar::Tensor createMatMulGroupedInputRHS(poplar::Graph &graph, const poplar::Type &inputType, const poplar::Type &outputType, const std::vector<std::size_t> &aShape, const std::vector<std::size_t> &bShape, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Create a tensor that is used as the right operand of grouped matrix multiplication.

This will create a 3D tensor in the graph. The ordering and tile mapping of the tensor will be set to make a grouped matrix multiplication with this tensor as the right argument efficient.

The first dimension of the required matrix and the matrix it multiplies by must the number of groups.

Parameters
  • graph – The Poplar graph.

  • type – The data type of the required matrix.

  • aShape – The grouped shape [g, r, c] of the matrix that the required matrix will be multiplied by.

  • bShape – The grouped shape [g, r, c] of the required matrix.

  • debugContext – Debug information.

  • options – The implementation options of the multiplication. See matMul().

  • cache – Optional pointer to planning cache to use.

Returns

A matrix of type type and grouped shape bShape. The tensor will have been mapped to tiles.

poplar::Tensor createMatMulGroupedOutput(poplar::Graph &graph, const poplar::Type &inputType, const poplar::Type &outputType, const std::vector<std::size_t> &aShape, const std::vector<std::size_t> &bShape, const poplar::DebugContext &debugContext, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Create a tensor that is used as the output operand of grouped matrix multiplication (with output).

This will create a 3D tensor in the graph. The ordering and tile mapping of the tensor will be set to make a grouped matrix multiplication with this tensor as the output argument efficient.

The first dimension of the required matrix and the matrix it multiplies by must the number of groups.

Parameters
  • graph – The Poplar graph.

  • type – The data type of the required matrix.

  • aShape – The grouped shape [g, r, c] of the matrix that the required matrix will be multiplied by.

  • bShape – The grouped shape [g, r, c] of the required matrix.

  • debugContext – Debug information.

  • options – The implementation options of the multiplication. See matMul().

  • cache – Optional pointer to planning cache to use.

Returns

A matrix of type type and grouped shape [ aShape[g], aShape[r], bShape[c] ]. The tensor will have been mapped to tiles.

poplar::Tensor preArrangeMatMulInputRHS(poplar::Graph &graph, const std::vector<std::size_t> &aShape, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::Type &outputType, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Pre-arrange right-hand side input.

Re-arrange memory for RHS operand to an upcoming matmul operation. This allows the rearrangement of the memory of a tensor that would otherwise be rearranged as part of the matmul operation for efficiency.

Use this function and the matMul*() functions with the inputRHSIsPreArranged option flag to do any re-arrangement necessary once and then re-use that input multiple times.

Only valid for fully connected layers.

Parameters
  • graph – The Poplar graph.

  • aShape – The shape of the left argument to the multiplication.

  • B – The right argument to the multiplication. This 2D tensor must be already mapped to tiles.

  • prog – A reference to a program sequence which will be appended with the code to perform the arrangement.

  • outputType – Optional via overloaded function. Element type of returned tensor. The default is B.elementType() if omitted.

  • debugContext – Optional debug information.

  • options – Flags describing options for how the multiplication should be implemented. See matMul().

  • cache – Optional pointer to planning cache to use.

Returns

New tensor holding the rearranged input. This tensor has the same shape as the given tensor.Pre-arrange input with explicitly defined output type.

poplar::Tensor preArrangeMatMulInputRHS(poplar::Graph &graph, const std::vector<std::size_t> &aShape, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Pre-arrange input where the output type is the same as B.

poplar::Tensor preArrangeMatMulGroupedInputRHS(poplar::Graph &graph, const std::vector<std::size_t> &aShape, const poplar::Tensor &B, poplar::program::Sequence &prog, const poplar::Type &outputType, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Pre-arrange grouped input with explicitly defined output type.

poplar::Tensor transposeGroupedMatrix(const poplar::Tensor &A)

Transposes a grouped matrix tensor.

Parameters

A – Tensor to transpose

Returns

Transposed tensor

std::set<ConvPlanParams> matMulGetConvPlanParams(const std::set<MatMulPlanParams> &matmuls, MatMulToConvOptions &matmulToConvOpts)

Obtain the set of convolution parameters corresponding to the user supplied set of parameters for matrix multiplication.

Parameters
  • matmuls – Set of Matrix multiplication parameter tuples

  • matmulToConvOpts – Convolution options corresponding to every matrix multiplication options.

Returns

Set of Convolution parameters

void preplanMatMuls(const std::set<MatMulPlanParams> &matmuls, matmul::PlanningCache &cache)

Deprecated:

Use preplan() instead.

Plan the specified matrix multiplications.

Parameters
  • matmuls – A set of parameters to preplan matmuls

  • cache – The planning cache to update

void matmulValidateOptions(const poplar::OptionFlags &options)

Provides an interface to validate the matmul options.

Presence of invalid key or a value will throw an exception.

Parameters

options – Flags describing options for how the multiplication should be implemented. See matMul().

struct MatMulParams
#include <MatMul.hpp>

Parameters to define a Matrix multiplication.

C = A * B

Public Members

poplar::Type inputType

Input type (of A & B)

poplar::Type outputType

Output type (of C)

std::vector<std::size_t> aShape

Shape of the lhs input matrix (A)

std::vector<std::size_t> bShape

Shape of the rhs input matrix (B)

Friends

friend bool operator<(const MatMulParams &a, const MatMulParams &b)
namespace matmul
class PlanningCache : public poplin::PlanningCache
#include <MatMul.hpp>

Deprecated:

Use poplin::PlanningCache instead.

Public Functions

poplin::PlanningCache &getImpl()