FullyConnected

Refer to PopSparse Matrix Multiplication (Dynamic Pattern) on the IPU technical note for a high-level description of the algorithmic design of the dynamic sparse matrix multiplication in the Graphcore PopSparse library. The document provides a guide to the code and some pointers to the implementation of the PopSparse library.

#include <popsparse/FullyConnected.hpp>

Fully-connected layers using sparse tensors.

namespace popsparse

Support for sparse matrices.

namespace dynamic

Support for dynamic sparse matrices.

Functions

SparseTensor createFullyConnectedWeights(poplar::Graph &graph, const poplar::Type &inputType, const FullyConnectedParams &params, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Create a sparse tensor that is used as the weights W for a fully connected layer.

The following options are available:

availableMemoryProportion Decimal between 0 and 1 [=0.6]

The maximum proportion of available memory on each tile that this layer should consume temporarily during the course of the operation.
metaInfoBucketOversizeProportion Decimal between 0 and 1 [=0.3]

This specifies additional elements to allocate in each bucket of meta-information as a proportion of the required size for a perfectly uniformly distributed sparsity pattern.
doGradAPass (true, false) [=false]

doGradWPass (true, false) [=false]

Indicate which passes are present for the operation of the layer as a whole. It is assumed that the forward pass is always present.
partialsType poplar::Type [=poplar::FLOAT]

The type to use for partial results. If the type specified is smaller than the output type then the option is ignored and the output type is used instead.
sharedBuckets (true, false) [=true]

If set, forces the same buckets to be used for all three passes.

Parameters

graph – The Poplar graph.
inputType – The type for inputs to the operation.
params – Parameters for the fully connected layer.
debugContext – Optional debug information.
options – Implementation options for the fully connected layer.
cache – Optional pointer to planning cache to use.

Returns

A tensor with sparse representation of weights for the fully connected layer.

poplar::Tensor createFullyConnectedInput(poplar::Graph &graph, const poplar::Type &inputType, const FullyConnectedParams &params, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Create a dense tensor that is used as the input activations for a fully connected layer.

This returned tensor is of shape [batchSize, inputChannelsPerGroup].

Parameters

graph – The Poplar graph.
inputType – The type for inputs to the operation.
params – Parameters for the fully connected layer.
debugContext – Optional debug information.
options – Implementation options for the fully connected layer. See createFullyConnectedWeights() for details.
cache – Optional pointer to planning cache to use.

poplar::Tensor fullyConnectedFwd(poplar::Graph &graph, const SparseTensor &weights, const poplar::Tensor &activations, const FullyConnectedParams &fcParams, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Run a fully connected forward (or inference) pass.

The sparse-weights tensor is made up of meta information for the sparsity and the non-zero values. Does the Fwd operation described in the Note above but with input and output transposed.

The meta information for the sparse weights tensor must be created for the forward (or inference) pass and should be created by use of the createFullyConnectedWeights() function.

Parameters

graph – The Poplar graph.
weights – Sparsity information of the weights tensor.
activations – The dense activations have shape [batchSize][inputChannelsPerGroup * numGroups]
fcParams – Fully connected layer parameters.
prog – A reference to a program sequence which will be appended with the code to perform the forward operation.
debugContext – Optional debug information.
options – The structure describing options on how the operation should be implemented. See createFullyConnectedWeights() for details.
cache – Optional pointer to planning cache to use.

Returns

The tensor holding the result. This tensor will be created, added to the graph and mapped to tiles. The result tensor is of shape [batchSize][outputChannelsPerGroup * numGroups]

poplar::Tensor fullyConnectedGradA(poplar::Graph &graph, const SparseTensor &weights, const poplar::Tensor &gradients, const FullyConnectedParams &fcParams, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Run a fully connected GradA pass.

The sparse-weights tensor is made up of meta information for the sparsity and the non-zero values. Does the GradA computation as described in the Note above but with input and output transposed.

The meta information for the sparse-weights tensor must be created for the GradA pass and should be created by use of createFullyConnectedWeights() function.

Parameters

graph – The Poplar graph.
weights – Sparsity information of the weights tensor.
gradients – The dense loss gradients with respect to output activations and are of shape [batchSize][outputChannelsPerGroup] .
fcParams – Fully connected layer parameters.
prog – A reference to a program sequence which will be appended with the code to perform the GradA operation.
debugContext – Optional debug information.
options – The structure describing options on how the operation should be implemented. See createFullyConnectedWeights() for details.
cache – Optional pointer to planning cache to use.

Returns

The tensor holding the result. This tensor will be created, added to the graph and mapped to tiles. The tensor is of shape [batchSize][inputChannelsPerGroup * numGroups]

poplar::Tensor fullyConnectedSparseGradW(poplar::Graph &graph, const poplar::Tensor sparsityMetaInfo, const poplar::Tensor &gradA, const poplar::Tensor &activations, const FullyConnectedParams &fcParams, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, const poplar::OptionFlags &options = {}, PlanningCache *cache = nullptr)

Run a fully connected GradW pass to compute sparse gradients.

The layout of the returned tensor is exactly as that of the representation of the weights NZ values so that any elementwise operation may be done between the two.

The actual implementation differs from that in the Note above as the transpose of the gradients and activations are supplied as parameters to this function.

Parameters

graph – The Poplar graph.
weightMetaInfo – Meta information for sparse weights. See SparseTensor representation.
gradA – Dense gradients wrt output activations of shape [batchSize][outputChannelsPerGroup * numGroups]
activations – Input activations of shape [batchSize][inputChannelsPerGroup * numGroups]
fcParams – Fully connected layer parameters.
prog – A reference to a program sequence which will be appended with the code to perform the GradW operation.
debugContext – Optional debug information.
options – The structure describing options on how the operation should be implemented. See createFullyConnectedWeights() for details.
cache – Optional pointer to planning cache to use.

Returns

The tensor holding the result. This tensor will be created, added to the graph and mapped to tiles.

std::tuple<unsigned, unsigned, unsigned> fullyConnectedDenseGradWSerialSplits(const poplar::Graph &graph, const poplar::Type &inputType, const FullyConnectedParams &fcParams, const poplar::OptionFlags &options_ = {}, PlanningCache *cache = nullptr)

Report the serial splitting of a dense gradW output given the memory proportion limit given in options.

A dense gradW output is of shape [numGroups][inputSize][outputSize]

Parameters

graph – The Poplar graph.
inputType – The type of input.
params – Fully connected params.
options – The structure describing options on how the operation should be implemented. See createFullyConnectedWeights() for details.
cache – Optional pointer to planning cache to use.

Returns

Serial splits for each of the output dimensions [numGroups][inputSize][outputSize].

Search help

FullyConnected