Recurrent

#include <popnn/Recurrent.hpp>

Functions for recurrent neural networks (RNN).

namespace poplin

Linear algebra functions.

namespace popnn

Functions used in neural networks.

namespace rnn

Functions for Recurrent Neural Networks (RNN)

Functions

std::vector<std::pair<poplin::MatMulParams, poplar::OptionFlags>> getMatMulPrePlanParameters(std::size_t numSteps, std::size_t batchSize, std::size_t inputSize, std::size_t outputSize, const poplar::Type &dType, const poplar::Type &partialsType = poplar::FLOAT, bool inferenceOnly = false, bool hasFeedforwardWeights = true)

Predict what matrix multiplications will be needed for the given parameters and return list of corresponding matmul parameters and options.

uint64_t getFwdFlops(unsigned sequenceSize, unsigned batchSize, unsigned inputSize, unsigned outputSize, bool weightInput = true)

Compute the total floating point operations for the forward pass of RNN.

uint64_t getBwdFlops(unsigned sequenceSize, unsigned batchSize, unsigned inputSize, unsigned outputSize, bool calcInputGrad = true)

Compute the total floating point operations for the backward pass of RNN.

uint64_t getWuFlops(unsigned sequenceSize, unsigned batchSize, unsigned inputSize, unsigned outputSize)

Compute the total floating point operations for the weight update pass of RNN.

poplar::Tensor createInput(poplar::Graph &graph, unsigned numSteps, unsigned batchSize, unsigned inputSize, unsigned outputSize, const poplar::Type &dType, const poplar::Type &partialsType = poplar::FLOAT, bool inferenceOnly = false, const poplar::DebugContext &debugContext = {}, poplin::PlanningCache *planningCache = nullptr)

Create a tensor which is input to a vanilla RNN.

The layout of the tensor is best for a multiplication of the input weight matrix with the given number of steps.

Parameters
  • graph – The graph object.

  • numSteps – The number of steps used in the forward weighting of the input.

  • batchSize – The number of batch elements.

  • inputSize – The size of the input for each sequence step.

  • outputSize – The output (hidden) size of each sequence element.

  • inferenceOnly – Indicates whether the RNN layer is for inference only. If true, we can ignore the backwards and weight update passes.

  • dType – The data type of the created tensor.

  • partialsType – The data type of intermediate calculations.

  • debugContext – Optional debug information.

  • planningCache – The matmul planning cache.

Returns

Tensor of shape [numSteps, batchSize, inputSize]

poplar::Tensor createFwdState(poplar::Graph &graph, const poplar::Type &dType, unsigned batchSize, unsigned outputSize, poplar::program::Sequence &prog, bool initState, bool inferenceOnly, const poplar::DebugContext &debugContext = {}, poplin::PlanningCache *planningCache = nullptr)

Create initial state for a vanilla RNN.

The state, apart from the activations, is initialised by the control program.

The number of hidden states may depend on whether the RNN is used for inference or training.

Parameters
  • graph – The graph object.

  • dType – The data type of the created tensor.

  • batchSize – The number of batch elements.

  • outputSize – The output (hidden) of each sequence element.

  • prog – The control program.

  • initState – If true, indicates that the state should be initialised.

  • inferenceOnly – Indicates whether the RNN layer is for inference only. If true, we can ignore backwards and weight update passes.

  • debugContext – Optional debug information.

  • planningCache – The matmul planning cache.

Returns

A 2D tensor of shape [batchSize, outputSize]

poplar::Tensor getOutputFromFwdState(const poplar::Tensor &fwdState)

Extract previous output tensor from the hidden state.

The returned tensor is a view of the tensor and can be used to initialise the tensor, if required.

poplar::Tensor createWeightsInput(poplar::Graph &graph, unsigned sequenceSize, unsigned batchSize, unsigned inputSize, unsigned outputSize, const poplar::Type &dType, const poplar::Type &partialsType = poplar::FLOAT, bool inferenceOnly = false, const poplar::DebugContext &debugContext = {}, poplin::PlanningCache *planningCache = nullptr)

Create the weights used to weight the input of a vanilla RNN layer.

The best tile mapping of the weight tensor is when it matches the sequence size in the input activation tensor used to multiply the input weights.

Parameters
  • graph – The graph object.

  • sequenceSize – The number of sequence steps used in the forward weighting of the input. The best tile mapping is when this matches the sequence size of the input activation tensor.

  • batchSize – The number of batch elements.

  • inputSize – The input size of each sequence.

  • outputSize – The output (hidden) size of each sequence.

  • dType – The data type of the created tensor.

  • partialsType – The data type of partial results in the computation.

  • inferenceOnly – Indicates whether the RNN layer is for inference only. If true, we can ignore backwards and weight update passes.

  • debugContext – Optional debug information.

  • planningCache – The matmul planning cache.

poplar::Tensor createWeightsFeedback(poplar::Graph &graph, unsigned batchSize, unsigned outputSize, const poplar::Type &dType, const poplar::Type &partialsType = poplar::FLOAT, bool inferenceOnly = false, const poplar::DebugContext &debugContext = {}, poplin::PlanningCache *planningCache = nullptr)

Create the weights used in the recurrent part of a vanilla RNN layer.

Parameters
  • graph – The graph object.

  • batchSize – The number of batch elements.

  • outputSize – The output (hidden) size of each sequence.

  • dType – The data type of the created tensor.

  • partialsType – The data type of partial results in the computation

  • inferenceOnly – Indicates whether the RNN layer is for inference only. If true, we can ignore backwards and weight update passes.

  • debugContext – Optional debug information.

  • planningCache – The matmul planning cache.

poplar::Tensor forwardWeightInput(poplar::Graph &graph, const poplar::Tensor &actIn, const poplar::Tensor &weights, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, bool inferenceOnly = false, const poplar::DebugContext &debugContext = {}, poplin::PlanningCache *planningCache = nullptr)

Perform the feedforward part of a RNN layer.

The feedforward part of the RNN layer must be followed by the feedback part to complete the RNN layer. In other words, the output must be fed as the feedforward input to the feedback part.

The following definitions apply:

  • numSteps is the number of sequence steps.

  • batchSize is the number of batch elements.

  • inputSize is the size of the input for each step.

  • outputSize is the size of the output for each step.

See also

forwardIterate

Parameters
  • graph – The graph object.

  • actIn – The input activation tensor with shape [numSteps, batchSize, inputSize].

  • weights – Feedforward weights with shape [outputSize, inputSize].

  • prog – The program sequence. Programs added by this function are appended to this program sequence.

  • partialsType – The data type for intermediates.

  • inferenceOnly – Indicates whether the RNN layer is for inference only. If true, we can ignore backwards and weight update passes.

  • debugContext – Optional debug information.

  • planningCache – The matmul planning cache.

Returns

The output tensor with shape [numSteps, batchSize, outputSize].

poplar::Tensor forwardIterate(poplar::Graph &graph, const poplar::Tensor &feedFwdIn, const poplar::Tensor &initState, const poplar::Tensor &feedbackWeights, const poplar::Tensor &biases, poplar::program::Sequence &prog, popnn::NonLinearityType nonLinearityType, const poplar::Type &partialsType = poplar::FLOAT, bool inferenceOnly = false, const poplar::DebugContext &debugContext = {}, poplin::PlanningCache *planningCache = nullptr)

Perform the feedback part of the RNN layer.

The feedback part of the RNN layer must be preceded by the feedforward part of the RNN layer to complete the layer.

The following definitions apply:

  • numSteps is the number of steps.

  • batchSize is the number of batch elements.

  • inputSize is the size of the input for each step.

  • outputSize is the size of the output for each step.

Parameters
  • graph – The graph object.

  • feedFwdIn – The input to this function (output from the feedforward part of the RNN layer.

  • initState – The initial state of the RNN layer, which means the the previous output.

  • feedbackWeights – The feedback weights.

  • biases – The biases.

  • prog – The program sequence. Programs added by this function are appended to this program sequence.

  • nonLinearityType – The non linearity used for the output activations

  • partialsType – The data type for intermediates.

  • inferenceOnly – Indicates whether the RNN layer is for inference only. If true, we can ignore backwards and weight update passes.

  • debugContext – Optional debug information.

  • planningCache – The matmul planning cache.

Returns

The output activations of the RNN layer.

poplar::Tensor createBwdState(poplar::Graph &graph, const poplar::Type &dType, unsigned batchSize, unsigned outputSize, poplar::program::Sequence &prog, const poplar::DebugContext &debugContext = {}, poplin::PlanningCache *planningCache = nullptr)

Create initial state for backward pass of a vanilla RNN.

Parameters
  • graph – The graph object.

  • dType – The data type of the created tensor.

  • batchSize – The number of batch elements processed.

  • outputSize – The number of output activations.

  • prog – The control program.

  • debugContext – Optional debug information.

  • planningCache – The matmul planning cache.

Returns

Tile mapped initial state tensor.

std::pair<poplar::Tensor, poplar::Tensor> backwardGradientStep(poplar::Graph &graph, const poplar::Tensor &nextLayerGrad, const poplar::Tensor &bwdState, const poplar::Tensor &actOut, const poplar::Tensor &weightsInput, const poplar::Tensor &weightsFeedback, poplar::program::Sequence &prog, popnn::NonLinearityType nonLinearityType, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, poplin::PlanningCache *planningCache = nullptr)

Compute a single step of the backward pass of a vanilla RNN layer.

Two gradient outputs are produced. The first gradient is at the input of the RNN layer for the step. The second gradient is at the adder and can be used to backward propagate through the earlier steps.

Parameters
  • graph – The graph object.

  • nextLayerGrad – The loss gradient fed as input to this step.

  • bwdState – The gradient state for the previous step.

  • actOut – The output activation.

  • weightsInput – The input weights.

  • weightsFeedback – The feedback weights.

  • prog – The control program to which programs are added.

  • nonLinearityType – The type of non-linearity.

  • partialsType – The data type used in intermediate calculations.

  • debugContext – Optional debug information.

  • planningCache – The matmul planning cache.

Returns

A pair of tensors. The first tensor is the loss gradient at the input layer. The second tensor is the backward state needed to run the next backward step.

poplar::Tensor backwardGradientStep(poplar::Graph &graph, const poplar::Tensor &nextLayerGrad, const poplar::Tensor &bwdState, const poplar::Tensor &actOut, const poplar::Tensor &weightsFeedback, poplar::program::Sequence &prog, popnn::NonLinearityType nonLinearityType, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, poplin::PlanningCache *planningCache = nullptr)

Same as backwardGradientStep(poplar::Graph&, const poplar::Tensor&, const poplar::Tensor&, const poplar::Tensor&, const poplar::Tensor&, const poplar::Tensor &, poplar::program::Sequence&, popnn::NonLinearityType, const poplar::Type&, const poplar::DebugContext&, poplin::PlanningCache*) with the difference that the input gradients are not computed.

void paramDeltaUpdate(poplar::Graph &graph, const poplar::Tensor &bwdState, const poplar::Tensor &actIn, const poplar::Tensor &prevOut, poplar::Tensor &weightsInputDeltasAcc, poplar::Tensor &weightsFeedbackDeltasAcc, poplar::Tensor &biasDeltasAcc, poplar::program::Sequence &prog, const poplar::Type &partialsType = poplar::FLOAT, const poplar::DebugContext &debugContext = {}, poplin::PlanningCache *planningCache = nullptr)

Update parameter deltas for a vanilla RNN step.

The parameter deltas updated are:

  • Feedback Weights

  • Input Weights

  • Bias The new deltas computed for this step are added to the accumulated deltas from previous steps. The caller must zero the accumulated tensors at the first call if the tensors to maintain the result are in-place.

Parameters
  • graph – The graph object.

  • bwdState – The gradient state for this step.

  • actIn – The input activations for this step.

  • prevOut – The previous RNN output activations for this step.

  • weightsInputDeltasAcc – The previous weights input deltas tensor. This tensor must be tile-mapped. The deltas from this step are added to this tensor.

  • weightsFeedbackDeltasAcc – The previous feedback weights deltas tensor. This tensor must be tile-mapped. The deltas from this step are added to this tensor.

  • biasDeltasAcc – The previous bias deltas tensor. This tensor must be tile-mapped. The deltas from this step are added to this tensor.

  • prog – The control program to which programs are added.

  • partialsType – The data type used in intermediate calculations.

  • debugContext – Optional debug information.

  • planningCache – The matmul planning cache.

poplar::Tensor rnnFwdSequence(poplar::Graph &graph, poplar::program::Sequence &prog, const poplar::Tensor &fwdStateInit, const poplar::Tensor *weightedIn, const poplar::Tensor &biases, const poplar::Tensor &feedFwdWeights, const poplar::Tensor &feedbackWeights, const poplar::Tensor &prevLayerActs, const popnn::NonLinearityType &nonLinearityType, const poplar::Type &partialsType, bool inferenceOnly, const poplar::DebugContext &debugContext = {}, poplin::PlanningCache *planningCache = nullptr)

Perform the forward part of the RNN layer.

The feedback part of the RNN layer must be preceded by the feedforward part of the RNN layer to complete the layer.

The following definitions apply:

  • numSteps is the number of steps

  • batchSize is the number of batch elements.

  • inputSize is the size of the input for each step.

  • outputSize is the size of the output for each step.

Parameters
  • graph – The graph object.

  • prog – The control program.

  • fwdStateInit – The forward state tensor for initial step.

  • weightedIn – The preweighted input, or nullptr if feedFwdWeights is to be applied.

  • biases – The biases.

  • feedFwdWeights – The input weights.

  • feedbackWeights – The feedback weights.

  • prevLayerActs – The activations from previous layer (output from feedforward part of the RNN layer.

  • nonLinearityType – The type of non-linearity used for the output activations.

  • partialsType – The data type for intermediates.

  • inferenceOnly – Indicates whether the RNN layer is for inference only. If true, we can ignore backwards and weight update passes.

  • debugContext – Optional debug information.

  • planningCache – The matmul planning cache.

Returns

Forward state tensor for all steps [0:seqSize).

std::tuple<poplar::Tensor, poplar::Tensor, poplar::Tensor, poplar::Tensor> rnnBwdSequence(poplar::Graph &graph, bool doWU, bool ignoreInputGradientCalc, poplar::program::Sequence &prog, const poplar::Tensor &fwdStateInit, const poplar::Tensor &fwdState, const poplar::Tensor &biases, const poplar::Tensor &feedFwdWeights, const poplar::Tensor &feedbackWeights, const poplar::Tensor &outGradient, const poplar::Tensor &actIn, const popnn::NonLinearityType &nonLinearityType, const poplar::Type &partialsType, const poplar::DebugContext &debugContext = {}, poplin::PlanningCache *planningCache = nullptr)

Perform the feedback part of the RNN layer.

The feedback part of the RNN layer must be preceded by the feedforward part of the RNN layer to complete the layer.

The following definitions apply:

  • numSteps is the number of steps.

  • batchSize is the number of batch elements.

  • inputSize is the size of the input for each step.

  • outputSize is the size of the output for each step.

Parameters
  • graph – The graph object.

  • doWU – Calculate weight updates.

  • ignoreInputGradientCalc – Do not calculate the gradients over the input weights.

  • prog – The control program.

  • fwdStateInit – The forward state tensor for initial step.

  • fwdState – The forward state tensor for all steps [0:seqSize).

  • biases – The biases.

  • feedFwdWeights – The input weights.

  • feedbackWeights – The feedback weights.

  • outGradient – The gradient from next layer.

  • actIn – The activations from the previous layer, so the output from the feedforward part of the RNN layer.

  • nonLinearityType – The type of non-linearity used for the output activations.

  • partialsType – The data type for intermediates.

  • debugContext – Optional debug information.

  • planningCache – The matmul planning cache.

Returns

Returns four tensors:

  • gradients for the previous layer

  • input weight deltas

  • output weight deltas

  • bias deltas When doWU is false, the weight and bias deltas are not calculated.