23. IPU TensorFlow Addons Python API

23.1. TensorFlow layers

23.1.1. TensorFlow layers made for IPU TensorFlow

class ipu_tensorflow_addons.layers.PopnnAUGRU(num_units, dtype=tf.float32, partials_dtype=tf.float32, seed=None, weights_initializer=None, bias_initializer=None, activation='tanh', recurrent_activation='sigmoid', return_state=True, name=None, reset_after=False, available_memory_proportion_fwd=None, available_memory_proportion_bwd=None, options=None, options_bwd=None)

XLA compatible, time-major Popnn implementation of an AUGRU layer.

Below is a typical workflow:

with tf.Graph().as_default():
  augru = PopnnAUGRU(num_units, ...)

  outputs, output_state = augru(inputs, initial_state, training=True)
__init__(num_units, dtype=tf.float32, partials_dtype=tf.float32, seed=None, weights_initializer=None, bias_initializer=None, activation='tanh', recurrent_activation='sigmoid', return_state=True, name=None, reset_after=False, available_memory_proportion_fwd=None, available_memory_proportion_bwd=None, options=None, options_bwd=None)

Creates a PopnnAUGRU model from model spec.

Parameters
  • num_units – the number of units within the RNN model.

  • dtype – tf.float16 or tf.float32

  • partials_dtype – the type used by Popnn to perform partial calculations. Either tf.float16 or tf.float32.

  • seed – A Python integer. Used to create the default Glorot uniform initializer weights_initializer.

  • weights_initializer – starting value to initialize the weight (default is Glorot uniform initializer).

  • activation – Activation function. Defaults to “tanh”. Accepted values: “tanh”, “relu”, “softmax”, “sigmoid”, “hard_sigmoid”.

  • recurrent_activation – Recurrent activation function. Defaults to “sigmoid”. Must generate output in the [0,1] range. Accepted values: “tanh”, “softmax”, “sigmoid”, “hard_sigmoid”.

  • return_state – Boolean. Whether to return the last state in addition to the output. Default: True.

  • bias_initializer – starting value to initialize the bias (default is all zeros).

  • name – VariableScope for the created subgraph; defaults to class name. This only serves the default scope if later no scope is specified when invoking __call__().

  • available_memory_proportion_fwd – Deprecated, please use options={'availableMemoryProportion': <value>} instead. Maximum fraction of IPU memory which can be used as temporary scratch space during computation, for the forward propagation layer. A value of -1. or None indicates that the default in Popnn should be used. If available_memory_proportion_bwd is set to None, then this value applies to both phases. Note that the value in options['availableMemoryProportion'] will be used if set together with this argument.

  • available_memory_proportion_bwd – Deprecated, please use options_bwd={'availableMemoryProportion': <value>} instead. Maximum fraction of IPU memory which can be used as temporary scratch space during computation, for the backward propagation layer. A value of -1. or None indicates that the default in Popnn should be used. Note that the value in options_bwd['availableMemoryProportion'] will be used if set together with this argument.

  • options – A Python dictionary. Implementation or debug options for the forward LSTM cell in PopLibs. See the LSTM documentation in the PopLibs API reference for the full list of options.

  • options_bwd – A Python dictionary. Implementation or debug options for the backward LSTM cell in PopLibs. See the LSTM documentation in the PopLibs API reference for the full list of options.

call(inputs, seq_len, attention_score, initial_state=None, training=True, time_major=True)

Runs the forward step for the AUGRU model.

Parameters
  • inputs – 3-D tensor with shape [time_len, batch_size, input_size].

  • seq_len – 1-D tensor with the sequence length of samples in each batch.

  • attention_score – The output of attention layer, the score of samples in each batch, shaped [batch_size, max_seq_len].

  • initial_state – Initial state tensor, shaped [batch_size, num_units]. If not provided, the state is initialized to zeros.

  • training – whether this operation will be used in training or inference.

  • time_major – whether the time dimension is the first dimension.

Returns

A tuple of output and output state.

  • output: a tensor of shape [time_len, batch_size, num_units].

  • output_state: The output state of the last cell.

Raises

ValueError – if initial_state is not valid.

class ipu_tensorflow_addons.layers.PopnnDynamicGRU(num_units, dtype=tf.float32, partials_dtype=tf.float32, seed=None, weights_initializer=None, bias_initializer=None, activation='tanh', recurrent_activation='sigmoid', return_state=True, name=None, reset_after=False, available_memory_proportion_fwd=None, available_memory_proportion_bwd=None, options=None, options_bwd=None)

XLA compatible, time-major Popnn implementation of an GRU layer, with a sequence length input.

Below is a typical workflow:

with tf.Graph().as_default():
  gru = PopnnDynamicGRU(num_units, ...)

  outputs, output_state = gru(
    inputs, seq_len, initial_state, training=True)
__init__(num_units, dtype=tf.float32, partials_dtype=tf.float32, seed=None, weights_initializer=None, bias_initializer=None, activation='tanh', recurrent_activation='sigmoid', return_state=True, name=None, reset_after=False, available_memory_proportion_fwd=None, available_memory_proportion_bwd=None, options=None, options_bwd=None)

Creates a PopnnDynamicGRU model from model spec.

Parameters
  • num_units – the number of units within the RNN model.

  • dtype – tf.float16 or tf.float32

  • partials_dtype – the type used by Popnn to perform partial calculations. Either tf.float16 or tf.float32.

  • seed – A Python integer. Used to create the default Glorot uniform initializer weights_initializer.

  • weights_initializer – starting value to initialize the weight (default is Glorot uniform initializer).

  • bias_initializer – starting value to initialize the bias (default is all zeros).

  • activation – Activation function. Defaults to “tanh”. Accepted values: “tanh”, “relu”, “softmax”, “sigmoid”, “hard_sigmoid”.

  • recurrent_activation – Recurrent activation function. Defaults to “sigmoid”. Must generate output in the [0,1] range. Accepted values: “tanh”, “softmax”, “sigmoid”, “hard_sigmoid”.

  • return_state – Boolean. Whether to return the last state in addition to the output. Default: True.

  • name – VariableScope for the created subgraph; defaults to class name. This only serves the default scope if later no scope is specified when invoking __call__().

  • reset_after – GRU convention (whether to apply reset gate after or before matrix multiplication). False = “before” (default), True = “after”. Leave as default (False) to match the behaviour of the standard TensorFlow GRU.

  • available_memory_proportion_fwd – Deprecated, please use options={'availableMemoryProportion': <value>} instead. Maximum fraction of IPU memory which can be used as temporary scratch space during computation, for the forward propagation layer. A value of -1. or None indicates that the default in Popnn should be used. If available_memory_proportion_bwd is set to None, then this value applies to both phases. Note that the value in options['availableMemoryProportion'] will be used if set together with this argument.

  • available_memory_proportion_bwd – Deprecated, please use options_bwd={'availableMemoryProportion': <value>} instead. Maximum fraction of IPU memory which can be used as temporary scratch space during computation, for the backward propagation layer. A value of -1. or None indicates that the default in Popnn should be used. Note that the value in options_bwd['availableMemoryProportion'] will be used if set together with this argument.

  • options – A Python dictionary. Implementation or debug options for the forward LSTM cell in PopLibs. See the LSTM documentation in the PopLibs API reference for the full list of options.

  • options_bwd – A Python dictionary. Implementation or debug options for the backward LSTM cell in PopLibs. See the LSTM documentation in the PopLibs API reference for the full list of options.

call(inputs, seq_len, initial_state=None, training=True, time_major=True)

Runs the forward step for the DynamicGRU model.

Parameters
  • inputs – 3-D tensor with shape [batch_size, time_len, input_size].

  • seq_len – 1-D tensor with the sequence length of samples in each batch.

  • initial_state – Initial state tensor, shaped [batch_size, num_units]. If not provided, the state is initialized to zeros.

  • training – whether this operation will be used in training or inference.

  • time_major – whether the time dimension is the first demension.

Returns

A tuple of output and output state.

  • output: a tensor of shape [time_len, batch_size, num_units].

  • output_state: The output state of the last cell.

Raises

ValueError – if initial_state is not valid.

class ipu_tensorflow_addons.layers.PopnnDynamicLSTM(num_units, dtype=tf.float32, partials_dtype=tf.float32, seed=None, weights_initializer=None, bias_initializer=None, activation='tanh', recurrent_activation='sigmoid', return_state=True, name=None, available_memory_proportion_fwd=None, available_memory_proportion_bwd=None, options=None, options_bwd=None)
call(inputs, seq_len, initial_state=None, training=True)

Runs the forward step for the LSTM model.

Parameters
  • inputs – 3D tensor with shape [time_len, batch_size, input_size].

  • seq_len – 1-D tensor with the sequence length of samples in each batch.

  • initial_state – An LSTMStateTuple of state tensors, each shaped [batch_size, num_units]. If not provided, the state is initialized to zeros.

  • training – Set to False to use the LSTM model in inference mode.

Returns

A tuple of output and output state.

  • output: a tensor of shape [time_len, batch_size, num_units].

  • output_state: An LSTMStateTuple of the same shape and structure as initial_state.

Raises

ValueError – if initial_state is not valid.

class ipu_tensorflow_addons.layers.PopnnGRU(num_units, dtype=tf.float32, partials_dtype=tf.float32, seed=None, weights_initializer=None, bias_initializer=None, activation='tanh', recurrent_activation='sigmoid', return_state=True, name=None, reset_after=False, available_memory_proportion_fwd=None, available_memory_proportion_bwd=None, options=None, options_bwd=None)

XLA compatible, time-major Popnn implementation of a GRU layer.

Below is a typical workflow:

with tf.Graph().as_default():
  gru = PopnnGRU(num_units, ...)

  outputs, output_state = gru(inputs, initial_state, training=True)
__init__(num_units, dtype=tf.float32, partials_dtype=tf.float32, seed=None, weights_initializer=None, bias_initializer=None, activation='tanh', recurrent_activation='sigmoid', return_state=True, name=None, reset_after=False, available_memory_proportion_fwd=None, available_memory_proportion_bwd=None, options=None, options_bwd=None)

Creates a PopnnGRU model from model spec.

Parameters
  • num_units – the number of units within the GRU model.

  • dtype – tf.float16 or tf.float32

  • partials_dtype – the type used by Popnn to perform partial calculations. Either tf.float16 or tf.float32.

  • seed – A Python integer. Used to create the default Glorot uniform initializer weights_initializer.

  • weights_initializer – starting value to initialize the weights (default is Glorot uniform initializer).

  • bias_initializer – starting value to initialize the bias (default is all zeros).

  • activation – Activation function. Defaults to “tanh”. Accepted values: “tanh”, “relu”, “softmax”, “sigmoid”, “hard_sigmoid”.

  • recurrent_activation – Recurrent activation function. Defaults to “sigmoid”. Must generate output in the [0,1] range. Accepted values: “tanh”, “softmax”, “sigmoid”, “hard_sigmoid”.

  • return_state – Boolean. Whether to return the last state in addition to the output. Default: True.

  • name – VariableScope for the created subgraph; defaults to class name. This only serves the default scope if later no scope is specified when invoking __call__().

  • reset_after – GRU convention (whether to apply reset gate after or before matrix multiplication). False = “before” (default), True = “after”. Leave as default (False) to match the behaviour of the standard TensorFlow GRU.

  • available_memory_proportion_fwd – Deprecated, please use options={'availableMemoryProportion': <value>} instead. Maximum fraction of IPU memory which can be used as temporary scratch space during computation, for the forward propagation layer. A value of -1. or None indicates that the default in Popnn should be used. If available_memory_proportion_bwd is set to None, then this value applies to both phases. Note that the value in options['availableMemoryProportion'] will be used if set together with this argument.

  • available_memory_proportion_bwd – Deprecated, please use options_bwd={'availableMemoryProportion': <value>} instead. Maximum fraction of IPU memory which can be used as temporary scratch space during computation, for the backward propagation layer. A value of -1. or None indicates that the default in Popnn should be used. Note that the value in options_bwd['availableMemoryProportion'] will be used if set together with this argument.

  • options – A Python dictionary. Implementation or debug options for the forward LSTM cell in PopLibs. See the LSTM documentation in the PopLibs API reference for the full list of options.

  • options_bwd – A Python dictionary. Implementation or debug options for the backward LSTM cell in PopLibs. See the LSTM documentation in the PopLibs API reference for the full list of options.

build(input_shape)

Create variables of the PopnnGRU.

It can be called manually before __call__() or automatically through __call__(). In the former case, any subsequent __call__() will skip creating variables.

Parameters

input_shape – a TensorShape object with 3 dimensions.

Raises

ValueError – if input_shape has wrong dimension or unknown 3rd dimension.

call(inputs, initial_state=None, training=True)

Runs the forward step for the GRU model.

Parameters
  • inputs – 3D tensor with shape [time_len, batch_size, input_size].

  • initial_state – Initial state tensor, shaped [batch_size, num_units]. If not provided, the state is initialized to zeros.

  • training – Set to False to use the GRU model in inference mode.

Returns

A tuple of output and output_state.

  • output: a tensor of shape [time_len, batch_size, num_units].

  • output_state: The output state of the last cell.

Raises

ValueError – if initial_state is not valid.

state_shape(batch_size)

Shape of Popnn GRU state.

State shape is [batch_size, num_units].

Parameters

batch_size – an int

Returns

A Python array.

class ipu_tensorflow_addons.layers.PopnnLSTM(num_units, dtype=tf.float32, partials_dtype=tf.float32, seed=None, weights_initializer=None, bias_initializer=None, activation='tanh', recurrent_activation='sigmoid', return_state=True, name=None, available_memory_proportion_fwd=None, available_memory_proportion_bwd=None, options=None, options_bwd=None)

XLA compatible, time-major Popnn implementation of an LSTM layer.

Below is a typical workflow:

with tf.Graph().as_default():
  lstm = PopnnLSTM(num_units, ...)

  outputs, output_states = lstm(inputs, initial_states, training=True)
__init__(num_units, dtype=tf.float32, partials_dtype=tf.float32, seed=None, weights_initializer=None, bias_initializer=None, activation='tanh', recurrent_activation='sigmoid', return_state=True, name=None, available_memory_proportion_fwd=None, available_memory_proportion_bwd=None, options=None, options_bwd=None)

Creates a PopnnLSTM model from model spec.

Parameters
  • num_units – the number of units within the LSTM model.

  • dtype – tf.float16 or tf.float32

  • partials_dtype – the type used by Popnn to perform partial calculations. Either tf.float16 or tf.float32.

  • seed – A Python integer. Used to create the default Glorot uniform initializer weights_initializer.

  • weights_initializer – starting value to initialize the weights (default is Glorot uniform initializer).

  • bias_initializer – starting value to initialize the bias (default is all zeros).

  • activation – Activation function. Defaults to “tanh”. Accepted values: “tanh”, “relu”, “softmax”, “sigmoid”, “hard_sigmoid”.

  • recurrent_activation – Recurrent activation function. Defaults to “sigmoid”. Must generate output in the [0,1] range. Accepted values: “tanh”, “softmax”, “sigmoid”, “hard_sigmoid”.

  • return_state – Boolean. Whether to return the last state in addition to the output. Default: True.

  • name – VariableScope for the created subgraph; defaults to class name. This only serves the default scope if later no scope is specified when invoking __call__().

  • available_memory_proportion_fwd – Deprecated, please use options={'availableMemoryProportion': <value>} instead. Maximum fraction of IPU memory which can be used as temporary scratch space during computation, for the forward propagation layer. A value of -1. or None indicates that the default in Popnn should be used. If available_memory_proportion_bwd is set to None, then this value applies to both phases. Note that the value in options['availableMemoryProportion'] will be used if set together with this argument.

  • available_memory_proportion_bwd – Deprecated, please use options_bwd={'availableMemoryProportion': <value>} instead. Maximum fraction of IPU memory which can be used as temporary scratch space during computation, for the backward propagation layer. A value of -1. or None indicates that the default in Popnn should be used. Note that the value in options_bwd['availableMemoryProportion'] will be used if set together with this argument.

  • options – A Python dictionary. Implementation or debug options for the forward LSTM cell in PopLibs. See the LSTM documentation in the PopLibs API reference for the full list of options.

  • options_bwd – A Python dictionary. Implementation or debug options for the backward LSTM cell in PopLibs. See the LSTM documentation in the PopLibs API reference for the full list of options.

build(input_shape)

Create variables of the PopnnLSTM.

It can be called manually before __call__() or automatically through __call__(). In the former case, any subsequent __call__() will skip creating variables.

Parameters

input_shape – a TensorShape object with 3 dimensions.

Raises

ValueError – if input_shape has wrong dimension or unknown 3rd dimension.

call(inputs, initial_state=None, training=True)

Runs the forward step for the LSTM model.

Parameters
  • inputs – 3D tensor with shape [time_len, batch_size, input_size].

  • initial_state – An LSTMStateTuple of state tensors, each shaped [batch_size, num_units]. If not provided, the state is initialized to zeros.

  • training – Set to False to use the LSTM model in inference mode.

Returns

A tuple of output and output state.

  • output: a tensor of shape [time_len, batch_size, num_units].

  • output_state: An LSTMStateTuple of the same shape and structure as initial_state.

Raises

ValueError – if initial_state is not valid.

state_shape(batch_size)

Shape of Popnn LSTM states.

Shape is a 2-element tuple. Each is [batch_size, num_units]

Parameters

batch_size – an int

Returns

a tuple of Python arrays.

23.2. TensorFlow optimizers

23.2.1. Optimizers made for IPU TensorFlow