5.2. TensorFlow 2

3.1.0

New features

Simplified public API for collectives.

Bug Fixes

None

Other improvements

Improved documentation on recomputation.

Known issues

TensorFlow 2 IPUStrategy is initialised with the wrong number of replicas.

When training a model with multiple replicas, this issue causes an optimizer to apply the sum of the gradients over all the replicas instead of the mean. A workaround is for the user to provide a gradient_transformer function to the optimizer that divides the gradients (after the sum-reduction across replicas) by the number of replicas being used.

See, for example, the optimizer_factory module in our TensorFlow 2 CNN example.
The program can terminate unexpectedly when providing a 0-dimensional Tensor to a tf.Dataset.

Compatibility changes

Compatibility changes are listed in the user guide Targeting the IPU from TensorFlow 2:

TensorFlow 2 API changes

3.0.0

New features

Disallow infeeds and outfeeds where the shape contains zero-sized dimensions.
Add tensorflow.python.ipu.distributed.host_collective_ops which contains allgather, allreduce and broadcast ops. These are used by PopDistStrategy instead of collectives that use Horovod.
Allow broadcasting of rank 0 tensors which removes the dependency on Horovod, which is due to be removed in the next SDK release.
Use popdist::run to prevent host sync timeouts.
Better granularity in allocation priorities.
Extend the API for exporting models for tf.serving with the possibility of passing an optional pre- or post-processing function to be executed on the CPU as part of the exported model graph. This enables you to export models for TensorFlow Serving with pre- or post-processing computations executed on the server CPU.

Bug Fixes

Restrict reduction pre-apply cases where the reduction is the right-hand side of a SUB or a DIVIDE.
Fix replicated tensor sharding (RTS) for LAMB.

Other improvements

None

Known issues

TensorFlow 2 IPUStrategy is initialised with the wrong number of replicas.

When training a model with multiple replicas, this issue causes an optimizer to apply the sum of the gradients over all the replicas instead of the mean. A workaround is for the user to provide a gradient_transformer function to the optimizer that divides the gradients (after the sum-reduction across replicas) by the number of replicas being used.

See, for example, the optimizer_factory module in our TensorFlow 2 CNN example.
The program can terminate unexpectedly when providing a 0-dimensional Tensor to a tf.Dataset.

Compatibility changes

The tensorflow.python.ipu.horovod module has been deprecated and will be removed in the next release.

You should change your code to use PopDistStrategy from tensorflow.python.ipu.distributed.popdist_strategy.

The allgather, allreduce and broadcast functions from tensorflow.python.ipu.horovod have been copied to tensorflow.python.ipu.distributed.

If your code used allgather, allreduce and broadcast from tensorflow.python.ipu.horovod then you should update it to use the functions from tensorflow.python.ipu.distributed.host_collective_ops instead.

For this release the following functions use Horovod. In the next release these functions will be aliases for functions in tensorflow.python.ipu.distributed.host_collective_ops

tensorflow.python.ipu.distributed.allgather
tensorflow.python.ipu.distributed.allreduce
tensorflow.python.ipu.distributed.broadcast

IPUHorovodStrategy is also available from tensorflow.python.ipu.distributed.ipu_horovod_strategy in this release but will be removed in the next release.