5.2. TensorFlow 2
3.1.0
New features
Simplified public API for collectives.
Bug Fixes
None
Other improvements
Improved documentation on recomputation.
Known issues
TensorFlow 2 IPUStrategy is initialised with the wrong number of replicas.
When training a model with multiple replicas, this issue causes an optimizer to apply the sum of the gradients over all the replicas instead of the mean. A workaround is for the user to provide a
gradient_transformerfunction to the optimizer that divides the gradients (after the sum-reduction across replicas) by the number of replicas being used.See, for example, the optimizer_factory module in our TensorFlow 2 CNN example.
The program can terminate unexpectedly when providing a 0-dimensional Tensor to a
tf.Dataset.
Compatibility changes
Compatibility changes are listed in the user guide Targeting the IPU from TensorFlow 2:
3.0.0
New features
Disallow infeeds and outfeeds where the shape contains zero-sized dimensions.
Add
tensorflow.python.ipu.distributed.host_collective_opswhich containsallgather,allreduceandbroadcastops. These are used byPopDistStrategyinstead of collectives that use Horovod.Allow broadcasting of rank 0 tensors which removes the dependency on Horovod, which is due to be removed in the next SDK release.
Use
popdist::runto prevent host sync timeouts.Better granularity in allocation priorities.
Extend the API for exporting models for
tf.servingwith the possibility of passing an optional pre- or post-processing function to be executed on the CPU as part of the exported model graph. This enables you to export models for TensorFlow Serving with pre- or post-processing computations executed on the server CPU.
Bug Fixes
Restrict reduction pre-apply cases where the reduction is the right-hand side of a
SUBor aDIVIDE.Fix replicated tensor sharding (RTS) for LAMB.
Other improvements
None
Known issues
TensorFlow 2 IPUStrategy is initialised with the wrong number of replicas.
When training a model with multiple replicas, this issue causes an optimizer to apply the sum of the gradients over all the replicas instead of the mean. A workaround is for the user to provide a
gradient_transformerfunction to the optimizer that divides the gradients (after the sum-reduction across replicas) by the number of replicas being used.See, for example, the optimizer_factory module in our TensorFlow 2 CNN example.
The program can terminate unexpectedly when providing a 0-dimensional Tensor to a
tf.Dataset.
Compatibility changes
The tensorflow.python.ipu.horovod module has been deprecated and will be removed in the next release.
You should change your code to use PopDistStrategy from tensorflow.python.ipu.distributed.popdist_strategy.
The allgather, allreduce and broadcast functions from tensorflow.python.ipu.horovod have been copied to tensorflow.python.ipu.distributed.
If your code used allgather, allreduce and broadcast from tensorflow.python.ipu.horovod then you should update it to use the functions from tensorflow.python.ipu.distributed.host_collective_ops instead.
For this release the following functions use Horovod. In the next release these functions will be aliases for functions in tensorflow.python.ipu.distributed.host_collective_ops
tensorflow.python.ipu.distributed.allgathertensorflow.python.ipu.distributed.allreducetensorflow.python.ipu.distributed.broadcast
IPUHorovodStrategy is also available from tensorflow.python.ipu.distributed.ipu_horovod_strategy in this release but will be removed in the next release.