5.2. TensorFlow 2
3.1.0
New features
Simplified public API for collectives.
Bug Fixes
None
Other improvements
Improved documentation on recomputation.
Known issues
TensorFlow 2 IPUStrategy is initialised with the wrong number of replicas.
When training a model with multiple replicas, this issue causes an optimizer to apply the sum of the gradients over all the replicas instead of the mean. A workaround is for the user to provide a
gradient_transformer
function to the optimizer that divides the gradients (after the sum-reduction across replicas) by the number of replicas being used.See, for example, the optimizer_factory module in our TensorFlow 2 CNN example.
The program can terminate unexpectedly when providing a 0-dimensional Tensor to a
tf.Dataset
.
Compatibility changes
Compatibility changes are listed in the user guide Targeting the IPU from TensorFlow 2:
3.0.0
New features
Disallow infeeds and outfeeds where the shape contains zero-sized dimensions.
Add
tensorflow.python.ipu.distributed.host_collective_ops
which containsallgather
,allreduce
andbroadcast
ops. These are used byPopDistStrategy
instead of collectives that use Horovod.Allow broadcasting of rank 0 tensors which removes the dependency on Horovod, which is due to be removed in the next SDK release.
Use
popdist::run
to prevent host sync timeouts.Better granularity in allocation priorities.
Extend the API for exporting models for
tf.serving
with the possibility of passing an optional pre- or post-processing function to be executed on the CPU as part of the exported model graph. This enables you to export models for TensorFlow Serving with pre- or post-processing computations executed on the server CPU.
Bug Fixes
Restrict reduction pre-apply cases where the reduction is the right-hand side of a
SUB
or aDIVIDE
.Fix replicated tensor sharding (RTS) for LAMB.
Other improvements
None
Known issues
TensorFlow 2 IPUStrategy is initialised with the wrong number of replicas.
When training a model with multiple replicas, this issue causes an optimizer to apply the sum of the gradients over all the replicas instead of the mean. A workaround is for the user to provide a
gradient_transformer
function to the optimizer that divides the gradients (after the sum-reduction across replicas) by the number of replicas being used.See, for example, the optimizer_factory module in our TensorFlow 2 CNN example.
The program can terminate unexpectedly when providing a 0-dimensional Tensor to a
tf.Dataset
.
Compatibility changes
The tensorflow.python.ipu.horovod
module has been deprecated and will be removed in the next release.
You should change your code to use PopDistStrategy
from tensorflow.python.ipu.distributed.popdist_strategy
.
The allgather
, allreduce
and broadcast
functions from tensorflow.python.ipu.horovod
have been copied to tensorflow.python.ipu.distributed
.
If your code used allgather
, allreduce
and broadcast
from tensorflow.python.ipu.horovod
then you should update it to use the functions from tensorflow.python.ipu.distributed.host_collective_ops
instead.
For this release the following functions use Horovod. In the next release these functions will be aliases for functions in tensorflow.python.ipu.distributed.host_collective_ops
tensorflow.python.ipu.distributed.allgather
tensorflow.python.ipu.distributed.allreduce
tensorflow.python.ipu.distributed.broadcast
IPUHorovodStrategy
is also available from tensorflow.python.ipu.distributed.ipu_horovod_strategy
in this release but will be removed in the next release.