Targeting the IPU from TensorFlow 2

1. Introduction
- 1.1. Document overview
2. Targeting the Poplar XLA device
- 2.1. Supported types
- 2.2. Device selection
- 2.3. Configuring system options
  - 2.3.1. TF_POPLAR_FLAGS environment variable
- 2.4. Supported operations
- 2.5. Unsupported operations
- 2.6. Error Handling
  - 2.6.1. Construction and compilation errors
  - 2.6.2. Runtime errors
3. Support for TensorFlow 2
- 3.1. IPUStrategy
- 3.2. Execution modes
  - 3.2.1. Graph mode with @tf.function
  - 3.2.2. Eager mode
- 3.3. On-device loops
4. Keras with IPUs
5. Compiling and pre-compiling executables
- 5.1. Caching of compiled executables
- 5.2. Pre-compiling executables
  - 5.2.1. Unsupported Operations
6. Training a model
- 6.1. Training loops, data sets and feed queues
- 6.2. Optional simplification of infeeds and outfeeds
- 6.3. Accessing outfeed queue results during execution
- 6.4. Replicated graphs
  - 6.4.1. Selecting the number of replicas
  - 6.4.2. Performing parameter updates
- 6.5. Pipelined training
- 6.6. Gradient accumulation
- 6.7. Optimizer state offloading
- 6.8. Dataset benchmarking
  - 6.8.1. Accessing the JSON data
7. Efficient IPU I/O
- 7.1. Prefetch elements
- 7.2. I/O Tiles
8. Example using IPUEstimator
9. Example using IPUPipelineEstimator
10. Distributed training
- 10.1. PopDistStrategy examples
11. Half-precision floating point and stochastic rounding
- 11.1. Controlling the half-precision floating-point unit
- 11.2. Resetting the global random number seed
- 11.3. Debugging numerical issues
12. IPU-optimised operations
- 12.1. Image operations
- 12.2. Matmul serialisation
- 12.3. Dropout
- 12.4. Embedding lookup
- 12.5. Group normalisation
- 12.6. Instance normalisation
- 12.7. Layer normalisation
- 12.8. GeLU activation
- 12.9. Sequence slice
- 12.10. Histogram
13. IPU Outlined Functions
- 13.1. Usage
- 13.2. Examples
  - 13.2.1. Models with common structures
  - 13.2.2. Serializing large operations
14. Writing custom operations
- 14.1. Custom operation on the IPU
- 14.2. Custom host CPU operations
  - 14.2.1. Gradient callback
15. IPU host embeddings
- 15.1. Usage
- 15.2. Example
- 15.3. Experimental functionality: IPU embeddings in remote buffers
  - 15.3.1. Partitioning strategies
16. IPU embedded application runtime
- 16.1. Usage
- 16.2. Pipelining and I/O tiles
- 16.3. Example
- 16.4. Error Handling
  - 16.4.1. Runtime errors
17. Exporting precompiled models for TensorFlow Serving
- 17.1. Exporting non-pipelined models defined inside a function
  - 17.1.1. Example of exporting non-pipelined model defined inside a function
  - 17.1.2. Example of exporting non-pipelined model defined inside a function with additional preprocessing and postprocessing steps
- 17.2. Exporting pipelined models defined as a list of functions
  - 17.2.1. Pipeline example
  - 17.2.2. Pipeline example with preprocessing and postprocessing steps
- 17.3. Exporting Keras models
- 17.4. Running the model in TensorFlow Serving
18. Retrieving information about compilation and execution
- 18.1. TensorFlow options for reporting
- 18.2. XLA graph file naming
19. Keras with IPUs
- 19.1. Single IPU models
- 19.2. Using steps_per_execution
- 19.3. Gradient accumulation
- 19.4. Model parallelism
- 19.5. Automatic data parallelism
- 19.6. Asynchronous callbacks
- 19.7. Configuring Infeeds and Outfeed
- 19.8. Saving and loading Keras models
- 19.9. Exporting precompiled Keras models for TensorFlow Serving
- 19.10. IPU-specific Keras layers and optimizers
- 19.11. Implementation details
20. IPU TensorFlow Addons
- 20.1. Introduction
- 20.2. Keras layers
- 20.3. Optimizers
21. TensorFlow API changes
- 21.1. Release 3.0
  - 21.1.1. Non-breaking changes
    - Deprecated modules
- 21.2. Release 2.6
  - 21.2.1. Breaking changes
    - Removal of deprecated APIs
  - 21.2.2. Non-breaking changes
    - IPU Keras changes
- 21.3. Release 2.5
  - 21.3.1. Breaking changes
  - 21.3.2. Non-breaking changes
- 21.4. Release 2.4
  - 21.4.1. Breaking changes
    - Summary ops
    - Removal of deprecated members
  - 21.4.2. Non-breaking changes
- 21.5. Release 2.3
  - 21.5.1. Breaking changes
    - Custom user op metadata interface updates
    - The verified transfers feature has been removed
  - 21.5.2. Non-breaking changes
- 21.6. Release 2.2
  - 21.6.1. Breaking changes
  - 21.6.2. Non-breaking changes
- 21.7. Release 2.1
  - 21.7.1. Breaking changes
  - 21.7.2. Non-breaking changes
- 21.8. Release 2.0
  - 21.8.1. Breaking changes
  - 21.8.2. Non-breaking changes
22. TensorFlow Python API
- 22.1. Operations and utilities related to the Graphcore IPU
- 22.2. Distribution strategy for a single system
- 22.3. Compiler interface
- 22.4. Scoping contexts
- 22.5. Infeed queue
- 22.6. Outfeed queue
- 22.7. General utilities
- 22.8. Configuration utilities
- 22.9. Looping utilities
- 22.10. Distributed training
- 22.11. Horovod
- 22.12. Serving utilities
- 22.13. Datasets
  - 22.13.1. Dataset benchmarking
  - 22.13.2. Dataset wrappers
- 22.14. Estimators
  - 22.14.1. IPUEstimator
  - 22.14.2. IPUPipelineEstimator
  - 22.14.3. Run configs
  - 22.14.4. Session run hooks
- 22.15. Keras
  - 22.15.1. IPU specific Keras extensions
- 22.16. Keras layers
  - 22.16.1. Keras layer specializations for the Graphcore IPU
- 22.17. Keras losses
  - 22.17.1. Keras loss functions for the Graphcore IPU
- 22.18. Keras optimizers
  - 22.18.1. Keras Optimizer wrappers for the Graphcore IPU
- 22.19. Operators
  - 22.19.1. Control flow operations.
  - 22.19.2. Custom operations
  - 22.19.3. Functional operators
  - 22.19.4. Image operations
  - 22.19.5. Graphcore utility operations
  - 22.19.6. IPU specific maths operations
  - 22.19.7. Pipelining operators
  - 22.19.8. Popnn primitive neural network operators
  - 22.19.9. Popnn normalization operators
  - 22.19.10. Popops all to all and all gather operators
  - 22.19.11. Popops cross replica operators
  - 22.19.12. Popops embedding operators
  - 22.19.13. Popops reduce scatter operator
  - 22.19.14. Popops within replica operators
  - 22.19.15. Poprand operators
  - 22.19.16. Utility operations to be used in replicated mode
  - 22.19.17. Slicing operators
  - 22.19.18. Statistics operators
  - 22.19.19. Embedded application runtime
- 22.20. Optimisers
  - 22.20.1. Helper classes and methods for gradient accumulation.
  - 22.20.2. Optimizer classes for the Graphcore IPU
- 22.21. Sharding
  - 22.21.1. Utility functions for sharding graphs
23. TensorFlow operators supported by the IPU
24. Keras API changes
- 24.1. Release 2.6
25. Keras Python API
- 25.1. IPU specific Keras integration
- 25.2. IPU specific Keras extensions
26. IPU TensorFlow Addons API changes
- 26.1. Release 3.0
  - 26.1.1. Breaking changes
- 26.2. Release 2.5
  - 26.2.1. Non-breaking changes
    - RNN available_memory_proportion_fwd/available_memory_proportion_bwd deprecated
- 26.3. Release 2.4
27. IPU TensorFlow Addons Python API
- 27.1. Keras Layers
  - 27.1.1. Keras layers made for IPU TensorFlow
- 27.2. Keras Optimizers
  - 27.2.1. Keras optimizers made for IPU TensorFlow
- 27.3. Legacy TensorFlow Layers
  - 27.3.1. TensorFlow layers made for IPU TensorFlow
- 27.4. Legacy TensorFlow Optimizers
  - 27.4.1. Optimizers made for IPU TensorFlow
28. Resources
- 28.1. Graphcore
- 28.2. TensorFlow
- 28.3. Other
29. Legal notices