Targeting the IPU from TensorFlow 2
- 1. Introduction
- 2. Targeting the Poplar XLA device
- 3. Support for TensorFlow 2
- 4. Keras with IPUs
- 5. Compiling and pre-compiling executables
- 6. Training a model
- 6.1. Training loops, data sets and feed queues
- 6.2. Optional simplification of infeeds and outfeeds
- 6.3. Accessing outfeed queue results during execution
- 6.4. Replicated graphs
- 6.5. Pipelined training
- 6.6. Recomputation
- 6.7. Gradient accumulation
- 6.8. Optimizer state offloading
- 6.9. Replicated tensor sharding
- 6.10. Dataset benchmarking
- 7. Efficient IPU I/O
- 8. Example using IPUEstimator
- 9. Example using IPUPipelineEstimator
- 10. Distributed training
- 11. Half-precision floating point and stochastic rounding
- 12. IPU-optimised operations
- 13. IPU Outlined Functions
- 14. Writing custom operations
- 15. IPU host embeddings
- 16. IPU embedded application runtime
- 17. Exporting precompiled models for TensorFlow Serving
- 18. Retrieving information about compilation and execution
- 19. Keras with IPUs
- 19.1. Single IPU models
- 19.2. Using steps_per_execution
- 19.3. Gradient accumulation
- 19.4. Model parallelism
- 19.5. Automatic data parallelism
- 19.6. Asynchronous callbacks
- 19.7. Configuring Infeeds and Outfeed
- 19.8. Saving and loading Keras models
- 19.9. Exporting precompiled Keras models for TensorFlow Serving
- 19.10. IPU-specific Keras layers and optimizers
- 19.11. Implementation details
- 19.12. Automatic loss scaling
- 20. IPU TensorFlow Addons
- 21. TensorFlow API changes
- 22. TensorFlow Python API
- 22.1. Operations and utilities related to the Graphcore IPU
- 22.2. Distribution strategy for a single system
- 22.3. Compiler interface
- 22.4. Scoping contexts
- 22.5. Infeed queue
- 22.6. Outfeed queue
- 22.7. General utilities
- 22.8. Configuration utilities
- 22.9. Looping utilities
- 22.10. Distribution using PopDist
- 22.11. Serving utilities
- 22.12. Datasets
- 22.13. Estimators
- 22.14. Operators
- 22.14.1. Control flow operations.
- 22.14.2. Custom operations
- 22.14.3. Functional operators
- 22.14.4. Image operations
- 22.14.5. Graphcore utility operations
- 22.14.6. IPU specific maths operations
- 22.14.7. Pipelining operators
- 22.14.8. Popnn primitive neural network operators
- 22.14.9. Popnn normalization operators
- 22.14.10. Popops all to all and all gather operators
- 22.14.11. Popops cross replica operators
- 22.14.12. Popops embedding operators
- 22.14.13. F8 operations
- 22.14.14. Popops reduce scatter operator
- 22.14.15. Popops within replica operators
- 22.14.16. Poprand operators
- 22.14.17. Utility operations to be used in replicated mode
- 22.14.18. Slicing operators
- 22.14.19. Statistics operators
- 22.14.20. Embedded application runtime
- 22.15. Optimisers
- 22.16. Sharding
- 23. TensorFlow operators supported by the IPU
- 24. Keras API changes
- 25. Keras Python API
- 26. IPU TensorFlow Addons API changes
- 27. IPU TensorFlow Addons Python API
- 28. Resources
- 29. Legal notices