Targeting the IPU from TensorFlow 2
Version: 3.1.0
1. Introduction
1.1. Document overview
2. Targeting the Poplar XLA device
2.1. Supported types
2.2. Device selection
2.3. Configuring system options
2.3.1. TF_POPLAR_FLAGS environment variable
2.4. Supported operations
2.5. Unsupported operations
2.6. Error Handling
2.6.1. Construction and compilation errors
2.6.2. Runtime errors
3. Support for TensorFlow 2
3.1. IPUStrategy
3.2. Execution modes
3.2.1. Graph mode with @tf.function
3.2.2. Eager mode
3.3. On-device loops
4. Keras with IPUs
5. Compiling and pre-compiling executables
5.1. Caching of compiled executables
5.2. Pre-compiling executables
5.2.1. Unsupported Operations
6. Training a model
6.1. Training loops, data sets and feed queues
6.2. Optional simplification of infeeds and outfeeds
6.3. Accessing outfeed queue results during execution
6.4. Replicated graphs
6.4.1. Selecting the number of replicas
6.4.2. Performing parameter updates
6.5. Pipelined training
6.5.1. Grouped scheduling
6.5.2. Interleaved scheduling
6.5.3. Sequential scheduling
6.5.4. Pipeline stage inputs and outputs
6.5.5. Applying an optimiser to the graph
6.5.6. Device mapping
6.5.7. Concurrent pipeline stages
6.6. Recomputation
6.7. Gradient accumulation
6.7.1. Optimizers
6.7.2. Pipelining
6.7.3. Accumulation data type
6.8. Optimizer state offloading
6.9. Replicated tensor sharding
6.10. Dataset benchmarking
6.10.1. Accessing the JSON data
7. Efficient IPU I/O
7.1. Prefetch elements
7.2. I/O Tiles
7.3. uint8 data
8. Example using IPUEstimator
9. Example using IPUPipelineEstimator
10. Distributed training
10.1. PopDistStrategy examples
10.2. Limitations and known issues
11. Half-precision floating point and stochastic rounding
11.1. Controlling the half-precision floating-point unit
11.2. Resetting the global random number seed
11.3. Debugging numerical issues
12. IPU-optimised operations
12.1. Image operations
12.2. Matmul serialisation
12.3. Dropout
12.4. Embedding lookup
12.5. Group normalisation
12.6. Instance normalisation
12.7. Layer normalisation
12.8. GeLU activation
12.9. Sequence slice
12.10. Histogram
13. IPU Outlined Functions
13.1. Usage
13.2. Examples
13.2.1. Models with common structures
13.2.2. Serializing large operations
14. Writing custom operations
14.1. Custom operation on the IPU
14.1.1. Building the Poplar graph
14.1.2. Gradient builders
14.1.3. Metadata
14.1.4. Compiling the IPU code
API level
PopLibs library code
Compiling the library file
14.1.5. Using the custom op in TensorFlow
14.1.6. Tensor allocation
14.1.7. Examples
In-place operations
Operation attributes
Custom codelet
14.2. Custom host CPU operations
14.2.1. Gradient callback
15. IPU host embeddings
15.1. Usage
15.2. Example
15.3. Experimental functionality: IPU embeddings in remote buffers
15.3.1. Partitioning strategies
Token strategy
Encoding strategy
Choosing a strategy for your application
16. IPU embedded application runtime
16.1. Usage
16.2. Pipelining and I/O tiles
16.2.1. Parallel requests
16.2.2. Timeout
16.2.3. Engine restarts
16.3. Example
16.4. Error Handling
16.4.1. Runtime errors
17. Exporting precompiled models for TensorFlow Serving
17.1. Exporting non-pipelined models defined inside a function
17.1.1. Example of exporting non-pipelined model defined inside a function
17.1.2. Example of exporting non-pipelined model defined inside a function with additional preprocessing and postprocessing steps
17.2. Exporting pipelined models defined as a list of functions
17.2.1. Pipeline example
17.2.2. Pipeline example with preprocessing and postprocessing steps
17.3. Exporting Keras models
17.4. Running the model in TensorFlow Serving
18. Retrieving information about compilation and execution
18.1. TensorFlow options for reporting
18.2. XLA graph file naming
19. Keras with IPUs
19.1. Single IPU models
19.2. Using steps_per_execution
19.3. Gradient accumulation
19.4. Model parallelism
19.4.1. Sequential model
Pipelining a model containing nested models
19.4.2. Functional model
Pipelining a model you are writing yourself
Pipelining an existing functional model
19.4.3. Model subclass
Pipelining a model you are writing yourself
Pipelining an existing model
19.4.4. Pipelining options
19.5. Automatic data parallelism
19.6. Asynchronous callbacks
19.7. Configuring Infeeds and Outfeed
19.8. Saving and loading Keras models
19.9. Exporting precompiled Keras models for TensorFlow Serving
19.9.1. Non-pipelined Keras model example
19.9.2. Non-pipelined Keras model example with additional preprocessing and postprocessing steps
19.9.3. Pipelined Keras model example
19.9.4. Pipelined Keras model example with additional preprocessing and postprocessing steps
19.10. IPU-specific Keras layers and optimizers
19.11. Implementation details
19.12. Automatic loss scaling
20. IPU TensorFlow Addons
20.1. Introduction
20.2. Keras layers
20.2.1. IPU implementations of standard Keras layers
20.2.2. Layers without upstream equivalents
20.2.3. Code example
20.3. Optimizers
21. TensorFlow API changes
21.1. Release 3.1
21.1.1. Breaking changes
Removal of deprecated Keras API
Removal of deprecated Horovod API
21.2. Release 3.0
21.2.1. Non-breaking changes
Deprecated modules
21.3. Release 2.6
21.3.1. Breaking changes
Removal of deprecated APIs
21.3.2. Non-breaking changes
IPU Keras changes
21.4. Release 2.5
21.4.1. Breaking changes
IPU Keras changes
Removal of deprecated APIs
Other
21.4.2. Non-breaking changes
Deprecated layers
Deprecated pipeline and gradient_accumulation options
RNN available_memory_proportion_fwd/available_memory_proportion_bwd deprecated
21.5. Release 2.4
21.5.1. Breaking changes
Summary ops
Removal of deprecated members
21.5.2. Non-breaking changes
21.6. Release 2.3
21.6.1. Breaking changes
Custom user op metadata interface updates
The verified transfers feature has been removed
21.6.2. Non-breaking changes
21.7. Release 2.2
21.7.1. Breaking changes
C++ Poplar TensorFlow libraries are private by default
Reports removed from ipu events
TensorFlow 2.1 to TensorFlow 2.4 Migration
21.7.2. Non-breaking changes
IPULoggingTensorHook replication_factor deprecated
IPUInfeedQueue/IPUOutfeedQueue/IPULoggingTensorHook feed_name deprecated
Change of output location for profiling information
Warning when epsilon value is too low
21.8. Release 2.1
21.8.1. Breaking changes
IPUPipelineEstimator change
Autosharding removed
Old IPU option configuration API changes
IPU Keras changes [TensorFlow 2]
21.8.2. Non-breaking changes
Recompute suggestions deprecated
IPUInfeedQueue/IPUOutfeedQueue replication_factor deprecated
IPUInfeedQueue data_to_prefetch deprecated
IPUOutfeedQueue data_to_prefetch deprecated
CTC loss ops deprecated
New configuration API
Support for grouped collectives
Environment variable changes
21.9. Release 2.0
21.9.1. Breaking changes
21.9.2. Non-breaking changes
IPUPipelineEstimator change
Autosharding deprecated
IPU config change
IPU Keras changes [TensorFlow 2]
22. TensorFlow Python API
22.1. Operations and utilities related to the Graphcore IPU
22.2. Distribution strategy for a single system
22.3. Compiler interface
22.4. Scoping contexts
22.5. Infeed queue
22.6. Outfeed queue
22.7. General utilities
22.8. Configuration utilities
22.9. Looping utilities
22.10. Distribution using PopDist
22.11. Serving utilities
22.12. Datasets
22.12.1. Dataset benchmarking
22.12.2. Dataset wrappers
22.13. Estimators
22.13.1. IPUEstimator
22.13.2. IPUPipelineEstimator
22.13.3. Run configs
22.13.4. Session run hooks
22.14. Operators
22.14.1. Control flow operations.
22.14.2. Custom operations
22.14.3. Functional operators
22.14.4. Image operations
22.14.5. Graphcore utility operations
22.14.6. IPU specific maths operations
22.14.7. Pipelining operators
22.14.8. Popnn primitive neural network operators
22.14.9. Popnn normalization operators
22.14.10. Popops all to all and all gather operators
22.14.11. Popops cross replica operators
22.14.12. Popops embedding operators
22.14.13. Popops reduce scatter operator
22.14.14. Popops within replica operators
22.14.15. Poprand operators
22.14.16. Utility operations to be used in replicated mode
22.14.17. Slicing operators
22.14.18. Statistics operators
22.14.19. Embedded application runtime
22.15. Optimisers
22.15.1. Helper classes and methods for gradient accumulation.
22.15.2. Optimizer classes for the Graphcore IPU
22.16. Sharding
22.16.1. Utility functions for sharding graphs
23. TensorFlow operators supported by the IPU
24. Keras API changes
24.1. Release 2.6
25. Keras Python API
25.1. IPU specific Keras integration
25.2. IPU specific Keras extensions
25.3. Keras Optimizer specializations for the Graphcore IPU
26. IPU TensorFlow Addons API changes
26.1. Release 3.0
26.1.1. Breaking changes
26.2. Release 2.5
26.2.1. Non-breaking changes
RNN available_memory_proportion_fwd/available_memory_proportion_bwd deprecated
26.3. Release 2.4
27. IPU TensorFlow Addons Python API
27.1. Keras Layers
27.1.1. Keras layers made for IPU TensorFlow
27.2. Keras Optimizers
27.2.1. Keras optimizers made for IPU TensorFlow
27.3. Legacy TensorFlow Layers
27.3.1. TensorFlow layers made for IPU TensorFlow
27.4. Legacy TensorFlow Optimizers
27.4.1. Optimizers made for IPU TensorFlow
28. Resources
28.1. Graphcore
28.2. TensorFlow
28.3. Other
29. Legal notices
Targeting the IPU from TensorFlow 2
24.
Keras API changes
24.1.
Release 2.6
First IPU Keras release.