Targeting the IPU from TensorFlow 2
Version: 2.5.1
1. Introduction
1.1. Document overview
2. Targeting the Poplar XLA device
2.1. Supported types
2.2. Device selection
2.3. Configuring system options
2.3.1. TF_POPLAR_FLAGS environment variable
2.4. Supported operations
2.5. Unsupported operations
2.6. Error Handling
2.6.1. Construction and compilation errors
2.6.2. Runtime errors
3. Support for TensorFlow 2
3.1. IPUStrategy
3.2. Execution modes
3.2.1. Graph mode with @tf.function
3.2.2. Eager mode
3.3. On-device loops
4. Keras with IPUs
4.1. Single IPU models
4.2. Using steps_per_execution
4.3. Gradient accumulation
4.4. Model parallelism
4.4.1. Sequential model
4.4.2. Functional model
Pipelining a model you are writing yourself
Pipelining an existing functional model
4.4.3. Model subclass
Pipelining a model you are writing yourself
Pipelining an existing model
4.4.4. Pipelining options
4.5. Automatic data parallelism
4.6. Asynchronous callbacks
4.7. Configuring Infeeds and Outfeed
4.8. Saving and loading Keras models
4.9. Implementation details
5. Compiling and pre-compiling executables
5.1. Caching of compiled executables
5.2. Pre-compiling executables
5.2.1. Unsupported Operations
6. Training a model
6.1. Training loops, data sets and feed queues
6.2. Optional simplification of infeeds and outfeeds
6.3. Accessing outfeed queue results during execution
6.4. Replicated graphs
6.4.1. Selecting the number of replicas
6.4.2. Performing parameter updates
6.5. Pipelined training
6.5.1. Grouped scheduling
6.5.2. Interleaved scheduling
6.5.3. Sequential scheduling
6.5.4. Pipeline stage inputs and outputs
6.5.5. Applying an optimiser to the graph
6.5.6. Device mapping
6.5.7. Concurrent pipeline stages
6.6. Gradient accumulation
6.6.1. Optimizers
6.6.2. Pipelining
6.6.3. Accumulation data type
6.7. Optimizer state offloading
6.8. Dataset benchmarking
6.8.1. Accessing the JSON data
7. Efficient IPU I/O
7.1. Prefetch elements
7.2. I/O Tiles
8. Example using IPUEstimator
9. Example using IPUPipelineEstimator
10. Distributed training
10.1. Example using IPUMultiWorkerStrategy
10.1.1. The input function
10.1.2. The model function
10.1.3. Cluster definition
10.1.4. Complete example
10.2. Distributed training with Horovod
10.3. Launching Horovod training
10.4. Complete Horovod example
11. Half-precision floating point and stochastic rounding
11.1. Controlling the half-precision floating-point unit
11.2. Resetting the global random number seed
11.3. Debugging numerical issues
12. IPU-optimised operations
12.1. Image operations
12.2. Matmul serialisation
12.3. LSTM and GRU
12.4. Dropout
12.5. Embedding lookup
12.6. Group normalisation
12.7. Instance normalisation
12.8. Layer normalisation
12.9. GeLU activation
12.10. Sequence slice
12.11. Histogram
13. IPU Outlined Functions
13.1. Usage
13.2. Examples
13.2.1. Models with common structures
13.2.2. Serializing large operations
14. Writing custom operations
14.1. Custom operation on the IPU
14.1.1. Building the Poplar graph
14.1.2. Gradient builders
14.1.3. Metadata
14.1.4. Compiling the IPU code
API level
PopLibs library code
Compiling the library file
14.1.5. Using the custom op in TensorFlow
14.1.6. Tensor allocation
14.1.7. Examples
In-place operations
Operation attributes
Custom codelet
14.2. Custom host CPU operations
14.2.1. Gradient callback
15. IPU host embeddings
15.1. Usage
15.2. Example
15.3. Experimental functionality: IPU embeddings in remote buffers
15.3.1. Partitioning strategies
Token strategy
Encoding strategy
Choosing a strategy for your application
16. IPU embedded application runtime
16.1. Usage
16.2. Pipelining and I/O tiles
16.2.1. Parallel requests
16.2.2. Timeout
16.2.3. Engine restarts
16.3. Example
16.4. Error Handling
16.4.1. Runtime errors
17. Retrieving information about compilation and execution
17.1. TensorFlow options for reporting
17.2. XLA graph file naming
18. IPU TensorFlow Addons
18.1. Introduction
18.2. Keras layers
18.2.1. IPU implementations of standard Keras layers
18.2.2. Layers without upstream equivalents
18.2.3. Code example
18.3. Optimizers
19. TensorFlow API changes
19.1. Release 2.5
19.1.1. Breaking changes
IPU Keras changes
Removal of deprecated APIs
Other
19.1.2. Non-breaking changes
Deprecated layers
Deprecated pipeline and gradient_accumulation options
RNN available_memory_proportion_fwd/available_memory_proportion_bwd deprecated
19.2. Release 2.4
19.2.1. Breaking changes
Summary ops
Removal of deprecated members
19.2.2. Non-breaking changes
19.3. Release 2.3
19.3.1. Breaking changes
Custom user op metadata interface updates
The verified transfers feature has been removed
19.3.2. Non-breaking changes
19.4. Release 2.2
19.4.1. Breaking changes
C++ Poplar TensorFlow libraries are private by default
Reports removed from ipu events
TensorFlow 2.1 to TensorFlow 2.4 Migration
19.4.2. Non-breaking changes
IPULoggingTensorHook replication_factor deprecated
IPUInfeedQueue/IPUOutfeedQueue/IPULoggingTensorHook feed_name deprecated
Change of output location for profiling information
Warning when epsilon value is too low
19.5. Release 2.1
19.5.1. Breaking changes
IPUPipelineEstimator change
Autosharding removed
Old IPU option configuration API changes
IPU Keras changes [TensorFlow 2]
19.5.2. Non-breaking changes
Recompute suggestions deprecated
IPUInfeedQueue/IPUOutfeedQueue replication_factor deprecated
IPUInfeedQueue data_to_prefetch deprecated
IPUOutfeedQueue data_to_prefetch deprecated
CTC loss ops deprecated
New configuration API
Support for grouped collectives
Environment variable changes
19.6. Release 2.0
19.6.1. Breaking changes
19.6.2. Non-breaking changes
IPUPipelineEstimator change
Autosharding deprecated
IPU config change
IPU Keras changes [TensorFlow 2]
20. TensorFlow Python API
20.1. Operations and utilities related to the Graphcore IPU
20.2. Distribution strategy for a single system
20.3. Compiler interface
20.4. Scoping contexts
20.5. Infeed queue
20.6. Outfeed queue
20.7. General utilities
20.8. Configuration utilities
20.9. Looping utilities
20.10. Distributed training
20.11. Horovod
20.12. Serving utilities
20.13. Datasets
20.13.1. Dataset benchmarking
20.13.2. Dataset wrappers
20.14. Estimators
20.14.1. IPUEstimator
20.14.2. IPUPipelineEstimator
20.14.3. Run configs
20.14.4. Session run hooks
20.15. Keras
20.15.1. IPU specific Keras extensions
20.16. Keras layers
20.16.1. Keras layer specializations for the Graphcore IPU
20.17. Keras losses
20.17.1. Keras loss functions for the Graphcore IPU
20.18. Keras optimizers
20.18.1. Keras Optimizer wrappers for the Graphcore IPU
20.19. Operators
20.19.1. Control flow operations.
20.19.2. Custom operations
20.19.3. Functional operators
20.19.4. Image operations
20.19.5. Graphcore utility operations
20.19.6. IPU specific maths operations
20.19.7. Pipelining operators
20.19.8. Popnn primitive neural network operators
20.19.9. Popnn normalization operators
20.19.10. Popnn recurrent neural network operators
20.19.11. Popops all to all and all gather operators
20.19.12. Popops cross replica operators
20.19.13. Popops embedding operators
20.19.14. Popops reduce scatter operator
20.19.15. Popops within replica operators
20.19.16. Poprand operators
20.19.17. Utility operations to be used in replicated mode
20.19.18. Slicing operators
20.19.19. Statistics operators
20.19.20. Embedded application runtime
20.20. Optimisers
20.20.1. Optimizer classes for the Graphcore IPU
20.21. Sharding
20.21.1. Utility functions for sharding graphs
21. TensorFlow operators supported by the IPU
22. IPU TensorFlow Addons API changes
22.1. Release 2.5
22.1.1. Non-breaking changes
RNN available_memory_proportion_fwd/available_memory_proportion_bwd deprecated
22.2. Release 2.4
23. IPU TensorFlow Addons Python API
23.1. Keras Layers
23.1.1. Keras layers made for IPU TensorFlow
23.2. Keras Optimizers
23.2.1. Keras optimizers made for IPU TensorFlow
23.3. Legacy TensorFlow Layers
23.3.1. TensorFlow layers made for IPU TensorFlow
23.4. Legacy TensorFlow Optimizers
23.4.1. Optimizers made for IPU TensorFlow
24. Resources
24.1. Graphcore
24.2. TensorFlow
24.3. Other
25. Trademarks & copyright
Targeting the IPU from TensorFlow 2
Please activate JavaScript to enable the search functionality.