Targeting the IPU from TensorFlow 2
Version: 2.2.0
1. Introduction
1.1. Document overview
2. Targeting the Poplar XLA device
2.1. Supported types
2.2. Device selection
2.3. Configuring system options
2.3.1. TF_POPLAR_FLAGS environment variable
2.4. Supported operations
2.5. Unsupported operations
2.6. Error Handling
2.6.1. Construction and compilation errors
2.6.2. Runtime errors
3. Support for TensorFlow 2
3.1. IPUStrategy
3.2. Execution modes
3.2.1. Graph mode with @tf.function
3.2.2. Eager mode
3.3. On-device loops
4. Keras with IPUs
4.1. Single IPU models
4.2. Using steps_per_execution
4.3. Gradient accumulation
4.4. Model parallelism
4.4.1. Sequential model
4.4.2. Functional model
Pipelining a model you are writing yourself
Pipelining an existing functional model
4.5. Automatic data parallelism
4.6. Asynchronous callbacks
4.7. Porting models from TensorFlow 2.1
4.7.1. TF2.1
4.7.2. TF2.4
4.8. Implementation details
5. Compiling and pre-compiling executables
5.1. Caching of compiled executables
5.2. Pre-compiling executables
5.2.1. Unsupported Operations
6. Training a model
6.1. Training loops, data sets and feed queues
6.2. Accessing outfeed queue results during execution
6.3. Replicated graphs
6.3.1. Selecting the number of replicas
6.3.2. Performing parameter updates
6.4. Pipelined training
6.4.1. Sequential scheduling
6.4.2. Interleaved scheduling
6.4.3. Grouped scheduling
6.4.4. Pipeline stage inputs and outputs
6.4.5. Applying an optimiser to the graph
6.4.6. Device mapping
6.4.7. Concurrent pipeline stages
6.5. Gradient accumulation
6.5.1. Optimizers
6.5.2. Pipelining
6.5.3. Accumulation data type
6.6. Optimizer state offloading
6.7. Dataset benchmarking
6.7.1. Accessing the JSON data
7. Efficient IPU I/O
7.1. Prefetch elements
7.2. I/O Tiles
8. Example using IPUEstimator
9. Example using IPUPipelineEstimator
10. Distributed training
10.1. Example using IPUMultiWorkerStrategy
10.1.1. The input function
10.1.2. The model function
10.1.3. Cluster definition
10.1.4. Complete example
10.2. Distributed training with Horovod
10.3. Launching Horovod training
10.4. Complete Horovod example
11. Half-precision floating point and stochastic rounding
11.1. Controlling the half-precision floating-point unit
11.2. Resetting the global random number seed
11.3. Debugging numerical issues
12. IPU-optimised operations
12.1. LSTM and GRU
12.2. Dropout
12.3. Embedding lookup
12.4. Group normalisation
12.5. Instance normalisation
12.6. Layer normalisation
12.7. GeLU activation
12.8. Sequence slice
12.9. Histogram
13. IPU Outlined Functions
13.1. Usage
13.2. Examples
13.2.1. Models with common structures
13.2.2. Serializing large operations
14. Writing custom operations
14.1. Custom operation on the IPU
14.1.1. Building the Poplar graph
14.1.2. Gradient builders
14.1.3. Metadata
14.1.4. Compiling the IPU code
API level
PopLibs library code
Compiling the library file
14.1.5. Using the custom op in TensorFlow
14.1.6. Tensor allocation
14.1.7. Examples
In-place operations
Operation attributes
Custom codelet
14.2. Custom host CPU operations
14.2.1. Gradient callback
15. IPU host embeddings
15.1. Usage
15.2. Example
15.3. Experimental functionality: IPU embeddings in remote buffers
15.3.1. Partitioning strategies
Token strategy
Encoding strategy
Choosing a strategy for your application
16. Retrieving information about compilation and execution
16.1. TensorFlow options for reporting
16.2. Dumping auxiliary Poplar information
16.2.1. Poplar vertex graph
16.2.2. Poplar interval report
16.3. XLA graph file naming
17. API changes
17.1. Release 2.2
17.1.1. Breaking changes
C++ Poplar TensorFlow libraries are private by default
Reports removed from ipu events
TensorFlow 2.1 to TensorFlow 2.4 Migration
17.1.2. Non-breaking changes
IPULoggingTensorHook replication_factor deprecated
IPUInfeedQueue/IPUOutfeedQueue/IPULoggingTensorHook feed_name deprecated
Change of output location for profiling information
17.2. Release 2.1
17.2.1. Breaking changes
IPUPipelineEstimator change
Autosharding removed
IPU config change
IPU Keras changes [TensorFlow 2]
17.2.2. Non-breaking changes
Recompute suggestions deprecated
IPUInfeedQueue/IPUOutfeedQueue replication_factor deprecated
IPUInfeedQueue data_to_prefetch deprecated
IPUOutfeedQueue data_to_prefetch deprecated
CTC loss ops deprecated
New configuration API
Support for grouped collectives
Environment variable changes
17.3. Release 2.0
17.3.1. Breaking changes
17.3.2. Non-breaking changes
IPUPipelineEstimator change
Autosharding deprecated
IPU config change
IPU Keras changes [TensorFlow 2]
18. Python API
18.1. Operations and utilities related to the Graphcore IPU
18.2. Distribution strategy for a single system
18.3. Compiler interface
18.4. Scoping contexts
18.5. Infeed queue
18.6. Outfeed queue
18.7. General utilities
18.8. Configuration utilities
18.9. Looping utilities
18.10. Distributed training
18.11. Horovod
18.12. Datasets
18.12.1. Dataset benchmarking
18.12.2. Dataset wrappers
18.13. Estimators
18.13.1. IPUEstimator
18.13.2. IPUPipelineEstimator
18.13.3. Run configs
18.13.4. Session run hooks
18.14. Keras
18.14.1. IPU specific Keras extensions
18.15. Keras layers
18.15.1. Keras layer specializations for the Graphcore IPU
18.16. Keras losses
18.16.1. Keras loss functions for the Graphcore IPU
18.17. Keras optimizers
18.17.1. Keras Optimizer wrappers for the Graphcore IPU
18.18. Operators
18.18.1. Custom operations
18.18.2. Functional operators
18.18.3. Image operations
18.18.4. Graphcore utility operations
18.18.5. IPU specific maths operations
18.18.6. Pipelining operators
18.18.7. Popnn primitive neural network operators
18.18.8. Popnn normalization operators
18.18.9. Popnn recurrent neural network operators
18.18.10. Popops all to all and all gather operators
18.18.11. Popops cross replica operators
18.18.12. Popops embedding operators
18.18.13. Popops reduce scatter operator
18.18.14. Poprand operators
18.18.15. Utility operations to be used in replicated mode
18.18.16. Slicing operators
18.18.17. Statistics operators
18.18.18. Summary operations for IPUs
18.19. Optimisers
18.19.1. Optimizer classes for the Graphcore IPU
18.20. Sharding
18.20.1. Utility functions for sharding graphs
19. TensorFlow operators supported by the IPU
20. Resources
20.1. Graphcore
20.2. TensorFlow
20.3. Other
21. Index
22. Trademarks & copyright
Targeting the IPU from TensorFlow 2
Please activate JavaScript to enable the search functionality.