Targeting the IPU from TensorFlow 2
Version: 2.4.0
1. Introduction
1.1. Document overview
2. Targeting the Poplar XLA device
2.1. Supported types
2.2. Device selection
2.3. Configuring system options
2.3.1. TF_POPLAR_FLAGS environment variable
2.4. Supported operations
2.5. Unsupported operations
2.6. Error Handling
2.6.1. Construction and compilation errors
2.6.2. Runtime errors
3. Support for TensorFlow 2
3.1. IPUStrategy
3.2. Execution modes
3.2.1. Graph mode with @tf.function
3.2.2. Eager mode
3.3. On-device loops
4. Keras with IPUs
4.1. Single IPU models
4.2. Using steps_per_execution
4.3. Gradient accumulation
4.4. Model parallelism
4.4.1. Sequential model
4.4.2. Functional model
Pipelining a model you are writing yourself
Pipelining an existing functional model
4.5. Automatic data parallelism
4.6. Asynchronous callbacks
4.7. Configuring Infeeds and Outfeed
4.8. Porting models from TensorFlow 2.1
4.8.1. TF2.1
4.8.2. TF2.4
4.9. Implementation details
5. Compiling and pre-compiling executables
5.1. Caching of compiled executables
5.2. Pre-compiling executables
5.2.1. Unsupported Operations
6. Training a model
6.1. Training loops, data sets and feed queues
6.2. Accessing outfeed queue results during execution
6.3. Replicated graphs
6.3.1. Selecting the number of replicas
6.3.2. Performing parameter updates
6.4. Pipelined training
6.4.1. Grouped scheduling
6.4.2. Interleaved scheduling
6.4.3. Sequential scheduling
6.4.4. Pipeline stage inputs and outputs
6.4.5. Applying an optimiser to the graph
6.4.6. Device mapping
6.4.7. Concurrent pipeline stages
6.5. Gradient accumulation
6.5.1. Optimizers
6.5.2. Pipelining
6.5.3. Accumulation data type
6.6. Optimizer state offloading
6.7. Dataset benchmarking
6.7.1. Accessing the JSON data
7. Efficient IPU I/O
7.1. Prefetch elements
7.2. I/O Tiles
8. Example using IPUEstimator
9. Example using IPUPipelineEstimator
10. Distributed training
10.1. Example using IPUMultiWorkerStrategy
10.1.1. The input function
10.1.2. The model function
10.1.3. Cluster definition
10.1.4. Complete example
10.2. Distributed training with Horovod
10.3. Launching Horovod training
10.4. Complete Horovod example
11. Half-precision floating point and stochastic rounding
11.1. Controlling the half-precision floating-point unit
11.2. Resetting the global random number seed
11.3. Debugging numerical issues
12. IPU-optimised operations
12.1. LSTM and GRU
12.2. Dropout
12.3. Embedding lookup
12.4. Group normalisation
12.5. Instance normalisation
12.6. Layer normalisation
12.7. GeLU activation
12.8. Sequence slice
12.9. Histogram
13. IPU Outlined Functions
13.1. Usage
13.2. Examples
13.2.1. Models with common structures
13.2.2. Serializing large operations
14. Writing custom operations
14.1. Custom operation on the IPU
14.1.1. Building the Poplar graph
14.1.2. Gradient builders
14.1.3. Metadata
14.1.4. Compiling the IPU code
API level
PopLibs library code
Compiling the library file
14.1.5. Using the custom op in TensorFlow
14.1.6. Tensor allocation
14.1.7. Examples
In-place operations
Operation attributes
Custom codelet
14.2. Custom host CPU operations
14.2.1. Gradient callback
15. IPU host embeddings
15.1. Usage
15.2. Example
15.3. Experimental functionality: IPU embeddings in remote buffers
15.3.1. Partitioning strategies
Token strategy
Encoding strategy
Choosing a strategy for your application
16. IPU embedded application runtime
16.1. Usage
16.2. Pipelining and I/O tiles
16.2.1. Parallel requests
16.2.2. Timeout
16.2.3. Engine restarts
16.3. Example
16.4. Error Handling
16.4.1. Runtime errors
17. Retrieving information about compilation and execution
17.1. TensorFlow options for reporting
17.2. XLA graph file naming
18. IPU TensorFlow Addons
18.1. Introduction
18.2. Keras layers
18.3. Optimizers
19. API changes
19.1. Release 2.4
19.1.1. Breaking changes
Summary ops
Removal of deprecated members
19.1.2. Non-breaking changes
19.2. Release 2.3
19.2.1. Breaking changes
Custom user op metadata interface updates
The verified transfers feature has been removed
19.2.2. Non-breaking changes
19.3. Release 2.2
19.3.1. Breaking changes
C++ Poplar TensorFlow libraries are private by default
Reports removed from ipu events
TensorFlow 2.1 to TensorFlow 2.4 Migration
19.3.2. Non-breaking changes
IPULoggingTensorHook replication_factor deprecated
IPUInfeedQueue/IPUOutfeedQueue/IPULoggingTensorHook feed_name deprecated
Change of output location for profiling information
Warning when epsilon value is too low
19.4. Release 2.1
19.4.1. Breaking changes
IPUPipelineEstimator change
Autosharding removed
Old IPU option configuration API changes
IPU Keras changes [TensorFlow 2]
19.4.2. Non-breaking changes
Recompute suggestions deprecated
IPUInfeedQueue/IPUOutfeedQueue replication_factor deprecated
IPUInfeedQueue data_to_prefetch deprecated
IPUOutfeedQueue data_to_prefetch deprecated
CTC loss ops deprecated
New configuration API
Support for grouped collectives
Environment variable changes
19.5. Release 2.0
19.5.1. Breaking changes
19.5.2. Non-breaking changes
IPUPipelineEstimator change
Autosharding deprecated
IPU config change
IPU Keras changes [TensorFlow 2]
20. Python API
20.1. Datasets
20.2. Estimators
20.3. Keras
20.4. Keras layers
20.5. Keras losses
20.6. Keras optimizers
20.7. Operators
20.8. Optimisers
20.9. Sharding
21. TensorFlow operators supported by the IPU
22. IPU TensorFlow Addons API changes
22.1. Release 2.4
23. IPU TensorFlow Addons Python API
23.1. Keras Layers
23.2. Keras Optimizers
23.3. Legacy TensorFlow Layers
23.4. Legacy TensorFlow Optimizers
23.4.1. Optimizers made for IPU TensorFlow
24. Resources
24.1. Graphcore
24.2. TensorFlow
24.3. Other
25. Trademarks & copyright
Targeting the IPU from TensorFlow 2
23.
IPU TensorFlow Addons Python API
23.1.
Keras Layers
23.2.
Keras Optimizers
23.3.
Legacy TensorFlow Layers
23.4.
Legacy TensorFlow Optimizers
23.4.1.
Optimizers made for IPU TensorFlow