Targeting the IPU from TensorFlow 2
Version: 2.1.0
1. Introduction
1.1. Document overview
2. Targeting the Poplar XLA device
2.1. Supported types
2.2. Device selection
2.3. Configuring system options
2.3.1. TF_POPLAR_FLAGS environment variable
2.4. Supported operations
2.5. Unsupported operations
3. Compiling and pre-compiling executables
3.1. Caching of compiled executables
3.2. Pre-compiling executables
3.2.1. Unsupported Operations
4. Support for TensorFlow 2
4.1. Function annotation with @tf.function
4.2. IPUStrategy
4.3. Keras
4.3.1. Model class
4.3.2. Sequential class
4.3.3. PipelineModel class
4.3.4. PipelineSequential class
4.3.5. Custom training loops
5. TensorFlow 2 examples
5.1. Training on the IPU
5.2. Custom training function
5.3. Pipelined model
6. Training a model
6.1. Training loops, data sets and feed queues
6.2. Accessing outfeed queue results during execution
6.3. Replicated graphs
6.3.1. Selecting the number of replicas
6.3.2. Performing parameter updates
6.4. Pipelined training
6.4.1. Sequential scheduling
6.4.2. Interleaved scheduling
6.4.3. Grouped scheduling
6.4.4. Pipeline stage inputs and outputs
6.4.5. Applying an optimiser to the graph
6.4.6. Device mapping
6.5. Gradient accumulation
6.5.1. Optimizers
6.5.2. Pipelining
6.5.3. Accumulation data type
6.6. Optimizer state offloading
6.7. Dataset benchmarking
6.7.1. Accessing the JSON data
7. Efficient IPU I/O
7.1. Prefetch elements
7.2. I/O Tiles
8. Example using IPUEstimator
9. Example using IPUPipelineEstimator
10. Distributed training
10.1. Example using IPUMultiWorkerStrategy
10.1.1. The input function
10.1.2. The model function
10.1.3. Cluster definition
10.1.4. Complete example
10.2. Distributed training with Horovod
10.3. Launching Horovod training
10.4. Complete Horovod example
11. Half-precision floating point and stochastic rounding
11.1. Controlling the half-precision floating-point unit
11.2. Resetting the global random number seed
11.3. Debugging numerical issues
12. IPU-optimised operations
12.1. LSTM and GRU
12.2. Dropout
12.3. Embedding lookup
12.4. Group normalisation
12.5. Instance normalisation
12.6. Layer normalisation
12.7. GeLU activation
12.8. Sequence slice
12.9. Histogram
13. IPU Outlined Functions
13.1. Usage
13.2. Examples
13.2.1. Models with common structures
13.2.2. Serializing large operations
14. Writing custom operations
14.1. Custom operation on the IPU
14.1.1. Building the Poplar graph
14.1.2. Gradient builders
14.1.3. Metadata
14.1.4. Compiling the IPU code
API level
PopLibs library code
Compiling the library file
14.1.5. Using the custom op in TensorFlow
14.1.6. Tensor allocation
14.1.7. Examples
In-place operations
Operation attributes
Custom codelet
14.2. Custom host CPU operations
14.2.1. Gradient callback
15. IPU host embeddings
15.1. Usage
15.2. Example
15.3. Experimental functionality: IPU embeddings in remote buffers
15.3.1. Partitioning strategies
Token strategy
Encoding strategy
Choosing a strategy for your application
16. Retrieving information about compilation and execution
16.1. TensorFlow options for reporting
16.2. Dumping auxiliary Poplar information
16.2.1. Poplar vertex graph
16.2.2. Poplar interval report
16.3. XLA graph file naming
17. API changes
17.1. Release 2.1
17.1.1. Breaking changes
IPUPipelineEstimator change
Autosharding removed
IPU config change
IPU Keras changes [TensorFlow 2]
17.1.2. Non-breaking changes
Recompute suggestions deprecated
IPUInfeedQueue/IPUOutfeedQueue replication_factor deprecated
IPUInfeedQueue data_to_prefetch deprecated
IPUOutfeedQueue data_to_prefetch deprecated
CTC loss ops deprecated
New configuration API
Support for grouped collectives
Environment variable changes
17.2. Release 2.0
17.2.1. Breaking changes
17.2.2. Non-breaking changes
IPUPipelineEstimator change
Autosharding deprecated
IPU config change
IPU Keras changes [TensorFlow 2]
18. Deprecated profiling functionality
18.1. Adding an operation to get compilation and execution events
18.1.1. ipu_event_trace()
18.1.2. ipu_compile_summary(name, [op list])
18.2. Enabling tracing in the hardware configuration options
18.3. Extract the reports from the returned events
18.4. Producing reports for use with the PopVision Graph Analyser
18.4.1. COMPILE_BEGIN
18.4.2. COMPILE_END
Tensor map
18.4.3. EXECUTE
18.5. Using the IPU Model device for debugging
18.6. Reading the Poplar textual summary report
18.6.1. Target
18.6.2. Graph
18.6.3. Memory usage
18.7. Producing an ELF image of the compilation
19. Python API
19.1. Operations and utilities related to the Graphcore IPU
19.2. Distribution strategy for a single system
19.3. Compiler interface
19.4. Scoping contexts
19.5. Infeed queue
19.6. Outfeed queue
19.7. General utilities
19.8. Configuration utilities
19.9. Looping utilities
19.10. Distributed training
19.11. Horovod
19.12. Datasets
19.12.1. Dataset benchmarking
19.12.2. Dataset wrappers
19.13. Estimators
19.13.1. IPUEstimator
19.13.2. IPUPipelineEstimator
19.13.3. Run configs
19.13.4. Session run hooks
19.14. Keras
19.14.1. Keras API
19.14.2. Keras Model interfaces for IPU
19.15. Keras layers
19.15.1. Keras layer specializations for the Graphcore IPU
19.16. Keras losses
19.16.1. Keras loss functions for the Graphcore IPU
19.17. Keras optimizers
19.17.1. Keras Optimizer wrappers for the Graphcore IPU
19.18. Operators
19.18.1. Custom operations
19.18.2. Functional operators
19.18.3. Image operations
19.18.4. Graphcore utility operations
19.18.5. IPU specific maths operations
19.18.6. Pipelining operators
19.18.7. Popnn primitive neural network operators
19.18.8. Popnn normalization operators
19.18.9. Popnn recurrent neural network operators
19.18.10. Popops all to all and all gather operators
19.18.11. Popops cross replica operators
19.18.12. Popops embedding operators
19.18.13. Popops reduce scatter operator
19.18.14. Poprand operators
19.18.15. Utility operations to be used in replicated mode
19.18.16. Slicing operators
19.18.17. Statistics operators
19.18.18. Summary operations for IPUs
19.19. Optimisers
19.19.1. Optimizer classes for the Graphcore IPU
19.20. Sharding
19.20.1. Utility functions for sharding graphs
20. TensorFlow operators supported by the IPU
21. Resources
21.1. Graphcore
21.2. TensorFlow
21.3. Other
22. Index
23. Trademarks & copyright
Targeting the IPU from TensorFlow 2
Please activate JavaScript to enable the search functionality.