Targeting the IPU from TensorFlow 1
Version: 2.2.0
1. Introduction
1.1. Document overview
2. Tutorial
2.1. Preliminary graphs
2.2. A basic graph
2.2.1. Selecting hardware to run on
2.2.2. Running on the IPU Model simulator
2.3. Compiling the graph for the IPU
2.4. Sharding a graph
2.5. Adding variables
2.5.1. Troubleshooting
2.5.2. Note on the global_step counter
3. Targeting the Poplar XLA device
3.1. Supported types
3.2. Device selection
3.3. Configuring system options
3.3.1. TF_POPLAR_FLAGS environment variable
3.4. Supported operations
3.5. Unsupported operations
3.6. Error Handling
3.6.1. Construction and compilation errors
3.6.2. Runtime errors
4. Compiling and pre-compiling executables
4.1. Caching of compiled executables
4.2. Pre-compiling executables
4.2.1. Unsupported Operations
5. Training a model
5.1. Training loops, data sets and feed queues
5.2. Accessing outfeed queue results during execution
5.3. Replicated graphs
5.3.1. Selecting the number of replicas
5.3.2. Performing parameter updates
5.4. Pipelined training
5.4.1. Sequential scheduling
5.4.2. Interleaved scheduling
5.4.3. Grouped scheduling
5.4.4. Pipeline stage inputs and outputs
5.4.5. Applying an optimiser to the graph
5.4.6. Device mapping
5.4.7. Concurrent pipeline stages
5.5. Gradient accumulation
5.5.1. Optimizers
5.5.2. Pipelining
5.5.3. Accumulation data type
5.6. Optimizer state offloading
5.7. Dataset benchmarking
5.7.1. Accessing the JSON data
6. Efficient IPU I/O
6.1. Prefetch elements
6.2. I/O Tiles
7. Example using IPUEstimator
8. Example using IPUPipelineEstimator
9. Distributed training
9.1. Example using IPUMultiWorkerStrategy
9.1.1. The input function
9.1.2. The model function
9.1.3. Cluster definition
9.1.4. Complete example
9.2. Distributed training with Horovod
9.3. Launching Horovod training
9.4. Complete Horovod example
10. Half-precision floating point and stochastic rounding
10.1. Controlling the half-precision floating-point unit
10.2. Resetting the global random number seed
10.3. Debugging numerical issues
11. IPU-optimised operations
11.1. LSTM and GRU
11.2. Dropout
11.3. Embedding lookup
11.4. Group normalisation
11.5. Instance normalisation
11.6. Layer normalisation
11.7. GeLU activation
11.8. Sequence slice
11.9. Histogram
12. IPU Outlined Functions
12.1. Usage
12.2. Examples
12.2.1. Models with common structures
12.2.2. Serializing large operations
13. Writing custom operations
13.1. Custom operation on the IPU
13.1.1. Building the Poplar graph
13.1.2. Gradient builders
13.1.3. Metadata
13.1.4. Compiling the IPU code
API level
PopLibs library code
Compiling the library file
13.1.5. Using the custom op in TensorFlow
13.1.6. Tensor allocation
13.1.7. Examples
In-place operations
Operation attributes
Custom codelet
13.2. Custom host CPU operations
13.2.1. Gradient callback
14. IPU host embeddings
14.1. Usage
14.2. Example
14.3. Experimental functionality: IPU embeddings in remote buffers
14.3.1. Partitioning strategies
Token strategy
Encoding strategy
Choosing a strategy for your application
15. Retrieving information about compilation and execution
15.1. TensorFlow options for reporting
15.2. Dumping auxiliary Poplar information
15.2.1. Poplar vertex graph
15.2.2. Poplar interval report
15.3. XLA graph file naming
16. API changes
16.1. Release 2.2
16.1.1. Breaking changes
C++ Poplar TensorFlow libraries are private by default
Reports removed from ipu events
16.1.2. Non-breaking changes
IPULoggingTensorHook replication_factor deprecated
IPUInfeedQueue/IPUOutfeedQueue/IPULoggingTensorHook feed_name deprecated
Change of output location for profiling information
IPU Keras Layers deprecation in TensorFlow 1.15
16.2. Release 2.1
16.2.1. Breaking changes
IPUPipelineEstimator change
Autosharding removed
Old IPU option configuration API changes
IPU Keras changes [TensorFlow 2]
16.2.2. Non-breaking changes
Recompute suggestions deprecated
IPUInfeedQueue/IPUOutfeedQueue replication_factor deprecated
IPUInfeedQueue data_to_prefetch deprecated
IPUOutfeedQueue data_to_prefetch deprecated
CTC loss ops deprecated
New configuration API
Support for grouped collectives
Environment variable changes
16.3. Release 2.0
16.3.1. Breaking changes
16.3.2. Non-breaking changes
IPUPipelineEstimator change
Autosharding deprecated
IPU config change
IPU Keras changes [TensorFlow 2]
17. Deprecated profiling functionality
17.1. Adding an operation to get compilation and execution events
17.1.1. ipu_event_trace()
17.1.2. ipu_compile_summary(name, [op list])
17.2. Enabling tracing in the hardware configuration options
17.3. Extract the reports from the returned events
17.4. Producing reports for use with the PopVision Graph Analyser
17.4.1. COMPILE_BEGIN
17.4.2. COMPILE_END
Tensor map
17.4.3. EXECUTE
17.5. Using the IPU Model device for debugging
17.6. Reading the Poplar textual summary report
17.6.1. Target
17.6.2. Graph
17.6.3. Memory usage
17.7. Producing an ELF image of the compilation
18. Python API
18.1. Operations and utilities related to the Graphcore IPU
18.2. Compiler interface
18.3. Scoping contexts
18.4. Infeed queue
18.5. Outfeed queue
18.6. General utilities
18.7. Configuration utilities
18.8. Looping utilities
18.9. Distributed training
18.10. Horovod
18.11. Datasets
18.11.1. Dataset benchmarking
18.11.2. Dataset wrappers
18.12. Estimators
18.12.1. IPUEstimator
18.12.2. IPUPipelineEstimator
18.12.3. Run configs
18.12.4. Session run hooks
18.13. Keras layers
18.13.1. Keras layer specializations for the Graphcore IPU
18.14. Operators
18.14.1. Custom operations
18.14.2. Functional operators
18.14.3. Image operations
18.14.4. Graphcore utility operations
18.14.5. IPU specific maths operations
18.14.6. Pipelining operators
18.14.7. Popnn primitive neural network operators
18.14.8. Popnn normalization operators
18.14.9. Popnn recurrent neural network operators
18.14.10. Popops all to all and all gather operators
18.14.11. Popops cross replica operators
18.14.12. Popops embedding operators
18.14.13. Popops reduce scatter operator
18.14.14. Poprand operators
18.14.15. Utility operations to be used in replicated mode
18.14.16. Slicing operators
18.14.17. Statistics operators
18.14.18. Summary operations for IPUs
18.15. Optimisers
18.15.1. Optimizer classes for the Graphcore IPU
18.16. Sharding
18.16.1. Utility functions for sharding graphs
19. TensorFlow operators supported by the IPU
20. Resources
20.1. Graphcore
20.2. TensorFlow
20.3. Other
21. Index
22. Trademarks & copyright
Targeting the IPU from TensorFlow 1
Please activate JavaScript to enable the search functionality.