Targeting the IPU from TensorFlow 1
Version: 3.1.0
1. Introduction
1.1. Document overview
2. Tutorial
2.1. Preliminary graphs
2.2. A basic graph
2.2.1. Selecting hardware to run on
2.2.2. Running on the IPU Model simulator
2.3. Compiling the graph for the IPU
2.4. Sharding a graph
2.5. Adding variables
2.5.1. Troubleshooting
2.5.2. Note on the global_step counter
3. Targeting the Poplar XLA device
3.1. Supported types
3.2. Device selection
3.3. Configuring system options
3.3.1. TF_POPLAR_FLAGS environment variable
3.4. Supported operations
3.5. Unsupported operations
3.6. Error Handling
3.6.1. Construction and compilation errors
3.6.2. Runtime errors
4. Compiling and pre-compiling executables
4.1. Caching of compiled executables
4.2. Pre-compiling executables
4.2.1. Unsupported Operations
5. Training a model
5.1. Training loops, data sets and feed queues
5.2. Accessing outfeed queue results during execution
5.3. Replicated graphs
5.3.1. Selecting the number of replicas
5.3.2. Performing parameter updates
5.4. Pipelined training
5.4.1. Grouped scheduling
5.4.2. Interleaved scheduling
5.4.3. Sequential scheduling
5.4.4. Pipeline stage inputs and outputs
5.4.5. Applying an optimiser to the graph
5.4.6. Device mapping
5.4.7. Concurrent pipeline stages
5.5. Gradient accumulation
5.5.1. Optimizers
5.5.2. Pipelining
5.5.3. Accumulation data type
5.6. Optimizer state offloading
5.7. Dataset benchmarking
5.7.1. Accessing the JSON data
5.8. Half and mixed precision training
6. Efficient IPU I/O
6.1. Prefetch elements
6.2. I/O Tiles
7. Example using IPUEstimator
8. Example using IPUPipelineEstimator
9. Distributed training
9.1. PopDistStrategy examples
10. Half-precision floating point and stochastic rounding
10.1. Controlling the half-precision floating-point unit
10.2. Resetting the global random number seed
10.3. Debugging numerical issues
11. IPU-optimised operations
11.1. Image operations
11.2. Matmul serialisation
11.3. Dropout
11.4. Embedding lookup
11.5. Group normalisation
11.6. Instance normalisation
11.7. Layer normalisation
11.8. GeLU activation
11.9. Sequence slice
11.10. Histogram
12. IPU Outlined Functions
12.1. Usage
12.2. Examples
12.2.1. Models with common structures
12.2.2. Serializing large operations
13. Writing custom operations
13.1. Custom operation on the IPU
13.1.1. Building the Poplar graph
13.1.2. Gradient builders
13.1.3. Metadata
13.1.4. Compiling the IPU code
API level
PopLibs library code
Compiling the library file
13.1.5. Using the custom op in TensorFlow
13.1.6. Tensor allocation
13.1.7. Examples
In-place operations
Operation attributes
Custom codelet
13.2. Custom host CPU operations
13.2.1. Gradient callback
14. IPU host embeddings
14.1. Usage
14.2. Example
14.3. Experimental functionality: IPU embeddings in remote buffers
14.3.1. Partitioning strategies
Token strategy
Encoding strategy
Choosing a strategy for your application
15. IPU embedded application runtime
15.1. Usage
15.2. Pipelining and I/O tiles
15.2.1. Parallel requests
15.2.2. Timeout
15.2.3. Engine restarts
15.3. Example
15.4. Error Handling
15.4.1. Runtime errors
16. Exporting precompiled models for TensorFlow Serving
16.1. Exporting non-pipelined models defined inside a function
16.1.1. Example of exporting non-pipelined model defined inside a function
16.1.2. Example of exporting non-pipelined model defined inside a function with additional preprocessing and postprocessing steps
16.2. Exporting pipelined models defined as a list of functions
16.2.1. Pipeline example
16.2.2. Pipeline example with preprocessing and postprocessing steps
16.3. Running the model in TensorFlow Serving
17. Retrieving information about compilation and execution
17.1. TensorFlow options for reporting
17.2. XLA graph file naming
18. IPU TensorFlow Addons
18.1. Introduction
18.2. IPU SavedModel CLI
18.2.1. Run subcommand
18.2.2. Convert subcommand
18.2.3. Pipeline configuration
18.2.4. Pipeline development
18.2.5. Pipeline solution file
18.2.6. Example configuration file
19. TensorFlow API changes
19.1. Release 3.0
19.1.1. Non-breaking changes
Deprecated modules
19.2. Release 2.6
19.2.1. Breaking changes
Removal of deprecated APIs
19.3. Release 2.5
19.3.1. Breaking changes
Removal of deprecated APIs
Other
19.3.2. Non-breaking changes
Deprecated layers
RNN available_memory_proportion_fwd/available_memory_proportion_bwd deprecated
19.4. Release 2.4
19.4.1. Breaking changes
Summary ops
Removal of deprecated members
19.4.2. Non-breaking changes
19.5. Release 2.3
19.5.1. Breaking changes
Custom user op metadata interface updates
The verified transfers feature has been removed
19.5.2. Non-breaking changes
19.6. Release 2.2
19.6.1. Breaking changes
C++ Poplar TensorFlow libraries are private by default
Reports removed from ipu events
19.6.2. Non-breaking changes
IPULoggingTensorHook replication_factor deprecated
IPUInfeedQueue/IPUOutfeedQueue/IPULoggingTensorHook feed_name deprecated
Change of output location for profiling information
IPU Keras Layers deprecation in TensorFlow 1.15
Warning when epsilon value is too low
19.7. Release 2.1
19.7.1. Breaking changes
IPUPipelineEstimator change
Autosharding removed
Old IPU option configuration API changes
IPU Keras changes [TensorFlow 2]
19.7.2. Non-breaking changes
Recompute suggestions deprecated
IPUInfeedQueue/IPUOutfeedQueue replication_factor deprecated
IPUInfeedQueue data_to_prefetch deprecated
IPUOutfeedQueue data_to_prefetch deprecated
CTC loss ops deprecated
New configuration API
Support for grouped collectives
Environment variable changes
19.8. Release 2.0
19.8.1. Breaking changes
19.8.2. Non-breaking changes
IPUPipelineEstimator change
Autosharding deprecated
IPU config change
IPU Keras changes [TensorFlow 2]
20. TensorFlow Python API
20.1. Operations and utilities related to the Graphcore IPU
20.2. Compiler interface
20.3. Scoping contexts
20.4. Infeed queue
20.5. Outfeed queue
20.6. General utilities
20.7. Configuration utilities
20.8. Looping utilities
20.9. Distributed training
20.10. Horovod
20.11. Serving utilities
20.12. Datasets
20.12.1. Dataset benchmarking
20.12.2. Dataset wrappers
20.13. Estimators
20.13.1. IPUEstimator
20.13.2. IPUPipelineEstimator
20.13.3. Run configs
20.13.4. Session run hooks
20.14. Keras layers
20.14.1. Keras layer specializations for the Graphcore IPU
20.15. Operators
20.15.1. Control flow operations.
20.15.2. Custom operations
20.15.3. Functional operators
20.15.4. Image operations
20.15.5. Graphcore utility operations
20.15.6. IPU specific maths operations
20.15.7. Pipelining operators
20.15.8. Popnn primitive neural network operators
20.15.9. Popnn normalization operators
20.15.10. Popops all to all and all gather operators
20.15.11. Popops cross replica operators
20.15.12. Popops embedding operators
20.15.13. Popops reduce scatter operator
20.15.14. Popops within replica operators
20.15.15. Poprand operators
20.15.16. Utility operations to be used in replicated mode
20.15.17. Slicing operators
20.15.18. Statistics operators
20.15.19. Embedded application runtime
20.16. Optimisers
20.16.1. Helper classes and methods for gradient accumulation.
20.16.2. Optimizer classes for the Graphcore IPU
20.17. Sharding
20.17.1. Utility functions for sharding graphs
21. TensorFlow operators supported by the IPU
22. IPU TensorFlow Addons API changes
22.1. Release 3.0
22.1.1. Breaking changes
22.2. Release 2.5
22.2.1. Non-breaking changes
RNN available_memory_proportion_fwd/available_memory_proportion_bwd deprecated
22.3. Release 2.4
23. IPU TensorFlow Addons Python API
23.1. TensorFlow layers
23.1.1. TensorFlow layers made for IPU TensorFlow
23.2. TensorFlow optimizers
23.2.1. Optimizers made for IPU TensorFlow
24. Resources
24.1. Graphcore
24.2. TensorFlow
24.3. Other
25. Trademarks & copyright
Targeting the IPU from TensorFlow 1
»
Search
Please activate JavaScript to enable the search functionality.