Targeting the IPU from TensorFlow 1

1. Introduction
- 1.1. Document overview
2. Tutorial
- 2.1. Preliminary graphs
- 2.2. A basic graph
  - 2.2.1. Selecting hardware to run on
  - 2.2.2. Running on the IPU Model simulator
- 2.3. Compiling the graph for the IPU
- 2.4. Sharding a graph
- 2.5. Adding variables
  - 2.5.1. Troubleshooting
  - 2.5.2. Note on the global_step counter
3. Targeting the Poplar XLA device
- 3.1. Supported types
- 3.2. Device selection
- 3.3. Configuring system options
  - 3.3.1. TF_POPLAR_FLAGS environment variable
- 3.4. Supported operations
- 3.5. Unsupported operations
- 3.6. Error Handling
  - 3.6.1. Construction and compilation errors
  - 3.6.2. Runtime errors
4. Compiling and pre-compiling executables
- 4.1. Caching of compiled executables
- 4.2. Pre-compiling executables
  - 4.2.1. Unsupported Operations
5. Training a model
- 5.1. Training loops, data sets and feed queues
- 5.2. Accessing outfeed queue results during execution
- 5.3. Replicated graphs
  - 5.3.1. Selecting the number of replicas
  - 5.3.2. Performing parameter updates
- 5.4. Pipelined training
- 5.5. Gradient accumulation
- 5.6. Optimizer state offloading
- 5.7. Dataset benchmarking
  - 5.7.1. Accessing the JSON data
6. Efficient IPU I/O
- 6.1. Prefetch elements
- 6.2. I/O Tiles
7. Example using IPUEstimator
8. Example using IPUPipelineEstimator
9. Distributed training
- 9.1. Example using IPUMultiWorkerStrategy
- 9.2. Distributed training with Horovod
- 9.3. Launching Horovod training
- 9.4. Complete Horovod example
10. Half-precision floating point and stochastic rounding
- 10.1. Controlling the half-precision floating-point unit
- 10.2. Resetting the global random number seed
- 10.3. Debugging numerical issues
11. IPU-optimised operations
- 11.1. LSTM and GRU
- 11.2. Dropout
- 11.3. Embedding lookup
- 11.4. Group normalisation
- 11.5. Instance normalisation
- 11.6. Layer normalisation
- 11.7. GeLU activation
- 11.8. Sequence slice
- 11.9. Histogram
12. IPU Outlined Functions
- 12.1. Usage
- 12.2. Examples
  - 12.2.1. Models with common structures
  - 12.2.2. Serializing large operations
13. Writing custom operations
- 13.1. Custom operation on the IPU
- 13.2. Custom host CPU operations
  - 13.2.1. Gradient callback
14. IPU host embeddings
- 14.1. Usage
- 14.2. Example
- 14.3. Experimental functionality: IPU embeddings in remote buffers
  - 14.3.1. Partitioning strategies
15. Retrieving information about compilation and execution
- 15.1. TensorFlow options for reporting
- 15.2. Dumping auxiliary Poplar information
  - 15.2.1. Poplar vertex graph
  - 15.2.2. Poplar interval report
- 15.3. XLA graph file naming
16. API changes
- 16.1. Release 2.2
  - 16.1.1. Breaking changes
    - C++ Poplar TensorFlow libraries are private by default
    - Reports removed from ipu events
  - 16.1.2. Non-breaking changes
- 16.2. Release 2.1
  - 16.2.1. Breaking changes
  - 16.2.2. Non-breaking changes
- 16.3. Release 2.0
  - 16.3.1. Breaking changes
  - 16.3.2. Non-breaking changes
17. Deprecated profiling functionality
- 17.1. Adding an operation to get compilation and execution events
  - 17.1.1. ipu_event_trace()
  - 17.1.2. ipu_compile_summary(name, [op list])
- 17.2. Enabling tracing in the hardware configuration options
- 17.3. Extract the reports from the returned events
- 17.4. Producing reports for use with the PopVision Graph Analyser
- 17.5. Using the IPU Model device for debugging
- 17.6. Reading the Poplar textual summary report
- 17.7. Producing an ELF image of the compilation
18. Python API
- 18.1. Operations and utilities related to the Graphcore IPU
- 18.2. Compiler interface
- 18.3. Scoping contexts
- 18.4. Infeed queue
- 18.5. Outfeed queue
- 18.6. General utilities
- 18.7. Configuration utilities
- 18.8. Looping utilities
- 18.9. Distributed training
- 18.10. Horovod
- 18.11. Datasets
  - 18.11.1. Dataset benchmarking
  - 18.11.2. Dataset wrappers
- 18.12. Estimators
- 18.13. Keras layers
  - 18.13.1. Keras layer specializations for the Graphcore IPU
- 18.14. Operators
- 18.15. Optimisers
  - 18.15.1. Optimizer classes for the Graphcore IPU
- 18.16. Sharding
  - 18.16.1. Utility functions for sharding graphs
19. TensorFlow operators supported by the IPU
20. Resources
- 20.1. Graphcore
- 20.2. TensorFlow
- 20.3. Other
21. Index
22. Trademarks & copyright