Logo
Targeting the IPU from TensorFlow 1
Version: 2.2.0
  • 1. Introduction
    • 1.1. Document overview
  • 2. Tutorial
    • 2.1. Preliminary graphs
    • 2.2. A basic graph
      • 2.2.1. Selecting hardware to run on
      • 2.2.2. Running on the IPU Model simulator
    • 2.3. Compiling the graph for the IPU
    • 2.4. Sharding a graph
    • 2.5. Adding variables
      • 2.5.1. Troubleshooting
      • 2.5.2. Note on the global_step counter
  • 3. Targeting the Poplar XLA device
    • 3.1. Supported types
    • 3.2. Device selection
    • 3.3. Configuring system options
      • 3.3.1. TF_POPLAR_FLAGS environment variable
    • 3.4. Supported operations
    • 3.5. Unsupported operations
    • 3.6. Error Handling
      • 3.6.1. Construction and compilation errors
      • 3.6.2. Runtime errors
  • 4. Compiling and pre-compiling executables
    • 4.1. Caching of compiled executables
    • 4.2. Pre-compiling executables
      • 4.2.1. Unsupported Operations
  • 5. Training a model
    • 5.1. Training loops, data sets and feed queues
    • 5.2. Accessing outfeed queue results during execution
    • 5.3. Replicated graphs
      • 5.3.1. Selecting the number of replicas
      • 5.3.2. Performing parameter updates
    • 5.4. Pipelined training
      • 5.4.1. Sequential scheduling
      • 5.4.2. Interleaved scheduling
      • 5.4.3. Grouped scheduling
      • 5.4.4. Pipeline stage inputs and outputs
      • 5.4.5. Applying an optimiser to the graph
      • 5.4.6. Device mapping
      • 5.4.7. Concurrent pipeline stages
    • 5.5. Gradient accumulation
      • 5.5.1. Optimizers
      • 5.5.2. Pipelining
      • 5.5.3. Accumulation data type
    • 5.6. Optimizer state offloading
    • 5.7. Dataset benchmarking
      • 5.7.1. Accessing the JSON data
  • 6. Efficient IPU I/O
    • 6.1. Prefetch elements
    • 6.2. I/O Tiles
  • 7. Example using IPUEstimator
  • 8. Example using IPUPipelineEstimator
  • 9. Distributed training
    • 9.1. Example using IPUMultiWorkerStrategy
      • 9.1.1. The input function
      • 9.1.2. The model function
      • 9.1.3. Cluster definition
      • 9.1.4. Complete example
    • 9.2. Distributed training with Horovod
    • 9.3. Launching Horovod training
    • 9.4. Complete Horovod example
  • 10. Half-precision floating point and stochastic rounding
    • 10.1. Controlling the half-precision floating-point unit
    • 10.2. Resetting the global random number seed
    • 10.3. Debugging numerical issues
  • 11. IPU-optimised operations
    • 11.1. LSTM and GRU
    • 11.2. Dropout
    • 11.3. Embedding lookup
    • 11.4. Group normalisation
    • 11.5. Instance normalisation
    • 11.6. Layer normalisation
    • 11.7. GeLU activation
    • 11.8. Sequence slice
    • 11.9. Histogram
  • 12. IPU Outlined Functions
    • 12.1. Usage
    • 12.2. Examples
      • 12.2.1. Models with common structures
      • 12.2.2. Serializing large operations
  • 13. Writing custom operations
    • 13.1. Custom operation on the IPU
      • 13.1.1. Building the Poplar graph
      • 13.1.2. Gradient builders
      • 13.1.3. Metadata
      • 13.1.4. Compiling the IPU code
        • API level
        • PopLibs library code
        • Compiling the library file
      • 13.1.5. Using the custom op in TensorFlow
      • 13.1.6. Tensor allocation
      • 13.1.7. Examples
        • In-place operations
        • Operation attributes
        • Custom codelet
    • 13.2. Custom host CPU operations
      • 13.2.1. Gradient callback
  • 14. IPU host embeddings
    • 14.1. Usage
    • 14.2. Example
    • 14.3. Experimental functionality: IPU embeddings in remote buffers
      • 14.3.1. Partitioning strategies
        • Token strategy
        • Encoding strategy
        • Choosing a strategy for your application
  • 15. Retrieving information about compilation and execution
    • 15.1. TensorFlow options for reporting
    • 15.2. Dumping auxiliary Poplar information
      • 15.2.1. Poplar vertex graph
      • 15.2.2. Poplar interval report
    • 15.3. XLA graph file naming
  • 16. API changes
    • 16.1. Release 2.2
      • 16.1.1. Breaking changes
        • C++ Poplar TensorFlow libraries are private by default
        • Reports removed from ipu events
      • 16.1.2. Non-breaking changes
        • IPULoggingTensorHook replication_factor deprecated
        • IPUInfeedQueue/IPUOutfeedQueue/IPULoggingTensorHook feed_name deprecated
        • Change of output location for profiling information
        • IPU Keras Layers deprecation in TensorFlow 1.15
    • 16.2. Release 2.1
      • 16.2.1. Breaking changes
        • IPUPipelineEstimator change
        • Autosharding removed
        • Old IPU option configuration API changes
        • IPU Keras changes [TensorFlow 2]
      • 16.2.2. Non-breaking changes
        • Recompute suggestions deprecated
        • IPUInfeedQueue/IPUOutfeedQueue replication_factor deprecated
        • IPUInfeedQueue data_to_prefetch deprecated
        • IPUOutfeedQueue data_to_prefetch deprecated
        • CTC loss ops deprecated
        • New configuration API
        • Support for grouped collectives
        • Environment variable changes
    • 16.3. Release 2.0
      • 16.3.1. Breaking changes
      • 16.3.2. Non-breaking changes
        • IPUPipelineEstimator change
        • Autosharding deprecated
        • IPU config change
        • IPU Keras changes [TensorFlow 2]
  • 17. Deprecated profiling functionality
    • 17.1. Adding an operation to get compilation and execution events
      • 17.1.1. ipu_event_trace()
      • 17.1.2. ipu_compile_summary(name, [op list])
    • 17.2. Enabling tracing in the hardware configuration options
    • 17.3. Extract the reports from the returned events
    • 17.4. Producing reports for use with the PopVision Graph Analyser
      • 17.4.1. COMPILE_BEGIN
      • 17.4.2. COMPILE_END
        • Tensor map
      • 17.4.3. EXECUTE
    • 17.5. Using the IPU Model device for debugging
    • 17.6. Reading the Poplar textual summary report
      • 17.6.1. Target
      • 17.6.2. Graph
      • 17.6.3. Memory usage
    • 17.7. Producing an ELF image of the compilation
  • 18. Python API
    • 18.1. Operations and utilities related to the Graphcore IPU
    • 18.2. Compiler interface
    • 18.3. Scoping contexts
    • 18.4. Infeed queue
    • 18.5. Outfeed queue
    • 18.6. General utilities
    • 18.7. Configuration utilities
    • 18.8. Looping utilities
    • 18.9. Distributed training
    • 18.10. Horovod
    • 18.11. Datasets
      • 18.11.1. Dataset benchmarking
      • 18.11.2. Dataset wrappers
    • 18.12. Estimators
      • 18.12.1. IPUEstimator
      • 18.12.2. IPUPipelineEstimator
      • 18.12.3. Run configs
      • 18.12.4. Session run hooks
    • 18.13. Keras layers
      • 18.13.1. Keras layer specializations for the Graphcore IPU
    • 18.14. Operators
      • 18.14.1. Custom operations
      • 18.14.2. Functional operators
      • 18.14.3. Image operations
      • 18.14.4. Graphcore utility operations
      • 18.14.5. IPU specific maths operations
      • 18.14.6. Pipelining operators
      • 18.14.7. Popnn primitive neural network operators
      • 18.14.8. Popnn normalization operators
      • 18.14.9. Popnn recurrent neural network operators
      • 18.14.10. Popops all to all and all gather operators
      • 18.14.11. Popops cross replica operators
      • 18.14.12. Popops embedding operators
      • 18.14.13. Popops reduce scatter operator
      • 18.14.14. Poprand operators
      • 18.14.15. Utility operations to be used in replicated mode
      • 18.14.16. Slicing operators
      • 18.14.17. Statistics operators
      • 18.14.18. Summary operations for IPUs
    • 18.15. Optimisers
      • 18.15.1. Optimizer classes for the Graphcore IPU
    • 18.16. Sharding
      • 18.16.1. Utility functions for sharding graphs
  • 19. TensorFlow operators supported by the IPU
  • 20. Resources
    • 20.1. Graphcore
    • 20.2. TensorFlow
    • 20.3. Other
  • 21. Index
  • 22. Trademarks & copyright
Targeting the IPU from TensorFlow 1


Revision c43e948f.