Logo
Targeting the IPU from TensorFlow 2
Version: 2.2.0
  • 1. Introduction
    • 1.1. Document overview
  • 2. Targeting the Poplar XLA device
    • 2.1. Supported types
    • 2.2. Device selection
    • 2.3. Configuring system options
      • 2.3.1. TF_POPLAR_FLAGS environment variable
    • 2.4. Supported operations
    • 2.5. Unsupported operations
    • 2.6. Error Handling
      • 2.6.1. Construction and compilation errors
      • 2.6.2. Runtime errors
  • 3. Support for TensorFlow 2
    • 3.1. IPUStrategy
    • 3.2. Execution modes
      • 3.2.1. Graph mode with @tf.function
      • 3.2.2. Eager mode
    • 3.3. On-device loops
  • 4. Keras with IPUs
    • 4.1. Single IPU models
    • 4.2. Using steps_per_execution
    • 4.3. Gradient accumulation
    • 4.4. Model parallelism
      • 4.4.1. Sequential model
      • 4.4.2. Functional model
        • Pipelining a model you are writing yourself
        • Pipelining an existing functional model
    • 4.5. Automatic data parallelism
    • 4.6. Asynchronous callbacks
    • 4.7. Porting models from TensorFlow 2.1
      • 4.7.1. TF2.1
      • 4.7.2. TF2.4
    • 4.8. Implementation details
  • 5. Compiling and pre-compiling executables
    • 5.1. Caching of compiled executables
    • 5.2. Pre-compiling executables
      • 5.2.1. Unsupported Operations
  • 6. Training a model
    • 6.1. Training loops, data sets and feed queues
    • 6.2. Accessing outfeed queue results during execution
    • 6.3. Replicated graphs
      • 6.3.1. Selecting the number of replicas
      • 6.3.2. Performing parameter updates
    • 6.4. Pipelined training
      • 6.4.1. Sequential scheduling
      • 6.4.2. Interleaved scheduling
      • 6.4.3. Grouped scheduling
      • 6.4.4. Pipeline stage inputs and outputs
      • 6.4.5. Applying an optimiser to the graph
      • 6.4.6. Device mapping
      • 6.4.7. Concurrent pipeline stages
    • 6.5. Gradient accumulation
      • 6.5.1. Optimizers
      • 6.5.2. Pipelining
      • 6.5.3. Accumulation data type
    • 6.6. Optimizer state offloading
    • 6.7. Dataset benchmarking
      • 6.7.1. Accessing the JSON data
  • 7. Efficient IPU I/O
    • 7.1. Prefetch elements
    • 7.2. I/O Tiles
  • 8. Example using IPUEstimator
  • 9. Example using IPUPipelineEstimator
  • 10. Distributed training
    • 10.1. Example using IPUMultiWorkerStrategy
      • 10.1.1. The input function
      • 10.1.2. The model function
      • 10.1.3. Cluster definition
      • 10.1.4. Complete example
    • 10.2. Distributed training with Horovod
    • 10.3. Launching Horovod training
    • 10.4. Complete Horovod example
  • 11. Half-precision floating point and stochastic rounding
    • 11.1. Controlling the half-precision floating-point unit
    • 11.2. Resetting the global random number seed
    • 11.3. Debugging numerical issues
  • 12. IPU-optimised operations
    • 12.1. LSTM and GRU
    • 12.2. Dropout
    • 12.3. Embedding lookup
    • 12.4. Group normalisation
    • 12.5. Instance normalisation
    • 12.6. Layer normalisation
    • 12.7. GeLU activation
    • 12.8. Sequence slice
    • 12.9. Histogram
  • 13. IPU Outlined Functions
    • 13.1. Usage
    • 13.2. Examples
      • 13.2.1. Models with common structures
      • 13.2.2. Serializing large operations
  • 14. Writing custom operations
    • 14.1. Custom operation on the IPU
      • 14.1.1. Building the Poplar graph
      • 14.1.2. Gradient builders
      • 14.1.3. Metadata
      • 14.1.4. Compiling the IPU code
        • API level
        • PopLibs library code
        • Compiling the library file
      • 14.1.5. Using the custom op in TensorFlow
      • 14.1.6. Tensor allocation
      • 14.1.7. Examples
        • In-place operations
        • Operation attributes
        • Custom codelet
    • 14.2. Custom host CPU operations
      • 14.2.1. Gradient callback
  • 15. IPU host embeddings
    • 15.1. Usage
    • 15.2. Example
    • 15.3. Experimental functionality: IPU embeddings in remote buffers
      • 15.3.1. Partitioning strategies
        • Token strategy
        • Encoding strategy
        • Choosing a strategy for your application
  • 16. Retrieving information about compilation and execution
    • 16.1. TensorFlow options for reporting
    • 16.2. Dumping auxiliary Poplar information
      • 16.2.1. Poplar vertex graph
      • 16.2.2. Poplar interval report
    • 16.3. XLA graph file naming
  • 17. API changes
    • 17.1. Release 2.2
      • 17.1.1. Breaking changes
        • C++ Poplar TensorFlow libraries are private by default
        • Reports removed from ipu events
        • TensorFlow 2.1 to TensorFlow 2.4 Migration
      • 17.1.2. Non-breaking changes
        • IPULoggingTensorHook replication_factor deprecated
        • IPUInfeedQueue/IPUOutfeedQueue/IPULoggingTensorHook feed_name deprecated
        • Change of output location for profiling information
    • 17.2. Release 2.1
      • 17.2.1. Breaking changes
        • IPUPipelineEstimator change
        • Autosharding removed
        • IPU config change
        • IPU Keras changes [TensorFlow 2]
      • 17.2.2. Non-breaking changes
        • Recompute suggestions deprecated
        • IPUInfeedQueue/IPUOutfeedQueue replication_factor deprecated
        • IPUInfeedQueue data_to_prefetch deprecated
        • IPUOutfeedQueue data_to_prefetch deprecated
        • CTC loss ops deprecated
        • New configuration API
        • Support for grouped collectives
        • Environment variable changes
    • 17.3. Release 2.0
      • 17.3.1. Breaking changes
      • 17.3.2. Non-breaking changes
        • IPUPipelineEstimator change
        • Autosharding deprecated
        • IPU config change
        • IPU Keras changes [TensorFlow 2]
  • 18. Python API
    • 18.1. Operations and utilities related to the Graphcore IPU
    • 18.2. Distribution strategy for a single system
    • 18.3. Compiler interface
    • 18.4. Scoping contexts
    • 18.5. Infeed queue
    • 18.6. Outfeed queue
    • 18.7. General utilities
    • 18.8. Configuration utilities
    • 18.9. Looping utilities
    • 18.10. Distributed training
    • 18.11. Horovod
    • 18.12. Datasets
      • 18.12.1. Dataset benchmarking
      • 18.12.2. Dataset wrappers
    • 18.13. Estimators
      • 18.13.1. IPUEstimator
      • 18.13.2. IPUPipelineEstimator
      • 18.13.3. Run configs
      • 18.13.4. Session run hooks
    • 18.14. Keras
      • 18.14.1. IPU specific Keras extensions
    • 18.15. Keras layers
      • 18.15.1. Keras layer specializations for the Graphcore IPU
    • 18.16. Keras losses
      • 18.16.1. Keras loss functions for the Graphcore IPU
    • 18.17. Keras optimizers
      • 18.17.1. Keras Optimizer wrappers for the Graphcore IPU
    • 18.18. Operators
      • 18.18.1. Custom operations
      • 18.18.2. Functional operators
      • 18.18.3. Image operations
      • 18.18.4. Graphcore utility operations
      • 18.18.5. IPU specific maths operations
      • 18.18.6. Pipelining operators
      • 18.18.7. Popnn primitive neural network operators
      • 18.18.8. Popnn normalization operators
      • 18.18.9. Popnn recurrent neural network operators
      • 18.18.10. Popops all to all and all gather operators
      • 18.18.11. Popops cross replica operators
      • 18.18.12. Popops embedding operators
      • 18.18.13. Popops reduce scatter operator
      • 18.18.14. Poprand operators
      • 18.18.15. Utility operations to be used in replicated mode
      • 18.18.16. Slicing operators
      • 18.18.17. Statistics operators
      • 18.18.18. Summary operations for IPUs
    • 18.19. Optimisers
      • 18.19.1. Optimizer classes for the Graphcore IPU
    • 18.20. Sharding
      • 18.20.1. Utility functions for sharding graphs
  • 19. TensorFlow operators supported by the IPU
  • 20. Resources
    • 20.1. Graphcore
    • 20.2. TensorFlow
    • 20.3. Other
  • 21. Index
  • 22. Trademarks & copyright
Targeting the IPU from TensorFlow 2

22. Trademarks & copyright

Graphcore® and Poplar® are registered trademarks of Graphcore Ltd.

AI-Float™, Colossus™, Exchange Memory™, Graphcloud™, In-Processor-Memory™, IPU-Core™, IPU-Exchange™, IPU-Fabric™, IPU-Link™, IPU-M2000™, IPU-Machine™, IPU-POD™, IPU-Tile™, PopART™, PopLibs™, PopVision™, PopTorch™, Streaming Memory™ and Virtual-IPU™ are trademarks of Graphcore Ltd.

All other trademarks are the property of their respective owners.

Copyright © 2016-2020 Graphcore Ltd. All rights reserved.

This software is made available under the terms of the Graphcore End User License Agreement (EULA). Please ensure you have read and accept the terms of the license before using the software.

Previous

Revision 4c70543b.