Logo
Targeting the IPU from TensorFlow 2
Version: 3.1.0
  • 1. Introduction
    • 1.1. Document overview
  • 2. Targeting the Poplar XLA device
    • 2.1. Supported types
    • 2.2. Device selection
    • 2.3. Configuring system options
      • 2.3.1. TF_POPLAR_FLAGS environment variable
    • 2.4. Supported operations
    • 2.5. Unsupported operations
    • 2.6. Error Handling
      • 2.6.1. Construction and compilation errors
      • 2.6.2. Runtime errors
  • 3. Support for TensorFlow 2
    • 3.1. IPUStrategy
    • 3.2. Execution modes
      • 3.2.1. Graph mode with @tf.function
      • 3.2.2. Eager mode
    • 3.3. On-device loops
  • 4. Keras with IPUs
  • 5. Compiling and pre-compiling executables
    • 5.1. Caching of compiled executables
    • 5.2. Pre-compiling executables
      • 5.2.1. Unsupported Operations
  • 6. Training a model
    • 6.1. Training loops, data sets and feed queues
    • 6.2. Optional simplification of infeeds and outfeeds
    • 6.3. Accessing outfeed queue results during execution
    • 6.4. Replicated graphs
      • 6.4.1. Selecting the number of replicas
      • 6.4.2. Performing parameter updates
    • 6.5. Pipelined training
      • 6.5.1. Grouped scheduling
      • 6.5.2. Interleaved scheduling
      • 6.5.3. Sequential scheduling
      • 6.5.4. Pipeline stage inputs and outputs
      • 6.5.5. Applying an optimiser to the graph
      • 6.5.6. Device mapping
      • 6.5.7. Concurrent pipeline stages
    • 6.6. Recomputation
    • 6.7. Gradient accumulation
      • 6.7.1. Optimizers
      • 6.7.2. Pipelining
      • 6.7.3. Accumulation data type
    • 6.8. Optimizer state offloading
    • 6.9. Replicated tensor sharding
    • 6.10. Dataset benchmarking
      • 6.10.1. Accessing the JSON data
  • 7. Efficient IPU I/O
    • 7.1. Prefetch elements
    • 7.2. I/O Tiles
    • 7.3. uint8 data
  • 8. Example using IPUEstimator
  • 9. Example using IPUPipelineEstimator
  • 10. Distributed training
    • 10.1. PopDistStrategy examples
    • 10.2. Limitations and known issues
  • 11. Half-precision floating point and stochastic rounding
    • 11.1. Controlling the half-precision floating-point unit
    • 11.2. Resetting the global random number seed
    • 11.3. Debugging numerical issues
  • 12. IPU-optimised operations
    • 12.1. Image operations
    • 12.2. Matmul serialisation
    • 12.3. Dropout
    • 12.4. Embedding lookup
    • 12.5. Group normalisation
    • 12.6. Instance normalisation
    • 12.7. Layer normalisation
    • 12.8. GeLU activation
    • 12.9. Sequence slice
    • 12.10. Histogram
  • 13. IPU Outlined Functions
    • 13.1. Usage
    • 13.2. Examples
      • 13.2.1. Models with common structures
      • 13.2.2. Serializing large operations
  • 14. Writing custom operations
    • 14.1. Custom operation on the IPU
      • 14.1.1. Building the Poplar graph
      • 14.1.2. Gradient builders
      • 14.1.3. Metadata
      • 14.1.4. Compiling the IPU code
        • API level
        • PopLibs library code
        • Compiling the library file
      • 14.1.5. Using the custom op in TensorFlow
      • 14.1.6. Tensor allocation
      • 14.1.7. Examples
        • In-place operations
        • Operation attributes
        • Custom codelet
    • 14.2. Custom host CPU operations
      • 14.2.1. Gradient callback
  • 15. IPU host embeddings
    • 15.1. Usage
    • 15.2. Example
    • 15.3. Experimental functionality: IPU embeddings in remote buffers
      • 15.3.1. Partitioning strategies
        • Token strategy
        • Encoding strategy
        • Choosing a strategy for your application
  • 16. IPU embedded application runtime
    • 16.1. Usage
    • 16.2. Pipelining and I/O tiles
      • 16.2.1. Parallel requests
      • 16.2.2. Timeout
      • 16.2.3. Engine restarts
    • 16.3. Example
    • 16.4. Error Handling
      • 16.4.1. Runtime errors
  • 17. Exporting precompiled models for TensorFlow Serving
    • 17.1. Exporting non-pipelined models defined inside a function
      • 17.1.1. Example of exporting non-pipelined model defined inside a function
      • 17.1.2. Example of exporting non-pipelined model defined inside a function with additional preprocessing and postprocessing steps
    • 17.2. Exporting pipelined models defined as a list of functions
      • 17.2.1. Pipeline example
      • 17.2.2. Pipeline example with preprocessing and postprocessing steps
    • 17.3. Exporting Keras models
    • 17.4. Running the model in TensorFlow Serving
  • 18. Retrieving information about compilation and execution
    • 18.1. TensorFlow options for reporting
    • 18.2. XLA graph file naming
  • 19. Keras with IPUs
    • 19.1. Single IPU models
    • 19.2. Using steps_per_execution
    • 19.3. Gradient accumulation
    • 19.4. Model parallelism
      • 19.4.1. Sequential model
        • Pipelining a model containing nested models
      • 19.4.2. Functional model
        • Pipelining a model you are writing yourself
        • Pipelining an existing functional model
      • 19.4.3. Model subclass
        • Pipelining a model you are writing yourself
        • Pipelining an existing model
      • 19.4.4. Pipelining options
    • 19.5. Automatic data parallelism
    • 19.6. Asynchronous callbacks
    • 19.7. Configuring Infeeds and Outfeed
    • 19.8. Saving and loading Keras models
    • 19.9. Exporting precompiled Keras models for TensorFlow Serving
      • 19.9.1. Non-pipelined Keras model example
      • 19.9.2. Non-pipelined Keras model example with additional preprocessing and postprocessing steps
      • 19.9.3. Pipelined Keras model example
      • 19.9.4. Pipelined Keras model example with additional preprocessing and postprocessing steps
    • 19.10. IPU-specific Keras layers and optimizers
    • 19.11. Implementation details
    • 19.12. Automatic loss scaling
  • 20. IPU TensorFlow Addons
    • 20.1. Introduction
    • 20.2. Keras layers
      • 20.2.1. IPU implementations of standard Keras layers
      • 20.2.2. Layers without upstream equivalents
      • 20.2.3. Code example
    • 20.3. Optimizers
  • 21. TensorFlow API changes
    • 21.1. Release 3.1
      • 21.1.1. Breaking changes
        • Removal of deprecated Keras API
        • Removal of deprecated Horovod API
    • 21.2. Release 3.0
      • 21.2.1. Non-breaking changes
        • Deprecated modules
    • 21.3. Release 2.6
      • 21.3.1. Breaking changes
        • Removal of deprecated APIs
      • 21.3.2. Non-breaking changes
        • IPU Keras changes
    • 21.4. Release 2.5
      • 21.4.1. Breaking changes
        • IPU Keras changes
        • Removal of deprecated APIs
        • Other
      • 21.4.2. Non-breaking changes
        • Deprecated layers
        • Deprecated pipeline and gradient_accumulation options
        • RNN available_memory_proportion_fwd/available_memory_proportion_bwd deprecated
    • 21.5. Release 2.4
      • 21.5.1. Breaking changes
        • Summary ops
        • Removal of deprecated members
      • 21.5.2. Non-breaking changes
    • 21.6. Release 2.3
      • 21.6.1. Breaking changes
        • Custom user op metadata interface updates
        • The verified transfers feature has been removed
      • 21.6.2. Non-breaking changes
    • 21.7. Release 2.2
      • 21.7.1. Breaking changes
        • C++ Poplar TensorFlow libraries are private by default
        • Reports removed from ipu events
        • TensorFlow 2.1 to TensorFlow 2.4 Migration
      • 21.7.2. Non-breaking changes
        • IPULoggingTensorHook replication_factor deprecated
        • IPUInfeedQueue/IPUOutfeedQueue/IPULoggingTensorHook feed_name deprecated
        • Change of output location for profiling information
        • Warning when epsilon value is too low
    • 21.8. Release 2.1
      • 21.8.1. Breaking changes
        • IPUPipelineEstimator change
        • Autosharding removed
        • Old IPU option configuration API changes
        • IPU Keras changes [TensorFlow 2]
      • 21.8.2. Non-breaking changes
        • Recompute suggestions deprecated
        • IPUInfeedQueue/IPUOutfeedQueue replication_factor deprecated
        • IPUInfeedQueue data_to_prefetch deprecated
        • IPUOutfeedQueue data_to_prefetch deprecated
        • CTC loss ops deprecated
        • New configuration API
        • Support for grouped collectives
        • Environment variable changes
    • 21.9. Release 2.0
      • 21.9.1. Breaking changes
      • 21.9.2. Non-breaking changes
        • IPUPipelineEstimator change
        • Autosharding deprecated
        • IPU config change
        • IPU Keras changes [TensorFlow 2]
  • 22. TensorFlow Python API
    • 22.1. Operations and utilities related to the Graphcore IPU
    • 22.2. Distribution strategy for a single system
    • 22.3. Compiler interface
    • 22.4. Scoping contexts
    • 22.5. Infeed queue
    • 22.6. Outfeed queue
    • 22.7. General utilities
    • 22.8. Configuration utilities
    • 22.9. Looping utilities
    • 22.10. Distribution using PopDist
    • 22.11. Serving utilities
    • 22.12. Datasets
      • 22.12.1. Dataset benchmarking
      • 22.12.2. Dataset wrappers
    • 22.13. Estimators
      • 22.13.1. IPUEstimator
      • 22.13.2. IPUPipelineEstimator
      • 22.13.3. Run configs
      • 22.13.4. Session run hooks
    • 22.14. Operators
      • 22.14.1. Control flow operations.
      • 22.14.2. Custom operations
      • 22.14.3. Functional operators
      • 22.14.4. Image operations
      • 22.14.5. Graphcore utility operations
      • 22.14.6. IPU specific maths operations
      • 22.14.7. Pipelining operators
      • 22.14.8. Popnn primitive neural network operators
      • 22.14.9. Popnn normalization operators
      • 22.14.10. Popops all to all and all gather operators
      • 22.14.11. Popops cross replica operators
      • 22.14.12. Popops embedding operators
      • 22.14.13. Popops reduce scatter operator
      • 22.14.14. Popops within replica operators
      • 22.14.15. Poprand operators
      • 22.14.16. Utility operations to be used in replicated mode
      • 22.14.17. Slicing operators
      • 22.14.18. Statistics operators
      • 22.14.19. Embedded application runtime
    • 22.15. Optimisers
      • 22.15.1. Helper classes and methods for gradient accumulation.
      • 22.15.2. Optimizer classes for the Graphcore IPU
    • 22.16. Sharding
      • 22.16.1. Utility functions for sharding graphs
  • 23. TensorFlow operators supported by the IPU
  • 24. Keras API changes
    • 24.1. Release 2.6
  • 25. Keras Python API
    • 25.1. IPU specific Keras integration
    • 25.2. IPU specific Keras extensions
    • 25.3. Keras Optimizer specializations for the Graphcore IPU
  • 26. IPU TensorFlow Addons API changes
    • 26.1. Release 3.0
      • 26.1.1. Breaking changes
    • 26.2. Release 2.5
      • 26.2.1. Non-breaking changes
        • RNN available_memory_proportion_fwd/available_memory_proportion_bwd deprecated
    • 26.3. Release 2.4
  • 27. IPU TensorFlow Addons Python API
    • 27.1. Keras Layers
      • 27.1.1. Keras layers made for IPU TensorFlow
    • 27.2. Keras Optimizers
      • 27.2.1. Keras optimizers made for IPU TensorFlow
    • 27.3. Legacy TensorFlow Layers
      • 27.3.1. TensorFlow layers made for IPU TensorFlow
    • 27.4. Legacy TensorFlow Optimizers
      • 27.4.1. Optimizers made for IPU TensorFlow
  • 28. Resources
    • 28.1. Graphcore
    • 28.2. TensorFlow
    • 28.3. Other
  • 29. Legal notices
Targeting the IPU from TensorFlow 2

29. Legal notices

Graphcloud®, Graphcore®, Poplar® and PopVision® are registered trademarks of Graphcore Ltd.

Bow™, Bow-2000™, Bow Pod™, Colossus™, In-Processor-Memory™, IPU-Core™, IPU-Exchange™, IPU-Fabric™, IPU-Link™, IPU-M2000™, IPU-Machine™, IPU-POD™, IPU-Tile™, PopART™, PopDist™, PopLibs™, PopRun™, PopTorch™, Streaming Memory™ and Virtual-IPU™ are trademarks of Graphcore Ltd.

All other trademarks are the property of their respective owners.

This software is made available under the terms of the Graphcore End User License Agreement (EULA) and the Graphcore Container License Agreement. Please ensure you have read and accept the terms of the corresponding license before using the software. The Graphcore EULA applies unless indicated otherwise.

Copyright © 2016-2020 Graphcore Ltd. All rights reserved.

Previous

Revision f9a7b7fd.