Logo
Targeting the IPU from TensorFlow 1
Version: 2.2.0
  • 1. Introduction
    • 1.1. Document overview
  • 2. Tutorial
    • 2.1. Preliminary graphs
    • 2.2. A basic graph
      • 2.2.1. Selecting hardware to run on
      • 2.2.2. Running on the IPU Model simulator
    • 2.3. Compiling the graph for the IPU
    • 2.4. Sharding a graph
    • 2.5. Adding variables
      • 2.5.1. Troubleshooting
      • 2.5.2. Note on the global_step counter
  • 3. Targeting the Poplar XLA device
    • 3.1. Supported types
    • 3.2. Device selection
    • 3.3. Configuring system options
      • 3.3.1. TF_POPLAR_FLAGS environment variable
    • 3.4. Supported operations
    • 3.5. Unsupported operations
    • 3.6. Error Handling
      • 3.6.1. Construction and compilation errors
      • 3.6.2. Runtime errors
  • 4. Compiling and pre-compiling executables
    • 4.1. Caching of compiled executables
    • 4.2. Pre-compiling executables
      • 4.2.1. Unsupported Operations
  • 5. Training a model
    • 5.1. Training loops, data sets and feed queues
    • 5.2. Accessing outfeed queue results during execution
    • 5.3. Replicated graphs
      • 5.3.1. Selecting the number of replicas
      • 5.3.2. Performing parameter updates
    • 5.4. Pipelined training
      • 5.4.1. Sequential scheduling
      • 5.4.2. Interleaved scheduling
      • 5.4.3. Grouped scheduling
      • 5.4.4. Pipeline stage inputs and outputs
      • 5.4.5. Applying an optimiser to the graph
      • 5.4.6. Device mapping
      • 5.4.7. Concurrent pipeline stages
    • 5.5. Gradient accumulation
      • 5.5.1. Optimizers
      • 5.5.2. Pipelining
      • 5.5.3. Accumulation data type
    • 5.6. Optimizer state offloading
    • 5.7. Dataset benchmarking
      • 5.7.1. Accessing the JSON data
  • 6. Efficient IPU I/O
    • 6.1. Prefetch elements
    • 6.2. I/O Tiles
  • 7. Example using IPUEstimator
  • 8. Example using IPUPipelineEstimator
  • 9. Distributed training
    • 9.1. Example using IPUMultiWorkerStrategy
      • 9.1.1. The input function
      • 9.1.2. The model function
      • 9.1.3. Cluster definition
      • 9.1.4. Complete example
    • 9.2. Distributed training with Horovod
    • 9.3. Launching Horovod training
    • 9.4. Complete Horovod example
  • 10. Half-precision floating point and stochastic rounding
    • 10.1. Controlling the half-precision floating-point unit
    • 10.2. Resetting the global random number seed
    • 10.3. Debugging numerical issues
  • 11. IPU-optimised operations
    • 11.1. LSTM and GRU
    • 11.2. Dropout
    • 11.3. Embedding lookup
    • 11.4. Group normalisation
    • 11.5. Instance normalisation
    • 11.6. Layer normalisation
    • 11.7. GeLU activation
    • 11.8. Sequence slice
    • 11.9. Histogram
  • 12. IPU Outlined Functions
    • 12.1. Usage
    • 12.2. Examples
      • 12.2.1. Models with common structures
      • 12.2.2. Serializing large operations
  • 13. Writing custom operations
    • 13.1. Custom operation on the IPU
      • 13.1.1. Building the Poplar graph
      • 13.1.2. Gradient builders
      • 13.1.3. Metadata
      • 13.1.4. Compiling the IPU code
        • API level
        • PopLibs library code
        • Compiling the library file
      • 13.1.5. Using the custom op in TensorFlow
      • 13.1.6. Tensor allocation
      • 13.1.7. Examples
        • In-place operations
        • Operation attributes
        • Custom codelet
    • 13.2. Custom host CPU operations
      • 13.2.1. Gradient callback
  • 14. IPU host embeddings
    • 14.1. Usage
    • 14.2. Example
    • 14.3. Experimental functionality: IPU embeddings in remote buffers
      • 14.3.1. Partitioning strategies
        • Token strategy
        • Encoding strategy
        • Choosing a strategy for your application
  • 15. Retrieving information about compilation and execution
    • 15.1. TensorFlow options for reporting
    • 15.2. Dumping auxiliary Poplar information
      • 15.2.1. Poplar vertex graph
      • 15.2.2. Poplar interval report
    • 15.3. XLA graph file naming
  • 16. API changes
    • 16.1. Release 2.2
      • 16.1.1. Breaking changes
        • C++ Poplar TensorFlow libraries are private by default
        • Reports removed from ipu events
      • 16.1.2. Non-breaking changes
        • IPULoggingTensorHook replication_factor deprecated
        • IPUInfeedQueue/IPUOutfeedQueue/IPULoggingTensorHook feed_name deprecated
        • Change of output location for profiling information
        • IPU Keras Layers deprecation in TensorFlow 1.15
    • 16.2. Release 2.1
      • 16.2.1. Breaking changes
        • IPUPipelineEstimator change
        • Autosharding removed
        • Old IPU option configuration API changes
        • IPU Keras changes [TensorFlow 2]
      • 16.2.2. Non-breaking changes
        • Recompute suggestions deprecated
        • IPUInfeedQueue/IPUOutfeedQueue replication_factor deprecated
        • IPUInfeedQueue data_to_prefetch deprecated
        • IPUOutfeedQueue data_to_prefetch deprecated
        • CTC loss ops deprecated
        • New configuration API
        • Support for grouped collectives
        • Environment variable changes
    • 16.3. Release 2.0
      • 16.3.1. Breaking changes
      • 16.3.2. Non-breaking changes
        • IPUPipelineEstimator change
        • Autosharding deprecated
        • IPU config change
        • IPU Keras changes [TensorFlow 2]
  • 17. Deprecated profiling functionality
    • 17.1. Adding an operation to get compilation and execution events
      • 17.1.1. ipu_event_trace()
      • 17.1.2. ipu_compile_summary(name, [op list])
    • 17.2. Enabling tracing in the hardware configuration options
    • 17.3. Extract the reports from the returned events
    • 17.4. Producing reports for use with the PopVision Graph Analyser
      • 17.4.1. COMPILE_BEGIN
      • 17.4.2. COMPILE_END
        • Tensor map
      • 17.4.3. EXECUTE
    • 17.5. Using the IPU Model device for debugging
    • 17.6. Reading the Poplar textual summary report
      • 17.6.1. Target
      • 17.6.2. Graph
      • 17.6.3. Memory usage
    • 17.7. Producing an ELF image of the compilation
  • 18. Python API
    • 18.1. Operations and utilities related to the Graphcore IPU
    • 18.2. Compiler interface
    • 18.3. Scoping contexts
    • 18.4. Infeed queue
    • 18.5. Outfeed queue
    • 18.6. General utilities
    • 18.7. Configuration utilities
    • 18.8. Looping utilities
    • 18.9. Distributed training
    • 18.10. Horovod
    • 18.11. Datasets
      • 18.11.1. Dataset benchmarking
      • 18.11.2. Dataset wrappers
    • 18.12. Estimators
      • 18.12.1. IPUEstimator
      • 18.12.2. IPUPipelineEstimator
      • 18.12.3. Run configs
      • 18.12.4. Session run hooks
    • 18.13. Keras layers
      • 18.13.1. Keras layer specializations for the Graphcore IPU
    • 18.14. Operators
      • 18.14.1. Custom operations
      • 18.14.2. Functional operators
      • 18.14.3. Image operations
      • 18.14.4. Graphcore utility operations
      • 18.14.5. IPU specific maths operations
      • 18.14.6. Pipelining operators
      • 18.14.7. Popnn primitive neural network operators
      • 18.14.8. Popnn normalization operators
      • 18.14.9. Popnn recurrent neural network operators
      • 18.14.10. Popops all to all and all gather operators
      • 18.14.11. Popops cross replica operators
      • 18.14.12. Popops embedding operators
      • 18.14.13. Popops reduce scatter operator
      • 18.14.14. Poprand operators
      • 18.14.15. Utility operations to be used in replicated mode
      • 18.14.16. Slicing operators
      • 18.14.17. Statistics operators
      • 18.14.18. Summary operations for IPUs
    • 18.15. Optimisers
      • 18.15.1. Optimizer classes for the Graphcore IPU
    • 18.16. Sharding
      • 18.16.1. Utility functions for sharding graphs
  • 19. TensorFlow operators supported by the IPU
  • 20. Resources
    • 20.1. Graphcore
    • 20.2. TensorFlow
    • 20.3. Other
  • 21. Index
  • 22. Trademarks & copyright
Targeting the IPU from TensorFlow 1

22. Trademarks & copyright

Graphcore® and Poplar® are registered trademarks of Graphcore Ltd.

AI-Float™, Colossus™, Exchange Memory™, Graphcloud™, In-Processor-Memory™, IPU-Core™, IPU-Exchange™, IPU-Fabric™, IPU-Link™, IPU-M2000™, IPU-Machine™, IPU-POD™, IPU-Tile™, PopART™, PopLibs™, PopVision™, PopTorch™, Streaming Memory™ and Virtual-IPU™ are trademarks of Graphcore Ltd.

All other trademarks are the property of their respective owners.

Copyright © 2016-2020 Graphcore Ltd. All rights reserved.

This software is made available under the terms of the Graphcore End User License Agreement (EULA). Please ensure you have read and accept the terms of the license before using the software.

Previous

Revision c43e948f.