Logo
Targeting the IPU from TensorFlow 1
Version: 3.4.0
  • 1. Introduction
    • 1.1. Document overview
  • 2. Setup quick start
    • 2.1. Enable Poplar SDK
    • 2.2. Create and enable a Python virtual environment
    • 2.3. Install the TensorFlow 1 wheels and validate
  • 3. Tutorial
    • 3.1. Preliminary graphs
    • 3.2. A basic graph
      • 3.2.1. Selecting hardware to run on
      • 3.2.2. Running on the IPU Model simulator
    • 3.3. Compiling the graph for the IPU
    • 3.4. Sharding a graph
    • 3.5. Adding variables
      • 3.5.1. Troubleshooting
      • 3.5.2. Note on the global_step counter
  • 4. Targeting the Poplar XLA device
    • 4.1. Supported types
    • 4.2. Device selection
    • 4.3. Configuring system options
      • 4.3.1. TF_POPLAR_FLAGS environment variable
    • 4.4. Supported operations
    • 4.5. Unsupported operations
    • 4.6. Error Handling
      • 4.6.1. Construction and compilation errors
      • 4.6.2. Runtime errors
  • 5. Compiling and pre-compiling executables
    • 5.1. Caching of compiled executables
    • 5.2. Pre-compiling executables
      • 5.2.1. Unsupported Operations
  • 6. Training a model
    • 6.1. Training loops, data sets and feed queues
    • 6.2. Accessing outfeed queue results during execution
    • 6.3. Replicated graphs
      • 6.3.1. Selecting the number of replicas
      • 6.3.2. Performing parameter updates
    • 6.4. Pipelined training
      • 6.4.1. Grouped scheduling
      • 6.4.2. Interleaved scheduling
      • 6.4.3. Sequential scheduling
      • 6.4.4. Pipeline stage inputs and outputs
      • 6.4.5. Applying an optimiser to the graph
      • 6.4.6. Device mapping
      • 6.4.7. Concurrent pipeline stages
    • 6.5. Gradient accumulation
      • 6.5.1. Optimizers
      • 6.5.2. Pipelining
      • 6.5.3. Accumulation data type
    • 6.6. Optimizer state offloading
    • 6.7. Dataset benchmarking
      • 6.7.1. Accessing the JSON data
    • 6.8. Half and mixed precision training
  • 7. Efficient IPU I/O
    • 7.1. Prefetch elements
    • 7.2. I/O Tiles
  • 8. Example using IPUEstimator
  • 9. Example using IPUPipelineEstimator
  • 10. Distributed training
    • 10.1. PopDistStrategy examples
  • 11. Half-precision floating point and stochastic rounding
    • 11.1. Controlling the half-precision floating-point unit
    • 11.2. Resetting the global random number seed
    • 11.3. Debugging numerical issues
  • 12. IPU-optimised operations
    • 12.1. Image operations
    • 12.2. Matmul serialisation
    • 12.3. Dropout
    • 12.4. Embedding lookup
    • 12.5. Group normalisation
    • 12.6. Instance normalisation
    • 12.7. Layer normalisation
    • 12.8. GeLU activation
    • 12.9. Sequence slice
    • 12.10. Histogram
  • 13. IPU Outlined Functions
    • 13.1. Usage
    • 13.2. Examples
      • 13.2.1. Models with common structures
      • 13.2.2. Serializing large operations
  • 14. Writing custom operations
    • 14.1. Custom operation on the IPU
      • 14.1.1. Building the Poplar graph
      • 14.1.2. Gradient builders
      • 14.1.3. Metadata
      • 14.1.4. Compiling the IPU code
        • API level
        • PopLibs library code
        • Compiling the library file
      • 14.1.5. Using the custom op in TensorFlow
      • 14.1.6. Tensor allocation
      • 14.1.7. Examples
        • In-place operations
        • Operation attributes
        • Custom codelet
    • 14.2. Custom host CPU operations
      • 14.2.1. Gradient callback
  • 15. IPU host embeddings
    • 15.1. Usage
    • 15.2. Example
    • 15.3. Experimental functionality: IPU embeddings in remote buffers
      • 15.3.1. Partitioning strategies
        • Token strategy
        • Encoding strategy
        • Choosing a strategy for your application
  • 16. IPU embedded application runtime
    • 16.1. Usage
    • 16.2. Pipelining and I/O tiles
      • 16.2.1. Parallel requests
      • 16.2.2. Timeout
      • 16.2.3. Engine restarts
    • 16.3. Example
    • 16.4. Error Handling
      • 16.4.1. Runtime errors
  • 17. Exporting precompiled models for TensorFlow Serving
    • 17.1. Exporting non-pipelined models defined inside a function
      • 17.1.1. Example of exporting non-pipelined model defined inside a function
      • 17.1.2. Example of exporting non-pipelined model defined inside a function with additional preprocessing and postprocessing steps
    • 17.2. Exporting pipelined models defined as a list of functions
      • 17.2.1. Pipeline example
      • 17.2.2. Pipeline example with preprocessing and postprocessing steps
    • 17.3. Running the model in TensorFlow Serving
  • 18. Retrieving information about compilation and execution
    • 18.1. TensorFlow options for reporting
    • 18.2. XLA graph file naming
  • 19. IPU TensorFlow Addons
    • 19.1. Introduction
    • 19.2. IPU SavedModel CLI
      • 19.2.1. Run subcommand
      • 19.2.2. Convert subcommand
      • 19.2.3. Pipeline configuration
      • 19.2.4. Pipeline development
      • 19.2.5. Pipeline solution file
      • 19.2.6. Example configuration file
  • 20. TensorFlow API changes
    • 20.1. Release 3.0
      • 20.1.1. Non-breaking changes
        • Deprecated modules
    • 20.2. Release 2.6
      • 20.2.1. Breaking changes
        • Removal of deprecated APIs
    • 20.3. Release 2.5
      • 20.3.1. Breaking changes
        • Removal of deprecated APIs
        • Other
      • 20.3.2. Non-breaking changes
        • Deprecated layers
        • RNN available_memory_proportion_fwd/available_memory_proportion_bwd deprecated
    • 20.4. Release 2.4
      • 20.4.1. Breaking changes
        • Summary ops
        • Removal of deprecated members
      • 20.4.2. Non-breaking changes
    • 20.5. Release 2.3
      • 20.5.1. Breaking changes
        • Custom user op metadata interface updates
        • The verified transfers feature has been removed
      • 20.5.2. Non-breaking changes
    • 20.6. Release 2.2
      • 20.6.1. Breaking changes
        • C++ Poplar TensorFlow libraries are private by default
        • Reports removed from ipu events
      • 20.6.2. Non-breaking changes
        • IPULoggingTensorHook replication_factor deprecated
        • IPUInfeedQueue/IPUOutfeedQueue/IPULoggingTensorHook feed_name deprecated
        • Change of output location for profiling information
        • IPU Keras Layers deprecation in TensorFlow 1.15
        • Warning when epsilon value is too low
    • 20.7. Release 2.1
      • 20.7.1. Breaking changes
        • IPUPipelineEstimator change
        • Autosharding removed
        • Old IPU option configuration API changes
        • IPU Keras changes [TensorFlow 2]
      • 20.7.2. Non-breaking changes
        • Recompute suggestions deprecated
        • IPUInfeedQueue/IPUOutfeedQueue replication_factor deprecated
        • IPUInfeedQueue data_to_prefetch deprecated
        • IPUOutfeedQueue data_to_prefetch deprecated
        • CTC loss ops deprecated
        • New configuration API
        • Support for grouped collectives
        • Environment variable changes
    • 20.8. Release 2.0
      • 20.8.1. Breaking changes
      • 20.8.2. Non-breaking changes
        • IPUPipelineEstimator change
        • Autosharding deprecated
        • IPU config change
        • IPU Keras changes [TensorFlow 2]
  • 21. TensorFlow Python API
    • 21.1. Operations and utilities related to the Graphcore IPU
    • 21.2. Compiler interface
    • 21.3. Scoping contexts
    • 21.4. Infeed queue
    • 21.5. Outfeed queue
    • 21.6. General utilities
    • 21.7. Configuration utilities
    • 21.8. Looping utilities
    • 21.9. Distributed training
    • 21.10. Horovod
    • 21.11. Serving utilities
    • 21.12. Datasets
      • 21.12.1. Dataset benchmarking
      • 21.12.2. Dataset wrappers
    • 21.13. Estimators
      • 21.13.1. IPUEstimator
      • 21.13.2. IPUPipelineEstimator
      • 21.13.3. Run configs
      • 21.13.4. Session run hooks
    • 21.14. Keras layers
      • 21.14.1. Keras layer specializations for the Graphcore IPU
    • 21.15. Operators
      • 21.15.1. Control flow operations.
      • 21.15.2. Custom operations
      • 21.15.3. Functional operators
      • 21.15.4. Image operations
      • 21.15.5. Graphcore utility operations
      • 21.15.6. IPU specific maths operations
      • 21.15.7. Pipelining operators
      • 21.15.8. Popnn primitive neural network operators
      • 21.15.9. Popnn normalization operators
      • 21.15.10. Popops all to all and all gather operators
      • 21.15.11. Popops cross replica operators
      • 21.15.12. Popops embedding operators
      • 21.15.13. Popops reduce scatter operator
      • 21.15.14. Popops within replica operators
      • 21.15.15. Poprand operators
      • 21.15.16. Utility operations to be used in replicated mode
      • 21.15.17. Slicing operators
      • 21.15.18. Statistics operators
      • 21.15.19. Embedded application runtime
    • 21.16. Optimisers
      • 21.16.1. Helper classes and methods for gradient accumulation.
      • 21.16.2. Optimizer classes for the Graphcore IPU
    • 21.17. Sharding
      • 21.17.1. Utility functions for sharding graphs
  • 22. TensorFlow operators supported by the IPU
  • 23. IPU TensorFlow Addons API changes
    • 23.1. Release 3.0
      • 23.1.1. Breaking changes
    • 23.2. Release 2.5
      • 23.2.1. Non-breaking changes
        • RNN available_memory_proportion_fwd/available_memory_proportion_bwd deprecated
    • 23.3. Release 2.4
  • 24. IPU TensorFlow Addons Python API
    • 24.1. TensorFlow layers
      • 24.1.1. TensorFlow layers made for IPU TensorFlow
    • 24.2. TensorFlow optimizers
      • 24.2.1. Optimizers made for IPU TensorFlow
  • 25. Resources
    • 25.1. Graphcore
    • 25.2. TensorFlow
    • 25.3. Other
  • 26. Trademarks & copyright
Targeting the IPU from TensorFlow 1

Search help

Note: Searching from the top-level index page will search all documents. Searching from a specific document will search only that document.

  • Find an exact phrase: Wrap your search phrase in "" (double quotes) to only get results where the phrase is exactly matched. For example "PyTorch for the IPU" or "replicated tensor sharding"
  • Prefix query: Add an * (asterisk) at the end of any word to indicate a prefix query. This will return results containing all words with the specific prefix. For example tensor*
  • Fuzzy search: Use ~N (tilde followed by a number) at the end of any word for a fuzzy search. This will return results that are similar to the search word. N specifies the “edit distance” (fuzziness) of the match. For example Polibs~1
  • Words close to each other: ~N (tilde followed by a number) after a phrase (in quotes) returns results where the words are close to each other. N is the maximum number of positions allowed between matching words. For example "ipu version"~2
  • Logical operators. You can use the following logical operators in a search:
    • + signifies AND operation
    • | signifies OR operation
    • - negates a single word or phrase (returns results without that word or phrase)
    • () controls operator precedence

26. Trademarks & copyright

Graphcloud®, Graphcore®, Poplar® and PopVision® are registered trademarks of Graphcore Ltd.

Bow™, Bow-2000™, Bow Pod™, Colossus™, In-Processor-Memory™, IPU-Core™, IPU-Exchange™, IPU-Fabric™, IPU-Link™, IPU-M2000™, IPU-Machine™, IPU-POD™, IPU-Tile™, PopART™, PopDist™, PopLibs™, PopRun™, PopTorch™, Streaming Memory™ and Virtual-IPU™ are trademarks of Graphcore Ltd.

All other trademarks are the property of their respective owners.

Copyright © 2020—2023 Graphcore Ltd. All rights reserved.

Previous