Logo
PyTorch for the IPU: User Guide
Version: latest
  • 1. Introduction
    • 1.1. Data batching
    • 1.2. Parallel and Distributed execution
    • 1.3. Constraints
    • 1.4. Other resources
  • 2. Installation
    • 2.1. Version compatibility
    • 2.2. Using a Python virtual environment
    • 2.3. Setting the environment variables
    • 2.4. Validating the setup
  • 3. From PyTorch to PopTorch
    • 3.1. Preparing your data
    • 3.2. Creating your model
      • 3.2.1. Training
      • 3.2.2. Inference
    • 3.3. The training loop
    • 3.4. Multiple/custom losses
    • 3.5. Optimizers
    • 3.6. Going further
  • 4. Features
    • 4.1. Options
      • 4.1.1. Setting options via config file
    • 4.2. Model wrapping functions
      • 4.2.1. poptorch.trainingModel
      • 4.2.2. poptorch.inferenceModel
      • 4.2.3. poptorch.PoplarExecutor
      • 4.2.4. poptorch.isRunningOnIpu
    • 4.3. Error handling
      • 4.3.1. Recoverable runtime errors
      • 4.3.2. Unrecoverable runtime errors
      • 4.3.3. Application and other errors
    • 4.4. Multi-IPU execution strategies
      • 4.4.1. Annotations
        • Model partitioning using blocks
        • poptorch.Stage and poptorch.AutoStage
          • poptorch.Stage
          • poptorch.AutoStage
        • poptorch.Phase
        • Advanced annotation with strings
      • 4.4.2. Available execution strategies
        • Pipelined execution
        • Sharded execution
        • Phased execution
          • Serial phased execution
          • Parallel phased execution
          • poptorch.Liveness
      • 4.4.3. Grouping tensor weights across replicas
    • 4.5. Optimizers
      • 4.5.1. Loss scaling
      • 4.5.2. Velocity scaling (SGD combined variant only)
      • 4.5.3. Accumulation types
      • 4.5.4. Constant attributes
      • 4.5.5. Reading and writing optimizer state
    • 4.6. PopTorch ops
      • 4.6.1. poptorch.ctc_beam_search_decoder
      • 4.6.2. poptorch.ipu_print_tensor
      • 4.6.3. poptorch.identity_loss
      • 4.6.4. poptorch.MultiConv
      • 4.6.5. poptorch.nop
      • 4.6.6. poptorch.dynamic_slice
      • 4.6.7. poptorch.dynamic_update
      • 4.6.8. poptorch.serializedMatMul
      • 4.6.9. poptorch.set_available_memory
      • 4.6.10. Miscellaneous functions
    • 4.7. 16-bit float support
    • 4.8. PyTorch buffers
    • 4.9. Creating custom ops
      • 4.9.1. Implementing the custom op
      • 4.9.2. Make the op available to PyTorch
      • 4.9.3. Passing attributes to the custom op
    • 4.10. Precompilation and caching
      • 4.10.1. Caching
      • 4.10.2. Precompilation
    • 4.11. Environment variables
      • 4.11.1. Logging level
      • 4.11.2. Profiling
      • 4.11.3. IPU Model
      • 4.11.4. Wait for an IPU to become available
      • 4.11.5. Enable executable caching
  • 5. Efficient data batching
    • 5.1. poptorch.DataLoader
    • 5.2. poptorch.AsynchronousDataAccessor
      • 5.2.1. Rebatching iterable datasets
    • 5.3. poptorch.Options.deviceIterations
    • 5.4. poptorch.Options.replicationFactor
    • 5.5. poptorch.Options.inputReplicaGrouping
    • 5.6. poptorch.Options.Training.gradientAccumulation
    • 5.7. poptorch.Options.outputMode
  • 6. IPU supported operations
    • 6.1. Torch operations
      • 6.1.1. Tensor operations
        • Creation ops
        • Indexing, slicing, joining and mutating ops
        • Random samplers
      • 6.1.2. Math operations
        • Pointwise ops
        • Reduction ops
        • Comparison ops
        • torch.linalg ops
        • Other ops
        • BLAS and LAPACK Operations
    • 6.2. Torch.nn operations
      • 6.2.1. Containers
      • 6.2.2. Convolution layers
      • 6.2.3. Pooling layers
      • 6.2.4. Padding layers
      • 6.2.5. Activations
      • 6.2.6. Normalization layers
      • 6.2.7. Recurrent layers
      • 6.2.8. Linear layers
      • 6.2.9. Dropout
      • 6.2.10. Sparse layers
      • 6.2.11. Loss functions
      • 6.2.12. Vision Layers
    • 6.3. 16-bit float operations
    • 6.4. 16-bit float migration
    • 6.5. Gradient computation control
  • 7. Debugging your model
    • 7.1. Inspecting tensors
    • 7.2. Anchoring tensors
    • 7.3. Retrieving tensors
    • 7.4. Inspecting optimiser state
  • 8. Efficient IPU I/O
    • 8.1. Prefetch and multibuffering
    • 8.2. Overlapping compute and I/O
  • 9. Examples
    • 9.1. MNIST example
  • 10. Experimental features
    • 10.1. Distributed execution without PopRun
    • 10.2. torch.nn.CTCLoss
  • 11. API reference
    • 11.1. Options
    • 11.2. Helpers
    • 11.3. PopTorch ops
    • 11.4. Model wrapping functions
    • 11.5. Parallel execution
    • 11.6. Optimizers
    • 11.7. Data batching
    • 11.8. Enumerations
  • 12. Index
  • 13. Trademarks & copyright
PyTorch for the IPU: User Guide

Search help

Note: Searching from the top-level index page will search all documents. Searching from a specific document will search only that document.

  • Find an exact phrase: Wrap your search phrase in "" (double quotes) to only get results where the phrase is exactly matched. For example "PyTorch for the IPU" or "replicated tensor sharding"
  • Prefix query: Add an * (asterisk) at the end of any word to indicate a prefix query. This will return results containing all words with the specific prefix. For example tensor*
  • Fuzzy search: Use ~N (tilde followed by a number) at the end of any word for a fuzzy search. This will return results that are similar to the search word. N specifies the “edit distance” (fuzziness) of the match. For example Polibs~1
  • Words close to each other: ~N (tilde followed by a number) after a phrase (in quotes) returns results where the words are close to each other. N is the maximum number of positions allowed between matching words. For example "ipu version"~2
  • Logical operators. You can use the following logical operators in a search:
    • + signifies AND operation
    • | signifies OR operation
    • - negates a single word or phrase (returns results without that word or phrase)
    • () controls operator precedence


Revision a59a8b68.