Logo
PyTorch for the IPU: User Guide
Version: 3.0.0
  • 1. Introduction
    • 1.1. Data batching
    • 1.2. Parallel and Distributed execution
    • 1.3. Constraints
    • 1.4. Other resources
  • 2. Installation
    • 2.1. Version compatibility
    • 2.2. Using a Python virtual environment
    • 2.3. Setting the environment variables
    • 2.4. Validating the setup
  • 3. From PyTorch to PopTorch
    • 3.1. Preparing your data
    • 3.2. Creating your model
      • 3.2.1. Training
      • 3.2.2. Inference
    • 3.3. The training loop
    • 3.4. Multiple/custom losses
    • 3.5. Optimizers
    • 3.6. Going further
  • 4. Features
    • 4.1. Options
      • 4.1.1. Setting options via config file
    • 4.2. Model wrapping functions
      • 4.2.1. poptorch.trainingModel
      • 4.2.2. poptorch.inferenceModel
      • 4.2.3. poptorch.PoplarExecutor
      • 4.2.4. poptorch.isRunningOnIpu
    • 4.3. Error handling
      • 4.3.1. Recoverable runtime errors
      • 4.3.2. Unrecoverable runtime errors
      • 4.3.3. Application and other errors
    • 4.4. Multi-IPU execution strategies
      • 4.4.1. Annotations
        • Model partitioning using blocks
        • poptorch.Stage and poptorch.AutoStage
          • poptorch.Stage
          • poptorch.AutoStage
        • poptorch.Phase
        • Advanced annotation with strings
      • 4.4.2. Available execution strategies
        • Pipelined execution
        • Sharded execution
        • Phased execution
          • Serial phased execution
          • Parallel phased execution
          • poptorch.Liveness
    • 4.5. Optimizers
      • 4.5.1. Loss scaling
      • 4.5.2. Velocity scaling (SGD combined variant only)
      • 4.5.3. Accumulation types
      • 4.5.4. Constant attributes
      • 4.5.5. Reading and writing optimizer state
    • 4.6. PopTorch ops
      • 4.6.1. poptorch.ctc_beam_search_decoder
      • 4.6.2. poptorch.ipu_print_tensor
      • 4.6.3. poptorch.identity_loss
      • 4.6.4. poptorch.MultiConv
      • 4.6.5. poptorch.nop
      • 4.6.6. poptorch.dynamic_slice
      • 4.6.7. poptorch.serializedMatMul
      • 4.6.8. poptorch.set_available_memory
      • 4.6.9. Miscellaneous functions
    • 4.7. 16-bit float support
    • 4.8. Automatic mixed-precision casting
    • 4.9. PyTorch buffers
    • 4.10. Creating custom ops
      • 4.10.1. Implementing the custom op
      • 4.10.2. Make the op available to PyTorch
      • 4.10.3. Passing attributes to the custom op
    • 4.11. Precompilation and caching
      • 4.11.1. Caching
      • 4.11.2. Precompilation
    • 4.12. Environment variables
      • 4.12.1. Logging level
      • 4.12.2. Profiling
      • 4.12.3. IPU Model
      • 4.12.4. Wait for an IPU to become available
      • 4.12.5. Enable executable caching
  • 5. Efficient data batching
    • 5.1. poptorch.DataLoader
    • 5.2. poptorch.AsynchronousDataAccessor
      • 5.2.1. Rebatching iterable datasets
    • 5.3. poptorch.Options.deviceIterations
    • 5.4. poptorch.Options.replicationFactor
    • 5.5. poptorch.Options.Training.gradientAccumulation
    • 5.6. poptorch.Options.outputMode
  • 6. IPU supported operations
    • 6.1. Torch operations
      • 6.1.1. Tensor operations
        • Creation ops
        • Indexing, slicing, joining and mutating ops
        • Random samplers
      • 6.1.2. Math operations
        • Pointwise ops
        • Reduction ops
        • Comparison ops
        • Other ops
        • BLAS and LAPACK Operations
    • 6.2. Torch.nn operations
      • 6.2.1. Containers
      • 6.2.2. Convolution layers
      • 6.2.3. Pooling layers
      • 6.2.4. Padding layers
      • 6.2.5. Activations
      • 6.2.6. Normalization layers
      • 6.2.7. Recurrent layers
      • 6.2.8. Linear layers
      • 6.2.9. Dropout
      • 6.2.10. Sparse layers
      • 6.2.11. Loss functions
      • 6.2.12. Vision Layers
    • 6.3. 16-bit float operations
    • 6.4. 16-bit float migration
    • 6.5. Gradient computation control
  • 7. Debugging your model
    • 7.1. Inspecting tensors
    • 7.2. Anchoring tensors
    • 7.3. Retrieving tensors
    • 7.4. Inspecting optimiser state
  • 8. Efficient IPU I/O
    • 8.1. Prefetch and Multibuffering
    • 8.2. Overlapping compute and I/O
  • 9. Examples
    • 9.1. MNIST example
  • 10. Experimental features
    • 10.1. Distributed execution without PopRun
    • 10.2. torch.nn.CTCLoss
  • 11. Legacy tracing frontend
    • 11.1. Dispatcher support
    • 11.2. Constraints when using tracing
    • 11.3. 16-bit float operations when using tracing
      • 11.3.1. Casting
      • 11.3.2. Creation functions
      • 11.3.3. Normalization
    • 11.4. Automatic mixed-precision casting
      • 11.4.1. Custom casting policies
  • 12. API reference
    • 12.1. Options
    • 12.2. Helpers
    • 12.3. PopTorch Ops
    • 12.4. Model wrapping functions
    • 12.5. Parallel execution
    • 12.6. Optimizers
    • 12.7. Data batching
    • 12.8. Enumerations
    • 12.9. Autocasting
  • 13. Index
  • 14. Legal notices
  • 15. Changelog
    • 15.1. v3.0 (Poplar SDK 3.0)
      • 15.1.1. New features
      • 15.1.2. API changes
      • 15.1.3. Bug Fixes
    • 15.2. v2.6 (Poplar SDK 2.6)
      • 15.2.1. New features
      • 15.2.2. API changes
      • 15.2.3. Bug Fixes
    • 15.3. v2.5 (Poplar SDK 2.5)
      • 15.3.1. New features
      • 15.3.2. API changes
      • 15.3.3. Bug Fixes
    • 15.4. v2.4 (Poplar SDK 2.4)
      • 15.4.1. New features
      • 15.4.2. API changes
      • 15.4.3. Bug Fixes
    • 15.5. v2.3 (Poplar SDK 2.3)
      • 15.5.1. New features
      • 15.5.2. Bug Fixes
      • 15.5.3. API changes
    • 15.6. v2.2 (Poplar SDK 2.2)
      • 15.6.1. New features
      • 15.6.2. API changes
    • 15.7. v2.1 (Poplar SDK 2.1)
      • 15.7.1. New features
      • 15.7.2. API changes
      • 15.7.3. Known issues
    • 15.8. v2.0 (Poplar SDK 2.0)
      • 15.8.1. New features
      • 15.8.2. API changes
    • 15.9. v1.0 (Poplar SDK 1.4)
      • 15.9.1. New features
      • 15.9.2. Known issues
    • 15.10. v0.1 (Poplar SDK 1.3)
      • 15.10.1. New features
PyTorch for the IPU: User Guide


Revision c55bb7e2.