Logo
Tutorials
Version: 3.0.0
  • 1. Introduction
    • 1.1. Prerequisites
    • 1.2. Running these tutorials
  • 2. PyTorch
    • 2.1. Introduction to PopTorch - running a simple model
      • What is PopTorch?
      • Getting started: training a model on the IPU
        • Import the packages
        • Load the data
          • PopTorch DataLoader
        • Build the model
        • Prepare training for IPUs
        • Train the model
          • Training loop
          • Use the same IPU for training and inference
          • Save the trained model
        • Evaluate the model
      • Using the model on our own images to get predictions
        • Running our model on the IPU
        • Running our model on the CPU
        • Limitations with our model
      • Doing more with poptorch.Options
        • deviceIterations
        • replicationFactor
        • randomSeed
        • useIpuModel
      • How to set the options
      • Summary
        • Next steps:
    • 2.2. Efficient data loading with PopTorch
      • PyTorch and PopTorch DataLoader
      • Understanding batching with IPU
        • Device iterations
          • A note on returned data
        • Gradient accumulation
        • Replication
        • Global batch size
          • How many samples will then be loaded in one step?
      • Tuning hyperparameters
        • Evaluating the asynchronous DataLoader
        • What if the DataLoader throughput is too low?
        • Device iterations vs global batch size
          • Case of a training session
          • Case of an inference session
          • Conclusion: Training and inference sessions
      • Experiments
        • Case 1: No bottleneck
          • Why is the throughput lower with real data?
        • Case 2: Larger global batch size with replication
      • Summary
    • 2.3. Tutorial on BERT Fine-tuning on IPU
      • File structure
      • How to use this demo
      • License
    • 2.4. Half and mixed precision in PopTorch
      • General
        • Motives for half precision
        • Numerical stability
          • Loss scaling
          • Stochastic rounding
      • Train a model in half precision
        • Import the packages
        • Build the model
        • Choose parameters
          • Casting a model’s parameters
          • Casting a single layer’s parameters
        • Prepare the data
        • Optimizers and loss scaling
        • Set PopTorch’s options
          • Stochastic rounding on IPU
          • Partials data type
        • Train the model
        • Evaluate the model
      • Visualise the memory footprint
      • Debug floating-point exceptions
      • Summary
    • 2.5. Observing tensors in PopTorch
      • Table of Contents
      • General
      • File structure
      • Method 1 Print tensor
      • Method 2 Direct anchoring
      • Anchor modes
      • Gradient histogram example
        • Import packages
        • Build the model
        • Assigning assorted parameters
        • Set PopTorch options
        • Setting up the data loader
        • Initialising the PopTorch model
        • Printing out the tensor names
        • Anchoring the tensors
        • Training the model
        • Retrieving the tensors
        • Building the histogram
    • 2.6. PopTorch Parallel Execution Using Pipelining
      • File structure
      • Introduction to pipelined execution
      • Setting hyperparameters
      • Preparing the data
      • Model definition
        • Annotation for model partitioning
          • Defining the training model
      • Execution strategies
        • Pipelined execution (parallel)
          • Assigning blocks to stages and IPUs
          • Setting gradient accumulation and device iterations
        • Efficient model partitioning (advanced)
        • Sharded execution (sequential)
      • Saving memory by offloading to remote buffers (advanced)
      • Training the model
      • Inference
      • How to run the example script
      • Conclusion
    • 2.7. Training a Hugging Face model on the IPU using a local dataset
      • How to run this tutorial
        • Getting the dataset
        • Environment
      • Graphcore Hugging Face models
        • Utility imports
      • Preparing the NIH Chest X-ray Dataset
        • Preparing the labels
        • Create the dataset
        • Visualising the dataset
      • Preparing the model
      • Run the training
        • Plotting convergence
      • Run the evaluation
      • Conclusion
  • 3. PopXL
    • 3.1. PopXL and popxl.addons
      • Introduction
      • Requirements
      • Basic concepts
      • A simple example
        • Imports
        • Defining a Linear Module
        • Creating a graph from a Module
        • Summary and concepts in practice
        • Multiple bound graphs
      • Nested Modules and Outlining
        • DotTree example
      • MNIST
        • Load dataset
        • Defining the Training step
        • Validation
      • Conclusion
    • 3.2. PopXL Custom Optimiser
      • Introduction
      • Requirements
      • Imports
      • Defining the Adam optimiser
        • Managing in-place ops
        • Using the var_updates module
        • Using our custom optimiser
      • MNIST with Adam
      • Validation
      • Conclusion
    • 3.3. Data parallelism
    • 3.4. Pipelining
    • 3.5. Remote variables and RTS
    • 3.6. Phased Execution
  • 4. Poplar
    • 4.1. Tutorial 1: programs and variables
      • Setup
      • Graphs, variables and programs
        • Creating the graph
        • Adding variables and mapping them to IPU tiles
        • Adding the control program
        • Compiling the poplar executable
      • Initialising variables
      • Getting data into and out of the device
      • Data streams
      • (Optional) Using the IPU
      • Summary
    • 4.2. Tutorial 2: using PopLibs
      • Setup
      • Using PopLibs
      • Reshaping and transposing data
    • 4.3. Tutorial 3: writing vertex code
      • Setup
      • Writing vertex code
      • Creating a codelet
      • Creating a compute set
      • Executing the compute set
    • 4.4. Tutorial 4: profiling output
      • Setup
      • Profiling on the IPU
      • Profiling Methods
        • Command line Profile Summary
        • Generating Profile Report Files
        • Using The PopVision Analysis API in C++ or Python
      • Using PopVision Graph Analyser - loading and viewing a report
      • Using PopVision Graph Analyser - General Functionality
        • Capturing IPU Reports - setting POPLAR_ENGINE_OPTIONS
        • Comparing two reports
        • Profiling an Out Of Memory program
      • Using PopVision Graph Analyser - Different tabs in the application
        • Memory Report
        • Program Tree
        • Operations Summary
        • Liveness Report
        • Execution Trace
      • Follow-ups
      • Summary
    • 4.5. Tutorial 5: matrix-vector multiplication
      • Setup
      • The vertex code
      • The host code
      • (Optional) Using the IPU
      • Summary
    • 4.6. Tutorial 6: matrix-vector multiplication optimisation
      • Setup
      • Optimising matrix-vector multiplication
  • 5. TensorFlow 1
    • 5.1. Introduction to TensorFlow 1 on the IPU
      • Before you start
      • Other useful resources
    • 5.2. TensorFlow 1 on the IPU: training a model using half- and mixed-precision
      • Table of Contents
      • Introduction
      • Using FP16 on the IPU in practice
        • Support for FP16 in TensorFlow
        • Review of common numerical issues
        • Inaccuracies in parameter updates
          • Method 1: Using stochastic rounding
          • Method 2: Store and update the parameters in FP32
        • Underflowing gradients and loss scaling
          • Numerical concerns with loss scaling
          • Techniques for specific optimisers
        • Diagnosing numerical issues
        • Avoiding numerical issues
          • Avoiding underflow
          • Avoiding overflow
          • Unstable operations
        • Other considerations
          • Setting the partials type for convolutions and matrix multiplications
          • Data type arguments for specialised IPU ops
      • Code examples
        • Running the examples
        • Stochastic rounding example
        • FP32 parameter updates example
        • IPUEstimator example
        • Optional command line arguments
        • Other examples
    • 5.3. TensorFlow 1 Pipelining
      • Introduction
      • Requirements
      • Key Principles of Model Pipelining
        • Overview
          • 1. Model Parallelism With Sharding
          • 2. Model Parallelism With Pipelining
        • Pipeline Execution Phases
      • Tutorial Walkthrough
        • Tutorial Step 1: The Existing Single IPU Application
        • Tutorial Step 2: Running The Model On Multiple IPUs Using Sharding
        • Tutorial Step 2: Code Changes
        • Tutorial Step 3: Using pipelining for better IPU utilization
          • The TensorFlow 1 Pipelining API
        • Scheduling
          • Sequential Scheduling
          • Interleaved Scheduling
          • Grouped Scheduling
        • Tutorial Step 3: Code Changes
        • Tutorial Step 3 : Extension
          • Pipeline Schedule
          • Repeat count
          • Stages
        • Tutorial Step 4: Run-time Configurable Stages
        • Tutorial Step 4: Code Changes
      • Further Considerations
        • Recomputation
        • Variable Offloading
        • Gradient Accumulation Buffer Data Type
        • Data Parallelism
        • IPUPipelineEstimator
  • 6. TensorFlow 2
    • 6.1. Using Infeed and Outfeed Queues in TensorFlow 2
      • Directory Structure
      • Table of Contents
      • Introduction
      • Example
        • Import the necessary APIs
        • Define hyperparameters
        • Prepare the dataset
        • Define the model
        • Define the custom training loop
        • Configure the hardware
        • Create data pipeline and execute the training loop
      • Additional notes
        • License
    • 6.2. Keras tutorial: How to run on IPU
      • Keras MNIST example
      • Running the example on the IPU
        • 1. Import the TensorFlow IPU module
        • 2. Preparing the dataset
        • 3. Add IPU configuration
        • 4. Specify IPU strategy
        • 5. Wrap the model within the IPU strategy scope
        • 6. Results
      • Going faster by setting steps_per_execution
      • Replication
      • Pipelining
    • 6.3. Using TensorBoard in TensorFlow 2 on the IPU
      • Preliminary Setup
      • Introduction to TensorBoard and Data Logging
        • How does TensorBoard work?
      • How do I launch TensorBoard?
        • TensorBoard on a Remote Machine
          • SSH Tunnelling
          • Exposing TensorBoard to the Network
        • Automatically Handling Log Directory Cleansing
      • Logging Data with tf.keras.callbacks.Callback
        • Running Evaluation at the end of an Epoch
        • Supported Data Types in tf.summary
        • Logging Custom Image Data at the end of an Epoch
        • Using tf.keras.callbacks.TensorBoard
      • Model Setup & Data Preparation
      • Model Definition
      • Model Training
      • Exploring TensorBoard
        • Scalars
        • Images
        • Graphs
        • Distributions and Histograms
          • Distributions
          • Histograms
      • Time Series
      • Using TensorBoard Without Keras
      • To Conclude
  • 7. PopVision
    • 7.1. Tutorial: Accessing profiling information
      • Setup
      • Using the python API
        • Loading a profile
        • Using visitors to explore the data
      • Going further with the PopVision Graph Analyser
    • 7.2. Tutorial: Lightweight profiling
      • Introduction
      • Setup
      • Example 1: Usage of the Block program
        • Nested Blocks
      • Example 2: Implicit Blocks
      • Example 3: I/O
      • Block Flush
      • Conclusion
      • Further reading
    • 7.3. Tutorial: Reading PVTI files with libpva
      • How to run this tutorial
      • Enabling PVTI file generation
      • Using the Python API
        • Loading a PVTI file
        • Accessing processes, threads, and events
        • Analysing epochs
      • Going further
    • 7.4. Tutorial: Instrumenting applications
      • Setup
      • Generating and opening reports
      • Profiling execution of epochs
      • Logging the training and validation losses
      • Generating and profiling instant events
      • Going further
  • 8. Standard Tools
    • 8.1. Using IPUs from Jupyter Notebooks
      • Preparing your environment
      • Starting Jupyter with IPU support
        • Installing Jupyter
        • Starting a Jupyter Server
        • Connect your local machine to the Jupyter Server
        • Open the Jupyter notebook in the browser
      • Troubleshooting
        • Installing additional python packages from a notebook
        • Encountering ImportErrors
        • Can’t connect to server
        • Login page
    • 8.2. Using VS Code with the Poplar SDK and IPUs
      • Goals
      • Terminology
      • Installing extensions
      • Python development
        • Easily creating an .env file
        • Choosing VS Code’s Python interpreter
        • Using the .env file to access IPUs
        • Debugging code which requires IPUs
      • Debugging C++ libraries and custom ops
        • Difficulty
        • Outline
        • 0. Choose the C++ code to debug
        • 1. Set up launch.json for C++ debugging
        • 2. Set up your Python program to make debugging easy
        • 3. Attach gdbserver to your running process using the PID
        • 4. Connect VS Code to gdbserver
      • Troubleshooting
        • ImportError and ModuleNotFoundError for PopTorch, PopART or TensorFlow
          • Symptoms
          • Solution
        • launch.json config settings are being ignored
      • Features of the Python extension for VS Code
  • 9. Next steps
  • 10. Trademarks & copyright
Tutorials
  • »
  • 10. Trademarks & copyright
  • Edit on GitHub

10. Trademarks & copyright

Graphcloud®, Graphcore® and Poplar® are registered trademarks of Graphcore Ltd.

Bow™, Bow-2000™, Bow Pod™, Colossus™, In-Processor-Memory™, IPU-Core™, IPU-Exchange™, IPU-Fabric™, IPU-Link™, IPU-M2000™, IPU-Machine™, IPU-POD™, IPU-Tile™, PopART™, PopDist™, PopLibs™, PopRun™, PopVision™, PopTorch™, Streaming Memory™ and Virtual-IPU™ are trademarks of Graphcore Ltd.

All other trademarks are the property of their respective owners.

This software is made available under the terms of the Graphcore End User License Agreement (EULA) and the Graphcore Container License Agreement. Please ensure you have read and accept the terms of the corresponding license before using the software. The Graphcore EULA applies unless indicated otherwise.

© Copyright 2022, Graphcore Ltd.

Previous

Revision 5ecca5a1.