Logo
Poplar and PopLibs User Guide
Version: latest
  • 1. Introduction
  • 2. Programming with Poplar
    • 2.1. Poplar programming model
    • 2.2. The structure of a Poplar program
      • 2.2.1. Program flow control
        • Looping
        • Conditional execution
      • 2.2.2. What happens at run time
    • 2.3. Virtual graphs
    • 2.4. Replicated graphs
      • 2.4.1. Creating a replicated graph
    • 2.5. Data streams and remote buffers
      • 2.5.1. Data streams
        • Device-side streams
        • Host-side stream access
        • Stream buffer size limit
      • 2.5.2. Optimising host data transfers
        • Prefetch
        • Multibuffering
      • 2.5.3. Remote memory buffers
        • Remote buffer restrictions
    • 2.6. IPU-Link and sync configuration
      • 2.6.1. Link topologies
      • 2.6.2. Sync groups
    • 2.7. Device code
      • 2.7.1. Pre-compiling codelets
  • 3. Understanding vertices
    • 3.1. The Vertex class
    • 3.2. Vertex state
      • 3.2.1. Vector and VectorList types
      • 3.2.2. Allowed field types as vertex state
      • 3.2.3. Specifying memory constraints
      • 3.2.4. Stack allocation
    • 3.3. MultiVertex worker threads
      • 3.3.1. Thread safety
    • 3.4. Calling conventions
      • 3.4.1. External codelets
      • 3.4.2. Recursion and function pointers
    • 3.5. Vertex name mangling
  • 4. Supported types
    • 4.1. Scalar types
    • 4.2. Floating point types
      • 4.2.1. Half on the IPU
      • 4.2.2. Quarter on the IPU
      • 4.2.3. Half and quarter on the CPU
    • 4.3. Vector types
    • 4.4. Structure types
    • 4.5. Bit fields
  • 5. Vertex vector types
    • 5.1. Parameters
      • 5.1.1. Types
      • 5.1.2. Layout
      • 5.1.3. Minimum alignment
      • 5.1.4. Interleaved memory
    • 5.2. Memory layout for vectors
      • 5.2.1. Pointer compression
        • SCALED_PTR128
      • 5.2.2. Vector<T> layout
      • 5.2.3. Vector<Input<Vector<T>>> layout
      • 5.2.4. VectorList layout
        • DELTANELEMENTS layout
  • 6. Using the Poplar library
  • 7. The PopLibs libraries
    • 7.1. Using PopLibs
  • 8. PopLibs examples
    • 8.1. Dynamic slicing and updating
      • 8.1.1. Dynamic slice
      • 8.1.2. Dynamic update
      • 8.1.3. MultiSlice (embedding lookup)
  • 9. Graphcore Communication Library (GCL)
  • 10. Writing vertices in assembly
    • 10.1. Instruction set overview
      • 10.1.1. Supervisors and workers
      • 10.1.2. Execution pipelines
    • 10.2. Memory architecture
      • 10.2.1. Getting information about the memory
      • 10.2.2. Mk2 Colossus (GC200)
      • 10.2.3. Load and store instructions
    • 10.3. Worker stack and scratch space
      • 10.3.1. Specifying stack size
      • 10.3.2. Examples
    • 10.4. Vertex pipelines
      • 10.4.1. Memory conflicts
      • 10.4.2. Modified pipeline
      • 10.4.3. Fill and drain
    • 10.5. Assembly hints & tips
      • 10.5.1. Using the assembler
        • Assembler macros
        • Labels
        • Recording the code size of the vertex
        • Place each vertex in a unique section
      • 10.5.2. Architectural tips
        • Aligning repeat bodies
        • Over-reading and over-processing
        • Scratch space
        • Loading constants
      • 10.5.3. Division by 6
        • Division on the IPU
        • Division on the host
      • 10.5.4. General
        • Focus on optimising the vectorised case
        • Testing
        • Bit twiddling
  • 11. Writing efficient C++
    • 11.1. Inspecting the generated code
    • 11.2. Optimisation levels
    • 11.3. IPU hardware loops
      • 11.3.1. Prioritisation between hardware loops
      • 11.3.2. Hardware loop constraints
      • 11.3.3. rpt
      • 11.3.4. brnzdec
    • 11.4. Guiding the C/C++ compiler for better loop code generation
      • 11.4.1. Hardware loop generation
      • 11.4.2. __builtin_assume
      • 11.4.3. rptsize_t
      • 11.4.4. Pragma vectorize
    • 11.5. Restrict
    • 11.6. Alignment
    • 11.7. Vector math functions
    • 11.8. Memory intrinsics
    • 11.9. Builtins
    • 11.10. Inline assembly
    • 11.11. Intrinsics
  • 12. Profiling
    • 12.1. Profiling options
    • 12.2. Profile summary
      • 12.2.1. Printing from a Poplar program
      • 12.2.2. Command line conversion
    • 12.3. Storage categories
    • 12.4. Variable liveness
    • 12.5. Summary report format
  • 13. Environment variables
    • 13.1. Logging
      • 13.1.1. Logging level
      • 13.1.2. Logging destination
    • 13.2. Profiling output
    • 13.3. Setting options
  • 14. Application binary interface (ABI)
    • 14.1. Types
      • 14.1.1. Floating point types
      • 14.1.2. Structure types
      • 14.1.3. Bit fields
    • 14.2. Vertex calling convention
    • 14.3. Function calling convention
      • 14.3.1. Function parameters
      • 14.3.2. Return values
      • 14.3.3. Entry and exit
      • 14.3.4. Register assignments
    • 14.4. Stack frame
  • 15. Trademarks & copyright
Poplar and PopLibs User Guide

Search help

Note: Searching from the top-level index page will search all documents. Searching from a specific document will search only that document.

  • Find an exact phrase: Wrap your search phrase in "" (double quotes) to only get results where the phrase is exactly matched. For example "PyTorch for the IPU" or "replicated tensor sharding"
  • Prefix query: Add an * (asterisk) at the end of any word to indicate a prefix query. This will return results containing all words with the specific prefix. For example tensor*
  • Fuzzy search: Use ~N (tilde followed by a number) at the end of any word for a fuzzy search. This will return results that are similar to the search word. N specifies the “edit distance” (fuzziness) of the match. For example Polibs~1
  • Words close to each other: ~N (tilde followed by a number) after a phrase (in quotes) returns results where the words are close to each other. N is the maximum number of positions allowed between matching words. For example "ipu version"~2
  • Logical operators. You can use the following logical operators in a search:
    • + signifies AND operation
    • | signifies OR operation
    • - negates a single word or phrase (returns results without that word or phrase)
    • () controls operator precedence


Revision 9b1e5149.