Poplar and PopLibs User Guide
Version: 2.6.0
1. Introduction
2. Programming with Poplar
2.1. Poplar programming model
2.2. The structure of a Poplar program
2.2.1. Program flow control
Looping
Conditional execution
2.2.2. What happens at run time
2.3. Virtual graphs
2.4. Replicated graphs
2.4.1. Creating a replicated graph
2.5. Data streams and remote buffers
2.5.1. Data streams
Device-side streams
Host-side stream access
Stream buffer size limit
2.5.2. Optimising host data transfers
Prefetch
Multibuffering
2.5.3. Remote memory buffers
Remote buffer restrictions
2.6. IPU-Link and sync configuration
2.6.1. Link topologies
2.6.2. Sync groups
2.7. Device code
2.7.1. Pre-compiling codelets
3. Understanding vertices
3.1. The Vertex class
3.2. Vertex state
3.2.1. Vector and VectorList types
3.2.2. Allowed field types as vertex state
3.2.3. Specifying memory constraints
3.2.4. Stack allocation
3.3. MultiVertex worker threads
3.3.1. Thread safety
3.4. Calling conventions
3.4.1. External codelets
3.4.2. Recursion and function pointers
3.5. Vertex name mangling
4. Supported types
4.1. Scalar types
4.2. Floating point types
4.2.1. Half on the IPU
4.2.2. Half on the CPU
4.3. Vector types
4.4. Structure types
4.5. Bit fields
5. Vertex vector types
5.1. Parameters
5.1.1. Types
5.1.2. Layout
5.1.3. Minimum alignment
5.1.4. Interleaved memory
5.2. Memory layout for vectors
5.2.1. Pointer compression
SCALED_PTR32
SCALED_PTR64
SCALED_PTR128
5.2.2. Vector<T> layout
5.2.3. Vector<Input<Vector<T>>> layout
5.2.4. VectorList layout
DELTANELEMENTS layout
DELTAN layout
COMPACT_DELTAN layout
6. Using the Poplar library
7. The PopLibs libraries
7.1. Using PopLibs
8. Graphcore Communication Library (GCL)
8.1. Example
8.2. Topologies
8.2.1. Physical topologies
8.2.2. Logical topologies
8.2.3. Relationship between logical and physical topologies
9. Writing vertices in assembly
9.1. Instruction set overview
9.1.1. Supervisors and workers
9.1.2. Execution pipelines
9.2. Memory architecture
9.2.1. Getting information about the memory
9.2.2. Mk1 Colossus (GC2)
9.2.3. Mk2 Colossus (GC200)
9.2.4. Load and store instructions
9.3. Worker stack and scratch space
9.3.1. Specifying stack size
9.3.2. Examples
9.4. Vertex pipelines
9.4.1. Memory conflicts
9.4.2. Modified pipeline
9.4.3. Fill and drain
9.5. Assembly hints & tips
9.5.1. Using the assembler
Assembler macros
Labels
Recording the code size of the vertex
Place each vertex in a unique section
9.5.2. Architectural tips
Aligning repeat bodies
Over-reading and over-processing
Scratch space
Loading constants
9.5.3. Division by 6
Division on the IPU
Division on the host
9.5.4. General
Focus on optimising the vectorised case
Testing
Bit twiddling
10. Profiling
10.1. Profiling options
10.2. Profile summary
10.2.1. Printing from a Poplar program
10.2.2. Command line conversion
10.3. Storage categories
10.4. Variable liveness
10.5. Summary report format
11. Environment variables
11.1. Logging
11.1.1. Logging level
11.1.2. Logging destination
11.2. Profiling output
11.3. Setting options
12. Application binary interface (ABI)
12.1. Types
12.1.1. Floating point types
12.1.2. Structure types
12.1.3. Bit fields
12.2. Vertex calling convention
12.3. Function calling convention
12.3.1. Function parameters
12.3.2. Return values
12.3.3. Entry and exit
12.3.4. Register assignments
12.4. Stack frame
13. Legal notices
Poplar and PopLibs User Guide
Please activate JavaScript to enable the search functionality.