Poplar and PopLibs User Guide
Version: 3.1.0
1. Introduction
2. Programming with Poplar
2.1. Poplar programming model
2.2. The structure of a Poplar program
2.2.1. Program flow control
Looping
Conditional execution
2.2.2. What happens at run time
2.3. Virtual graphs
2.4. Replicated graphs
2.4.1. Creating a replicated graph
2.5. Data streams and remote buffers
2.5.1. Data streams
Device-side streams
Host-side stream access
Stream buffer size limit
2.5.2. Optimising host data transfers
Prefetch
Multibuffering
2.5.3. Remote memory buffers
Remote buffer restrictions
2.6. IPU-Link and sync configuration
2.6.1. Link topologies
2.6.2. Sync groups
2.7. Device code
2.7.1. Pre-compiling codelets
3. Understanding vertices
3.1. The Vertex class
3.2. Vertex state
3.2.1. Vector and VectorList types
3.2.2. Allowed field types as vertex state
3.2.3. Specifying memory constraints
3.2.4. Stack allocation
3.3. MultiVertex worker threads
3.3.1. Thread safety
3.4. Calling conventions
3.4.1. External codelets
3.4.2. Recursion and function pointers
3.5. Vertex name mangling
4. Supported types
4.1. Scalar types
4.2. Floating point types
4.2.1. Half on the IPU
4.2.2. Half on the CPU
4.3. Vector types
4.4. Structure types
4.5. Bit fields
5. Vertex vector types
5.1. Parameters
5.1.1. Types
5.1.2. Layout
5.1.3. Minimum alignment
5.1.4. Interleaved memory
5.2. Memory layout for vectors
5.2.1. Pointer compression
SCALED_PTR32
SCALED_PTR64
SCALED_PTR128
5.2.2. Vector<T> layout
5.2.3. Vector<Input<Vector<T>>> layout
5.2.4. VectorList layout
DELTANELEMENTS layout
6. Using the Poplar library
7. The PopLibs libraries
7.1. Using PopLibs
8. PopLibs examples
8.1. Dynamic slicing and updating
8.1.1. Dynamic slice
8.1.2. Dynamic update
8.1.3. MultiSlice (embedding lookup)
9. Graphcore Communication Library (GCL)
10. Writing vertices in assembly
10.1. Instruction set overview
10.1.1. Supervisors and workers
10.1.2. Execution pipelines
10.2. Memory architecture
10.2.1. Getting information about the memory
10.2.2. Mk1 Colossus (GC2)
10.2.3. Mk2 Colossus (GC200)
10.2.4. Load and store instructions
10.3. Worker stack and scratch space
10.3.1. Specifying stack size
10.3.2. Examples
10.4. Vertex pipelines
10.4.1. Memory conflicts
10.4.2. Modified pipeline
10.4.3. Fill and drain
10.5. Assembly hints & tips
10.5.1. Using the assembler
Assembler macros
Labels
Recording the code size of the vertex
Place each vertex in a unique section
10.5.2. Architectural tips
Aligning repeat bodies
Over-reading and over-processing
Scratch space
Loading constants
10.5.3. Division by 6
Division on the IPU
Division on the host
10.5.4. General
Focus on optimising the vectorised case
Testing
Bit twiddling
11. Profiling
11.1. Profiling options
11.2. Profile summary
11.2.1. Printing from a Poplar program
11.2.2. Command line conversion
11.3. Storage categories
11.4. Variable liveness
11.5. Summary report format
12. Environment variables
12.1. Logging
12.1.1. Logging level
12.1.2. Logging destination
12.2. Profiling output
12.3. Setting options
13. Application binary interface (ABI)
13.1. Types
13.1.1. Floating point types
13.1.2. Structure types
13.1.3. Bit fields
13.2. Vertex calling convention
13.3. Function calling convention
13.3.1. Function parameters
13.3.2. Return values
13.3.3. Entry and exit
13.3.4. Register assignments
13.4. Stack frame
14. Trademarks & copyright
Poplar and PopLibs User Guide
Index
C
|
G
|
H
|
I
|
M
|
O
|
R
|
S
|
V
C
constant
controlCode
controlId
controlTable
copyDescriptor
G
globalExchangeCode
globalExchangePacketHeader
globalMessage
H
hostExchangeCode
hostExchangePacketHeader
hostMessage
I
instrumentationResults
internalExchangeCode
M
message
multiple
O
outputEdge
R
rearrangement
S
sharedCodeStorage
sharedDataStorage
stack
V
variable
vectorListDescriptor
vertexCode
vertexFieldData
vertexInstanceState