A Dictionary of Graphcore Terminology

Glossary

Batch serialisation

A batch or micro batch of samples is normally processed in parallel. With batch serialisation, the (micro) batch is divided into sub-batches (based on a batch serialisation factor) and only a sub-batch of samples is processed in parallel. A sequence of these sub-batches is processed serially.

Batch size

See Compute batch size, Global batch size, Replica batch size and Micro batch size.

Bow

Next generation (Mk2) Colossus IPU using a 3D wafer-on-wafer design to improve performance with increased power delivery and clock speed. The Bow IPU has 1,472 tiles, each with 624 KB of In-Processor-Memory (900 MB total In-Processor-Memory) and FP16.16 AI compute of 350 teraFLOPS.

Bow Pod

A collection of interconnected Bow-2000 IPU-Machines. A Bow Pod₁₆ is in direct attach mode and has no switches and runs the management software on one of the IPU-Machines. Larger Bow Pod systems (Bow Pod₆₄ onwards) are switched systems with one or more servers and networking switches. A Bow Pod allows all the IPUs in the Bow-2000 IPU-Machines to communicate and synchronize using IPU-to-IPU connections. The IPUs can be partitioned into “virtual Pods” using the V-IPU software.

For more information:

Bow-2000

IPU-Machine: Bow-2000

Bow-2000 A 1U IPU-Machine containing four Bow IPUs providing 1.39 petaFLOPS of compute, up to 260 GB memory, 2.8 Tbps low-latency IPU-Fabric interconnect, and an IPU-Gateway that supports host disaggregation. Up to 4 Bow-2000s can work as a direct attached system, or larger numbers of Bow-2000s can be built into a switched rack system as a Bow Pod.

For more information:

Bow-2000 IPU-Machine Datasheet

BSP

Bulk-synchronous parallel

A programming methodology for parallel algorithms which is used on the IPU. Execution for the IPU consists of supersteps, each made up of three phases: synchronization, communication and local compute.

Leslie G. Valiant. 1990. A bridging model for parallel computation. Commun. ACM 33, 8 (August 1990), 103-111. DOI=10.1145/79173.79181.

For a more general introduction see Bulk synchronous parallel on Wikipedia.

C2

Graphcore’s dual IPU PCIe card with two GC2 Colossus IPUs. Provides performance of 250 teraFLOPS of mixed precision compute with 192 GB/s IPU-Link bandwidth between IPUs, 128 GB/s card to card IPU-Links. Maximum power consumption is 300 W.

C600

An IPU-Processor PCIe card targeted at machine-learning inference applications. Has a single Mk2 IPU with FP8 support.

Cluster

A logical grouping of IPUs.

Codelet

A piece of code that defines the inputs, outputs and internal state of a vertex. Contains a compute() function that defines the behaviour of the vertex.

Colossus

The current architecture of the Graphcore IPU. It consists of an array of thousands of IPU Tiles with In-Processor-Memory and IPU-Links for IPU-to-IPU communication. It is designed for parallel processing using the BSP model.

Colossus is available as the GC2 and GC200. It is named in honour of the Colossus computer used for code breaking at Bletchley Park.

Compute batch size

The number of samples for which activations/gradients are computed in parallel. This will be the same as the micro batch size unless batch serialisation is used.

Compute set

A set of vertices that are executed in parallel during the BSP compute phase.

Direct attach

One or more IPU-Machines can be used in “direct attach” mode where the IPU-Machines are directly controlled from the user’s computer. The IPU-specific part of the V-IPU software runs on an IPU-Machine, rather than on a separate server.

Dynamic graph

A dynamic graph can be a graph where the shape of tensors within the graph are determined at runtime (see Dynamic shape) or a graph where execution is via dynamic dispatch (see Eager execution).

Dynamic shape

The input to a graph can have variable length or shape. For models such as BERT, the maximum sequence size is known but the actual input is dynamic within that range.

Eager execution

Dynamic graph execution (or dispatch) where each operation is individually compiled, dispatched and executed when required.

Edge

The edges of a computation graph define the connections between elements of tensors and the vertices of the graph.

Exchange

Communication phase of a superstep, where data is communicated between tiles and between IPUs. An exchange can be:

Internal exchange (between tiles on the same IPU)
External exchange (outside the IPU)

Exchange fabric

The communication network used to transfer data between tiles within an IPU.

External exchange

An exchange phase where data is communicated between tiles and memory outside the IPU. This can be:

Inter-IPU exchange (between IPUs; also known as global exchange)
Host exchange (between IPUs and the host)
Streaming Memory exchange (between IPUs and Streaming Memory)

Search help

A Dictionary of Graphcore Terminology

Glossary