1. Overview

The IPU‑POD64 reference design is a rack solution containing 16 IPU-M2000s, one to four host servers (the default is one host server in the reference configuration), network switches and IPU-POD software. There are 64 GC200 IPUs in total with four IPUs in each IPU-M2000. For more information on IPU-POD systems available from Graphcore see https://www.graphcore.ai/products.

Warning

This guide is for properly trained service personnel and technicians who are required to install the IPU‑POD64.

If you have any questions then please contact your Graphcore representative or use the resources on the Graphcore support portal: https://www.graphcore.ai/support.

1.1. Acronyms and abbreviations

This is a short list that describes some of the most commonly used terms in this document.

Table 1.1 Glossary

Term

Description

AOC

Active optical cable

BMC

Baseboard Management Controller. Standby power domain service processor doing system hardware management.

BOM

Bill of Materials

GCD

A graph compile domain is operated by a single poplar instance within the system, either for a single IPU-M2000 or for several IPU-M2000s connected by IPU-Link cables.

GW-Link

High speed (100 GbE) communication links that connect IPU-M2000s horizontally across IPU‑POD64 racks. Special cables are required for GW-Links.

IPU-Gateway

A device that disaggregates the server(s) and the four IPUs in the IPU-M2000 across a RoCE network, provides external IPU memory, and enables IPU scaleout across 100 GbE connections (GW-Links) for rack-to-rack connectivity.

IPU-Link

High speed communication links that connect IPUs both within and between IPU-M2000s in a Pod. Special cables are required for IPU-Links.

PDU

Power Distribution Unit

RDMA

Remote DMA

RNIC

RDMA Network Interface Controller

RoCE

RDMA over converged Ethernet

ToR

Top of Rack. Often also used as a term for the ToR RDMA switch that is placed on top of the IPU-M2000s.