Overview

This guide is for properly trained service personnel and technicians who are required to install IPU-POD Direct Attach (DA) systems.

This guide covers IPU‑POD4 and IPU‑POD16 DA systems.

Warning

Only qualified personnel should install, service, or replace the equipment described in this document.

Acronyms and abbreviations

This is a short list that describes some of the most commonly used terms in this document.

Table 1 Glossary

BMC

Baseboard Management Controller – standby power domain service processor providing system hardware management

BOM

Bill of Materials

DA

Direct Attach. In a DA system the IPU-M2000s are connected directly to the server without the use of a switch. The V-IPU configuration software runs on one of the attached IPU-M2000s

EA

Early access

GW

Short for IPU-Gateway, a device that disaggregates the Server and the four IPUs in the IPU-M2000 across a RoCE network, provides external IPU Exchange Memory, and enables IPU scaleout across 100GbE (IPU-GW-link) for rack-to-rack connectivity

GCD

A graph compile domain is operated by a single Poplar Instance within the system, either within a single IPU-M2000 or within several IPU-M2000s connected by IPU-Link cables

GSD

Graph scaleout domain. The set of IPUs used to execute a program, consisting of one or more GCDs. The “GSD size” is the number of IPUs in the GSD

IPU-Link

High speed communication links that interconnect IPUs within and between IPU-M2000s. Special cables are required for IPU-Links between IPU-M2000s

PDU

Power Distribution Unit

RDMA

Remote DMA

RNIC

RDMA Network Interface Controller

RoCE

RDMA over converged Ethernet

System summary

The IPU-M2000 is a 1 rack unit (RU) compute platform delivering 1 petaFLOPS (FP16.16) of AI compute. It contains 4 Colossus GC200 IPUs with 3.6GB In-Processor-Memory™ and is pre-configured with 128GB (2x64GB) Streaming Memory™, 1x 100GbE RoCEv2 NIC card for host server connectivity and 1TB of NVMe M.2 SSD. In addition, the IPU-M2000 has connectors for the IPU-Fabric™ that provide high speed interfaces (total 2.8Tbps) for connecting to other IPU-M2000s.

_images/m2000.png

Fig. 1 IPU-Machine: M2000

An installed and fully operational IPU-POD DA system will consist of:

  • The IPU-POD Direct Attach AI compute platform

    • Configuration 1: IPU‑POD4 DA (single IPU-M2000 with 4 IPUs and a direct attached server)

    • Configuration 2: IPU‑POD16 DA (four IPU-M2000s with 16 IPUs and a direct attached server)

    • Additional options to be announced

  • Pre-installed and configured Virtual IPU (V-IPU) management software with embedded management through a web UI that offer easy installation and integration with pre-existing infrastructure

  • The Graphcore Poplar SDK software stack to be downloaded and installed on the host server

  • Approved host server pre-qualified by preferred channel partner

Please note that pre-integrated and qualified systems and integration support are available from channel partners.

An example IPU‑POD4 DA system is illustrated in the diagram below:

_images/m2000_DA_x1.png

Fig. 2 IPU‑POD4 DA system

All the IPU-POD DA configuration options are all fully supported by Graphcore’s Poplar® software development environment, providing a complete scalable platform for accelerated development. Existing ML frameworks such as TensorFlow, ONNX, and PyTorch are fully supported as well as industry standard converged infrastructure management tools including Open BMC, Redfish, IPMI, Docker containers, and orchestration with Slurm and Kubernetes. The PopVision™ visualisation and analysis tools provide monitoring of performance across one or more IPUs - the graphical analysis enables detailed inspection of all processing activities.

See the “IPU-POD Direct Attach Getting Started Guide” and the “Poplar and PopLibs User Guide” on the documentation page (https://docs.graphcore.ai/) for details of Poplar installation and use.

Pictures of a complete IPU‑POD16 DA system are shown below:

_images/m2000_DA_x4_front.png

Fig. 3 Front view (cold isle)

_images/m2000_DA_x4_back.png

Fig. 4 Rear view (hot isle)

Note

Cable colours may differ.