1. Overview

This guide is for properly trained service personnel and technicians who are required to install Bow Pod Direct Attach (DA) systems such as the Bow Pod16.

Warning

Only qualified personnel should install, service, or replace the equipment described in this document.

Note

Graphcore Pod systems include both IPU-POD systems (such as the IPU‑POD64 and IPU‑POD256) and Bow Pod systems (such as Bow Pod64 and Bow Pod256). The term IPU-Machine refers to the blades installed in your system, so IPU-M2000 in IPU-POD systems and Bow-2000 in Bow Pod systems.

1.1. Acronyms and abbreviations

This is a short list that describes some of the most commonly used terms in this document.

Table 1.1 Glossary

Term

Description

AOC

Active optical cable

BMC

Baseboard Management Controller. Standby power domain service processor doing system hardware management.

BOM

Bill of Materials

GCD

A graph compile domain is operated by a single poplar instance within the system, either for a single IPU-Machine or for several IPU-Machines connected by IPU-Link cables.

GW-Link

High speed (100 GbE) communication links that connect IPU-Machines horizontally across Bow Pod64 racks. Special cables are required for GW-Links.

IPU-Gateway

A device that disaggregates the server(s) and the four IPUs in the IPU-Machine across a RoCE network, provides external IPU memory, and enables IPU scaleout across 100 GbE connections (GW-Links) for rack-to-rack connectivity.

IPU-Link

High speed communication links that connect IPUs both within and between IPU-Machines in a Pod. Special cables are required for IPU-Links.

IPU-Machine

The term IPU-Machine refers to the blades installed in your system, so IPU-M2000 in IPU-POD systems and Bow-2000 in Bow Pod systems.

Pod

The term Pod covers both IPU-POD systems (such as the IPU‑POD64 and IPU‑POD256) and Bow Pod systems (such as Bow Pod64 and Bow Pod256).

PDU

Power Distribution Unit

RDMA

Remote DMA

RNIC

RDMA Network Interface Controller

RoCE

RDMA over converged Ethernet

ToR

Top of Rack. Often also used as a term for the ToR RDMA switch that is placed on top of the IPU-Machines.

1.2. System summary

The Bow-2000 is a 1 rack unit compute platform delivering nearly 1.4 petaFLOPS (FP16.16) of AI compute. It contains 4 Bow IPUs with 3.6GB In-Processor-Memory™ and is pre-configured with 128GB (2x64GB) Streaming Memory™, 1x 100GbE RoCEv2 NIC card for host server connectivity and 1TB of NVMe M.2 SSD. In addition, the Bow-2000 has connectors for the IPU-Fabric™ that provide high speed interfaces (total 2.8Tbps) for connecting to other Bow-2000s.

_images/Bow2000.png

Fig. 1.1 IPU-Machine: Bow-2000

An installed and fully operational Bow Pod16 DA system will consist of:

  • The Bow Pod16 Direct Attach AI compute platform with four Bow-2000s (16 Bow IPUs)

  • Pre-installed and configured Virtual IPU (V-IPU) management software with embedded management through a web UI that offers easy installation and integration with pre-existing infrastructure

  • The Graphcore Poplar SDK downloaded and installed on the host server

  • Approved host server pre-qualified by preferred channel partner (see the approved servers list for more details)

Please note that pre-integrated and qualified systems and integration support are available from channel partners.

An example Bow Pod16 DA system is illustrated in Fig. 1.2.

_images/bow16-da-system.png

Fig. 1.2 Bow Pod16 DA system

Bow Pod DA systems are fully supported by Graphcore’s Poplar® software development environment, providing a complete scalable platform for accelerated development. Existing ML frameworks such as TensorFlow, ONNX, and PyTorch are fully supported as well as industry standard converged infrastructure management tools including Open BMC, Redfish, IPMI, Docker containers, and orchestration with Slurm and Kubernetes. The PopVision™ visualisation and analysis tools provide monitoring of performance across one or more IPUs - the graphical analysis enables detailed inspection of all processing activities.

See the Bow Pod Getting Started Guide and the Poplar and PopLibs User Guide on the Graphcore documentation portal for details of Poplar installation and use.

Pictures of a complete Bow Pod16 DA system are shown in Fig. 1.3 and Fig. 1.4.

_images/m2000_DA_x4_front.png

Fig. 1.3 Front view (cold isle)

_images/m2000_DA_x4_back.png

Fig. 1.4 Rear view (hot isle)

Note

Cable colours may differ.