2. IPU‑POD128 design components

This section describes the components in the IPU‑POD128. Each IPU‑POD128 is made from two IPU‑POD64 logical racks with GW-Links connected between them.

2.1. IPU‑POD64 components

Each IPU‑POD64 has the following:

2.2. IPU-M2000s

2.2.1. Overview

There are 32 IPU-M2000s in each IPU‑POD128 (16 in each IPU‑POD64) making a total of 128 IPUs: 4 IPUs per IPU-M2000). The IPU-M2000 front panel contains:

  • 2 RNIC ports

  • 8 IPU-Link ports

  • 2 Management GbE ports (BMC/IPU-Gateway management ports)

  • 2 GW-Link ports

  • 8 Sync-Link ports

  • 3 LED indicators

_images/chassis_front.png

Fig. 2.1 Front panel

The IPU-M2000 back panel contains:

  • 2 power connectors per IPU-M2000

  • 5 fan units

  • 5 LED indicators

  • Unit QR code

_images/chassis_back.png

Fig. 2.2 Back panel

2.2.2. QR code label

There is a QR code label on the back panel of each IPU-M2000. The QR code contains the following information for each IPU-M2000:

  • Company name (Graphcore)

  • Serial number

  • Part number

  • BMC Ethernet MAC address

  • IPU-Gateway Ethernet MAC address

  • URL for Graphcore support portal

2.2.3. LED indicators

The IPU-M2000 has LED indicators on both sides of the chassis.

Rear side LEDs

The rear side LEDs (Fig. 2.3) indicate the state of the 5 fans on the IPU-M2000. All the indicators should normally be off. A lit LED (amber) indicates a fan module fault and the corresponding fan module should be replaced as soon as possible to maintain maximum cooling.

_images/m2000-led-indicators-1a.jpg

Fig. 2.3 Rear side LED indicators

Front side LEDs

The front side LEDs indicate the status of the IPU-M2000. Fig. 2.4 and Table 2.1 show the colour scheme and indications.

_images/m2000-led-indicators-2a.png

Fig. 2.4 Front side LEDs

Table 2.1 Front LED indicators

LED

Colour

Function

1

Green

“OK”, ”Normal”, ”Satisfactory operation”, ”Active”, or “In service”

10 Hz: BMC running on flash (instruction fetch from flash)

2 Hz: BMC running on DRAM without interrupt enabled (instruction fetch from DRAM)

0.5 Hz: BMC running on DRAM with interrupt enabled (system in standby mode)

0.1 Hz: BMC abnormal mode, some interrupts are not serviced for over 2 seconds

Steady green light: System operational

2

Amber

“Attention” or “Service action required”

3

White

“Here I am”,”This is the item being sought” or “Unit ID”

2.3. Server

The default configuration of each IPU‑POD64 uses a single PowerEdge R6525 server but up to four servers can be used. Contact Graphcore sales for details of other supported server types. This document describes the default server (PowerEdge R6525) installation only. Other servers may have different installation requirements.

The default server configuration is described in Section 4.1, Server configuration.

Since there is at least one server per IPU‑POD64 there will be a minimum of two servers in the IPU‑POD128.

2.4. Switches

Each IPU‑POD64 contains two network switches serving different purposes.

2.4.1. 100GE RoCE/RDMA switch (ToR switch)

The 100GbE RoCE/RDMA switch (also referred to as the ToR switch) is used by the end user’s machine learning (ML) jobs as a data-plane, connecting the host servers running the Poplar® SDK with the IPUs running the ML model in the IPU-M2000s. The default ToR switch is an Arista DCS-7060CX-32S-F. Contact Graphcore sales for details of other supported switch types. This document describes the default switch (7060CX) installation only. Other switches may have different installation requirements.

2.4.2. 1GE management switch

The 1GbE management switch is used for connecting the management ports together inside the rack. The default management switch is an Arista DCS-7010T-48-F. Contact Graphcore sales for details of other supported switch types. This document describes the default switch (7010T) installation only. Other switches may have different installation requirements.

2.5. Power distribution units

Two power distribution units (PDUs) are installed in each IPU‑POD64. The default unit is an APC AP8886.

2.6. Rack

The IPU-M2000s, servers, switches, and PDUs for each of the two IPU‑POD64 racks are installed in an APC AR3300SP rack. This rack has a packing system designed to safely transport and unload the rack.

It is important to follow the instructions carefully when packing or unpacking the rack.

2.7. Supplementary mounting components

The supplementary components listed below also need to be installed in each rack.

  • Cable organizer

  • Blanking panel

2.8. Cables

Each of the two IPU‑POD64 racks has three types of cabling:

  1. RJ45 cables

  2. OSFP cables

  3. QSFP cables

There are also GW-Link cables between the two IPU‑POD64 racks.

2.8.1. RJ45 cables

  • Red: IPU-M2000 to IPU-M2000 within-rack IPU-Link connectivity

  • Blue: Connecting IPU-M2000s to the management switch (BMC + IPU-Gateway management)

  • Blue: Connecting servers to the management switch

  • Yellow: Connecting IPU-M2000s to the management switch (BMC only management)

2.8.2. OSFP cables

  • IPU-M2000 to IPU-M2000 (IPU-Link) connectivity

2.8.3. QSFP cables

  • IPU-M2000 to ToR switch connectivity

  • For server to ToR switch connectivity

All IPU‑POD64 cable connections are described in Section 3, IPU-POD64 rack assembly.

2.9. Connecting cables between IPU‑POD64 logical racks

The GW-Link cables used to connect multiple IPU‑POD64 logical racks together are optical Ethernet cables, which means that the racks do not have to be installed adjacent to each other in the datacentre. This cabling is described in Section 7, IPU-POD128 installation.