1. IPU-POD overview

The IPU-POD™ is designed to make both training and inference of very large and demanding machine-learning models faster, more efficient, and more scalable. This enables very large and emergent models to be run most effectively.

The IPU-POD is constructed from a number of IPU-M2000s, each containing four IPUs. For example, the IPU-POD16 has four IPU-M2000s (16 IPUs), and the IPU-POD64 is built from 16 IPU-M2000s (64 IPUs).

Multi-rack IPU-POD systems are built from IPU‑POD64 racks - an IPU‑POD128 is built from two IPU‑POD64 racks and contains 128 IPUs, and an IPU‑POD256 is four IPU‑POD64 racks with 256 IPUs. The number of IPUs in an IPU-POD must be a power of 2, and greater than or equal to 16.

IPU-Links provide communication between the IPUs in an IPU-M2000 and also between the IPU-M2000s in an IPU-POD. The IPU-Gateway in the IPU-M2000 uses GW-Links for high-speed, low-latency communication between IPU-POD racks; this is required for multi-rack systems such as the IPU‑POD128 and IPU‑POD256.

1.1. V-IPU software

The Virtual-IPU™ (V-IPU™) IPU management software is used for allocating and configuring IPUs in the IPU-POD. It has command line support for both the Poplar user role (data centre users, described in the V-IPU user guide) and IPU admin role (data centre administrators, described in the V-IPU admin guide).

It consists of the following components:

  • V-IPU agents: An agent resides on each IPU-M2000 in an IPU system and manages the IPU-M2000 hardware.

  • V-IPU controller: The V-IPU controller runs on a management node. It is responsible for managing V-IPU agents.

  • V-IPU command-line interface: Command line tools provide access to the administration and user functions of the V-IPU controller.

This document assumes you have access to an IPU-POD that has the V-IPU agents and controller installed and configured.

The V-IPU software should already be installed on the IPU-POD. You can check this with your system administrator. You will also need to ask them for the information you need to connect to the IPU-POD (see Section 3, Getting started with V-IPU).

If you are the system administrator, refer to the V-IPU Admin Guide for information on installing and using the V-IPU software.

1.2. Poplar SDK

The IPU-POD is fully supported by Graphcore’s Poplar SDK to provide a complete, scalable platform for accelerated machine intelligence development.

The Poplar SDK contains tools for creating and running programs on IPU hardware using standard machine-learning frameworks such as PyTorch and TensorFlow. It also includes command line tools for managing IPU hardware.