1. Overview

1.1. About Bow Pod Systems

The Graphcore Bow™ Pod systems combine Bow-2000 IPU-Machines with network switches and a host server in a pre-qualified rack configuration that delivers from 5.577 petaFLOPS of AI compute (for the smallest Bow Pod, Bow Pod16). In addition, virtualization and provisioning software allow the AI compute resources to be elastically allocated to users and be grouped for both model-parallel and data-parallel AI compute.

The Bow Pod system is designed to make both training and inference of very large and demanding machine-learning models faster, more efficient, and more scalable. This enables very large and emergent models to be run most effectively.

The Bow Pod is constructed from a number of Bow-2000 IPU-Machines, each containing four IPUs. For example, the Bow Pod16 has four Bow-2000s (16 IPUs), and the Bow Pod64 is built from 16 Bow-2000s (64 IPUs).

Multi-rack Bow Pod systems are built from Bow Pod logical racks - a Bow Pod256 is built from four Bow Pod64 logical racks and contains 128 IPUs, and a Bow Pod512 is eight Bow Pod64 logical racks with 256 IPUs. The number of IPUs in a Bow Pod must be a power of 2, and greater than or equal to 16.

This getting started guide is aimed at a user who is ready to start using a Bow Pod system. This document describes what you need to do in order to run a simple program on an IPU. This includes installing the necessary software, establishing a connection with the management server on your Bow Pod, allocating IPUs to run your software on and an overview of the tools that you can use to monitor the hardware and software while running machine-learning jobs on your Bow Pod system.

Note

This document assumes you have access to a Bow Pod system that has been built and tested according to the relevant Bow Pod build and test guide. This includes that the V-IPU IPU management software (Section 1.3, V-IPU software) has been installed on the management server and on the Bow-2000 IPU-Machines.

1.2. Poplar SDK

The Bow Pod is fully supported by Graphcore’s Poplar SDK to provide a complete, scalable platform for accelerated machine intelligence development.

The Poplar SDK contains tools for creating and running programs on IPU hardware using standard machine-learning frameworks such as PyTorch and TensorFlow. The SDK contains PopTorch, a set of extensions for PyTorch to enable PyTorch models to run directly on Graphcore IPU hardware. It also contains a Graphcore distribution of TensorFlow 1 and TensorFlow 2.

The SDK also includes command line tools for managing IPU hardware.

1.3. V-IPU software

The Virtual-IPU™ (V-IPU™) IPU management software is used for allocating and configuring IPUs in the Bow Pod. The full V-IPU software consists of the following components:

  • V-IPU agents: An agent resides on each Bow-2000 in a Bow Pod system and manages the Bow-2000 hardware.

  • V-IPU controller: The V-IPU controller runs on a management node. It is responsible for managing V-IPU agents.

  • V-IPU command-line interface: Command line tools provide access to the administration and user functions of the V-IPU controller.

This document describes the installation of the command line interface (Section 2.2, Installing the V-IPU command-line tools). For more information about using the V-IPU software for the Poplar user role (data centre users, refer to the V-IPU User Guide) and for the IPU admin role (data centre administrators, refer to the V-IPU Administrator Guide).