1. Introduction
Note
The information in this document applies to Graphcore Pod systems, which covers both IPU-POD systems (such as the IPU‑POD64 and IPU‑POD256) and Bow Pod systems (such as Bow Pod64 and Bow Pod256). The term IPU-Machine refers to the blades installed in your Pod, so IPU-M2000 in IPU-POD systems and Bow-2000 in Bow Pod systems.
PopRun is a command line utility to launch distributed applications on Graphcore Pod compute systems. Specifically, PopRun is used to create multiple instances. Each instance can either be launched on a single host server or multiple host servers within the same Pod, depending on the number of host servers available on the target Pod. Typically, an IPU‑POD64 or a Bow Pod64 is configured with one, two or four host servers.
On large Pod systems such as an IPU‑POD128 or a Bow Pod256, PopRun will automatically launch multiple instances on remote host servers. With remote host servers we mean servers that are physically located in another interconnected Pod. This makes PopRun a powerful tool for running applications at scale.
For PopRun to launch applications, the actual application must be written in a way so that it is able to take advantage of the additional compute power provided by many IPUs inside a Pod. Other motivating factors to use PopRun are:
PopRun lets you launch multiple instances of your application on one or more Pods.
Depending on your application, launching multiple instances may increase the performance of your application.
PopRun lets you perform careful placement of multiple instances on the host server to minimise NUMA effects.
PopRun is required if you wish to scale your application beyond a single IPU‑POD64 or a single Bow Pod64.
The Poplar Distributed Configuration Library (PopDist) provides a set of APIs that you can use to make your application ready for distributed execution. Command line parameters passed to PopRun are exposed to the developer and can be used to distribute the input/output data or other parts of the applications.
1.1. Installation
Both PopRun and PopDist come bundled with the Poplar SDK.
No additional installation is required.
The PopRun binary is called poprun
and can be found in the Poplar bin
directory of the Poplar SDK installation.
Attention
Before you proceed, make sure you have sourced the enable.sh
scripts as described in the getting started guide for your IPU system.
1.2. Validating the installation
To validate your Poplar SDK installation and thus make sure that poprun
is available, run the following command:
$ poprun --num-instances 2 --num-replicas 2 --offline-mode=1 echo "Hello world!"
If PopRun is setup properly, you should see Hello world! printed out twice, once per instance:
Hello world!
Hello world!
PopDist supports an external Python package called Horovod for distributed processing. See Section 4, Poplar distributed configuration library (PopDist) for more information.
The following sections will provide more detailed information regarding the various PopRun and PopDist features.
1.3. Replicas and instances
Distribution with PopRun and PopDist is based on the concepts of replicas and instances.
A replica is a feature of Poplar that allows you to create a number of identical copies of the same graph. Read more about replication in the Poplar documentation. Replication will also increase the amount of memory used for exchange code, which you can read more about in the Memory and Performance Optimisation guide.
An instance is an operating system process on the host that controls a subset of the replicas. Using multiple instances allows scaling to larger distributed systems as the replicas are divided among the instances. Each instance is only responsible for communicating with its local replicas. Among other things, this distributes the responsibility for feeding data to the replicas among the instances. Placing the instances on different host machines allows scaling out and gives access to more CPU resources that can be used for host processing like dataset preprocessing.
Note that an instance can spawn multiple threads. If your application is programmed to create multiple threads, care should be taken to not oversubscribe the number of host cores available to you. Oversubscribing the number of host cores will in most cases lead to performance degradation.
PopRun can be asked to print a visual illustration of the replicas
and instances by passing --print-topology=yes
. Here is an example
with 8 replicas and 2 instances (using a single host and a single
IPU-Link domain), in which each instance will control 4 replicas:
===========================================
| poprun topology |
|===========================================|
| hosts | localhost |
|-----------|-------------------------------|
| ILDs | 0 |
|-----------|-------------------------------|
| instances | 0 | 1 |
|-----------|-------------------------------|
| replicas | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
-------------------------------------------