4. Poplar distributed configuration library (PopDist)

This section provides information on how you can use the PopDist library to make the appropriate changes to your application so that it can be launched in a distributed environment using PopRun. In short, PopDist provides an API which you can use to write a distributed application. As the examples (Section 4.3, PopDist examples) will demonstrate, very few modifications are needed in order to prepare an application for distributed execution.

PopRun uses mpirun for multiple instance creation, while PopDist uses mpi for communication between multiple hosts. This is not to be confused with communication between IPUs, which is realised using the Graphcore Communication Library (GCL) over either the IPU-Links or the GW-Links.

There are several packages that enable mpi communication in Python applications, but we recommend Horovod due to its simplicity. More details about the Horovod package can be found in the Horovod documentation.

4.1. Horovod Installation

The implementation of Horovod that you install depends on your machine learning framework:

  • TensorFlow 1 and TensorFlow 2: You do not have to install Horovod separately as it has been bundled in the Graphcore TensorFlow wheel.

  • PyTorch: You must install the official Horovod package.

  • PopART: You must install the Graphcore implementation of Horovod that is included as a wheel file with the Poplar SDK.

Note

You must install and enable the Poplar SDK following the instructions in the Getting Started guide for your IPU system. Do this before installing Horovod to ensure that it uses the OpenMPI version that comes with the SDK.

Note

It is recommended to create Python virtual environments for each framework before installing any packages.

4.1.1. TensorFlow 1 and TensorFlow 2

Horovod is bundled in the Graphcore TensorFlow wheel and so you do not have to install it separately. For instructions on how to install the TensorFlow wheel, refer to the section on setting up TensorFlow for the IPU in the Getting Started guide for your IPU system.

Once you have installed the TensorFlow wheel, you can validate that Horovod has been installed correctly by using the code snippet in Listing 4.1.

Listing 4.1 Validating that the Horovod for TensorFlow has been installed correctly.
1
2
3
4
5
6
7
# Copyright (c) 2021 Graphcore Ltd. All rights reserved.

import popdist
import popdist.tensorflow
from tensorflow.python.ipu import horovod as hvd

hvd.init()

4.1.2. PyTorch

If you are using PyTorch, then you must install the official Horovod package if you plan to use mpi for host-side communication. Make sure you have activated your PyTorch virtual environment and installed PopTorch. For instructions on how to install PopTorch, refer to the Installation section in the PyTorch for the IPU: User Guide.

You can install the official Horovod package with the command:

$ pip install horovod

You can validate that Horovod has been installed correctly by using the code snippet in Listing 4.2.

Listing 4.2 Validating that the Horovod for PyTorch has been installed correctly.
1
2
3
4
5
6
7
8
# Copyright (c) 2021 Graphcore Ltd. All rights reserved.

import torch
import poptorch
import popdist
import horovod.torch as hvd

hvd.init()

4.1.3. PopART

If you are using PopART, then you must install the Graphcore implementation of Horovod.

$ pip install <sdk_path>/horovod.x.y.z.whl

Note

Since PopART has to use the version of Horovod that is bundled with the Poplar SDK, there is no need to validate the installation.

4.2. The PopDist API

PopDist contains some universal functions which are useful in all frameworks. They can be used for doing things such as the following dynamically, based on the arguments provided to PopRun:

  • Evaluating the global batch size

  • Sharding your dataset

  • Setting the size of buffer to use for shuffling

popdist.checkNumIpusPerReplica(expected)

Check if the IPUs per replica in the context matches the expected number of IPUs.

False is returned if the environment variable is set and does not match the given value. On the other hand, if the environment variable is not set, a check is performed to see if the number of IPUs per replica corresponds to the expected number of IPUs.

Returns

True if number of IPUs per replica equal the expected number of IPUs. False otherwise.

Parameters

expected (int) –

Return type

bool

popdist.getDeviceId(ipus_per_replica=None)

Gets the device id of device suitable for PopDist.

Returns

Returns the device id of a suitable device.

Parameters

ipus_per_replica (Optional[int]) –

Return type

int

popdist.getInstanceIndex()

Gets the index of the current instance.

Can only be used with a uniform number of replicas per instance.

Returns

instance index in [0, getNumInstances()).

Return type

int

popdist.getNumInstances()

Gets the total number of instances.

Can only be used with a uniform number of replicas per instance.

Returns

the total number of instances.

Return type

int

popdist.getNumIpusPerReplica()

Get the number of IPUs per replica.

A function that will try to infer the number of IPUs per replica from the environment variables. Will default to 1 if the environment variable cannot be determined.

Returns

1 if the environment variable cannot be inferred.

Return type

int

popdist.getNumLocalReplicas()

Gets the number of local replicas.

Infers the number of local replicas automatically from the environment variables.

Returns

1 if the environment variable is not set.

Return type

int

popdist.getNumTotalReplicas()

Get the total number of replicas.

Infer the total number of replicas from environment variables. Will default to 1 if the environment variable cannot be determined.

Returns

1 if the environment variable cannot be inferred.

Return type

int

popdist.getReplicaIndexOffset()

Gets the replica index offset.

The replica index corresponds to the offset of the first replica in an instance.

Returns

0 if no environment variable is set.

Return type

int

popdist.isPopdistEnvSet()

Check if the PopDist environment is set.

Returns

True if set.

Return type

bool

popdist.isUniformReplicasPerInstance()

Checks if the number of replicas per instance is uniform.

Automatically inferred from the environment variables passed to PopDist.

Returns

True if the number of replicas per instance is the same for all instances or if no environment is set.

Return type

bool

4.2.1. PopART

popdist.popart.configureSessionOptions(opts)

Configure PopART session options to work with the PopDist context.

Parameters

opts (popart.SessionOptions) –

Return type

None

popdist.popart.getDevice(ipusPerReplica)

Get a PopART device that works with the PopDist context.

Returns

An attached device.

Parameters

ipusPerReplica (int) –

Return type

popart.DeviceInfo

4.2.2. PopTorch

class popdist.poptorch.Options(*args, **kwargs)

An extension to PopTorch’s Options class so that it is easier to pass application-specific options to PopDist.

4.2.3. TensorFlow 1 and 2

popdist.tensorflow.set_ipu_config(config, ipus_per_replica, configure_device=True)

Set the PopDist configuration options for TensorFlow.

Parameters
  • config – An IPUConfig instance created with tensorflow.python.ipu.config.IPUConfig() - or IpuOptions configuration protobuf created with tensorflow.python.ipu.utils.create_ipu_config() (deprecated) - to update.

  • ipus_per_replica (int) – The number of IPUs per replica.

  • configure_device (bool) – Whether to update config to select the IPU device for PopDist execution.

Returns

The passed config.

4.3. PopDist examples

In this section we will detail how you can use PopDist to distribute your application. Examples for PyTorch and TensorFlow are shown.

4.3.1. PyTorch

The code example below outlines the most common steps involved in adding PopDist to an application. For the sake of brevity, the example code shown below assumes that the application is launched using PopRun like this:

$ poprun --vipu-partition=MyPartition --vipu-server-host=127.0.0.1
         --num-replicas=4 --num-instances=1 -v python main.py
Listing 4.3 Simple PopDist Example with PyTorch
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# Copyright (c) 2021 Graphcore Ltd. All rights reserved.

import torch
import poptorch
import popdist
import popdist.poptorch
import horovod.torch as hvd


def init_popdist(args):
    hvd.init()
    args.use_popdist = True
    if popdist.getNumTotalReplicas() != args.replicas:
        print(f"The number of replicas is overridden by PopRun. "
              f"The new value is {popdist.getNumTotalReplicas()}.")
    args.replicas = int(popdist.getNumLocalReplicas())

    args.popdist_rank = popdist.getInstanceIndex()
    args.popdist_size = popdist.getNumInstances()


def create_model(opts):
    if opts.use_popdist:
        model_opts = popdist.poptorch.Options()
    else:
        model_opts = poptorch.Options()

    return model_opts


if __name__ == '__main__':

    opts = command_line_arguments()

    # Initialise PopDist
    if popdist.isPopdistEnvSet():
        init_popdist(opts)
    else:
        opts.use_popdist = False

    create_model(opts)

The PopTorch PopDist package is imported on line 5-6, while Horovod is imported on line 7. Next, the PopRun command line parameters are parsed and passed to init_popdist, which is used to initialise both Horovod and PopDist.

Note

The use of Horovod is optional. It is not a requirement to initialise PopDist.

Some of the applications in our GitHub examples repository, such as our PyTorch CNNs training application make use of PopDist and PopRun.

4.3.2. TensorFlow 1

When using PopDist with TensorFlow 1, there is an additional class and function that you need to use that are not part of the PopDist API:

  • PopDistStrategy: This is a context manager for targeting the IPU in a distributed manner.

  • tf.data.Dataset.shard(): This allows you to split your dataset between instances using The PopDist API methods. This is done as follows:

    dataset = dataset.shard(num_shards=popdist.getNumInstances(), index=popdist.getInstanceIndex())
    

    The data is automatically split between replicas within each instance, so no further manual splitting is required.

See the TensorFlow 1 PopDist feature example.

Our TensorFlow CNNs training application also makes use of PopDist and PopRun.

4.3.3. TensorFlow 2

When using PopDist with TensorFlow 2, there is an additional class and function that you need to use that are not part of the PopDist API:

  • PopDistStrategy: This is a new context manager for targeting the IPU in a distributed manner.

  • tf.data.Dataset.shard(): This allows you to split your dataset between instances using The PopDist API methods. This is done as follows:

    dataset = dataset.shard(num_shards=popdist.getNumInstances(), index=popdist.getInstanceIndex())
    

    The data is automatically split between replicas within each instance, so no further manual splitting is required.

See the TensorFlow 2 PopDist feature example.

Note

When using Keras methods such as tf.keras.Model.fit() in a multi-instance session, the progress bar that summarises the metrics and loss for each epoch is a summary across all replicas. Do not be alarmed that the logs of each instance are claiming to process the whole epoch, each instance only processes its respective dataset shard.

Note

When using tf.keras.Model() to define your model, the batch_size argument for tf.keras.Input() expects the global batch size. You do not need to explicitly provide this parameter, but if you do, be aware of this behaviour.

4.4. Conclusion

A PopDist application can be organised into the following parts:

1. Parsing phase: where PopRun runtime command line parameters are parsed and handled.

2. Initialisation phase: where much needed variables such as rank and size are stored by making PopDist API calls. If MPI is to be used between the various instances, Horovod is also initialised.

3. Digestion phase: where PopDist variables are actively used to make the application distributed.

The steps above are just rough outlines and may vary from application to application.