1. Introduction

This guide explains how you can run applications in Docker on a Linux machine with one or more physical IPU devices.

Prerequisites:

A machine with IPU devices
Ubuntu 18.04 / CentOS 7.6

2. Initial setup

First check if your machine has the IPU device driver installed. You can check this is loaded and running with the following command:

$ modinfo ipu_driver

If the driver is are installed and running, you should see something similar to:

$ modinfo ipu_driver
filename:       /lib/modules/4.15.0-55-generic/updates/dkms/ipu_driver.ko
version:        1.0.39
description:    IPU PCI Driver
author:         Graphcore Limited
license:        GPL
srcversion:     49FFB7D8556EB58899AE41A
alias:          pci:v00001D95d00000003sv*sd*bc*sc*i*
alias:          pci:v00001D95d00000002sv*sd*bc*sc*i*
alias:          pci:v00001D95d00000001sv*sd*bc*sc*i*
depends:
retpoline:      Y
name:           ipu_driver
vermagic:       4.15.0-55-generic SMP mod_unload
parm:           memmap_start:array of ulong
parm:           memmap_size:array of ulong

If so, proceed to the next section. If it returns an error along the lines of:

$ modinfo ipu_driver
modinfo: ERROR: Module ipu_driver not found.

You will need to install the driver. See the Getting Started Guide for your IPU system for more information.

3. Using gc-docker

The Graphcore Poplar SDK includes some command line tools for managing the IPU system.

The gc-docker command is a small wrapper for the command docker run which adds the correct flags to use a set of IPU devices inside a running container.

If this is not on your path, you will need to go to the Poplar installation directory and enable the command-line tools:

$ cd [poplar-installation-path]
$ source enable.sh

This must be done in each shell. Alternatively, you can run the following command to automatically source it in all new Bash login shells:

$ echo 'source [full-path-to-extracted-poplar]/enable.sh' >> ~/.bash_profile

3.1. Loading docker images

First, download the Poplar image bundle from the Graphcore Downloads portal.

Then load the bundle into your local Docker daemon:

$ docker load --input=poplar-docker-images-1.3.0.tar.gz

Check the images have loaded and had tags applied. For example (output trimmed):

$ docker images
REPOSITORY            TAG              IMAGE ID
graphcore/tools       1.3.0            c2f5ebc91d4b
graphcore/tensorflow  1                6175c27cb631
graphcore/tensorflow  1-amd            6175c27cb631
graphcore/tensorflow  2                ae99a3fd3181
graphcore/tensorflow  2-amd            ae99a3fd3181
graphcore/tensorflow  1-intel          db5fef31303d
graphcore/tensorflow  2-intel          cb3b3a41321e
graphcore/pytorch     1.3.0            d84478558ab0
graphcore/poplar      1.3.0            c744278a89b2
ubuntu                bionic-20200903  c14bccfdea1c

graphcore/tools: contains tools to interact with IPU devices.
graphcore/poplar: contains Poplar and PopART.
graphcore/tensorflow: is based on graphcore/poplar, with TensorFlow installed on top. These images are tagged with 1 and 2 to choose between using TensorFlow 1 or 2. AMD optimised builds are the default, but can be explicitly used with the 1-amd and 2-amd tags. Builds using Intel specific instructions can be used with 1-intel and 2-intel tags.
graphcore/pytorch: is based on graphcore/poplar, with PyTorch and PopTorch installed.

Note

This tarball method of container image delivery will be replaced with a Docker registry in future, which will enable docker pull to be used instead.

3.2. Verifying IPU access from inside container

First check you have access to the IPU devices on the host. To do this, run gc-info -l and check the output contains a list of devices.

Next, do the same but inside the context of a container:

$ gc-docker -- --rm -ti graphcore/tools gc-info -l

The output should be the same.

Check you can run a TensorFlow container with gc-docker, and make sure the IPUs are visible to TensorFlow:

$ gc-docker -- --rm -ti graphcore/tensorflow:2 python3
Python 3.6.9 (default, Nov  7 2019, 10:44:02)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
>>> tensorflow.config.list_physical_devices("IPU")
[PhysicalDevice(name='/physical_device:IPU:0', device_type='IPU')]
>>>

The syntax for running an image with gc-docker is similar to using docker run, which is:

$ docker run [OPTIONS] IMAGE [COMMAND] [ARG...]

The main difference is that docker run is replaced with gc-docker --. So, in the TensorFlow example above, we used the graphcore/tensorflow:2 image and ran python3 as the command. No arguments were passed to python3.

The -- part of this command tells gc-docker that the rest of the arguments should be passed directly to docker run. gc-docker also has a few options which can be used before this. For example, you can pass a subset of IPU devices using --device-id n:

$ gc-docker --device-id 4 -- --rm -ti graphcore/tools gc-info -l

Please note, the device IDs in the container always start from zero. So if you select a subset of devices, they will be numbered from 0. For example, if you use devices 4 to 7, they will have IDs 0 to 3 in the container.

The --echo option is also useful. This makes gc-docker print the Docker command it would have run. For example:

$ gc-docker --echo --device-id 4 -- --rm -ti graphcore/tools gc-info -l
docker run --device=/dev/ipu4:/dev/ipu4 --device=/dev/ipu4_ex:/dev/ipu4_ex -ti graphcore/tools gc-info -l

Use the --help option or refer to the IPU Command Line Tools document, for more information.

3.3. Mounting directories from the host

You can mount volumes to share data between the host machine and the Docker container environment. This is useful for cases where you need to read data to be processed or to output results.

Volumes are mounted using the -v option. The basic syntax is -v <path_on_host>:<path_in_container>. For example, to mount /home/me/cat_pics from your host machine as /cats in the container, you could run the following command:

$ gc-docker -- -ti -v /home/me/cat_pics:/cats graphcore/tensorflow ls -a /cats
.  ..  mog.jpg

3.4. Setting environment variables

If you need some environment variables set inside the Docker environment, add -e VAR_NAME="var value" to your Docker options.

For example:

$ gc-docker -- -ti -e POPLAR_LOG_LEVEL=TRACE graphcore/tensorflow:2 python3

4. Running a TensorFlow application on an IPU

To demonstrate the workflow for running a TensorFlow application on IPUs in a Docker development environment, we will use one of the TensorFlow applications from the Graphcore public examples repository. First, get the code:

$ git clone https://github.com/graphcore/examples.git
$ cd examples

A common pattern when working with a Docker-based development environment is to mount the current directory into the container (as described in Mounting directories from the host), then set the working directory inside the container with -w <dir name>. For example, -v "$(pwd):/app" -w /app.

Applying this, you can run the LSTM example with the following command:

$ gc-docker -- -ti -v "$(pwd):/app" -w /app graphcore/tensorflow:1 python3 code_examples/tensorflow/kernel_benchmarks/lstm.py

To avoid running out of shared memory, many machine learning applications being very demanding on that part, it is recommended to add the following docker option: --ipc=host.

5. Extending the images

These base images can be used to create new images for more specialised purposes, or to package an application for deployment to platforms such as Kubernetes or Kubeflow.

As an example, here’s a simple Dockerfile example that creates a Jupyter notebook environment with TensorFlow and access to IPUs:

FROM graphcore/tensorflow:2

RUN pip3 install notebook

CMD ["jupyter", "notebook", "--allow-root", "--ip=0.0.0.0", "--port=8080"]

You can build and run this with the following commands:

$ docker build -t notebook .
$ gc-docker -- -p 8080:8080 notebook

6. Further reading

You can find documentation for the Graphcore software products on the Developer page of the Graphcore website.