1. Introduction

This guide explains how you can run applications in Docker on a Linux machine using one or more physical IPU devices.

You will need a machine with access to IPU devices and Docker installed.

To use the Graphcore tools outside of a Docker container, you will need to be running one of the supported operating systems. Refer to the Poplar SDK 3.4.0 Release Notes for a list of the supported systems.

2. Using gc-docker

Graphcore provides command line tools for managing IPU systems. One of these is gc-docker. This is a small wrapper for the command docker run which adds the flags necessary to use a set of IPU devices inside a running container.

For more information on gc-docker see the Command Line Tools User Guide.

The tools are available as a standalone package (called “Command Line Tools”) from the Graphcore software download portal and are also included with the Poplar SDK.

To set up your environment to use the tools, you will need to source the enable.sh script provided:

$ source enable.sh
  • In the standalone command line tools package, this script is in the top-level directory of the installed package.

  • For the Poplar SDK, the script is in the poplar-OS-VERSION directory of the installed SDK (where OS is the operating system you are using and VERSION is the version of the Poplar SDK).

To avoid having to source the enable script in every new shell, you can add the command to your .bash_profile, .zshenv or equivalent file.

For more information see the Graphcore Command Line Tools.

See the Getting Started Guide for your IPU system for more information on how to install the Poplar SDK.

3. Using Docker images

Licensed Software

This software is made available under the terms of the Graphcore Container License Agreement. Please ensure you have read and accept the terms of the license before using the software.

The Poplar container images can be pulled from Docker Hub. All Graphcore container images are based on specific operating system Docker Official Images.

The default base image (from Poplar SDK 3.1 onwards) is Ubuntu 20.04.

The Graphcore repositories may contain multiple images. These images can be selected using the tags described in Section 3.1, Image tags. An overview of the components of these repositories is shown in Fig. 3.1.

  • graphcore/pytorch-minimal: The Poplar SDK components[1] required to run PyTorch on the IPU. Also, includes PopART and the Poplar Triton Backend[2]. Designed for deployment.

  • graphcore/pytorch: The Poplar SDK components[1] required to run PyTorch on the IPU. Also, includes PopART, Jupyter, Jupyter Lab and the Poplar Triton Backend[2]. Designed for development.

  • graphcore/pytorch-geometric: The Poplar SDK components[1] required to run PyTorch and PyTorch Geometric on the IPU. Also, includes PopART, Jupyter Lab and the Poplar Triton Backend[2]. Designed for development.

  • graphcore/tensorflow-minimal: The Poplar SDK components[1] required to run TensorFlow for the IPU. Also, includes Poplar[1] and the Poplar Triton Backend[2]. Designed for deployment.

  • graphcore/tensorflow: The Poplar SDK components[1] required to run TensorFlow on the IPU. Also, includes TensorFlow Serving[3] (TensorFlow 2 images only), Jupyter, Jupyter Lab and the Poplar Triton Backend[2]. Designed for development.

  • graphcore/poplar-minimal: The Poplar SDK components[1] required to run Poplar programs. Also includes the Poplar Triton Backend[2]. Designed for deployment.

  • graphcore/poplar: The Poplar SDK components[1] required to run Poplar programs. Also, includes Jupyter, Jupyter Lab and the Poplar Triton Backend[2]. Designed for development.

  • graphcore/popart-minimal: The Poplar SDK components[1] required to run PopART. Also, includes the Poplar Triton Backend[2]. Designed for deployment.

  • graphcore/popart: The Poplar SDK components[1] required to run PopART. Also, includes Jupyter, Jupyter Lab and the Poplar Triton Backend[2]. Designed for development.

Note

[1] The Poplar SDK components included in all images are those components found in the poplar subdirectory of the Poplar SDK tarball. These include the Poplar graph programming framework, PopLibs, Model Runtime, PopEF, PopRun and PopDist, libpvti, libpva and supporting libraries. The components of the Poplar SDK are described in the Poplar SDK Overview document.

Note

[2] The Poplar Triton backend is included in the Poplar, PopART, PyTorch, PyTorch Geometric and TensorFlow images. It can be found at /opt/triton/lib/libtriton_poplar.so and the libraries it depends on are in /opt/triton/lib/lib.

Note

[3] The TensorFlow Serving binary is included in the TensorFlow 2 images. It can be found at /opt/tfserving2/tensorflow_model_server.

digraph {
  rankdir = TB
  node [shape=record style="filled,rounded" fillcolor=cornsilk]
  labeldistance = 1.2
    subgraph cluster_legend {
      label = "Legend" ;
      shape = rectangle ;
      color = black ;
      a ;
      b ;
      a -> b [labelfloat=true nojustify=false label="builds on"] ;
    }
    "poplar-minimal" [nojustify=false label="{poplar-minimal | Poplar SDK components to run Poplar programs\lPoplar Triton Backend}"];
    "popart-minimal" [nojustify=false label="{popart-minimal | Python\lPoplar SDK components to run PopART on the IPU}"];
    "popart-minimal" -> "poplar-minimal";
    "poplar" [nojustify=false label="{poplar | Python\lJupyter, Jupyter Lab}"];
    "poplar" -> "poplar-minimal";
    "tensorflow-minimal" [nojustify=false label="{tensorflow-minimal | Python}"];
    "tensorflow-minimal" -> "poplar-minimal" [label="amd"];
    "tensorflow-minimal" [nojustify=false label="{tensorflow-minimal | Python}"];
    "tensorflow-minimal" -> "poplar-minimal" [label="intel"];
    "popart" [nojustify=false label="{popart | Jupyter, Jupyter Lab}"];
    "popart" -> "popart-minimal";
    "pytorch-minimal" [nojustify=false label="{pytorch-minimal | Poplar SDK components to run PyTorch on the IPU}"];
    "pytorch-minimal" -> "popart-minimal";
    "tensorflow" [nojustify=false label="{tensorflow | Jupyter, Jupyter Lab\lTensorFlow Serving (TensorFlow 2 only)\lPython}"];
    "tensorflow" -> "tensorflow-minimal" [label="amd"];
    "tensorflow" [nojustify=false label="{tensorflow | Jupyter, Jupyter Lab\lTensorFlow Serving (TensorFlow 2 only)}"];
    "tensorflow" -> "tensorflow-minimal" [label="intel"];
    "pytorch" [nojustify=false label="{pytorch | Jupyter, Jupyter Lab}"];
    "pytorch" -> "pytorch-minimal";
    "pytorch-geometric" [nojustify=false label="{pytorch-geometric | Poplar SDK components to run PyTorch Geometric on the IPU}"];
    "pytorch-geometric" -> "pytorch";
  }

Fig. 3.1 Dependency graph for repositories

The IPU Operator repositories for using IPUs in Kubernetes are:

3.1. Image tags

The specific content and version of an image are denoted using image tags. Multiple tags are used for the same image to permit varying levels of specificity. Examples of tags for each image are given in table Table 3.1.

Table 3.1 Docker container tag formats and examples

Container image name

Tag format

Example

pytorch

<sdk-version>-<os-distro>-<os-version>-<build-date>

pytorch:3.3.0-ubuntu-20.04-20230619

pytorch-geometric

<sdk-version>-<os-distro>-<os-version>-<build-date>

pytorch-geometric:3.3.0-ubuntu-20.04-20230619

tensorflow

<tensorflow-version>-<arch>-<sdk-version>-<os-distro>-<os-version>-<build-date>

tensorflow:2-intel-3.3.0-ubuntu-20.04-20230619

poplar

<sdk-version>-<os-distro>-<os-version>-<build-date>

poplar:3.3.0-ubuntu-20.04-20230619

popart

<sdk-version>-<os-distro>-<os-version>-<build-date>

popart:3.3.0-ubuntu-20.04-20230619

For the tags used in versions of the Poplar SDK prior to 2.5, refer to the appendix Older tag versions.

3.1.1. General format

The general tag format indicates the Poplar SDK release, base OS image used and the date the image was built by Graphcore: <sdk-version>-<os-distro>-<os-version>-<build-date>. The <build-date> is in the form yyyymmdd. For example, 3.1.0-ubuntu-20.04-20221219 indicates version 3.1.0 of the Poplar SDK, based on an official Ubuntu 20.04 image, built on 19th December 2022.

The most recent build using a specific OS base image is referenced by <sdk-version>-<os-distro>-<os-version>, for example 3.1.0-ubuntu-20.04. The most recent build using the default base OS image is referenced by <sdk-version>, for example 3.1.0.

The tag latest always refers to the most recent Poplar SDK release using the default OS base image.

Note

Graphcore may update images following their initial publication. To always obtain the latest image, we recommend you do not include the <build-date> in the tag. This will ensure that any updates, such as those to address CVEs, will be picked up. If you wish to pin a specific version for your builds, we recommend using the SHA256 of the image.

3.1.2. TensorFlow tags

We support TensorFlow 1 and TensorFlow 2. There are versions of each of these compiled for Intel and AMD processors in order to provide the best performance on those hosts. As a result, there are four TensorFlow images.

Note

TensorFlow 1 images are only available up to and including Poplar SDK 3.0

The full TensorFlow image tag has a prefix of the form <tensorflow-version>-<arch>, indicating the version of TensorFlow and the processor architecture. The rest of the tags indicates the Poplar SDK release, base OS image used and date of build, as described above.

The TensorFlow version is either 1 and 2. The default is TensorFlow 2. The processor architecture can be either amd or intel.

The most recent build for a specific version of TensorFlow and architecture is referenced by <tensorflow-version>-<arch>, for example 1-amd identifies TensorFlow 1 for AMD. Omitting <arch> implies AMD and so, for example, a tag of 1 identifies the latest TensorFlow 1 build for AMD.

The tag latest always refers to the latest TensorFlow 2 build for AMD.

Note

You must use the correct image for your host CPU. You can use the command lscpu to determine the CPU type, if you are not sure.

3.1.3. Examples

To pull the latest PyTorch image use:

$ docker pull graphcore/pytorch

To pull the latest TensorFlow 2 image optimised for AMD, use:

$ docker pull graphcore/tensorflow:2-amd

You can also specify the version of the Poplar SDK, for example:

$ docker pull graphcore/pytorch:3.1.0

Or fully specify the version of a TensorFlow image:

Note that the date here is just an example; you would need to check the specific tags that are actually available.

$ docker pull graphcore/tensorflow:2-amd-3.1.0-20.04-20221219

3.2. Mounting directories from the host

You can mount directories as volumes to share data between the host machine and the Docker container environment. This is useful for cases where you need to read data to be processed or to output results.

Volumes are mounted using the -v option. The basic syntax is -v <path_on_host>:<path_in_container>. For example, to mount /home/me/cat_pics from your host machine as /cats in the container, you could run the following command:

$ gc-docker -- -ti -v /home/me/cat_pics:/cats graphcore/tensorflow ls -a /cats
.  ..  mog.jpg

You can repeat the -v option to mount multiple volumes in this way.

3.3. Setting environment variables

If you need some environment variables set inside the Docker environment, add -e VAR_NAME="var value" to your Docker options.

For example:

$ gc-docker -- -ti -e POPLAR_LOG_LEVEL=TRACE graphcore/tensorflow:2 python3

If you want an environment variable already set in your current environment to be set to the same value inside the Docker container, use the --pass-env option to gc-docker.

For example:

$ export POPLAR_LOG_LEVEL=TRACE
$ gc-docker --pass-env POPLAR_LOG_LEVEL -- -ti graphcore/tensorflow:2 python3

Note

gc-docker will automatically pass some configuration variables into Docker environment. See Section 3.4.2, Using the configuration in a container.

3.4. Using IPU Pod partitions

Note

This document assumes you are using an IPU Pod. If you are using PCIe IPU devices in an IPU server, see Section 7, PCIe IPU devices for setup instructions.

You can create V-IPU partitions by following the instructions found in the V-IPU User Guide.

3.4.1. Configuration variables

Once you have created a partition, Poplar needs to know the network endpoints to use to access the IPU devices, the topology of the devices and the configuration state. This information can most easily be provided by using the following environment variables.

Table 3.2 IPUoF configuration variables

IPUOF_VIPU_API_HOST

The IP address of the server running the V-IPU controller. Required.

IPUOF_VIPU_API_PORT

The port to connect to the V-IPU controller. Optional. The default is 8090.

IPUOF_VIPU_API_PARTITION_ID

The name of the partition to use. Required.

IPUOF_VIPU_API_GCD_ID

The ID of the GCD you want to use. Required for multi-GCD systems.

3.4.2. Using the configuration in a container

This configuration information needs to be available inside the container. The relevant environment variables can be set manually as explained in Section 3.3, Setting environment variables.

However, gc-docker has some options to make this more convenient. By default gc-docker adds Docker options to set IPUOF_VIPU_API_HOST, IPUOF_VIPU_API_PORT, IPUOF_VIPU_API_PARTITION_ID, and IPUOF_VIPU_API_GCD_ID within the container to the same value as the host, so that

$ export IPUOF_VIPU_API_HOST=localhost
$ export IPUOF_VIPU_API_PARTITION_ID=my_partition
$ gc-docker -- -ti graphcore/tensorflow:2 python3

is equivalent to

$ gc-docker -- -e IPUOF_VIPU_API_HOST=localhost -e IPUOF_VIPU_API_PARTITION_ID=my_partition -ti graphcore/tensorflow:2 python3

If IPUOF_CONFIG_PATH is set, gc-docker adds Docker options to mount the specified path as a volume under /etc/ipuof.conf and set IPUOF_CONFIG_PATH within the container to /etc/ipuof.conf, so that

$ export IPUOF_CONFIG_PATH=~/partitions/my_partition.conf
$ gc-docker -- graphcore/tensorflow:2 python3

is equivalent to

$ gc-docker -- -e IPUOF_CONFIG_PATH=/etc/ipuof.conf -v ~/partitions/my_partition.conf:/etc/ipuof.conf graphcore/tensorflow:2 python3

To prevent these variables being automatically passed, use the --no-default-env argument to gc-docker.

3.4.3. Configuration files

The information that Poplar needs to access devices can, in some cases, be passed via a configuration file.

Warning

IPUoF configuration files are provided as a convenience. However, they are not supported on all IPU systems. Also, because of the difficulties of maintaining and sharing configuration files, they are not the recommended way to manage access to IPUs.

Support for configuration files may be removed in future.

We strongly recommend the use of environment variables, as described in Section 3.4.1, Configuration variables.

By default when you create a partition, a configuration file is created in the directory $HOME/.ipuof.conf.d/.

3.5. Verifying IPU access from inside container

First check you have access to the IPU devices. To do this, run gc-info -l and check that the output contains a list of devices.

Next, do the same but inside the context of a container:

$ gc-docker -- --rm -ti graphcore/tools gc-info -l

The output should be the same.

Check you can run a TensorFlow container with gc-docker, and make sure the IPUs are visible to TensorFlow:

$ gc-docker -- --rm -ti graphcore/tensorflow:2 python3
Python 3.6.9 (default, Nov  7 2019, 10:44:02)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
>>> tensorflow.config.list_physical_devices("IPU")
[PhysicalDevice(name='/physical_device:IPU:0', device_type='IPU')]
>>>

The syntax for running an image with gc-docker is similar to using docker run, which is:

$ docker run [OPTIONS] IMAGE [COMMAND] [ARG...]

The main difference is that docker run is replaced with gc-docker --. So, in the TensorFlow example above, we used the graphcore/tensorflow:2 image and ran python3 as the command. No arguments were passed to python3.

The -- part of this command tells gc-docker that the rest of the arguments should be passed directly to docker run.

The --echo option is also useful. This makes gc-docker print the Docker command it would have run. For example:

$ gc-docker --echo -- --rm -ti graphcore/tools gc-info -l
docker run --ulimit memlock=-1:-1 --net=host --cap-add=IPC_LOCK --device=/dev/infiniband/ --ipc=host --rm -ti graphcore/tools gc-info -l

Use the --help option or refer to the Command Line Tools document, for more information.

4. Running a TensorFlow 2 application on an IPU

To demonstrate the workflow for running a TensorFlow 2 application on IPUs in a Docker development environment, we will use one of the TensorFlow 2 applications from the Graphcore tutorials. First, get the code:

$ git clone https://github.com/graphcore/examples.git
$ cd examples/tutorials

A common pattern when working with a Docker-based development environment is to mount the current directory into the container (as described in Section 3.2, Mounting directories from the host), then set the working directory inside the container with -w <dir name>. For example, -v "$(pwd):/app" -w /app.

Applying this, you can run the MNIST example.

First, by setting the environment variables:

$ export IPUOF_VIPU_API_HOST=localhost
$ export IPUOF_VIPU_API_PARTITION_ID=my_partition
$ gc-docker -- -ti -v "$(pwd):/app" -w /app graphcore/tensorflow:2 python3 simple_applications/tensorflow2/mnist/mnist.py

Second, by copying the configuration file to the working directory and

$ cp -r /etc/ipuof.conf.d .
$ export IPUOF_CONFIG_PATH=$(pwd)/my_partition.conf
$ gc-docker -- -ti -v "$(pwd):/app" -w /app graphcore/tensorflow:2 python3 simple_applications/tensorflow2/mnist/mnist.py

Note

Because many machine learning applications are very demanding of shared memory, we recommend that you add the Docker option: --ipc=host to the command line.

5. Extending the images

These base images can be used to create new images for more specialised purposes, or to package an application for deployment to platforms such as Kubernetes or Kubeflow.

As an example, here’s a simple Dockerfile example that creates a Jupyter notebook environment with TensorFlow 2 and access to IPUs:

FROM graphcore/tensorflow:2

RUN pip3 install notebook

CMD ["jupyter", "notebook", "--allow-root", "--ip=0.0.0.0", "--port=8080"]

You can build and run this with the following commands:

$ docker build -t notebook .
$ gc-docker -- -p 8080:8080 notebook

6. Sharing IPU devices between containers

Multiple containers configured to use the same IPU devices and running simultaneously requires some additional handling:

  • When run inside a container, tools such as gc-monitor (which lists processes running on IPU devices) are not able to observe processes running on the host machine or in other containers. This could cause IPUs to appear free even when other containers are running workloads on them.

  • Multiple containers can be started with the same partition config, but IPU devices will not be locked for exclusive access until an application using IPU devices is started in the container. For this reason it is recommended to immediately lock devices when the container starts, or avoid sharing IPUs between simultaneously running containers.

For these reasons, it is recommended to avoid sharing between containers and to instead use a higher level scheduler to allocate separate IPU devices to each container.

7. PCIe IPU devices

When the Graphcore IPUs are directly attached to the host you will need to install a PCIe driver. This enables the host to communicate with the IPU hardware. This section describes how to confirm whether you have a PCIe device attached, whether you have the IPU driver installed and how to install the driver if it is not installed.

Note

This is not relevant if you are using IPU Pod systems or cloud-based IPU systems.

7.1. Check if IPU hardware is present

To check whether the Graphcore hardware is present on the host. Run the following command:

$ lspci | grep -i Graphcore

You should see a list of devices similar to this one:

1a:00.0 Processing accelerators: Graphcore Ltd Device 0001

Note

If you do not see Graphcore hardware (the list is empty), you might be using an IPU Pod system. For guidance on how to configure and use an IPU Pod system, see the Getting Started Guide for your system.

7.2. Check if the IPU device driver is installed

To check if your machine has the IPU device driver installed run:

$ modinfo ipu_driver

If the driver is installed and running, you should see something similar to:

$ modinfo ipu_driver
filename:       /lib/modules/4.15.0-55-generic/updates/dkms/ipu_driver.ko
version:        1.0.39
description:    IPU PCI Driver
author:         Graphcore Limited
license:        GPL
srcversion:     49FFB7D8556EB58899AE41A
alias:          pci:v00001D95d00000003sv*sd*bc*sc*i*
alias:          pci:v00001D95d00000002sv*sd*bc*sc*i*
alias:          pci:v00001D95d00000001sv*sd*bc*sc*i*
depends:
retpoline:      Y
name:           ipu_driver
vermagic:       4.15.0-55-generic SMP mod_unload
parm:           memmap_start:array of ulong
parm:           memmap_size:array of ulong

In this case, you can continue with the set up in Section 7.2.1, Docker setup.

If you get an error along the lines of:

$ modinfo ipu_driver
modinfo: ERROR: Module ipu_driver not found.

you will need to install the driver (Section 7.2.2, Installing the PCIe driver).

7.2.1. Docker setup

Container use via gc-docker is similar to IPU Pods with the difference being that IPUoF configuration information does not need to be passed in:

$ gc-docker -- --rm -ti graphcore/tools gc-info -l

gc-docker also has a few options which can be used before this. For example, you can pass a subset of IPU devices using --device-id n:

$ gc-docker --device-id 4 -- --rm -ti graphcore/tools gc-info -l

Note

The device IDs in the container always start from zero. So if you select a subset of devices, they will be numbered from 0. For example, if you use devices 4 to 7, they will have IDs 0 to 3 in the container.

7.2.2. Installing the PCIe driver

The IPU driver is included with the Poplar SDK. See the Getting Started Guide for your IPU system for more information on how to install the SDK.

Instructions for installing the IPU driver for your system are included in the README that can be found in the following location:

$ [path_to_SDK]/gc_kernel-module-[os]_[os_ver]-[driver_ver]+[build]/README

where [path_to_SDK] is the location of the Poplar SDK on your system. [os] is OS on your system, [os_ver] is the OS version, [driver_ver] is the version number of the IPU driver and [build] is the build information.

8. Further reading

Full documentation for the Poplar software is available on the Graphcore documentation portal. More information can be found on the Graphcore developer pages.

For an overview of the Poplar SDK and development tools, see the SDK Overview.

The IPU Programmer’s Guide provides an introduction to the IPU architecture, programming model and tools available.

If you are interested in running TensorFlow on the IPU, there are user guides and API references for the IPU implementation of TensorFlow 1 and TensorFlow 2.

You can also run PyTorch on the IPU.

Graphcore also has a GitHub repository with further examples:

  • https://github.com/graphcore/examples:

    • Tutorials

    • Examples of using Poplar and IPU features

    • Examples of simple models

    • Source code from videos, blogs and other documents

    • Benchmarks for performance testing of layer types on your IPU system

    • TensorFlow and PyTorch examples of commonly used machine learning models for training and inference

You can use the tag “ipu” when asking questions or looking for answers on StackOverflow.

Support is available from the Graphcore customer engineering team via the Graphcore support portal.

For general help, discussions and announcements, please join our Graphcore Slack Community.