1. Introduction
This guide explains how you can run applications in Docker on a Linux machine using one or more physical IPU devices.
You will need a machine with access to IPU devices and Docker installed.
To use the Graphcore tools outside of a Docker container, you will need to be running one of the supported operating systems. Refer to the Poplar SDK 3.4.0 Release Notes for a list of the supported systems.
2. Using gc-docker
Graphcore provides command line tools for managing
IPU systems. One of these is gc-docker
.
This is a small wrapper for the command docker run
which adds the flags necessary to use a set of IPU devices inside a running
container.
For more information on gc-docker
see the Command Line Tools User Guide.
The tools are available as a standalone package (called “Command Line Tools”) from the Graphcore software download portal and are also included with the Poplar SDK.
To set up your environment to use the tools, you will need to source the enable.sh
script provided:
$ source enable.sh
In the standalone command line tools package, this script is in the top-level directory of the installed package.
For the Poplar SDK, the script is in the
poplar-OS-VERSION
directory of the installed SDK (whereOS
is the operating system you are using andVERSION
is the version of the Poplar SDK).
To avoid having to source the enable script in every new shell, you can add the command to your .bash_profile
, .zshenv
or equivalent file.
For more information see the Graphcore Command Line Tools.
See the Getting Started Guide for your IPU system for more information on how to install the Poplar SDK.
3. Using Docker images
Licensed Software
This software is made available under the terms of the Graphcore Container License Agreement. Please ensure you have read and accept the terms of the license before using the software.
The Poplar container images can be pulled from Docker Hub. All Graphcore container images are based on specific operating system Docker Official Images.
The default base image (from Poplar SDK 3.1 onwards) is Ubuntu 20.04.
The Graphcore repositories may contain multiple images. These images can be selected using the tags described in Section 3.1, Image tags. An overview of the components of these repositories is shown in Fig. 3.1.
graphcore/pytorch-minimal: The Poplar SDK components[1] required to run PyTorch on the IPU. Also, includes PopART and the Poplar Triton Backend[2]. Designed for deployment.
graphcore/pytorch: The Poplar SDK components[1] required to run PyTorch on the IPU. Also, includes PopART, Jupyter, Jupyter Lab and the Poplar Triton Backend[2]. Designed for development.
graphcore/pytorch-geometric: The Poplar SDK components[1] required to run PyTorch and PyTorch Geometric on the IPU. Also, includes PopART, Jupyter Lab and the Poplar Triton Backend[2]. Designed for development.
graphcore/tensorflow-minimal: The Poplar SDK components[1] required to run TensorFlow for the IPU. Also, includes Poplar[1] and the Poplar Triton Backend[2]. Designed for deployment.
graphcore/tensorflow: The Poplar SDK components[1] required to run TensorFlow on the IPU. Also, includes TensorFlow Serving[3] (TensorFlow 2 images only), Jupyter, Jupyter Lab and the Poplar Triton Backend[2]. Designed for development.
graphcore/poplar-minimal: The Poplar SDK components[1] required to run Poplar programs. Also includes the Poplar Triton Backend[2]. Designed for deployment.
graphcore/poplar: The Poplar SDK components[1] required to run Poplar programs. Also, includes Jupyter, Jupyter Lab and the Poplar Triton Backend[2]. Designed for development.
graphcore/popart-minimal: The Poplar SDK components[1] required to run PopART. Also, includes the Poplar Triton Backend[2]. Designed for deployment.
graphcore/popart: The Poplar SDK components[1] required to run PopART. Also, includes Jupyter, Jupyter Lab and the Poplar Triton Backend[2]. Designed for development.
Note
[1] The Poplar SDK components included in all images are those components found in the poplar
subdirectory of the Poplar SDK tarball. These include the Poplar graph programming framework, PopLibs, Model Runtime, PopEF, PopRun and PopDist, libpvti, libpva and supporting libraries. The components of the Poplar SDK are described in the Poplar SDK Overview document.
Note
[2] The Poplar Triton backend is included in the Poplar, PopART, PyTorch, PyTorch Geometric and TensorFlow images. It can be found at /opt/triton/lib/libtriton_poplar.so
and the libraries it depends on are in /opt/triton/lib/lib
.
Note
[3] The TensorFlow Serving binary is included in the TensorFlow 2 images. It can be found at /opt/tfserving2/tensorflow_model_server
.
The IPU Operator repositories for using IPUs in Kubernetes are:
graphcore/ipu-operator-controller: the Controller component of the Kubernetes IPU Operator.
graphcore/ipu-operator-launcher: the Launcher component of the Kubernetes IPU Operator.
graphcore/ipu-operator-vipu-proxy: the V-IPU Proxy component of the Kubernetes IPU Operator.
3.2. Mounting directories from the host
You can mount directories as volumes to share data between the host machine and the Docker container environment. This is useful for cases where you need to read data to be processed or to output results.
Volumes are mounted using the -v
option. The basic syntax is
-v <path_on_host>:<path_in_container>
. For example, to mount /home/me/cat_pics
from
your host machine as /cats
in the container, you could run the following command:
$ gc-docker -- -ti -v /home/me/cat_pics:/cats graphcore/tensorflow ls -a /cats
. .. mog.jpg
You can repeat the -v
option to mount multiple volumes in this way.
3.3. Setting environment variables
If you need some environment variables set inside the Docker environment,
add -e VAR_NAME="var value"
to your Docker options.
For example:
$ gc-docker -- -ti -e POPLAR_LOG_LEVEL=TRACE graphcore/tensorflow:2 python3
If you want an environment variable already set in your current environment
to be set to the same value inside the Docker container, use the --pass-env
option to gc-docker
.
For example:
$ export POPLAR_LOG_LEVEL=TRACE
$ gc-docker --pass-env POPLAR_LOG_LEVEL -- -ti graphcore/tensorflow:2 python3
Note
gc-docker
will automatically pass some configuration variables into
Docker environment. See Section 3.4.2, Using the configuration in a container.
3.4. Using IPU Pod partitions
Note
This document assumes you are using an IPU Pod. If you are using PCIe IPU devices in an IPU server, see Section 7, PCIe IPU devices for setup instructions.
You can create V-IPU partitions by following the instructions found in the V-IPU User Guide.
3.4.1. Configuration variables
Once you have created a partition, Poplar needs to know the network endpoints to use to access the IPU devices, the topology of the devices and the configuration state. This information can most easily be provided by using the following environment variables.
|
The IP address of the server running the V-IPU controller. Required. |
|
The port to connect to the V-IPU controller. Optional. The default is 8090. |
|
The name of the partition to use. Required. |
|
The ID of the GCD you want to use. Required for multi-GCD systems. |
3.4.2. Using the configuration in a container
This configuration information needs to be available inside the container. The relevant environment variables can be set manually as explained in Section 3.3, Setting environment variables.
However, gc-docker
has some options to make this more convenient. By default
gc-docker
adds Docker options to set IPUOF_VIPU_API_HOST
, IPUOF_VIPU_API_PORT
,
IPUOF_VIPU_API_PARTITION_ID
, and IPUOF_VIPU_API_GCD_ID
within the container to the
same value as the host, so that
$ export IPUOF_VIPU_API_HOST=localhost
$ export IPUOF_VIPU_API_PARTITION_ID=my_partition
$ gc-docker -- -ti graphcore/tensorflow:2 python3
is equivalent to
$ gc-docker -- -e IPUOF_VIPU_API_HOST=localhost -e IPUOF_VIPU_API_PARTITION_ID=my_partition -ti graphcore/tensorflow:2 python3
If IPUOF_CONFIG_PATH
is set, gc-docker
adds Docker options to mount the specified path as a volume under /etc/ipuof.conf
and set IPUOF_CONFIG_PATH
within the container to /etc/ipuof.conf
, so that
$ export IPUOF_CONFIG_PATH=~/partitions/my_partition.conf
$ gc-docker -- graphcore/tensorflow:2 python3
is equivalent to
$ gc-docker -- -e IPUOF_CONFIG_PATH=/etc/ipuof.conf -v ~/partitions/my_partition.conf:/etc/ipuof.conf graphcore/tensorflow:2 python3
To prevent these variables being automatically passed, use the --no-default-env
argument to gc-docker
.
3.4.3. Configuration files
The information that Poplar needs to access devices can, in some cases, be passed via a configuration file.
Warning
IPUoF configuration files are provided as a convenience. However, they are not supported on all IPU systems. Also, because of the difficulties of maintaining and sharing configuration files, they are not the recommended way to manage access to IPUs.
Support for configuration files may be removed in future.
We strongly recommend the use of environment variables, as described in Section 3.4.1, Configuration variables.
By default when you create a partition, a configuration file is
created in the directory $HOME/.ipuof.conf.d/
.
3.5. Verifying IPU access from inside container
First check you have access to the IPU devices. To do this, run
gc-info -l
and check that the output contains a list of devices.
Next, do the same but inside the context of a container:
$ gc-docker -- --rm -ti graphcore/tools gc-info -l
The output should be the same.
Check you can run a TensorFlow container with gc-docker
, and make sure the
IPUs are visible to TensorFlow:
$ gc-docker -- --rm -ti graphcore/tensorflow:2 python3
Python 3.6.9 (default, Nov 7 2019, 10:44:02)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
>>> tensorflow.config.list_physical_devices("IPU")
[PhysicalDevice(name='/physical_device:IPU:0', device_type='IPU')]
>>>
The syntax for running an image with gc-docker
is similar to using
docker run
, which is:
$ docker run [OPTIONS] IMAGE [COMMAND] [ARG...]
The main difference is that docker run
is replaced with gc-docker --
.
So, in the TensorFlow example above, we used the graphcore/tensorflow:2
image and ran python3
as the command. No arguments were passed to python3
.
The --
part of this command tells gc-docker
that the rest of the
arguments should be passed directly to docker run
.
The --echo
option is also useful. This makes gc-docker
print the Docker
command it would have run. For example:
$ gc-docker --echo -- --rm -ti graphcore/tools gc-info -l
docker run --ulimit memlock=-1:-1 --net=host --cap-add=IPC_LOCK --device=/dev/infiniband/ --ipc=host --rm -ti graphcore/tools gc-info -l
Use the --help
option or refer to the Command Line Tools document, for more
information.
4. Running a TensorFlow 2 application on an IPU
To demonstrate the workflow for running a TensorFlow 2 application on IPUs in a Docker development environment, we will use one of the TensorFlow 2 applications from the Graphcore tutorials. First, get the code:
$ git clone https://github.com/graphcore/examples.git
$ cd examples/tutorials
A common pattern when working with a Docker-based development environment is to
mount the current directory into the container (as described in
Section 3.2, Mounting directories from the host), then set the working directory inside the
container with -w <dir name>
.
For example, -v "$(pwd):/app" -w /app
.
Applying this, you can run the MNIST example.
First, by setting the environment variables:
$ export IPUOF_VIPU_API_HOST=localhost
$ export IPUOF_VIPU_API_PARTITION_ID=my_partition
$ gc-docker -- -ti -v "$(pwd):/app" -w /app graphcore/tensorflow:2 python3 simple_applications/tensorflow2/mnist/mnist.py
Second, by copying the configuration file to the working directory and
$ cp -r /etc/ipuof.conf.d .
$ export IPUOF_CONFIG_PATH=$(pwd)/my_partition.conf
$ gc-docker -- -ti -v "$(pwd):/app" -w /app graphcore/tensorflow:2 python3 simple_applications/tensorflow2/mnist/mnist.py
Note
Because many machine learning applications are very demanding of shared
memory, we recommend that you add the Docker option: --ipc=host
to the
command line.
5. Extending the images
These base images can be used to create new images for more specialised purposes, or to package an application for deployment to platforms such as Kubernetes or Kubeflow.
As an example, here’s a simple Dockerfile example that creates a Jupyter notebook environment with TensorFlow 2 and access to IPUs:
FROM graphcore/tensorflow:2
RUN pip3 install notebook
CMD ["jupyter", "notebook", "--allow-root", "--ip=0.0.0.0", "--port=8080"]
You can build and run this with the following commands:
$ docker build -t notebook .
$ gc-docker -- -p 8080:8080 notebook
7. PCIe IPU devices
When the Graphcore IPUs are directly attached to the host you will need to install a PCIe driver. This enables the host to communicate with the IPU hardware. This section describes how to confirm whether you have a PCIe device attached, whether you have the IPU driver installed and how to install the driver if it is not installed.
Note
This is not relevant if you are using IPU Pod systems or cloud-based IPU systems.
7.1. Check if IPU hardware is present
To check whether the Graphcore hardware is present on the host. Run the following command:
$ lspci | grep -i Graphcore
You should see a list of devices similar to this one:
1a:00.0 Processing accelerators: Graphcore Ltd Device 0001
Note
If you do not see Graphcore hardware (the list is empty), you might be using an IPU Pod system. For guidance on how to configure and use an IPU Pod system, see the Getting Started Guide for your system.
7.2. Check if the IPU device driver is installed
To check if your machine has the IPU device driver installed run:
$ modinfo ipu_driver
If the driver is installed and running, you should see something similar to:
$ modinfo ipu_driver
filename: /lib/modules/4.15.0-55-generic/updates/dkms/ipu_driver.ko
version: 1.0.39
description: IPU PCI Driver
author: Graphcore Limited
license: GPL
srcversion: 49FFB7D8556EB58899AE41A
alias: pci:v00001D95d00000003sv*sd*bc*sc*i*
alias: pci:v00001D95d00000002sv*sd*bc*sc*i*
alias: pci:v00001D95d00000001sv*sd*bc*sc*i*
depends:
retpoline: Y
name: ipu_driver
vermagic: 4.15.0-55-generic SMP mod_unload
parm: memmap_start:array of ulong
parm: memmap_size:array of ulong
In this case, you can continue with the set up in Section 7.2.1, Docker setup.
If you get an error along the lines of:
$ modinfo ipu_driver
modinfo: ERROR: Module ipu_driver not found.
you will need to install the driver (Section 7.2.2, Installing the PCIe driver).
7.2.1. Docker setup
Container use via gc-docker is similar to IPU Pods with the difference being that IPUoF configuration information does not need to be passed in:
$ gc-docker -- --rm -ti graphcore/tools gc-info -l
gc-docker
also has a few options which can be used before this. For
example, you can pass a subset of IPU devices using --device-id n
:
$ gc-docker --device-id 4 -- --rm -ti graphcore/tools gc-info -l
Note
The device IDs in the container always start from zero. So if you select a subset of devices, they will be numbered from 0. For example, if you use devices 4 to 7, they will have IDs 0 to 3 in the container.
7.2.2. Installing the PCIe driver
The IPU driver is included with the Poplar SDK. See the Getting Started Guide for your IPU system for more information on how to install the SDK.
Instructions for installing the IPU driver for your system are included in the README that can be found in the following location:
$ [path_to_SDK]/gc_kernel-module-[os]_[os_ver]-[driver_ver]+[build]/README
where [path_to_SDK]
is the location of the Poplar SDK on your system. [os]
is OS on your system, [os_ver]
is the OS version, [driver_ver]
is the version number of the IPU driver and [build]
is the build information.
8. Further reading
Full documentation for the Poplar software is available on the Graphcore documentation portal. More information can be found on the Graphcore developer pages.
For an overview of the Poplar SDK and development tools, see the SDK Overview.
The IPU Programmer’s Guide provides an introduction to the IPU architecture, programming model and tools available.
If you are interested in running TensorFlow on the IPU, there are user guides and API references for the IPU implementation of TensorFlow 1 and TensorFlow 2.
You can also run PyTorch on the IPU.
Graphcore also has a GitHub repository with further examples:
https://github.com/graphcore/examples:
Tutorials
Examples of using Poplar and IPU features
Examples of simple models
Source code from videos, blogs and other documents
Benchmarks for performance testing of layer types on your IPU system
TensorFlow and PyTorch examples of commonly used machine learning models for training and inference
You can use the tag “ipu” when asking questions or looking for answers on StackOverflow.
Support is available from the Graphcore customer engineering team via the Graphcore support portal.
For general help, discussions and announcements, please join our Graphcore Slack Community.