3. Quick start for beginners

This section provides more detail on the steps described in the Quick start for experts section.

Ensure you have completed the steps described in the getting started guide for your system as defined in the Prerequisites section before completing the steps in this section.

The setup for TensorFlow 1 depends on whether your system is running Ubuntu 18.04 or Ubuntu 20.04.

You can check which OS you are running with:

$ lsb_release -a

3.1. Ubuntu 18.04

3.1.1. Enable Poplar SDK

On some systems you must explicitly enable the Poplar SDK before you can use PyTorch, PopART, TensorFlow 1 and TensorFlow 2; and if you are working directly in the Poplar Graph Programming Framework. On other systems, the SDK is enabled as part of the login process.

Table 3.1 defines whether you have to explicitly enable the SDK and where to find the SDK.

Table 3.1 Systems that need the Poplar SDK to be enabled and the SDK location

System

Enable SDK?

SDK location

Pod system

Yes

The SDK will be found in the directory where you extracted the SDK tarball.

Graphcloud

Yes

/opt/gc/poplar_sdk-ubuntu_18_04-[poplar_ver]+[build]

where [poplar_ver] is the software version number of the Poplar SDK and [build] is the build information.

Gcore Cloud

No

The SDK has been enabled as part of the login process.

To enable the Poplar SDK:

For SDK versions 2.6 and later, there is a single enable script that determines whether you are using Bash or Zsh and runs the appropriate scripts to enable both Poplar and PopART.

Run the single script as:

$ source [path_to_SDK]/enable

where [path_to_SDK] is the location of the Poplar SDK on your system.

Note

You must source the Poplar enable script for each new shell. You can add this source command to your .bashrc (or .zshrc for SDK versions later than 2.6) to do this on a more permanent basis.

If you attempt to run any Poplar software without having first sourced this script, you will get an error from the C++ compiler similar to the following (the exact message will depend on your code):

fatal error: 'poplar/Engine.hpp' file not found

Warning

If you try to source the script after it has already been sourced, then you will get an error similar to:

ERROR: A Poplar SDK has already been enabled.
Path of enabled Poplar SDK: /opt/gc/sdk-2.5.1/poplar-ubuntu_20_04-2.5.0+3723-e94d646535
If this is not wanted then please start a new shell.

You can verify that Poplar has been successfully set up by running:

$ popc --version

This will display the version of the installed software.

3.1.2. Create and enable a Python virtual environment

It is good practice to work in a different Python virtual environment for each framework or even for each application. This section describes how you create and activate a Python virtual environment.

Note

You must activate the Python virtual environment before you can start using it.

The virtual environment must be created for the Python version you will be using. This cannot be changed after creation. Create a new Python virtual environment with:

$ virtualenv -p python[py_ver] ~/[base_dir]/[venv_name]

where [base_dir] is a location of your choice and [venv_name] is the name of the virtual environment. [py_ver] is the version of Python you are using and it depends on your OS.

Note

On Ubuntu 18 systems we support Python 3.6, and on Ubuntu 20 systems we support Python 3.8. You can get more information about the versions of tools supported in the Poplar SDK for different operating systems in the Release Notes.

You can check which OS you are running by using the command:

$ lsb_release -a

To start using a virtual environment, activate it with:

$ source ~/[base_dir]/[venv_name]/bin/activate

where [base_dir] is where you created the virtual environment and [venv_name] is the name of the virtual environment.

Now all subsequent installations will be local to that virtual environment.

3.1.3. Install the TensorFlow 1 wheels and validate

In order to run applications in TensorFlow 1 on an IPU, you have to install Python wheel files for the Graphcore ports of TensorFlow 1 and Keras and also for TensorFlow 1 add-ons.

3.1.3.1. TensorFlow 1 wheel

There are two TensorFlow 1 wheels included in the Poplar SDK, one for AMD processors and one for Intel processors. Check which processor is used on your system by running:

$ lscpu | grep name

The wheel file has a name of the form:

tensorflow-[ver]+[platform].whl

where [ver] is the version of the Graphcore port of TensorFlow 1 and [platform] defines the server details (processor and operating system) for the TensorFlow build. An example of the TensorFlow 1 wheel file for an Intel processor for Poplar SDK 3.0 is:

tensorflow-1.15.5+gc3.0.0+236840+f53da99dba1+intel_skylake512-cp36-cp36m-linux_x86_64.whl

Install the Graphcore TensorFlow 1 distribution for an AMD processor with:

$ python -m pip install ${POPLAR_SDK_ENABLED?}/../tensorflow-1.*+amd_*.whl

Install the Graphcore TensorFlow 1 distribution for an Intel processor with:

$ python -m pip install ${POPLAR_SDK_ENABLED?}/../tensorflow-1.*+intel_*.whl

POPLAR_SDK_ENABLED is the location of the Poplar SDK (Section 3.1.5, Define environment variable). The ? ensures that an error message is displayed if Poplar has not been enabled.

3.1.3.2. TensorFlow 1 Addons wheel

This section describes how to install the wheel file for IPU TensorFlow 1 Addons, a collection of add-ons created for the Graphcore port of TensorFlow 1. These include TensorFlow layers. For more information, refer to the section on IPU TensorFlow Addons Python API in the TensorFlow 1 user guide.

Note

  • The IPU TensorFlow 1 Addons wheel file is only available in Poplar SDK 2.4 and later.

  • There are separate Addons wheel files for TensorFlow 1 and TensorFlow 1.

The wheel file has a name of the form:

ipu_tensorflow_addons-[ver]+X+X+X-X-X-X.whl

where [ver] is the version of the Graphcore port of TensorFlow 1. An example of the Addons wheel file for TensorFlow 1.15 for the IPU for Poplar SDK 3.0 is:

ipu_tensorflow_addons-1.15.5+gc3.0.0+236840+e2b938f-py3-none-any.whl

Install the IPU TensorFlow 1 Addons wheel using the following command:

$ python -m pip install ${POPLAR_SDK_ENABLED?}/../ipu_tensorflow_addons-1.*.whl

POPLAR_SDK_ENABLED is the location of the Poplar SDK (Section 3.1.5, Define environment variable). The ? ensures that an error message is displayed if Poplar has not been enabled.

3.1.4. Clone Graphcore Tutorials repo

You need to clone the Graphcore tutorials repo on some systems as detailed in Table 3.2.

If you don’t need to clone the tutorials repo, then go straight to Section 3.1.5, Define environment variable.

Table 3.2 Systems that need the Graphcore tutorials and examples repositories to be cloned

System

Clone repos?

Comment

Pod system

Yes

You can clone the tutorials and examples repos in any location.

Graphcloud

Yes

You can clone the tutorials and examples repos in any location.

Gcore Cloud

No

The tutorials and examples have already been cloned in ~/graphcore/tutorials and ~/graphcore/examples respectively.

There are several tutorials available in the Graphcore tutorials repository on GitHub.

You can clone the tutorials repository into a location of your choice.

$ cd ~/[base_dir]
$ git clone https://github.com/graphcore/tutorials.git
$ cd tutorials
$ git checkout sdk-release-[poplar-ver]

where [base_dir] is a location of your choice and [poplar-ver] is the version of the Poplar SDK that you are using (Section 3.1.1, Enable Poplar SDK). This will install the contents of the tutorials repository under ~/[base_dir]/tutorials.

3.1.5. Define environment variable

You need to define the following environment variable:

  • TUTORIALS_DIR: for the location of the cloned tutorials repository.

We also use the environment variable POPLAR_SDK_ENABLED. This environment variable is defined when Poplar is enabled (Section 3.1.1, Enable Poplar SDK) and defines the location of the poplar directory in the SDK directory.

3.1.6. Define tutorials location

In order to simplify running the tutorial in this (and other Quick Starts) you need to define the location of the tutorials directory as an environment variable.

$ export POPLAR_TUTORIALS_DIR=~/[base_dir]/tutorials

[base_dir] is the location where you installed the Graphcore tutorials.

3.1.7. Run the application

This section describes how to run a simple application, the MNIST example, using TensorFlow 1.

  1. Install example requirements

    You can now install the requirements that the model needs.

$ cd $POPLAR_TUTORIALS_DIR/simple_applications/tensorflow/mnist/
$ pip install -r requirements.txt
  1. Run example

You run the code with the command:

$ python3 mnist_tf.py

The example has no command line options.

If the code has run successfully, you should see an output similar to that in Listing 3.1.

Listing 3.1 Example of output for TensorFlow 1 application.
 Image shape: (28, 28) Training examples: 60000 Test examples: 10000
 Epochs: 5 Batch-size: 16 Steps-per-epoch: 15 Batches-per-step: 250
 Benchmarking the infeed...
 2022-01-10 11:11:00.459295: I tensorflow/compiler/plugin/poplar/kernels/datastream/dataset_benchmark.cc:131] Processed: 31533 elements/second.
 2022-01-10 11:11:00.459497: I tensorflow/compiler/plugin/poplar/kernels/datastream/dataset_benchmark.cc:133] Bandwidth: 1.58422 GB/s.
 2022-01-10 11:11:00.459510: I tensorflow/compiler/plugin/poplar/kernels/datastream/dataset_benchmark.cc:135] Dataset iterator completed epoch 0.
 2022-01-10 11:11:02.451142: I tensorflow/compiler/plugin/poplar/kernels/datastream/dataset_benchmark.cc:131] Processed: 30126.5 elements/second.
 2022-01-10 11:11:02.451195: I tensorflow/compiler/plugin/poplar/kernels/datastream/dataset_benchmark.cc:133] Bandwidth: 1.51356 GB/s.
 2022-01-10 11:11:02.451200: I tensorflow/compiler/plugin/poplar/kernels/datastream/dataset_benchmark.cc:135] Dataset iterator completed epoch 1.
 2022-01-10 11:11:04.468018: I tensorflow/compiler/plugin/poplar/kernels/datastream/dataset_benchmark.cc:131] Processed: 29750.2 elements/second.
 2022-01-10 11:11:04.468064: I tensorflow/compiler/plugin/poplar/kernels/datastream/dataset_benchmark.cc:133] Bandwidth: 1.49465 GB/s.
 2022-01-10 11:11:04.468069: I tensorflow/compiler/plugin/poplar/kernels/datastream/dataset_benchmark.cc:135] Dataset iterator completed epoch 2.
 2022-01-10 11:11:06.457348: I tensorflow/compiler/plugin/poplar/kernels/datastream/dataset_benchmark.cc:131] Processed: 30162 elements/second.
 2022-01-10 11:11:06.457380: I tensorflow/compiler/plugin/poplar/kernels/datastream/dataset_benchmark.cc:133] Bandwidth: 1.51534 GB/s.
 2022-01-10 11:11:06.457397: I tensorflow/compiler/plugin/poplar/kernels/datastream/dataset_benchmark.cc:135] Dataset iterator completed epoch 3.
 2022-01-10 11:11:08.437805: I tensorflow/compiler/plugin/poplar/kernels/datastream/dataset_benchmark.cc:131] Processed: 30297.3 elements/second.
 2022-01-10 11:11:08.437841: I tensorflow/compiler/plugin/poplar/kernels/datastream/dataset_benchmark.cc:133] Bandwidth: 1.52214 GB/s.
 2022-01-10 11:11:08.437846: I tensorflow/compiler/plugin/poplar/kernels/datastream/dataset_benchmark.cc:135] Dataset iterator completed epoch 4.
 2022-01-10 11:11:15.921698: I tensorflow/compiler/jit/xla_compilation_cache.cc:251] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
 Training...
 Epoch: 0/5
 Compiling module cluster_1076296992091806376_f15n_1__.343:
 [##################################################] 100% Compilation Finished [Elapsed: 00:00:17.7]
 Loss 0.01846 Accuracy 0.98743:  Epoch: 5/5
 Saving...
 Testing...
 Compiling module cluster_1_11159079291124969372_f15n_1__.197:
 [##################################################] 100% Compilation Finished [Elapsed: 00:00:10.3]
 Test loss: 0.06264137 Test accuracy: 0.97716677

You have run an application that demonstrates how to use one IPU for a simple TensorFlow model with the MNIST dataset.

3.1.8. Try out other applications

Try out other examples in the Tutorials or Examples repositories. You may have to clone the Graphcore Examples repository if your installation doesn’t already include it as defined in Table 3.2.

There are several example applications available in the Graphcore examples repository on GitHub.

You can clone the examples repository into a location of your choice.

$ cd ~/[base_dir]
$ git clone https://github.com/graphcore/examples.git
$ cd examples
$ git checkout tags/[tag_name]

where [base_dir] is a location of your choice and [tag_name] is the name of the tagged version corresponding to the version of the Poplar SDK that you are using (Section 3.1.1, Enable Poplar SDK). You can see the tagged versions here. This will install the contents of the examples repository under ~/[base_dir]/examples.

In order to simplify running the examples in this (and other Quick Starts) you need to define the location of the examples directory as an environment variable.

$ export POPLAR_EXAMPLES_DIR=~/[base_dir]/examples

[base_dir] is the location where you installed the Graphcore examples.

3.1.9. Exit the virtual environment

When you are done, exit the Python virtual environment.

$ deactivate

3.2. Ubuntu 20.04

Ubuntu 20.04 does not natively support TensorFlow 1. This means that you need to run TensorFlow 1 applications in an Ubuntu 18.04 Docker container. Refer to Using IPUs from Docker for more information.

The following commands provide an example of how to pull the latest TensorFlow 1 image from Docker Hub, and then instantiate the container(Listing 3.2):

Listing 3.2 Creating a TF1 docker container
$ docker pull graphcore/tensorflow:1-intel
$ gc-docker -- -ti -v /home/ubuntu/graphcore:/graphcore -e IPUOF_VIPU_API_HOST -e IPUOF_VIPU_API_PARTITION_ID graphcore/tensorflow:1-intel

Thereafter, you can complete the following from within the container: