1. Overview

The PopVision™ Graph Analyser application is used to analyse the programs built for and executed on Graphcore’s IPU systems. It can be used for analysing and optimising the memory use and performance of programs.

The PopVision Graph Analyser provides reports on the following:

image6

Summary Report of the IPU hardware, graph parameters, host configuration.

image7

Memory Report, which gives a detailed analysis of memory usage across all the tiles in your IPU system, showing graphs of total memory and liveness data, and details of variable types, placement and size.

image8

Liveness Report, which gives a detailed breakdown of the state of the variables at each step in your program.

image9

Program Tree, which shows a hierarchical view of the steps in the program.

image10

Execution Trace, which shows how many cycles each step of your instrumented program consumes.

Each of these reports is described in further detail in the sections below.

You can search this documentation by entering text into the Search box at the top of the Table of Contents, on the left. To cycle through any search matches, press the Return key repeatedly.

1.1. About the IPU

An in-depth description of the IPU hardware is available on the online IPU Programmer’s Guide. While we describe some of the relevant features of the IPU in this document, you should refer to the Poplar documents for a more in-depth understanding.

2. Capturing IPU reports

This section describes how to generate the files that the PopVision Graph Analyser can analyse. The PopVision Graph Analyser uses report files generated during compilation and execution by the Poplar SDK.

The sections below describe the files supported by the PopVision Graph Analyser. These files can be created using POPLAR_ENGINE_OPTIONS, the Poplar API, or when using the gc-profile command line tool. At a minimum you need either the archive.a or the graph.json file for the PopVision Graph Analyser to present reports.

In Poplar SDK 1.2 a new entry in the POPLAR_ENGINE_OPTIONS was added to make capturing reports easier. In order to capture the reports needed for the PopVision Graph Analyser you only need to set POPLAR_ENGINE_OPTIONS='{"autoReport.all":"true"}' before you run a program. By default this will enable instrumentation and capture all the required reports to the current working directory. For more information, please read the description of the Poplar Engine options in the Poplar and PopLibs API Reference.

By default, report files are output to the current working directory. You can specify a different output directory by using, for example:

POPLAR_ENGINE_OPTIONS='{"autoReport.all":"true", "autoReport.directory":"./tommyFlowers"}'

If you have a program that has multiple poplar programs i.e. you build and run a training and validation model, then autoReport will only output the last poplar program to be built or executed.

2.1. Limiting report output

The POPLAR_ENGINE_OPTIONS='{"autoReport.all":"true"}' outputs all report files. If you wish to capture only one file, or all but one file, you can use a combination of the following options:

  • To capture just one report file, set the specific autoReport output, that is: POPLAR_ENGINE_OPTIONS='{"autoReport.outputArchive":"true"}'.

  • To capture all but one report file, set the all option and disable the report you do not want, for example: POPLAR_ENGINE_OPTIONS='{"autoReport.all":"true", "autoReport.outputArchive":"false"}'

Execution reports can be very large if you are running many iterations of your programs. By default, only the first two runs of each of your Poplar programs are captured. This can be increased or decreased by setting the executionProfileProgramRunCount option as follows:

POPLAR_ENGINE_OPTIONS='{"autoReport.all":"true", "autoReport.executionProfileProgramRunCount":"10"}'

2.2. Reloading reports

The folder (or folders, if you’re comparing reports) that contain the individual report files are monitored by the application in case any of the files changes, for example, if you’ve re-run your Poplar program and re-generated new version.

If the application detects that any of the files have changed, a dialog box appears telling you what files have changed, and prompting you to reload the report files.

Make sure that your Poplar program has finished executing (in particular that the execution.json/cbor file has been completely written to disk before clicking on the Reload button, otherwise you may see inconsistent information displayed in the application.

2.3. Poplar report files

The PopVision Graph Analyser only supports fixed names for each of the files. If you save them with different names they will not be opened. When you are browsing directories to open, the PopVision Graph Analyser will highlight which of the following files are present in that directory.

2.3.1. archive.a

This is a binary archive of ELF files, one for each tile. With this you can see the total memory usage for each tile on the Memory Report.

Poplar Engine Options

POPLAR_ENGINE_OPTIONS='{"autoReport.outputArchive":"true"}'

Using Poplar API

POPLAR_ENGINE_OPTIONS='{"target.saveArchive":"archive.a"}'

Using gc-profile CLI

Automatically created

2.3.2. Graph Profile (graph.json or graph.cbor)

This file contains information about the graph compiled for the IPU(s). This file is used to show memory, liveness and program tree views and also compute set information on the execution trace view.

Poplar Engine Options

POPLAR_ENGINE_OPTIONS='{"autoReport.outputGraphProfile":"true"}'

Using Poplar API

To save the graph.json you need to use the Poplar API Engine::getGraphProfile and write the ProfileValue to file.

Using gc-profile CLI

Automatically created

2.3.3. Execution Profile (execution.json or execution.cbor)

This file contains instrumentation data after running a program on the IPU(s). The file is required to see the execution trace view.

Poplar Engine Options

POPLAR_ENGINE_OPTIONS='{"autoReport.outputExecutionProfile":"true"}'

Using Poplar API

You will need to set POPLAR_ENGINE_OPTIONS='{"debug.instrument":"true"}'. Then to save the execution.json you need to use the Poplar API Engine::getExecutionProfile and write the ProfileValue to file.

Using gc-profile CLI

Generated when the -i option is supplied

2.3.4. GC-Profile Information (profile_info.json)

This file contains details about the host system used. It is only generated when running :ref:gc-profile cli. This file is used to show host information on the Summary Report <summary_report>.

Poplar Engine Options

Not applicable.

Using Poplar API

Not applicable.

Using gc-profile CLI

Automatically created

2.3.5. Lowered Vars Information (vars.capnp)

This file contains details about the memory layout of each tile. This file is used to generate the variable layout in the Memory Report.

Poplar Engine Options

POPLAR_ENGINE_OPTIONS='{"autoReport.outputLoweredVars":"true"}'

Using Poplar API

POPLAR_ENGINE_OPTIONS='{"debug.loweredVarDumpFile":"vars.capnp"}'

Using gc-profile CLI

Automatically created

2.3.6. Serialized Computation graph (serialized_graph.capnp)

This file contains the serialised Poplar graph.

Poplar Engine Options

POPLAR_ENGINE_OPTIONS='{"autoReport.outputSerializedGraph":"true"}'

Using Poplar API

To save the serialised graph you need to use the Poplar API Graph::serialize.

Using gc-profile CLI

Automatically created

2.3.7. Frameworks Information (framework.json & app.json)

You can use Poplar to create two more ‘custom’ files into which you can put your own data from frameworks or your application. See the Framework and Application JSON files section for more details

When using the Poplar API to create your reports, you can select either serialised JSON format, or serialised CBOR format (which is slightly more compact). Using gc-profile only creates .json files, and using the autoReport Engine option only creates .cbor files.

2.3.8. Debug Information (debug.cbor)

This file contains debug information that allows you to see the details of variables in the Liveness report. This information includes:

  • Whether it’s a Variable, Cloned Variable or Constant

  • The shape of the variable as an array of dimensions

  • The type of variable (e.e. ‘Half’)

  • For cloned variables, which variable it was cloned from, and the method by which it was cloned

  • The location of the variable

To capture enhanced debug information when compiling your program, include the following option in your POPLAR_ENGINE_OPTIONS:

{"autoReport.outputDebugInfo":"true"}

Collecting the enhanced debug information will not increase the memory footprint of your IPU application. The enhanced debug information is generated when the model is compiled.

See Viewing enhanced debug information for more details of how to view this information.

2.4. Using Poplar API

The following is a code example of how you could capture profile information directly using the Poplar APIs.

// Copyright (c) 2020 Graphcore Ltd. All rights reserved.
// This file contains the completed version of Poplar tutorial 4.
// See the Poplar user guide for details.

#include <fstream>
#include <iostream>
#include <poplar/Engine.hpp>
#include <poplar/Graph.hpp>
#include <poplar/IPUModel.hpp>
#include <poplin/MatMul.hpp>
#include <poplin/codelets.hpp>
#include <popops/codelets.hpp>
#include <poputil/TileMapping.hpp>

using namespace poplar;
using namespace poplar::program;

int main() {
  // Create the IPU model device
  IPUModel ipuModel;
  Device device = ipuModel.createDevice();
  Target target = device.getTarget();

  // Create the Graph object
  Graph graph(target);
  popops::addCodelets(graph);
  poplin::addCodelets(graph);

  // Add variables to the graph
  Tensor m1 = graph.addVariable(FLOAT, {800, 500}, "m1");
  Tensor m2 = graph.addVariable(FLOAT, {500, 400}, "m2");
  Tensor m3 = graph.addVariable(FLOAT, {400, 200}, "m3");
  poputil::mapTensorLinearly(graph, m1);
  poputil::mapTensorLinearly(graph, m2);
  poputil::mapTensorLinearly(graph, m3);

  // Create a control program that is a sequence of steps
  Sequence prog;

  Tensor m4 = poplin::matMul(graph, m1, m2, prog, "m4");
  Tensor m5 = poplin::matMul(graph, m4, m3, prog, "m5");

  // Options for the Analyser Tool
  OptionFlags opt = {
    {"target.saveArchive", "archive.a"},
    {"debug.instrument", "true"},
    {"debug.loweredVarDumpFile", "vars.capnp"}
  };

  // Create the engine
  Engine engine(graph, prog, opt);

  // Write the graph.json for the PopVision Graph Analyser
  ProfileValue graphProfile = engine.getGraphProfile();
  std::ofstream ofGraph("graph.json");
  poplar::serializeToJSON(ofGraph, graphProfile, true);

  // Run the control program
  engine.load(device);
  std::cout << "Running program\n";
  engine.run(0);
  std::cout << "Program complete\n";

  // Write the execution.json for the PopVision Graph Analyser
  ProfileValue executionProfile = engine.getExecutionProfile();
  std::ofstream ofExecution("execution.json");
  poplar::serializeToJSON(ofExecution, executionProfile, true);

  return 0;
}

For more details please see the Poplar API Guide.

2.5. Using TensorFlow

If you use TensorFlow, and do not specify autoReport.directory, then the separate reports for each poplar program will be saved to a cluster directory i.e. cluster_8926926012737461_f15n_0__.3776.

TensorFlow doesn’t currently delete the Poplar Engine completely at the end of its run, and therefore the execution trace is sometimes not properly output, and you’ll see an error when trying to view it in the Execution Trace report. In this case, you should continue to use gc-profile, as described above.

For more details please see the guide Targeting the IPU from TensorFlow.

2.6. Using PopART

For more details please see the Poplar and PopLibs API Reference.

3. Opening IPU reports

In order to view reports, the PopVision Graph Analyser requires one or more of the files listed in the Capturing Reports section, above.

You can open report files on your local machine, or from a remote server over SSH.

3.1. Local reports

You can open report files stored on your local machine as described below.

3.1.1. Opening local reports

To open a local report on your machine:

  1. On the Home screen of the PopVision Graph Analyser application, click on the ‘Open a report…’ button in the ‘Open’ panel. You’ll be presented with a file selection dialog, and the ‘local’ tab at the top will be selected by default. You’ll see listings of the directories and files on your local machine.

  2. You can sort these files by name or modified date, in ascending or descending order, by clicking on the appropriate column header. Your sorting preference is saved.

  3. Use this dialog box to navigate to the folder in which your report files have been saved. You’ll notice that when the PopVision Graph Analyser identifies a directory in which any of the report files listed above are found, those files are listed on the right-hand side. Note that if the minimal file requirements are not present in a directory (see the table above), the ‘Open’ button will be disabled.

  4. Once you’ve selected the directory with the necessary report files within it, click on the ‘Open’ button to load the report data from the files.

If you’ve used the application before, the Home screen also displays a list of recently opened report directories in the ‘Recent’ box. Click on one to open it again.

The Summary Report is displayed first, and the progress bar along the top of the screen shows the files being pre-processed by the application prior to being loaded and displayed.

Notice that the bottom of the Summary Report shows the relevant files> that have been found, and their loading state. More details on these files, and which reports need which of them, can be found here.

3.2. Remote reports

If you are using an IPU system on a remote server, for example on a cloud service, any reports generated will be saved to that server, so you cannot open them ‘locally’. You can, however, open them remotely by specifying the server address, and signing into the machine over SSH. The report contents are them streamed back to the PopVision Graph Analyser application on your local machine, allowing you to view the reports.

Opening a remote report requires you to have an SSH key set up on your local machine so that the remote server can authorise the connection.

When the PopVision Graph Analyser opens report files on a remote machine, it downloads a small binary app to it which pre-processes the report data and sends it back over SSH to the PopVision Graph Analyser application running on your local machine. If you’re running other performance-critical processes on that remote machine, you should be aware of any effects this process may have on the capacity of the remote machine’s hardware to run any other tasks. As server performance varies a great deal, the only way to know how much processor speed it takes is to try a small sample, and monitor the CPU usage.

3.2.1. Opening a remote report

To open a remote report on another machine:

  1. On the Home screen of the PopVision Graph Analyser application, click on the ‘Open a report…’ link in the ‘Open’ panel. You’ll be presented with a file selection dialog, and the ‘local’ tab at the top will be selected by default.

  2. Click on the ‘remote’ tab at the top, and you’ll see a login dialog that allows you to connect to a remote server. Enter your username, and the address of the remote machine.

  3. If you just want to log in with a password for the remote machine, enter it in the Password field.

  4. Alternatively, you can use your local machine’s SSH key to authorise your connection. Enter its file path in the Preferences dialog.

  5. Once you’re logged in, you’ll see a file dialog listing the directories and files on the server. You can sort these files by name or modified date, in ascending or descending order, by clicking on the appropriate column header. Your sorting preference is saved.

  6. Navigate to the folder in which your Poplar report files have been saved. You’ll notice that when you select a directory in which Poplar report files are found, the file window lists those files on the right-hand side. Note that if the :ref:archive.a file or the archive_info.json file are not present, the ‘Open’ button will be disabled, as one of these these files is the minimal requirement for generating a report in the PopVision Graph Analyser. See the Report files <report_files> section above for details of how to generate each file, and what it contains.

  7. Once you’ve entered the directory with the necessary report files in it, click on the ‘Open’ button to load the report.

The SSH connection is constantly checked, and if, for any reason, it goes down, a warning dialog is displayed, letting you know.

The Summary Report is displayed first, and the progress bar along the top of the screen shows the files being loaded into the application, and the report data being analysed and prepared for display.

Notice that the bottom of the Summary Report shows the relevant files> that have been found, and their loading state. More details on these files, and which reports need which of them, can be found here.

If you chose to create an additional ‘pass-phrase’ when you created your SSH key, you’ll need to make sure that it is added to your key before the PopVision Graph Analyser can use it, by using the ssh-add command-line tool. In this situation, use ‘SSH Agent mode’ in the Preferences, as PopVision Graph Analyser doesn’t directly support passphrases.

To configure ssh agent, from a terminal you can run the following.

# Start the ssh-agent in the background.
eval "$(ssh-agent -s)"

# Add your SSH private key to the ssh-agent
sshadd -K ~/.ssh/id_rsa

Then restart PopVision, click Preferences and remove the path pointing to your SSH private key path. Make sure that SSH agent mode is set to “Automatically obtain ssh-agent socket path from environment”.

4. Viewing reports

The PopVision Graph Analyser displays interactive graphical and textual reports, and you can interact with these in a number of ways to get to the information you want. Each report has a few different options that are only relevant to that report, but they all share some features in common, as described below.

When you open a report, its file path is displayed in the title bar of the report window.

4.1. Using the side menu

When you’re loading some report data, and the Summary Report is displayed, the side menu becomes visible on the left-hand side of the application window. This contains buttons at the top for viewing each of the main report types, and three buttons at the bottom:

image15

Reload report - if you need to reload a report, particularly if any of the files from which it was generated have been updated from a recent execution of your program, click this button to re-import all the report files. See Reloading reports for more information.

image16

Close report - once a report is loaded into the PopVision Graph Analyser, you can close it by clicking this option. This ‘unloads’ all the report data from the application and returns you to the opening page. If you want to view those reports again, you’ll need to re-load the data.

image17

Documentation - this opens the documentation window (which you’re now reading). If you were viewing one of the report pages, the Documentation window opens up on the relevant page.

image18

Minimise/Expand - this controls whether the side menu is expanded so that the menu option text is visible, or minimised, so that only the icons are visible.

4.2. Adjusting report size

There are several ways to change the size and scale of the report in the PopVision Graph Analyser window:

  • To zoom in and out of a particular section of the graph, click and drag horizontally in the graph preview area above the main graph, and the display will change to show the graph that corresponds to that section of the data. A pair of limiter icons appear in the preview area to show the start and end of the data displayed in the main graph area, and these can be dragged left and right as well to change the amount of data in the main graph. Using the scroll-wheel on your mouse will also zoom in and out of the graph.

  • You can also click and drag the main graph itself to view areas to the left and right of the currently viewed area. Note that clicking without dragging can sometimes select a specific tile (for example, in the Memory Report), but you can clear this selection from the input box above the graph.

  • You can reset the zoom scale of the Memory and Liveness reports by clicking on the small button to the left of the preview area, top-left of the graph. This zooms out to the furthest level, showing the entire graph.

  • To make a report larger, so that you can see more detail, you can drag the edges of the window to increase its size. This resizes the report images as you drag.

  • To adjust the space that each half of a report takes up on the page, click the ‘splitter’ icon between the two halves of the report, and drag it up and down. The two report sections resize accordingly.

4.4. Saving report images to disk

You can save report graphs to disk as image files, to avoid you having to make screen captures.

  1. Click on the ‘Save’ icon in the top right-hand corner of a report.

  2. Select the directory on your computer where you want to save it.

Report images are saved as PNG images, with transparent backgrounds. Please be aware that some image-viewing applications on your machine may display the traditional ‘checked background’ to report images saved in this way, which identifies it as transparent. When imported into other applications (for example, presentation software) they will look best on a light background (for images saved while in ‘light’ mode) or a black background (for images saved while in ‘dark’ mode.

5. Viewing a Summary Report image19

When you first open a report the Summary view is shown. This report shows the following information:

5.1. Program Information

The top half of the report shows details of the IPU system the program was compiled for, and also details of the size of the graph that Poplar created.

5.1.1. Target

  • Type: what kind of IPU the program was compiled for. This will either be IPU or IPUModel

  • Architecture: what version of IPU was used to run the program. This will either be Mk1 or Mk2

  • Number of IPUs: how many IPUs the program was compiled for.

  • Number of Replicas: if the number of replicas in the program is greater than one, the number is show here.

  • Total Memory: total memory on all IPUs the program was compiled for.

5.1.2. Graph

  • Number of compute sets: how many compute sets were found in the program definition.

  • Number of edges: how many data input/output variables were found in the final Poplar graph, connecting the compute vertices.

  • Number of variables: how many variables Poplar created from the program description. These are called lowered variables to distinguish them from the variables you created by hand when defining the program.

  • Number of vertices: how many compute codelets Poplar created from the program description.

5.2. Host Information

The bottom half of the report shows details of the host machine’s software configuration.

This information is generated when using the gc-profile CLI.

Information in this part of the report is as follows:

5.2.1. Poplar SDK

  • Poplar version: the version of Poplar that was used for the program.

  • Framework: which machine learning framework was used.

  • Framework version: the version of the framework used.

5.2.2. Program

  • Command: the command line that was used to run the program

  • Return code: the return code of the program. A zero generally means that the program executed without errors.

  • Start time: the time the program was started.

  • End time: the time the program ended.

  • Duration: the total elapsed time the program ran.

5.2.3. System Information

  • Operating system: the type of host machine operating system used to compile the program (e.g. ‘x86_64’, indicating an Intel-based 64-bit operating system).

  • Processor: the type of host machine processor used to compile program (e.g. ‘x86_64’, indicating an Intel 64-bit processor).

  • Platform: The name and version number of the host machine’s platform designation, and its operating system.

  • Python version: the version number of the Python language interpreter used on the host machine.

  • CPU: the type of host machine processor used to compile program (e.g. ‘x86_64’, indicating an Intel 64-bit processor).

5.2.4. Framework and Application JSON files

If you have a framework.json or an app.json file that was created in your program, their contents are displayed here so that you can check any parameters that you recorded in them. This is useful when comparing two reports, allowing you to spot differences easily.

You can put whatever information you wish into these two files, and if they’re found in the reports folder when the other report files are being loaded, their contents are displayed in a foldable tree. This assumes that the files are valid JSON.

5.2.5. Report files

This section of the Summary report shows the folder from which the report files were loaded (or both folders, if you’re comparing reports). It also shows which individual files are being loaded into the PopVision Graph Analyser, as documented here. For each file that is present:

  • A set of three green dots indicates the file was found, and is being analysed and loaded.

  • A green tick indicates that a file was found and has been loaded successfully.

  • A greyed-out question mark indicates that the corresponding file was not found.

  • A red cross indicates that the file could not be loaded. A warning message can be found in the Host Information section, directly above.

The folder from which the reports were loaded (or both folders, if you’re comparing reports) is also displayed.

6. Viewing a Memory Report image20

The Memory Report shows a graphical representation of memory usage across all the tiles in your IPU system, showing graphs of total memory and liveness data, and details of variable types, placement and size.

There are two main areas of the Memory Report:

  • The Memory graph, in the top half of the window, which shows two different types of memory graph. Click on the ‘Graph Type’ drop-down menu at the top left-hand corner of the graph to select a graph type:

    • Total Memory graph, which shows the memory usage of your program across all the IPU tiles. You can view a breakdown of this data by region (whether to display interleaved and non-interleaved memory separately).

    • Liveness graph, which shows the memory usage of the two types of program variables: those that are always live, and those that are Not Always Live (note that this is a maximum value - see the relevant FAQ.)

    • Variables graph, which allows you to plot the memory usage of multiple individual variables.

    • Tile Map, which shows the memory usage of the physical tiles overlaid on a schematic of the IPU.

  • The Tile Memory Usage report, in the bottom half of the screen, which shows memory usage broken down by various categories, and memory maps of individual tiles.

You can choose various view options for each graph, and you can also click on the graph to view details for an individual tile.

6.2. Total Memory graph

This memory report shows the total memory usage across all the tiles on all IPUs.

  • On a Memory Report, select ‘Total Memory’ from the Graph Type menu.

The horizontal axis shows the tile number (which you can order by software or physical ID (see above), and the vertical axis shows the memory usage.

6.2.1. Report breakdown

IPU memory has two different types of memory regions which Poplar allocates to data depending on how that data needs to be accessed:

  • interleaved, in which variables can be read and written to simultaneously, and

  • non-interleaved, in which memory can only be either read from or written to at any one moment.

  • overflowed, memory that exceeds the maximum amount available on a tile.

6.3. Liveness Memory graph

This memory report shows the memory usage of the two types of program variables:

  • On a Memory Report, select ‘Liveness’ from the Graph Type menu.

  • Always Live variables - these variables are permanent, and present for the entire execution lifetime of the program, occupying the same address.

  • Max Not Always Live variables - these variables are temporary, and are created as needed by Poplar to maximise memory usage on the IPU. When no longer needed, that memory space is available for other temporary variables to use. Note that these values are maximum memory values - see the relevant FAQ.

6.4. Variables Memory graph

This memory report allows you to select multiple variables and plot their memory usage across tiles.

  • On a Memory Report, select ‘Variables’ from the Graph Type menu.

You can also access this feature from the ‘Total Memory’ report by selecting a variable from the Variables tab as described below.

The Variables Memory graph is not available when viewing memory by IPU.

6.5. Tile map Memory graph

This memory report displays a schematic of an IPU and overlays it with a coloured representation of the tile memory usage for every tile for the selected IPU. The colour key is displayed on the right, and its range can be changed, as described below.

The Tile Map Memory graph is not available when viewing memory by IPU.

  • On a Memory Report, select ‘Tile Map’ from the Graph Type menu.

  • Select an IPU to view using the input on the left, and the map updates to show the memory usage for that IPU.

  • Hover the mouse over the tile map to see a popup of the details of a tile within the selected IPU. This shows physical and software tile ID, memory usage and rank (described below). While you hover, a black line within the colour key, to the right of the tile map, shows the memory usage of the hovered tile, according to the colour scale currently selected.

  • Click on a tile on the map to select it and see its memory usage. Its details are shown in the tabs and tables below, as in other memory reports. You can select multiple tiles by holding down the shift key while clicking a tile. Details for each tile are displayed in the tables below, with a column for each tile. The selected tile numbers are displayed in the search box above the tile map, so can enter them by hand if you know which one you’re looking for.

  • The Breakdown menu at the top of the tile map allows you to break down memory usage by region (see here for an explanation of this). When breaking down by region, the ‘Region’ control to the left of the map allows you to choose to display interleaved or non-interleaved memory.

  • The Options menu above the tile map allows you to view the memory usage including or excluding gaps.

Note that you can change the size of the tile map by dragging the split-screen control, and it will fill the space available in the top half of the screen.

6.5.1. Changing the colour scale

There are three methods of colouring the tiles on an IPU that show their memory usage in different ways. Use the ‘Scale type’ dropdown to the left of the tile map to select one of these scales:

  • Relative (default) - The colour of a tile depends on the memory between the upper and lower memory values.

  • Absolute - The colour of a tile depends on the memory between zero and max memory.

  • Rank - The colour depends on a linear ordering of tiles based on their memory usage.

When your model is out or memory, the colours are scaled appropriately, not just to the max memory.

6.6. Tile memory usage

The bottom half of the Memory Report screen shows three tabs that contain an analysis of memory usage by several different categories.

  • The default view shows memory usage for all tiles (or IPUs, if you are choosing to plot by IPU instead of tile), but you can select an individual tile/IPU as described above.

6.6.1. Tile memory usage: Details tab

The Details tab in the tile memory usage report displays a hierarchical list of memory usage by category on the selected tiles. This list is divided into three main sections:

  • Including Gaps - this shows memory usage on the selected tiles which includes the gaps between variables.

  • Excluding Gaps - this shows memory usage on the selected tiles which excludes the gaps between variables. It is split into interleaved/non-interleaved memory and also categorised by the type of data in that memory location.

  • Vertex Data - this shows the memory used by variables in the graph vertices as the Poplar program executes, categorised by the types mentioned in the ‘Excluding Gaps’ section, below.

Excluding Gaps

Memory usage on the selected tiles is displayed here in two categories, with memory usage figures for each:

  • by Memory Region - this show memory that is non-interleaved, memory that is interleaved, and any memory that has overflowed.

  • by Data Type - this shows memory further categorised by the type of data that is stored there (either overlapping data or non- overlapping data). The meaning of each of these categories is explained in the table below.

Not Overlapped data

This is data that is permanently stored in the same memory location throughout the execution lifetime of the program. No other variables can ‘overlap’ any other variables in this memory location.

Variables

These are all the ‘always live’ variables that are created by the program and the results of its compilation. They exist at that location for the execution lifetime of the program, including all user variables.

Internal Exchange Message Buffers

During the exchange phase of program execution, it may not be possible to send data straight to its destination, and so this data is stored in a buffer so that it can be sent at the next available time. An example of this might be that you want to send only two bytes, but the exchange has a minimum granularity of four bytes. Poplar needs to store those two bytes somewhere, and copy it to its destination. Another example is performing data rearrangements (e.g. a transpose), where you can’t use the exchange to do all the data movement, because this might increase the size of the exchange code. The Poplar compiler does some analysis which estimates whether it’s worth moving data to one of these buffers and using a transpose vertex.

Constants

These are variables that created using addConstant(), and are permanently located at the same memory location. See the note at the bottom of this table, however, for a possible exception.

Host Exchange Packet Headers

When transferring data to and from the host, the PCI interface needs to know the destination and size of the data that it should expect before it’s actually sent. This variable is where that information is stored. This is distinct from an Internal exchange, where you only need to indicate where the data is to be sent, and the compiler automatically schedules the destination tile to expect it when it’s due.

Stack

This is where the program stack variable lives, and it’s created automatically when Poplar executes. There is a Stack variable on every tile. The stack size is one of the options in the Engine class.

Vertex Instances

These store the state of each vertex instance, as a result of a variable being added with the connect method. sizeOf(vertex).

Copy Descriptors

Similar to vertex state, above, but for the internal runtime copies. These are shared amongst multiple copies (unlike vertex state, where each gets its own piece of memory). For example, if you have two Copy() calls that are identical, then instead of storing the source and destination pointers twice, both vertex states can just hold a pointer to the single copy descriptor.

VectorList Descriptors

A vector of pointers that points to the values of a multi-dimensional vector. The data for VectorList<Input<...>, DeltaN> fields.

Vertex Field Data

Variable-sized fields, e.g. the data for Vector<float>, and Vector<Input<...>> fields.

Control Code

This category is deprecated.

Vertex Code

This is where the assembled code from the Vertex ‘codelets’ is stored. It is always stored in a separate memory bank to all the other variable data so that it cannot be overwritten during execution.

Internal Exchange Code

The code instructions used to move data between tiles on an IPU.

Host Exchange Code

The code instructions used to move data between an IPU and the host machine.

Instrumentation Results

If you set up the debug.instrument option in the Poplar Engine, this is where the cycle counts for various Poplar functions are stored. You’ll notice, therefore, that enabling instrumentation increases your memory usage. Different levels of instrumentation can be selected, which will use different amounts of memory. Note that the size of these variables is dependent on the level of dynamic branching in your program – if you’re timing every instance of a function call, the compiler won’t necessarily be able to tell in advance how much memory it will require to keep a cycle count for each of them.

Overlapped data

This is data that is Not Always Live, meaning that is temporary, and its memory location can be overwritten by other not-always-live variables when needed. Reusing memory in this way reduces the amount that is required by Poplar programs.

Variables

These are all the temporary, not-always-live variables that are created by the program as it executes.

Program & Sync IDs

The Poplar Engine has a run() method to which you pass a vector of programs you want to execute. Each of these programs has an ID, so that you can specify which one to execute first. That integer variable lives on the IPU, and is categorised as a Program ID. Similarly, the Sync ID is used to store information about dynamic program structures, to allow the IPU to keep track of the path.

Internal Exchange Message Buffers

Used to store messages for tile-to-tile exchange within an IPU.

Host Exchange Message Buffers

Used to store messages for PCI data exchange between the IPU and host machine.

Data Rearrangement Buffers

Used to store intermediate data when performing rearrangement operations such as transpose, or where the data size doesn’t correspond to the granularity of the exchange. Using this memory space avoids the need for multiple PCI sends.

Non-overlapped variables are almost guaranteed to have been created by the Poplar addConstant() function, because they must remain ‘live’ for the execution lifetime of the program. Variables created by addVariable() will almost always end up in Overlapped data, because the compiler can identify in advance when it will need memory space for that variable, and when it can be overwritten by other variables. However, if you use addVariable() to create, for example, a tensor that is only ever read (and not written to) it will behave like Non-overlapped data (because it must remain ‘live’), despite being categorised by the report as ‘Overlapped’.

6.6.2. Tile memory usage: Vertices tab

This tab in the tile memory usage report lists the memory used by the graph vertices, together with the total memory size they occupy across the selected tiles (or all tiles, if none is selected). This list is ordered by decreasing memory usage.

For each vertex, the Poplar namespace and function name are listed, together with any additional information about their types. Please refer to the Poplar API Reference for a description of each of these functions.

6.6.3. Tile memory usage: Variables tab

This tab in the tile memory usage report displays a memory map of the currently selected tile, showing code and variable usage across the memory locations. Note that this tab is not available when viewing memory by IPU.

Entries in this section of the report are only present if the :ref:vars.capnp file was present in the report files directory. See the Report files <report_files> section for details about how to generate this file when executing your program.

  • Select an individual tile as described above to view its memory layout and variable usage.

There are several interactive features of the Variables view that can help you find the locations in which variables are stored:

  • The selected tile’s memory is displayed vertically in a scrollable area that is 128 bytes wide. The tile memory is partitioned into banks, with vertex code always occupying bank 0. Any unused banks at the top of the IPU memory are not displayed. Variables are displayed as coloured bars which span the memory locations, and in places where two or more variables overlap, only the largest is shown.

  • All variables in the memory layout are coloured according to their type. Click the colour key icon in the top right-hand corner to view the colour key for each type. The meaning of each of the categories displayed here is described in the table above.

  • Click on a variable to display its details, which appear on the right-hand side of the memory layout. This displays all variables which exist at any time at that memory location.

  • Click on the ‘Show’ button at the bottom of the variable details, beneath the ‘Interference’ heading, to filter other variables that interfere with the selected variable in terms of memory placement. See the Memory interference section below.

  • Search for a variable by entering search text into the input field above the memory layout. Variables with matching text in their names will be highlighted in the memory layout, with all others disappearing. You can clear any text you’ve entered here by hovering over the box and clicking the small x icon at the right-hand end.

  • Plot one or more variables on the Memory graph, as described below.

You can expand the variables map to fill the report window by clicking on the arrow button in the top right-hand corner of the map. A corresponding arrow to shrink the map again appears in the top-right corner.

Memory interference

You can see other variables with which a selected variable ‘interferes’. These are variables that are in contention in terms of their memory placement. Memory interference arises because some variables are restricted in the type of memory in which they can be stored:

  • A variable cannot be in the same memory element as another one.

  • A variable cannot be in the same memory region as another one

  • A variable must be stored in interleaved memory.

To see which other variables interfere with a selected variable:

  • Select a variable from the memory map by clicking on it. Several variables may occupy that memory location, and their details are displayed in a list on the right-hand side.

  • Click on the ‘Show’ button at the bottom of a variable’s details, beneath the ‘Interference’ heading, and the variables in the memory map will be filtered using that variable’s name, showing only those that interfere with it.

  • To re-display all variables, click the small cross in the filter box at the top of the variable memory map display.

6.6.4. Variable types

Variables in the memory layout diagram are categorised by colour as described below. Note that more detailed descriptions are available in the Excluding Gaps table, above.

  • User variables - these are user-defined variables.

    • Variable - variables created using the addVariable() Graph function.

    • Constant - variables created using the addConstant() Graph function.

  • Code variables - these are variables that are created by Poplar to execute the program code.

    • Control Table - used by Poplar to keep track of the program execution path in compute sets.

    • Control Code - the code used by Poplar to run the program.

    • Vertex Code - the code within the vertex ‘codelets’.

    • Internal Exchange Code - the compiled code used to move data between tiles on an IPU.

    • Host Exchange Code - the compiled code used to move data over PCI between an IPU and the host machine.

    • Global Exchange Code - the compiled code used to move data between multiple IPUs.

  • Vertex data variables - these are variables associated with tensors.

    • Vertex Instance State - the internal state of each vertex instance.

    • Copy Descriptor - shared memory for identical copy descriptors that are pointed to from within vertices.

    • Vector List Descriptor - vector of pointers that points to the values of multi-dimensional vectors.

    • Vertex Field Data - static variables that are not embedded into the state of vertex instances.

  • Temporary variables - these are used to temporarily store information that Poplar uses while executing the program code. They are ‘not always live’, overlapping variables.

    • Message - messages sent between tiles, IPUs and the host.

    • Host Message - messages sent between a tile and the host.

    • Global Message - messages sent between IPUs.

    • Rearrangement - variables used to store intermediate values when performing tensor rearrangements (e.g. transpose) to save having to make lots of individual message sends.

    • Output Edge - the continuous piece of memory that a vertex writes to if not writing directly to a tensor.

  • Miscellaneous variables - other variables that don’t fit into the categories above.

    • Multiple - other variables from multiple other categories, including all the ‘lowered’ variables that Poplar creates in addition to user- defined ones created in vertex code.

    • Control ID - the combination of program ID, sync IDs and software sync counter.

    • Host Exchange Packet Header - header information for PCI messages between the IPUs and the host machine.

    • Global Exchange Packet Header - header information for PCI messages between IPUs.

    • Stack - program stack for each tile core.

    • Instrumentation Results - the cycle counts if instrumentation is enabled.

    • Shared Code Storage - storage for code shared amongst tiles.

    • Shared Data Storage - storage for data shared amongst tiles.

    • Shared Structure State - constant data used for the shared structure exchange.

6.6.5. Plot multiple variables

When a variable is selected in the Variables tab, its details are displayed in the right-hand column next to the memory map for that tile. You can then plot that variable’s position in memory on the main graph, as follows:

  • With the variable selected, click on the ‘Plot variable’ button. The ‘Graph Type’ menu at the top changes to ‘Variables’. The variable name is added to the variable list above the report, and you can see how it is placed across the tile memory.

  • Click other variables in the memory map, and you can repeat the process above, adding them to the variable list at the top of the report, and displaying their memory placement together on the same graph. The default behaviour shows the size of the variable on each tile. If you select ‘Plot variable by address’ from the Options menu at the top of the screen, you can see how the variable is laid out in the memory space.

  • To remove a variable from the graph, find its name in the list above the graph, then click the small ‘x’ button at the right-hand end.

6.6.6. Full-screen option

When viewing the content of the Variables tab it may be easier to view the data in full-screen mode. You can toggle this option on and off using the arrow button in the top right hand corner of the tab.

7. Viewing a Liveness Report image21

The Liveness Report shows which of the ‘Not Always Live’ variables are allocated at certain points in your program, and gives a detailed breakdown of variable memory usage by the compute set that they’re in. There are two main areas to the report:

For a standard machine learning model that you’re training, for example a ResNET, you’ll generally see a curve that ascends to a peak and descends again. The rising portion of this curve is the memory usage during the forward pass of the training algorithm, where many activations are created (‘not always live variables’), and the peak represents the point where the maximum number of activations exist. As the curve descends again, which represents the backwards pass of the training algorithm, the activations are ‘released’ after being used to update the weights.

When you’re inspecting a liveness graph, it’s informative to look at the peak of this curve, and select the corresponding compute set to show the how the variables usage is contributing to the greatest memory utilisation.

7.2. The Liveness graph

  • Hover your mouse pointer over the graph to see a callout containing the compute set ID, together with the amount of memory used by that compute set.

  • Click on the graph to see that compute set’s stack details below the graph, which displays its name and memory size.

  • You can choose to display memory usage statistics for the selected tiles - see the relevant Preferences section, below.

  • You can plot the lifetime of Not Always Live variables directly on the graph - see here for more details. This is an experimental feature.

7.2.1. Selecting a source

You can choose whether you want to select an individual tile, or, if the report was generated using Poplar SDK 1.2 or later, select an individual IPU.

To select particular tiles or IPUs:

  • Click on the ‘Select Source’ dropdown list at the top of the graph and select the source you want to use for the graph. Each source will have its own plot.

  • All tiles - shows liveness data for all the tiles (the default)

  • Worst tiles - shows the two tiles that have the highest memory usage during program execution, and you can select either of them to view.

  • If the report was generated in Poplar 1.2 or later, you can also select one or more IPUs from this list.

There is a Poplar Engine setting which allows you to capture more than the default two worst tiles. Please refer to the Poplar API Reference for full details of these options.

7.2.2. Filtering steps

You can concentrate on the steps you’re interested in by filtering on a particular search term. When you enter a term (and press the Return key, those steps that don’t match in the execution graph are moved to a separate dataset in the graph, and ‘greyed-out’, leaving only the steps whose named match your search term.

To cancel the filtering, click on the small ‘x’ at the right-hand end of the search box.

7.2.3. Viewing options

There are several options for viewing Liveness Report graphs, which you can select from the ‘Options’ drop-down menu in the top right-hand corner of the report screen:

  • Include always live - whether to include a second trace in the graph that shows variables that are always present during the entire execution of the program, and always have the same memory address (for example stack). Note that you can show and hide each of the traces on the graph by clicking on their colour key, just below the x-axis.

  • Include empty stacks - whether to include stacks that contain no variables.

  • Stack IPUs - whether to display any selected IPUs (see above) stacked or not.

  • Show Max Memory - whether to display a line on the graph showing the maximum memory limit for total memory, a selected tile, or a selected IPU. Note that this requires the ‘Include Always Live’ option, above, to be enabled, and ‘Stack IPUs’ to be disabled.

7.3. Liveness stack details

In the bottom half of the Liveness report screen, you’ll see a tabbed list of live variable details that show their memory usage. These details are described below.

7.3.1. Viewing enhanced debug information

If you have captured enhanced debug information when compiling your program, it is visible in the variables tabs. See the section on the Debug Information file, above, for details of how to capture this information

If a variable has a small ‘disclosure arrow’ next to it, click it to display more details about it. This includes:

  • whether it’s a Variable, Cloned Variable or Constant,

  • the shape of the variable as an array of dimensions,

  • the type of variable (e.e. ‘Half’),

  • for cloned variables, which variable it was cloned from, and the method by which it was cloned,

  • the location of the variable in your code.

7.3.2. Always Live Variables

This tab shows the temporary variables that are created throughout the execution fo the program, and whose memory is overwritten by other temporary variables when they are no longer needed.

7.3.3. Not Always Live Variables

This tab shows the permanent variables that always occupy the same memory location, and are present throughout the lifetime of the program execution.

You can select one or more of the Not Always Live variables to see their lifetime displayed on the graph above. Click on the small icon at the right-hand end of their listing, and they will be added to the plot, showing when they were created, and when they were destroyed. Each variable plotted also appears as a small blue box above the graph, and you can click the small x within them to remove them from the graph. This is an experimental feature.

7.3.4. Vertices

This tab shows the vertex functions that are contained in the selected compute set.

7.3.5. Cycle estimates

This tab shows the cycle estimates of each tile on a ipu. This is only available when using an IPUModel

This is an experimental feature, and can be enabled and disabled in the Preferences.

8. Viewing a Program Tree image22

The Program Tree report shows a hierarchical view of the steps in the program that is run on the IPU. The report has a menu on the left-hand side that lists the Control Programs and Functions. The program steps contained within the selected control program or function are displayed in the main report area, on the right.

  • Click on one of the Control Program or Function numbers in the left-hand column to see the sequence of instructions that are contained within it.

The program steps in the main report area are listed hierarchically, and you can collapse or open steps that have nested steps with them by clicking on them. A small grey triangle at the start identifies steps that have sub-steps: if it’s pointing to the right, then that step is ‘collapsed’, and all of its sub-steps are currently hidden. Click the step to open up that step and show all its sub-steps. All steps are initially show opened up.

Each of the program steps is colour-coded, making it easier to ‘pick out’ particular steps that you’re interested in viewing. Where steps corresponding to those found in the Execution Trace report, they are coloured the same.

For a detailed explanation of what each of these program steps involve, please refer to the Poplar & PopLibs API Reference Guide on the Graphcore Developer Documentation website.

8.1. Searching for program steps

You can search for a particular step by entering some text in the Search box at the top of the report window, and then pressing the Return key. Program steps that match your search term are highlighted in, yellow in the main part of the report. Control programs and Functions that contain matching steps are also highlighted in the left-hand window - click on them to see the matching steps within.

8.2. View options

You can choose to view vertices for selected program steps, where available. Enable ‘Select Vertices’ from the Options menu. This is an experimental feature.

9. Viewing an Execution Trace image23

The Execution Trace report shows the output of instrumenting a Poplar program, capturing the cycle count for each step in the program. Also displayed are statistics, tile balance, cycle proportions and compute-set details.

There are two halves to this report:

  • the Execution Trace graph, in the top half of the screen, showing wither a flat compute set execution trace, or a flame graph of the stack,

  • a details report, in the bottom half of the screen, showing statistics about the cycle proportions, tile balance and compute sets present in the portion of the graph currently displayed.

9.1. The Execution Trace graph

The top half of the Execution Trace report shows, by default, a ‘flat graph’, showing a set of consecutive blocks for each IPU, identifying what program steps were executed (as shown in the Program Tree report), and how many cycles they took. You can also view it as a flame graph, where the Poplar debug information is used to group compute sets together as part of the same operation, or application layer.

  • Selecting features that you want in the graph from the Graph View drop-down, including Flame graph, BSP Trace, and displaying separate runs of the program.

  • You can move around the main report graph using your mouse, as described above.

  • Hover your mouse pointer over the graph to see a callout containing the compute set ID, together with the amount of memory used by that compute set.

  • Click on the graph to see that compute set’s stack details below the graph, which displays its name and memory size.

  • Double-clicking on a layer when the flame graph option is selected expands that layer to the full width of the graph. You can select two layers at once by clicking on the first one, then shift-clicking the second one. The graph then expands to contain just two layers. This makes it possible to inspect the cycle proportions of only the visible section of the execution trace, making it easier to understand the proportion of cycles spent each type of process.

9.1.1. Execution View

Use the ‘Execution view’ dropdown list in the top left-hand corner of the Execution report to control what features you’d like to see on the graph. These include:

  • Runs - this setting can be toggled to display or hide the inclusion of program run information on the graph. When enabled this displays, just below the mini-map, a set of dark grey markers that indicate when each program run starts and ends. On the graph itself, a bar at the top of each IPU ‘lane’ shows the names of each program that runs. See Defining program names, below.

  • Flat & Flame - these settings toggle between a ‘flat’ view, where all the program steps are compressed into a single ‘lane’ on the graph, or a ‘flame’ view, where the call structure of all steps is displayed. Note that you can also control how overlapping steps are displayed with the Separate overlapping steps control, in the View Options control.

  • BSP - whether to include a graphical depiction of the BSP activity, showing where the patterns of the IPUs’ internal Sync, Compute and Exchange steps occur.

The current combination of settings is displayed on the drop-down button itself.

Defining program names

You can specify program names in your programs by using the Poplar or PopART APIs. The Execution trace graph can display this information by enabling the Runs option in the Execution View drop-down control above the graph.

  • The Poplar Engine::run API now takes a debug string for the name of the run.

  • The PopART Session::run API allows you to specify a string for the name of the run, as well as additional strings for internal program runs, for example: WeightsToHost.

If you enable the display of runs in the graph, but no run name was provided for a run, a sequentially numbered default name is generated, for example: Engine::run #5.

9.1.2. Selecting IPUs

The menubar above the graph contains a drop-down list named ‘Select IPU’ which allows you to select all IPUs or any individual IPU.

9.1.3. Filtering steps

You can concentrate on the steps you’re interested in by filtering on a particular search term. When you enter a term (and press the Return key, those steps that don’t match in the execution graph are ‘greyed-out’, and you can cycle through the matching steps using the arrow keys below the search box.

To cancel the filtering, click on the small ‘x’ at the right-hand end of the search box.

This feature is currently only available on the ‘flat’ execution graph.

9.1.4. Viewing options

There are several options for viewing Execution trace graphs, which you can select from the ‘Options’ drop-down menu in the top right-hand corner of the report screen:

  • Show Tile Balance - this shows what percentage of tiles are in use during that program step, visible as shading in each each step.

  • Show Terminals - whether to include a line at the right- hand end of each process in the graph.

  • Group Executions - whether to group executions together that are part of the same ‘higher’ process further up the call stack. Grouping is determined by the slash-delimited function calls that are logged to the execution trace profile output.

  • Group Syncs - whether to group multiple successive Sync processes together in the graph.

  • Show Text - whether to show the name of each step in the graph.

  • Separate Overlapping Steps - whether to split up overlapping program steps into separate, non-overlapping ‘lanes’ in the graph so that they can all be seen at once.

  • Show External Syncs - whether to display External Sync steps in the graph. There are often many fo these, and hiding them may make the execution graph plot easier to understand in some cases.

  • Show Sync Ans - whether to display Sync Ans steps in the graph. As for External Syncs, above, hiding these steps may simplify the graph plot.

You can also view the colour key that the execution trace uses by clicking the key icon in the top right-hand corner of the graph.

9.2. Execution Trace details

The bottom half of the Execution Trace report shows more details about the execution trace. It includes:

  • the Summary tab, which shows statistics, cycle proportions and tile balance (experimental).

  • the Details tab, which shows details of a selected process from the execution trace graph.

9.2.1. Summary tab

This tab provides an overview of the portion of the execution trace currently displayed in the graph in the top half of the page.

Statistics

The statistics displayed are:

  • Cycles - the number of cycles in the visible section of the execution trace.

  • Rx / Tx - for Stream Copy and Global Exchange steps, the amount of data transmitted and received during the operation.

Cycle proportions

A bar is displayed for each IPU that shows a graphical representation of the proportion of cycles that are taken executing each type of compute set. If you hover your mouse over these bars, you’ll see a key that shows what process each colour represents, as follows:

  • Internal Sync - the sync process that occurs between each tile on an IPU as part of the BSP process.

  • External Sync - the sync process that occurs between each IPU as part of the BSP process. External syncs are also used in some host-device communication situations, where the IPUs all need to synchronise with an event outside their boundaries, for example a flow control step in the host program. This sync is carried out using 1 IPU only.

  • On Tile Execute - the vertex compute code executed on each tile.

  • Do Exchange - the tile-to-tile data exchange within an IPU.

  • Global Exchange - an IPU-to-IPU data exchange.

  • Stream Copy - a data exchange between an IPU and the host machine over PCI.

9.2.2. Details tab

When an program step is selected in the flat graph, or a layer is selected in the flame graph, a list of program steps, with further details, is shown here. Many of the details are the same across the different types:

  • Cycles - the number of cycles on active tiles that the program step used to execute,

  • Active Tiles - the number of tiles involved in executing that program step,

  • All Cycles - the number of cycles on all tiles, with additional statistics.

  • Tile Balance - a measure of how efficiently the program step is spread across the tiles. See View Options for more details.

  • Active Tile Balance - this is a recalculation of the tile balance measurement above, but excluding those tiles that do nothing.

Internal Sync

This is a sync process between tiles on an IPU.

External Sync

This is a sync process between IPUs.

Sync ANS

This is an internal Automatic, Non-participatory Sync process. A tile can pre-acknowledge a number of internal/external syncs using the ‘sans’ instruction. The Sync ANS instruction will wait until all those pre-acknowledged syncs actually happen.

On Tile Execute

This is a piece of vertex code being executed in a tile. In addition to the common information listed above, the following is displayed:

  • By Vertex Type - this shows what vertices are involved in the process execution.

Below these details, an interactive graph plot is displayed that shows how the selected program step makes use of cycles on each tile as it executes. For DoExchange programs, there is also a graph of the data received and transmitted by the program during its execution.

Do Exchange

This is an exchange process, where data is exchanged between IPU tiles. In addition to the common information listed above, the following is displayed:

  • Total Data - the total amount of data transferred during the exchange,

  • Data Transmitted - the amount of data transmitted during the exchange,

  • Data Received - the amount of data received during the exchange,

  • Data Balance - the mean amount of data exchanged divided by the maximum amount of data exchanged,

  • Exchange Code - how large the variable is that holds the code for performing the exchange,

  • Source Variables - a truncated list of the variables from which data was sent in the exchange,

  • Destination Variables - a truncated list of the variables to which data was sent in the exchange.

Global Exchange operations

Global Exchange is the process by which data is exchanged between IPUs. In addition to the common information listed above, the following is displayed:

  • Total Data - the total amount of data transferred during the exchange,

  • Data Balance - the mean amount of data exchanged divided by the maximum amount of data exchanged,

  • Source Variables - a truncated list of the variables from which data was sent in the exchange (with temporary variables given basic integer names),

  • Destination Variables - a truncated list of the variables to which data was sent in the exchange (with temporary variables given basic integer names).

A tile’s physical location on an IPU, and how far it is away from the main exchange block, determines how quickly data can be moved between it and other tiles. Also, the highest-numbered tiles on an IPU are linked back directly to the lowest-number tiles in a ring-type topology. The combination of these two factors is what generates the typically triangular and curved shapes seen in these exchange graphs.

Stream Copy

This process copies data between tensors and streams, allowing data to be transferred between the IPUs and the host machine over PCI. The execution trace shows these program steps as three separate phases, StreamCopyBegin, the Copy itself, and StreamCopyEnd.

In addition to the common information listed above, the following is displayed:

  • Total Data - the total amount of data transferred during the exchange,

  • Data Balance - the mean amount of data exchanged divided by the maximum amount of data exchanged,

  • Copies from host - how many copy instructions transferred data from the host machine,

  • Copies to host - how many copy instructions transferred data to the host machine.

10. Application preferences

To display the Preferences dialog, select ‘Preferences’ from the menu, or press the Ctrl / Cmd + , keys. As well as the settings displayed, the view options for the various reports are also saved.

You can reset your preferences at any time by selecting ‘Reset Preferences’ from the Help menu.

10.1. Setting the colour theme

The PopVision Graph Analyser supports light and dark colour themes, and you can select a preference here. There are three options:

  • Auto - this is the default setting, and allows the application to follow your machine’s system-wide theme setting for light or dark mode. If the PopVision Graph Analyser application detects a change in your operating system theme, it automatically switches to the corresponding mode in application.

  • Light - this forces the PopVision Graph Analyser application into light mode, irrespective of your machine’s theme settings.

  • Dark - this forces the PopVision Graph Analyser application into dark mode, irrespective of your machine’s theme settings.

Note that on the ‘Auto’ setting, changes to your system-wide colour theme settings will affect the application immediately, but choosing the ‘Light’ or ‘Dark’ override options only takes effect after you’ve restarted the PopVision Graph Analyser.

10.2. SSH preferences

You can store your SSH preferences in the Preferences dialog to allow authorisation when opening reports on remote machines. There are two settings you can enter here:

  • SSH private key path - enter the file path of your your machine’s private SSH key here. This filepath will be used to authenticate you on remote machines during the connection process. The default path is <home>/.ssh/id_rsa/, where <home> denotes your home directory in your operating system.

  • SSH agent mode - this dropdown-list allows you to choose whether you want to specify an ssh-agent socket path, and, if so, how you want to do so:

    • Disabled - do not use an ssh-agent socket (the default)

    • Manually specify - enter file path to the ssh-agent socket in the field that appears below this option.

    • Automatically obtain from environment - obtain the ssh-agent path from an environment variable.

10.4. Quit after last window is closed

This preferences control whether the Mac version of the application quits the program after the last window is closed.

10.5. Experimental features

Each version of the PopVision Graph Analyser contains some experimental features that are hidden by default. These features are not fully release-capable, and will have limited support and may change or be removed in future. You can enable them here, by toggling the button next to this option.

10.6. Graph stats

You can display (or hide) statistics for the Memory and Liveness reports. They appear in the top right-hand corner of the graph and show the Average, Minimum, Maximum and Standard Deviation of the memory usage across the selected tiles, for each data set plotted.

11. FAQs

This section contains a set of frequently asked questions about capturing and understanding reports in the PopVision Graph Analyser.

11.1. Not Always Live memory discrepancy

Question: Why does the tile memory differ on the Memory and Liveness reports? If you open a Memory report, and select the ‘Liveness’ graph type from the drop-down menu, and then select a particular tile, you can see its memory consumption plotted. If you then find that same tile in the Liveness report, you may notice that its memory consumption is lower. Why does this happen?

Answer: The ‘Not Always Live’ plot on the Memory-Liveness report actually shows the maximum memory of the not-always-live variables, which can be lower than the actual tile memory required. Because memory is statically allocated on the tile, and the allocating algorithm isn’t perfect, this could be less than the actual amount of memory required to store your program.

As an example, suppose you have two variables A and B, both 1 byte, but B needs to be stored in interleaved memory. If you have a program like this:

Write(A)
Read(A)
Write(B)
Read(B)

then the two variables are not live at the same time, so in theory could be overlapped, but because of the additional constraints they aren’t. In this case the maximum not-always-live bytes is 1 byte, but they memory required (excluding gaps) is 2 bytes.

12. Release notes

For the latest release notes, please refer to Graphcore’s Software Download site, where you can see what changes have occurred for each update of this software.

To see what’s changed in the PopVision Graph Analyser application, select ‘Release Notes’ from the Help menu, or click the “What’s new since the last release” link on the landing page.

13. Licensing information

Licensing information about the PopVision Graph Analyser is available to read by selecting ‘License’ from the Help menu. It contains an end-user agreement, copyright and trademark information, and license information about third-party software used in the application.

This information can also be found in the Installation README file, which you can find on the Graphcore Support site.