2. User guide

2.1. Overview

The PopVision™ Graph Analyser application is used to analyse the programs built for and executed on Graphcore’s IPU systems. It can be used for analysing and optimising the memory use and performance of programs.

The PopVision™ Graph Analyser provides reports on the following:

	Summary Report of the IPU hardware, graph parameters, host configuration.
	Insights Report , which gives you a quick overview of the memory usage of your model on the IPU, showing the tiles, vertices and exchanges that use the most memory. Graphical insights and guides for improving memory usage are also displayed, helping you to optimise memory usage for your model.
	Memory Report , which gives a detailed analysis of memory usage across all the tiles in your IPU system, showing graphs of total memory and liveness data, and details of variable types, placement and size.
	Liveness Report , which gives a detailed breakdown of the state of the variables at each step in your program.
	Program Tree , which shows a hierarchical view of the steps in the program.
	Operations Summary , which shows a summary of all the operations for a software layer in your model, displaying statistics about code size, execution cycles, debug data and FLOPs measurements.
	Operations graph , which displays the High Level Operations (HLO) graph for TensorFlow programs, allowing you to drill down through modules and see details of HLOs.
	Execution Trace , which shows how many cycles each step of your instrumented program consumes.

Each of these reports is described in further detail in the sections below.

2.1.1. End User License Agreement

Before you can use the Graph Analyser, you must first agree to the End User License Agreement (EULA) that is displayed when the app is first opened (assuming that you have not already agreed to it in a previous version). Clicking on the ‘Disagree’ button will quit the application immediately.

You can re-read the EULA at any time after you’ve agreed to it. Select ‘View EULA’ from the Help menu.

You can toggle the EULA dialog between modal and full-screen view by clicking on the icon to the left of the window’s title.

2.1.3. About the IPU

An in-depth description of the IPU hardware is available on the online IPU Programmer’s Guide . While we describe some of the relevant features of the IPU in this document, you should refer to the Poplar documents for a more in-depth understanding.

2.2. Capturing IPU reports

This section describes how to generate the files that the PopVision™ Graph Analyser can analyse. The PopVision™ Graph Analyser uses report files generated during compilation and execution by the Poplar SDK.

Note

When you first open the application, there is a link on the opening page to a Getting Started with PopVision video.

The sections below describe the files supported by the PopVision™ Graph Analyser. These files can be created using the POPLAR_ENGINE_OPTIONS environment variable or the Poplar API. At a minimum you need either the archive.a or the profile.pop for the PopVision™ Graph Analyser to present reports.

Note

With the release of Poplar SDK 1.2 a new entry in POPLAR_ENGINE_OPTIONS was added to make capturing reports easier. In order to capture the reports needed for the PopVision™ Graph Analyser you only need to set POPLAR_ENGINE_OPTIONS='{"autoReport.all":"true"}' before you run a program. By default this will enable instrumentation and capture all the required reports to the current working directory. For more information, please read the description of the Poplar Engine options in the Poplar and PopLibs API Reference .

By default, report files are output to the current working directory. You can specify a different output directory by using, for example:

POPLAR_ENGINE_OPTIONS='{"autoReport.all":"true", "autoReport.directory":"./tommyFlowers"}'

Note

If you have an application that has multiple Poplar programs (for example, if you build and run a training and validation model), then a subdirectory of the Engine name will be created in autoReport.directory in which the profile information will be written. This allows users of the Poplar API to make sure reports are written into different locations. (If no name is provided then the profile information will continue to be written in autoReport.directory ). Further information can be found in the Using TensorFlow , Using PopART and Using PyTorch sections.

Note

If you are profiling cached executables using PopART, you must use popart::SessionOptions to provide a directory name for your reports. Using autoReport.directory in POPLAR_ENGINE_OPTIONS will not work.

2.2.1. Unsupported file versions

As of Graph Analyser verion 3.7, support for the old JSON and CBOR graph profile formats has been removed. This means that the following files that were generated before Poplar SDK 2.0 can no longer be read:

graph.json
graph.cbor
execution.json
execution.cbor
profile_info.json

At a minimum you need either the profile.pop or archive.a files present for the PopVision™ Graph Analyser to generate its reports. If neither of these are found, you will see a ‘No graph profile found’ warning when trying to open a report.

2.2.2. Profiling Overhead

Profiling a model may add a memory and computation overhead to its compilation and execution phases. Typically, the highest performance impact is due to the execution overhead.

If you are not interested in the execution profile ( execution trace ) it is best to deactivate it by setting "autoReport.outputExecutionProfile": "false" or "debug.instrument": "false" . This will implicitly disable "debug.instrumentControlFlow" , "debug.instrumentExternalExchange" , and "debug.instrumentCompute" (unless explicitly enabled by the user). Note that a profile can omit its execution part but not its compilation part. In other words, setting "autoReport.outputExecutionProfile" to true will automatically set "autoReport.outputGraphProfile" to true too.

The next sections show which options can be used to diminish the overhead in each of the profiler parts.

Compilation

During compilation, the profiler generates the graph profile (also known as the memory profile). This profile contains information that Poplar knows or estimates at compilation time, such as the programs that form the model and its variables. The contents of the graph profile are sufficient to analyse memory issues.

The environment variable POPLAR_PROFILER_LOG_LEVEL can be set to generate a log of the steps performed by the profiler during compilation and detect any possible time overhead.

Poplar engine options can be used to include or exclude the profiling of certain information. This will reduce the time taken to create the profile and the size of the generated files. Please refer to Report Files for a description of the files.

The option POPLAR_ENGINE_OPTIONS='{"autoReport.all":"true"}' outputs a full report. If you wish to exclude a part of it, set this option and explicitly disable the undesired information. For example: POPLAR_ENGINE_OPTIONS='{"autoReport.all":"true", "autoReport.outputArchive":"false"}'

Similarly, if you wish to include only a certain part of the report, just set the specific autoReport option. For example: POPLAR_ENGINE_OPTIONS='{"autoReport.outputArchive":"true"}'

Options to tune the graph profile (in order of expected impact):

"autoReport.outputLoweredVars": "false" : This option can be useful if profile.pop is too large. However, this will deactivate some functionality in the Graph Analyser such as the variables memory graph . Note that excluding lowered variables from the report will not speed up its visualisation by the Graph Analyser.
"autoReport.outputDebugInfo": "false" : This option can be useful if debug.cbor is too large. However, this will deactivate some functionalities in the Graph Analyser such as the operations graph . Note that no meaningful speed-up is expected in the rest of Graph Analyser functionalities.
"autoReport.outputSerializedGraph": "false" : This option is false by default as the generated file can be large. This part of the profile is only needed to enable the Computation Graph in the Graph Analyser.
"autoReport.outputArchive": "false" : This option can be set to avoid generating archive.a . If it is set to false, you must generate profile.pop and some minor functionalities of the Memory Report will be disabled in the Graph Analyser.

Execution

Profiling the execution means to measure and record the cycles spent on each of the programs of the model. The result can be visualised in the Graph Analyser execution trace . However, this instrumentation can lead to the following main overheads:

IPU memory overhead : some memory in the IPU will be devoted to store program cycles and branch records. Some more will be used by the instrumentation code itself, although it is usually negligible.

You can set "debug.computeInstrumentationLevel": "ipu" to reduce the memory needed for program cycles. In this mode, only one tile ( debug.profilingTile ) will record cycles. The drawback is that per-tile cycles - the BSP trace - will not be available in the Graph Analyser. Another inconvenience is that this instrumentation level may slightly disrupt the normal execution of the model. This is because some artificial synchronisations may be introduced in order to measure the cycles of the longest-running tile.

Regarding branch records, you cannot reduce the memory needed to store them but you can pick which tile will keep them. Thus, by using "debug.branchRecordTile you can pick a tile with low memory pressure. Note that the last tile in the IPU is selected by default and that is usually a good decision. Also, branch recording may introduce artificial synchronisation points to flush the records to the host. This can disrupt the normal execution specially for pipelined models with high number of conditional branches, such as If programs.

Because of all these extra memory requirements, a model with high memory consumption may go out of memory when profiling is enabled. Depending on the model, you can adjust its parameters to leave space for the instrumentation. For example, you can try decreasing the batch size. In TensorFlow BERT you can adjust --micro-batch-size .

Host computing overhead : Poplar processes the cycle measurements after each run to create a trace that can be visualised in the Graph Analyser. This can take a considerable amount of time if the run executed many programs. This overhead may reveal itself in the Graph Analyser if the execution took multiple runs. At the beginning of each run the IPU waits for the host in a StreamCopyBegin program. After the first run, the host may be busy processing the cycles measured in the previous run. This causes a large StreamCopyBegin as the IPU waits for the host to finish this processing. Because of this overhead, measuring throughput of a profiled model is highly discouraged.

To reduce this overhead you can reduce the amount of programs profiled. By default, only the first two runs of the execution are captured. This can be increased or decreased by setting executionProfileProgramRunCount as follows:

POPLAR_ENGINE_OPTIONS='{"autoReport.all":"true", "autoReport.executionProfileProgramRunCount":"10"}'

It is essential that you try to reduce the iterations on each run too. For instance, by reducing the number of steps or the number of batches per step you can get a lighter execution profile. This will not only reduce host computation overhead but will also speed up the visualisation in the Graph Analyser. The public examples contain some hints on how to reduce an execution to be profiled. For instance, TensorFlow BERT .

Finally, the report size of multi-replica executions can be reduced by focusing on a single replica. The user can select a replica with "replicaToProfile" option as follows:

POPLAR_ENGINE_OPTIONS='{"autoReport.all":"true", "profiler.replicaToProfile":"0"}'

2.2.3. Reloading reports

The folder (or folders, if you’re comparing reports) that contain the individual report files are monitored by the application in case any of the files changes, for example, if you’ve re-run your Poplar program and re-generated new version.

If the application detects that any of the files have changed, a dialog box appears telling you what files have changed, and prompting you to reload the report files.

Note

Make sure that your Poplar program has finished executing (in particular that the profile.pop file has been completely written to disk before clicking on the Reload button), otherwise you may see inconsistent information displayed in the application.

2.2.4. Profile troubleshooting

Very occasionally, the Graph Analyser may not be able to open a profile. This can happen for a number of reasons detailed below, with explanations on how to remedy the issue.

Reducing the size of profile reports

Large models and programs with many iterations within them can generate large reports which can take a while to process and display in the PopVision tools. Below are some tips for reducing the size of the profiles generated when instrumenting your IPU programs.

Note

There are some additional suggestions for reducing profile size in the Profiling overhead section above.

Adjusting the number of steps being profiled
Reducing the number of batches per step
Changing the instrumentation level
Changing the branch record tile
Select a single replica
Reducing gradient accumulation factor - (if you’re using it). This reduces the size of a single Engine run.

Missing or corrupted report files

Sometimes the python script running the training or inference program exits too early, through some other fault, and the profile isn’t written correctly.

Set up POPLAR_PROFILER_LOG_LEVEL to get more information about your script’s execution.
The profiles are written in SQLite format. Check that they open in an sqlite client, and also using libpva .

Compilation fails with OOM

Sometimes execution may be prevented if there is not sufficient memory to execute the program. Here are some actions you can take to reduce memory usage in your model.

Reduce your model size. This will reduce the number of paramater variables that need to be stored in the IPU memory.
Only use the Memory report, not the Execution Trace report.
Change your instrumentation level so that you are storing less information.
Change the branch record tile.

Note

Additional ways to optimise your memory and throughput are detailed in the section on the Insights report .

2.2.5. Poplar report files

Note

The PopVision™ Graph Analyser only supports fixed names for each of the files. If you save them with different names they will not be opened. When you are browsing directories to open, the PopVision™ Graph Analyser will highlight which of the following files are present in that directory.

Binary archive (‘archive.a’)

This is an archive of ELF executable files, one for each tile. With this you can see the total memory usage for each tile on the Memory Report .

Poplar Engine Options	`POPLAR_ENGINE_OPTIONS='{"autoReport.outputArchive":"true"}'`
Using Poplar API	Set the Poplar Engine option “autoReport.outputArchive” to true

Poplar Profile (‘profile.pop’)

This file contains compile-time and execution information about the Poplar graph. This file is used to show memory , liveness and program tree views and also the execution trace view.

Poplar Engine Options	`POPLAR_ENGINE_OPTIONS='{"autoReport.outputGraphProfile":"true"}'` and/or `POPLAR_ENGINE_OPTIONS='{"autoReport.outputExecutionProfile":"true"}'`
Using Poplar API	Set the Poplar Engine options “autoReport.outputGraphProfile” to true and/or “autoReport.outputExecutionProfile” to true

Lowered Vars Information

Poplar can generate lowered vars information, which contains details about the allocation of variables on each tile, and is used to generate the variable layout in the Memory Report . IPU memory is statically allocated and this file contains the size, location, name and other details about every variable on every tile.

This information is not generated by default, as it can be quite large, and not useful to some users. However, there are engine options to collect the data and save it either into the profile.pop file, or as a stand- alone file.

Poplar Engine Options	When you use `POPLAR_ENGINE_OPTIONS='{"autoReport.all":"true"}'` the lowered vars information will be captured in the `profile.pop` file. You can switch that functionality on separately with: `POPLAR_ENGINE_OPTIONS='{"autoReport.outputLoweredVars":"true"}'`
Using Poplar API	To capture the lowered vars data in a separate file (and not write it into the `profile.pop` file), use, for example,: `POPLAR_ENGINE_OPTIONS='{"debug.loweredVarDumpFile":"vars.capnp"}'`

Serialized Computation graph (‘serialized_graph.capnp’)

This file contains a copy of the Poplar graph before compilation, including details on all of the compute sets, vertices, variables and edges (connections from vertices to variables). autoReport.all does not output this file by default, as it can be quite large for complex models, but you can enable it using the options below.

Poplar Engine Options	`POPLAR_ENGINE_OPTIONS='{"autoReport.outputSerializedGraph":"true"}'`
Using Poplar API	To save the serialised graph you need to use the Poplar API `Graph::serialize` .

Frameworks Information (‘framework.json’ & ‘app.json’)

You can use Poplar to create two more ‘custom’ files into which you can put your own data from frameworks or your application. See the Framework and Application JSON files section for more details

Debug Information (‘debug.cbor’)

This file contains additional debug information collected from the Poplar software. From this information you can understand the source of variables, Poplar programs and compute sets. The debug information is viewable in the Liveness report and the Program Tree.

Poplar Engine Options	Automatically, when using `POPLAR_ENGINE_OPTIONS='{"autoReport.enable":"true"}'` and manually using `{"autoReport.outputDebugInfo":"true"}`
Using Poplar API	Automatically created

Note

Collecting the enhanced debug information will not increase the memory footprint of your IPU application. The enhanced debug information is generated and streamed as the model is compiled.

See the two ‘Debug information’ sections in the Liveness Report and the Program Tree for details of what’s included in the debug information, and where to find it in the Graph Analyser reports.

2.2.6. Using TensorFlow

If you use TensorFlow, the separate reports for each Poplar program compiled and executed will be placed in a subdirectory of autoReport.directory that contains the ISO date/time and process ID in its name.

The debug.cbor will be placed in the autoReport.directory and symbolic links are created in the subdirectories

The cluster name can now be found in details loaded from the framework.json .

For more details please see the guide Targeting the IPU from TensorFlow .

2.2.7. Using PopART

For PopART, the name of the Engine is by default set to inference or training depending on if you are using the InferenceSession or TrainingSession. You also have the option of providing your own Engine name when creating the session.

training_session = popart.TrainingSession(fnModel=builder.getModelProto(),
   ...
   deviceInfo=device,
   name="tommyFlowers")

The profile.pop will be written out to:

autoReport.directory/tommyFlowers

Note

If your application has two inference sessions, by default the second will overwrite the first.

For more details please see the PopART User Guide .

Note

If you are profiling cached executables using PopART, you must use popart::SessionOptions to provide a directory name for your reports. Using autoReport.directory in POPLAR_ENGINE_OPTIONS will not work.

2.2.8. Using PyTorch

For PyTorch which builds on top of PopART by default it will also name the engine inference or training depending on if you are using an InferenceModel or TrainingModel. You also have the option to name the Engine yourself using the SessionOptions:

opts = poptorch.Options()
   opts.modelName("tommyflowers")
   opts.enableProfiling(dirname)

   poptorch_model = poptorch.inferenceModel(model, opts)

The profile.pop will be written out to:

autoReport.directory/tommyFlowers

For more details please see the PyTorch User Guide .

2.3. Opening reports

In order to view reports, the PopVision™ Graph Analyser requires one or more of the files listed in the Capturing Reports section, above.

You can open report files on your local machine, or from a remote server over SSH. See below for full details.

2.3.1. Opening recent reports

If you’ve used the application before, the Home screen displays a list of recently opened report directories in the ‘Recent’ box.

Click on a recent report to open it again. That report will automatically move to the top of the Recent list.
If you want to remove a report directory from the Recent list, click on the ‘trash’ icon that appears to the left of a report when you hover the mouse over it.
If you attempt to open a report from the Recent reports list that has been moved or deleted since it was last opened, an error dialog appears indicating that the file can’t be found. You can either replace the file, in which case the corresponding link in the Recent box will work again, or you can click on the ‘Remove from recent files’ button in the error dialog to remove it from the Recent list.

The Recent list can contain up to 64 previous file selections.

2.3.2. Opening the Demo Report

If you don’t have any reports to hand, you can open and browse the demo report that’s included with the Graph Analyser. It’s a small report, but contains all of the features that are supported in the current application.

Click on the ‘Open Demo Report’ button on the main screen to open the demo report.

2.3.3. Comparing reports

You can open two similar reports at once to compare them by clicking on the ‘Compare reports…’ link on the application’s main page. This presents you with two file selection dialogs that work in exactly the same way as for opening a single report, as described below. The information from both reports is then combined on the report pages, allowing you to compare them.

When opening a pair of reports to compare, you can click on the ‘magnet’ icon at the right-hand end of the directory textbox in either file selector. This copies the directory from the file selector on the other side.

2.3.4. Local reports

You can open report files stored on your local machine as described below.

Opening local reports

To open a local report on your machine:

On the Home screen of the PopVision™ Graph Analyser application, click on the ‘Open a report…’ button in the ‘Open’ panel. You’ll be presented with a file selection dialog, and the ‘local’ tab at the top will be selected by default. You’ll see listings of the directories and files on your local machine.
Use this dialog box to navigate to the folder in which your report files have been saved. You’ll notice that when the PopVision™ Graph Analyser identifies a directory in which any of the report files listed above are found, those files are listed on the right-hand side. Note that if the minimal file requirements are not present in a directory (see the table above), the ‘Open’ button will be disabled.
When typing a path into the box at the top of the dialog, a drop-down list shows the directories within the current directory. You can use the keyboard up and down arrows to navigate this list, and choose the next path element by pressing the Return key.
You can sort these files by name or modified date, in ascending or descending order, by clicking on the appropriate column header. Your sorting preference is saved.
Once you’ve selected the directory with the necessary report files within it, click on the ‘Open’ button to load the report data from the files.
The dialog remembers which location tab (local or remote) you selected previously, and selects it automatically the next time it is opened.
You can toggle the Open dialog between modal and full-screen view by clicking on the icon to the left of the window’s title.

The Summary Report is displayed first, and the progress bar along the top of the screen shows the files being pre-processed by the application prior to being loaded and displayed.

Notice that the bottom of the Summary Report shows the relevant files that have been found, and their loading state. More details on these files, and which reports need which of them, can be found here .

2.3.5. Remote reports

If you are using an IPU system on a remote server, for example on a cloud service, any reports generated will be saved to that server, so you cannot open them ‘locally’. You can, however, open them remotely by specifying the server address, and connecting to the machine over SSH. The report contents are them streamed back to the PopVision™ Graph Analyser application on your local machine, allowing you to view the reports.

Note

When the PopVision™ Graph Analyser opens report files on a remote machine, it uploads a small binary app to it which pre-processes the report data and sends it back over SSH to the PopVision™ Graph Analyser application running on your local machine. If you’re running other performance-critical processes on that remote machine, you should be aware of any effects this process may have on the capacity of the remote machine’s hardware to run any other tasks. As server performance varies a great deal, the only way to know how much processor speed it takes is to try a small sample, and monitor the CPU usage.

It may be the case that you do not have the right permissions to upload the analysis engine on a remote machine, or there may be insufficient room. In any case, if the upload fails, an appropriate error message is displayed.

Opening a remote report

To open a remote report on another machine:

On the Home screen of the PopVision™ Graph Analyser application, click on the ‘Open a report…’ link in the ‘Open’ panel. You’ll be presented with a file selection dialog, and the ‘local’ tab at the top will be selected by default.
Click on the ‘remote’ tab at the top, and you’ll see a login dialog that allows you to connect to a remote server. Enter your username, and the address of the remote machine.
If you just want to log in with a password for the remote machine, enter it in the Password field.
Alternatively, you can use your local machine’s SSH key to authorise your connection. Enter its file path in the Preferences dialog .
Once you’re logged in, you’ll see a file dialog listing the directories and files on the server. You can sort these files by name or modified date, in ascending or descending order, by clicking on the appropriate column header. Your sorting preference is saved.
Navigate to the folder in which your Poplar report files have been saved. You’ll notice that when you select a directory in which Poplar report files are found, the file window lists those files on the right-hand side. Note that if the archive.a file or the archive_info.json file are not present, the ‘Open’ button will be disabled, as one of these these files is the minimal requirement for generating a report in the PopVision™ Graph Analyser. See the Report files section above for details of how to generate each file, and what it contains.
Once you’ve entered the directory with the necessary report files in it, click on the ‘Open’ button to load the report.

Note

The SSH connection is constantly checked, and if, for any reason, it goes down, a warning dialog is displayed, letting you know.

The Summary Report is displayed first, and the progress bar along the top of the screen shows the files being loaded into the application, and the report data being analysed and prepared for display.

Notice that the bottom of the Summary Report shows the relevant files that have been found, and their loading state. More details on these files, and which reports need which of them, can be found here .

Note

The Graph Analyser does not currently support encrypted SSH private keys, i.e. keys that are protected by a passphrase. However it does support SSH agents. If your key is passphrase protected you will need to make sure to add it to your SSH agent before the PopVision™ Graph Analyser can use it, by using the ssh-add command-line tool and ensure ‘SSH Agent mode’ is set correctly in the Preferences.

To configure ssh agent, from a terminal you can run the following.

# Start the ssh-agent in the background.
eval "$(ssh-agent -s)"

# Add your SSH private key to the ssh-agent
ssh-add -K ~/.ssh/id_rsa

Then restart the Graph Analyser, click Preferences and remove the path pointing to your SSH private key path. Make sure that SSH agent mode is set to “Automatically obtain ssh-agent socket path from environment”.

Connection errors

A number of errors can occur when connecting to a remote server. This section lists the most common and gives some troubleshooting steps.

Error

Error: getaddrinfo ENOTFOUND server.example.com

This error occurs when the specified server could not be found (DNS lookup failed). Check that you have typed the server’s name correctly. If a VPN connection is required, check that it is connected and working correctly.

Error

Error: Password authentication failed

If SSH agent and SSH key authentication fail then password authentication is attempted. If you normally use a public key to connect to the server, check that you have correctly specified the key in SSH preferences . Otherwise, check that password authentication is enabled on the server and that you have typed your password correctly.

Error

Error: Cannot create directory ‘.cache/poplar_report_viewer’ because a file already exists there.

PopVision™ Graph Analyser will attempt to create a number of directories on the remote server if they do not already exist. If a file already exists with the same name then attempting to create the directory will fail. Check what the file is and either delete or rename it to allow the directory to be created.

Error

Error: Could not create directory ‘.cache’: Permission denied

PopVision™ Graph Analyser will attempt to create a number of directories on the remote server if they do not already exist. This error indicates that your user did not have permission to create one of these directories. This usually indicates a problem with how your home directory is setup on the server: either you do not have a home directory and don’t have permission to create one, or you have not been given adequate permissions on your own home directory. Contact the server administrator to ask for a home directory to be created or for its permissions to be corrected.

Error

Error: Could not write to ‘.cache/poplar_report_viewer/backend_…’ on the remote. This could be caused by a full filesystem

This error usually occurs because there was not enough disk space on the server to upload the required binary file to your home directory (around 20MB is required). You may be able to free up some space by deleting unused files.

In some cases servers have a home directory filesystem which is very limited in size but have a much larger local disk or “scratch space” available. PopVision™ can upload to the scratch space if you create a symbolic link from your home directory to the scratch drive, for example:

mv ~/.cache /localdata/username/.cache
ln -s /localdata/username/.cache ~/.cache

2.4. Viewing reports

The PopVision™ Graph Analyser displays interactive graphical and textual reports, and you can interact with these in a number of ways to get to the information you want. Each report has a few different options that are only relevant to that report, but they all share some features in common, as described below.

When you open a report, its file path is displayed in the title bar of the report window.

2.4.1. Using the side menu

When you’re loading some report data, and the Summary Report is displayed, the side menu becomes visible on the left-hand side of the application window. This contains buttons at the top for viewing each of the main report types, and three buttons at the bottom:

	Reload report - if you need to reload a report, particularly if any of the files from which it was generated have been updated from a recent execution of your program, click this button to re-import all the report files. See Reloading reports for more information.
	Close report - once a report is loaded into the PopVision™ Graph Analyser, you can close it by clicking this option. This ‘unloads’ all the report data from the application and returns you to the opening page. If you want to view those reports again, you’ll need to re-load the data.
	Documentation - this opens the documentation window (which you’re now reading). If you were viewing one of the report pages, the Documentation window opens up on the relevant page.
	image26\|

2.4.2. Adjusting report size

There are several ways to change the size and scale of the report in the PopVision™ Graph Analyser window:

You can increase and decrease the display size of the entire application (including the Help window) by using the Ctrl/Command keys with the + and — keys to magnify and shrink the display size, just as you would in a web browser. There are three Zoom options in the View menu that show the keys, and another option to reset the magnification level back to its default setting.
To zoom in and out of a particular section of the graph, click and drag horizontally in the graph preview area above the main graph, and the display will change to show the graph that corresponds to that section of the data. A pair of limiter icons appear in the preview area to show the start and end of the data displayed in the main graph area, and these can be dragged left and right as well to change the amount of data in the main graph. Using the scroll-wheel on your mouse scrolls the report page up and down, but you can zoom by holding down the Ctrl key.

Note

You can choose how you want your scroll wheel to behave (if you want it to scroll or zoom by default) by setting the Scroll behaviour preference.
You can also click and drag the main graph itself to view areas to the left and right of the currently viewed area. Note that clicking without dragging can sometimes select a specific tile (for example, in the Memory Report), but you can clear this selection from the input box above the graph.
You can reset the zoom scale of the Memory and Liveness reports by clicking on the small button to the left of the preview area, top-left of the graph. This zooms out to the furthest level, showing the entire graph.
To make a report larger, so that you can see more detail, you can drag the edges of the window to increase its size. This resizes the report images as you drag.
To adjust the space that each half of a report takes up on the page, click the ‘splitter’ icon between the two halves of the report, and drag it up and down. The two report sections resize accordingly. Note that the Program Tree and Execution Trace reports also display a ‘vertical splitter’ when comparing two reports, so you can choose how much of each report fills the available screen space.

2.4.3. Navigating report graphs

If the report page contains a graph, and you have selected a datapoint on it by clicking it, you can use the arrow keys to move the selection to the previous or next datapoint. If you hold down the Shift key while using the arrow keys, the datapoint selection will move by ten datapoints at a time.

Similar to most web pages, you can use the Tab key to cycle through interactive elements on the report page, and this includes the graph itself, which displays the standard blue selection border when it has focus.

Each of the graph plots has an ‘Options’ button in the top left-hand corner which allows you to hide the graph keys.

Each of the graph plots also has a ‘Reload’ button in the top left-hand corner that allows you to reload the graph data manually.

Using the Navigation panel

In the top left-hand corner of a report graph, there is a small compass icon that, when clicked, opens out into a navigation panel that allows you to move around the report easily with the keyboard, or if you don’t have a scrolling mouse or mouse pad.

Note

When you hover your mouse over each icon in the navigation panel, you can see the equivalent quick-key for that action.

Click the compass icon to open the navigation panel. You can click it again to close it, or click on the < icon at the panel’s right-hand end.
Click on the magnifying glass icons to zoom in or out of the graph, or type in the magnification percentage you’d like to jump to (and press the Enter key).
Click the « and » icons to pan the graph left and right.
Click the ? icon to jump to the in-app help, where you can see all the keyboard commands listed.

Graph data keys

Below a graph plot are one or more keys with a coloured block and the name of the dataset that’s being plotted in that colour on the plot. You can click on a key to hide that dataset from the plot, and it appears with strike-through styling.

Note

If you’re comparing two reports , there will be one of each data key for the source and the target report. To see at a glance which is which, you can hover your mouse over a key, and the full path to that report is displayed.

2.4.4. Saving report images to disk

You can save report graphs to disk as image files or copy them to the clipboard, to avoid you having to make screen captures.

Click on the camera icon in the top right-hand corner of a report.
Select whether to save to a file or copy to the clipboard.
If saving to a file, select the directory on your computer where you want to save it.
Confirmation messages appear if the images was copied to the clipboard, or it failed to save to disk, depending on which choice you’ve made.

Report images are saved as PNG files, and capture the entire visible part of the report screen, including any detailed information displayed in the tabs below the graph. They also reflect the currently selected theme colours.

2.5. Viewing a Summary Report 

When you first open a report, the Summary view is shown. It consists of high-level information about the Poplar program, split into various sections, as described below.

Each summary section displays information in collapsible blocks (marked with a downward-pointing arrow, or, for the Engine Options section, clickable disclosure triangles), making it easy to show only sections of interest to you. Whether a section is collapsed or not is saved automatically across all reports.

2.5.1. Program Information

The top half of the report shows details of the IPU system the program was compiled for, and also details of the size of the graph that Poplar created.

Target

Type : what kind of IPU the program was compiled for. This will either be IPU or IPUModel
Architecture : what version of IPU was used to run the program. This will either be Mk1 or Mk2
Timestamp : when the program compilation was started.
Tiles per IPU : how many tiles are in each IPU
IPUs per Replica : how many IPU are in each replica. (Only shown if more that one replica is used)
Replicas : the number of replicas. (Only shown if more that one replica is used)
Total Tiles : how many tiles in total (tiles per IPU * num IPUs)
Total IPU : how many IPUs in total (IPUs per replica * num replicas)
Memory per Tile : maximum memory on a tile.
Memory per IPU : maximum memory on a IPU.
Memory per Replica : maximum memory for a Replica.
Total Memory : total memory on all IPUs the program was compiled for.
Compute Set Instrumentation : the type of instrumentation compiled into the program to record compute set execution cycles. The method is controlled by the debug.computeInstrumentationLevel engine option and is enable or disabled via the debug.instrumentCompute option (or debug.instrument which enables all instrumentation).
- Off : instrumentation disabled; estimates are used instead, if available.
- Vertex : cycles are recorded for each vertex execution. This means vertex execution is serialised, and this only really works on very small graphs.
- Tile : cycles recorded separately for each tile. If you find this uses too much memory try using the Ipu method.
- Ipu : cycles recorded for the slowest tile on each IPU for each compute set. An internal sync is inserted before and after each compute set. The tile that is used to do the cycle recording is controlled by debug.profilingTile . If you have enough memory consider using the Tile method instead.
- Device : cycles recorded for the slowest tile on the entire decide for each compute set. This mode is not recommended - use Ipu instead.
External Exchange Instrumentation : the type of instrumentation compiled into the program to record extern exchange cycles (i.e. host exchange and global exchange). The is enabled or disabled using the debug.instrumentExternalExchange engine option (or debug.instrument which enables all instrumentation).
- Off : instrumentation disabled; estimates are used instead.
- Tile : cycles recorded separately for each tile.

Graph

Number of compute sets : how many compute sets are in the graph. This is after compilation so it may be a larger number than the number of compute sets added through the Poplar API because some are created during compilation.
Number of edges : the number of edges in the graph, again after compilation. An edge is a pointer from an Input<> , Output<> or InOut<> vertex field to a variable.
Number of variables : the number of variables in the graph, after compilation. Post-compilation variables are called lowered variables to distinguish them from the variables you created by hand when defining the program. The main difference between them is that lowered variables are restricted to a single tile.
Number of vertices : how many compute vertices the graph contains after compilation. Some vertices are added to the graph during compilation (e.g. memcpy vertices) so this may be higher than the number of vertices added with the Poplar API.

2.5.2. Engine options

This section displays all of the engine options that were used to generate the reports, the values supplied for each, and whether they are the default values or otherwise. You can also choose to show the options for the three different phases: compilation, execution and target.

If no engine options are displayed, click on the Engine Options heading to expand that section and show its contents.

To view options by their phase (execution, compilation or target), select the phase from the ‘Type’ drop-down menu. If a phase is not available it is disabled.
If you select the ‘Execution’ type, another drop-down list appears in which each of your runs is listed. Select ‘None’ to see all execution parameters, or select one of the runs to see execution parameters just for that run. See here for more details.
Values which are different from the default are displayed in bold . Use the “View - All” selection box in the top right-hand corner to switch between a list of all options, or “View - Non-default” to view just those which are different from the default values.
Click on the small book icon to follow a web link to the Poplar API Reference for a description of each of these options.

2.5.3. Framework and Application JSON files

If you have a framework.json or an app.json file that was created in your program, their contents are displayed here so that you can check any parameters that you recorded in them. This is useful when comparing two reports, allowing you to spot differences easily.

You can put whatever information you wish into these two files, and if they’re found in the reports folder when the other report files are being loaded, their contents are displayed in a foldable tree. This assumes that the files are valid JSON.

2.5.4. Report files

This section of the Summary report shows the folder from which the report files were loaded (or both folders, if you’re comparing reports). It also shows which individual files are being loaded into the PopVision™ Graph Analyser, as documented here .

If no Report Files are displayed, click on the Report Files heading to expand the section.

For each file that is present:

A set of three green dots indicates the file was found, and is being analysed and loaded.
A green tick indicates that a file was found and has been loaded successfully.
A greyed-out question mark indicates that the corresponding file was not found.
A red cross indicates that the file could not be loaded. A warning message can be found in the Host Information section, directly above.

The folder from which the reports were loaded (or both folders, if you’re comparing reports) is also displayed.

2.6. Viewing an Insights Report 

The Insights Report gives you a brief summary of how well your model fits into the available IPU memory. The report shows which tiles are responsible for the highest memory usage, and which vertices and exchanges require the most memory in your model and on the IPUs.

There are also a number of recommended actions you can take to reduce the memory usage reported, such as recomputation, changing the batch size or using FP16. Where relevant, details of the largest memory requirements are shown, and an estimated expected memory saving.

2.6.1. Memory insights

The top section of the Insights report shows how well your model fits into the available IPU memory. It includes the following information:

A panel at the top shows whether your model is within available IPU memory capacity, or whether you had an OOM (‘Out Of Memory’) issue. The proportion of IPU memory used is displayed, giving you a quick guide to how much memory you have spare (or how much you need to reduce if your model was OOM).
A chart showing the five tiles with the highest memory usage. For each tile, the amount of memory required is displayed, as well as the IPU on which that tile is located.
A chart showing a histogram of memory usage across all tiles. This gives you an insight into the number of tiles that are using particular amounts of memory.
A table showing the vertex and exchange memory usage, as described below .

2.6.2. Vertex and exchange sizes

The Insights report also shows you the vertices and exchanges that require the most memory in your model, and across the IPUs and tiles.

Select the Model, IPU or Tile tab to see the memory usage for each.
Select the ‘Vertex (state and code)’ radio button to display the name of the vertex with the largest state, the amount of memory it required, and a list of compute sets in which that vertex was used. Beneath that, the vertex with the largest code size is also shown, with the same information.
Select the ‘User variable’ radio button to see a list of always-live user variables at peak liveness that require memory reserved for the entire application.
Select the ‘Exchange’ radio button to display the name of the largest exchange, the amount of memory it required, and the names of the variables that were involved in the transfer.
Select the ‘Program step’ radio button to view the step with the peak memory usage, as well as a list of the not-always-live variables at this step.
When viewing the IPU tab, you can select which IPU to view using the drop-down list on the left. Only vertices and exchanges involving that IPU are listed.
When viewing the Tiles tab, you can select which of the five most memory-hungry tiles to view, or enter a specific tile ID in the search box. Only vertices and exchanges involving that tile are listed.

2.6.3. Tips on reducing memory usage

The bottom section of the Insights report displays a number of panels that recommend possible solutions for improving the memory usage of your model, which may help in situations where you are out of memory. Where appropriate, estimates are given showing how much memory could be saved with each solution. Further information about these recommendations can be found in the Memory and Performance Optimisation Guide , on the Graphcore website.

Note

These solutions may affect the performance, throughput, convergence and/or training characteristics of your model.

The following recommendations and solutions are included in the report:

Reducing the batch size , with links to a video for Evaluating Batch Sizes for IPUs and a Graphcore blog post on small batch sizes .
Using FP16 where appropriate - this panel shows how much memory saving could be achieved by switching the five most memory-hungry tiles from FP32 to FP16. A list of potential candidate FP32 variables from the five worst tiles is displayed, showing how much memory could be saved by changing them to FP16.
Using recomputation
Using recomputation checkpoints
Setting the available memory proportion for matmuls and convolutions
Offloading variables to the host machine
Reusing identical parts of your graph with outlining
Disabling execution profiling to reduce the memory needed for execution instrumentation.

2.7. Viewing a Memory Report 

The Memory Report shows a graphical representation of memory usage across all the tiles in your IPU system, showing graphs of total memory and liveness data, and details of variable types, placement and size.

There are two main areas of the Memory Report:

The Memory graph, in the top half of the window, shows different types of memory graph. Click on the ‘Graph Type’ drop-down menu at the top left-hand corner of the graph to select a graph type:
- Total Memory graph , which shows the memory usage of your program across all the IPU tiles. You can view a breakdown of this data by region (whether to display interleaved and non-interleaved memory separately) or by category (what the memory is used for).
- Variables graph , which allows you to plot the memory usage of multiple individual variables.
- Tile Map , which shows the memory usage of the tiles overlaid on a physical floor plan of the IPU.
The Tile Memory Usage report, in the bottom half of the screen, which shows memory usage broken down by various categories, and memory maps of individual tiles.

You can choose various view options for each graph, and you can also click on the graph to view details for an individual tile.

2.7.1. Navigating memory reports

You can move around the main report graphs using your mouse, as described above.

Selecting individual tiles/IPUs

When you first open the Memory Report, the bottom half of the report shows the memory usage for all tiles or IPUs at once. You can view the memory usage for an individual tile/IPU in various ways, as follows:

Move the mouse over the report graph, and you’ll see a tooltip box which shows the Tile/IPU ID, together with how much memory that tile/IPU has used.
Click on the main graph to select an individual tile/IPU. Its ID appears in the ‘Select’ input at the top of the screen, and also shown above the report in the bottom half of the screen.
If you hold down the Shift key when you click, you can select multiple tiles.
You can also type one or more tiles/IPU IDs in the ‘Select’ input box if you wish, separated by commas.

Memory Report view options

There are several options for viewing Memory Report graphs, which you can select from the ‘Options’ drop-down menu in the top right-hand corner of the report screen:

Tile Memory From Binary - whether to use the tile memory data from the binary file ( archive.a ).
Include Gaps - whether to include the memory gaps between variables (‘on’ by default). Some memory banks in IPU tiles are reserved for certain types of data. This leads to ‘gaps’ appearing in the tile memory.
Show Max Memory - whether to add a line on the graph to indicate the maximum memory available. If your program uses too much memory, its memory graph will cross this line, and is highlighted.
Order By Physical Tiles - whether to order the horizontal ‘Tiles’ axis by software tile ID (the default) or by the physical order of tiles.
Default IPU Order - whether to display the IPU tiles in the original numerical order, or in the default order defined using the TensorFlow SelectionOrder option. This TensorFlow option allows you to define the order in which the available IPUs are mapped to shards, and is either “SNAKE”, “HOOF” or “ZIGZAG”. This setting is stored in the frameworks.json file, whose contents you can see on the Summary report .
Plot Data By IPU - this allows you to view the memory usage by IPU rather than by individual tile. Memory usage is summed across all tiles on each IPU. Note that the Variable and Tile Map Graph Types, as well as the Variables Tab below the graph, are not available when viewing by IPU.

There are also a number of options in the preferences that you can use to view the memory report differently - see the Preferences section , below.

2.7.2. Total Memory graph

This memory report shows the total memory usage across all the tiles on all IPUs.

On a Memory Report, select ‘Total Memory’ from the Graph Type menu.

The horizontal axis shows the tile number (which you can order by software or physical ID (see above ), and the vertical axis shows the memory usage.

Memory Report breakdown

Breakdown by Region

IPU memory has two different types of memory regions which Poplar allocates to data depending on how that data needs to be accessed:

non-interleaved : Consecutive words are stored in the same memory bank. Code must be stored here.
interleaved : Consecutive words are stored in alternating memory banks. Some high bandwidth load/store instructions like ld128 only work in interleaved memory, and therefore some codelets require variables they are connected to to be stored here. Code cannot be stored here.
overflowed : Memory that exceeds the maximum amount available on a tile.

Breakdown by category

When you select “Breakdown - By Category” another dropdown box is displayed with all the available memory categories. This can be used to understand the overhead costs of instrumentation.

The ‘Select Categories’ dropdown is multi-select so you can compare multiple categories simultaneously.
The total memory can be toggled on and off by selecting the “All” option.
Breakdown by category is also available when viewing memory by IPU.

Breakdown by liveness

This memory report shows the memory usage of the two types of program variables:

Always-Live Variables - these variables must be accessible for the entire lifetime of graph execution. This means nothing else can ever use the memory allocated for these variables. Examples include code and constants.
Max Not-Always-Live Variables - not-always-live variables are only needed for some program steps. As long as two variables are not live at the same time they can be allocated in the same location, thus saving memory. This option shows the maximum amount of live memory use on each tile. See the relevant FAQ for more details.

2.7.3. Variables Memory graph

This memory report allows you to select multiple variables and plot their memory usage across tiles.

On a Memory Report, select ‘Variables’ from the Graph Type menu.
A prompt appears, suggesting you enter a variable to search for. Type the variable name into the search box at the top, and the application will find matching variables and display them in a drop-down list.
Select a variable from the list to plot on the graph.
Remove variables from the graph by clicking the small ‘x’ icon in their names in the key legend, below the graph.

You can also access this feature from the ‘Total Memory’ report by selecting a variable from the Variables tab as described below .

Note

The Variables Memory graph is not available when viewing memory by IPU.

2.7.4. Tile map Memory graph

This memory report displays a schematic of an IPU and overlays it with a coloured representation of the tile memory usage for every tile for the selected IPU. The colour key is displayed on the right, and its range can be changed, as described below .

Note

The Tile Map Memory graph is not available when viewing memory by IPU.

On a Memory Report, select ‘Tile Map’ from the Graph Type menu.
Select an IPU to view using the input on the left, and the map updates to show the memory usage for that IPU.
Hover the mouse over the tile map to see a popup of the details of a tile within the selected IPU. This shows physical and software tile ID, memory usage and rank (described below). While you hover, a black line within the colour key, to the right of the tile map, shows the memory usage of the hovered tile, according to the colour scale currently selected.
Click on a tile on the map to select it and see its memory usage. Its details are shown in the tabs and tables below, as in other memory reports. You can select multiple tiles by holding down the Ctrl key while clicking a tile (Command for Mac). Details for each tile are displayed in the tables below, with a column for each tile. The selected tile numbers are displayed in the search box above the tile map, so can enter them by hand if you know which one you’re looking for.
The Breakdown menu at the top of the tile map allows you to break down memory usage by region (see here for an explanation of this). When breaking down by region, the ‘Region’ control to the left of the map allows you to choose to display interleaved or non-interleaved memory.
Choose whether to include or exclude gaps in the tile map by using the Options menu at the top of the report.
The panel to the left of the tile map allows you to select which IPU to view, which variable category you’d like to view, and which colour scale to use.

Note that you can change the size of the tile map by dragging the split-screen control, and it will fill the space available in the top half of the screen.

Changing the colour scale

There are three methods of colouring the tiles on an IPU that show their memory usage in different ways. Use the ‘Scale type’ dropdown to the left of the tile map to select one of these scales:

Relative (default) - The colour of a tile depends on the memory between the upper and lower memory values.
Absolute - The colour of a tile depends on the memory between zero and max memory.
Rank - The colour depends on a linear ordering of tiles based on their memory usage.

When your model is out of memory, the colours are scaled appropriately, not just to the max memory.

2.7.5. Tile memory usage

The bottom half of the Memory Report screen shows tabs that contain an analysis of memory usage by several different categories.

The default view shows memory usage for all tiles (or IPUs, if you are choosing to plot by IPU instead of tile), but you can select an individual tile/IPU as described above .

Tile memory usage: Details tab

The Details tab in the tile memory usage report displays a hierarchical list of memory usage by category on the selected tiles. This list is divided into three main sections:

Including Gaps - this shows memory usage on the selected tiles which includes the gaps between variables.
Excluding Gaps - this shows memory usage on the selected tiles which excludes the gaps between variables. It is split into interleaved/non-interleaved memory and also categorised by the type of data in that memory location.
Vertex Data - this shows the memory used by variables in the graph vertices as the Poplar program executes, categorised by the types mentioned in the ‘Excluding Gaps’ section, below.

Excluding Gaps

Memory usage on the selected tiles is displayed here in two categories, with memory usage figures for each:

by Memory Region - this show memory that is non-interleaved, memory that is interleaved, and any memory that has overflowed.
by Data Type - this shows memory further categorised by the type of data that is stored there (either overlapping data or non- overlapping data). The meaning of each of these categories is explained in the table below.

Not Overlapped data
This shows the memory usage for parts of the memory where only a single variable is allocated. This includes variables that cannot be overlapped with other variables (the always-live variables), and also variables that just happen to be not overlapped with other variables, even though it isn’t disallowed.
Variables	These are the variables added using `Graph::addVariable()` .
Internal Exchange Message Buffers	During the exchange phase of program execution, it may not be possible to send data straight to its destination. For example sending a single byte directly is impossible because internal exchange has a granularity of four bytes. In cases like this Poplar will copy the data to and from temporary variables using on-tile copies (which can copy individual bytes) and then do the actual exchange from these buffers.
Constants	These are variables that created using `Graph::addConstant()` .
Host Exchange Packet Headers	Host exchange is performed using a packet-based communication protocol. Each packet starts with a header that contains the address that its payload should be written to. These addresses are determined at compile time and the packet headers are stored in these variables.
Stack	This is where the program stack lives. It is created automatically during compilation. There is a single `Stack` variable on every tile that contains the stacks for the supervisor and worker threads. The stack size is configurable at compile time via an engine option.
Vertex Instances	These store the state of each vertex instance. Each call to `addVertex()` adds a single vertex instance to the graph, whose size is equal to `sizeof(TheCodeletClass)` .
Copy Descriptors	During compilation, Poplar will add compute sets that perform copies. These contain copy vertices, and the copy vertices reference additional data called Copy Descriptors that describe how to perform the copy.
VectorList Descriptors	A vector of pointers that points to the values of a multi-dimensional vector. The data for `VectorList<T, DeltaN>` fields.
Vertex Field Data	Variable-sized fields, e.g. the data for `Vector<T>` fields.
Control Code	All code that is not vertex code, this includes all the code that is generated from you Poplar `Program` tree, and code for each compute set that calls the codelet `compute()` functions for each vertex. It does not include the `compute()` functions themselves, which fall under the Vertex Code category.
Vertex Code	This is where the assembled code from the codelets is stored. A codelet is a class written in C++ or assembly, whereas a vertex is an instance of that class. Adding multiple instances of a single vertex type does not increase the amount of Vertex Code memory required.
Internal Exchange Code	The code instructions used to move data between tiles on an IPU.
Host Exchange Code	The code instructions used to move data between an IPU and the host machine.
Instrumentation Results	If you set the `debug.instrument` option in the Poplar Engine, this is where the cycle counts for various Poplar functions are stored. You’ll notice, therefore, that enabling instrumentation increases your memory usage. Different levels of instrumentation can be selected, which will use different amounts of memory. Note that the size of these variables is dependent on the level of dynamic branching in your program – if you’re timing every instance of a function call, the compiler won’t necessarily be able to tell in advance how much memory it will require to keep a cycle count for each of them.
Overlapped data
This is data for variables that are not always live, meaning that they are temporary and can be overlapped by other not-always-live variables if the two variables are not live at the same time. Reusing memory in this way reduces the amount that is required by Poplar programs. The sizes reported here count the memory used by the variables as if they were not overlapped. For example if two 4-byte variables are allocated in the same location it would be reported as 8 bytes here.
Program & Sync IDs	The Poplar Engine has a `run()` method to which you pass a vector of programs you want to execute. Each of these programs has an ID, so that you can specify which one to execute first. When you call `run(3)` to run the forth program, the `3` is the program ID that is sent to the IPU so that it knows which program to run. Additionally, when control flow cannot be statically determined, the IPU must inform the host which control flow path it took, so that the host knows which data to send during host exchange. This is done by sending Sync IDs.
Data Rearrangement Buffers	Data connected to a vertex edge is guaranteed to be contiguous in memory, but the Poplar API allows you to connect non-contiguous tensors to edges. In this case Poplar will need to insert rearranging copies to temporary variables so that the data presented to the vertex is contiguous.

Tile memory usage: Compute Sets tab

This tab contains a table of the compute sets that appear on the selected tiles or IPUs or, if none are selected, all tiles or IPUs. The name and total size of memory for each compute set are listed in descending order of size.

Each row can be expanded or collapsed by clicking the chevron icon at its left-hand edge. When a compute set is expanded, a subsidiary table of its constituent vertices is displayed beneath the compute set row.
The vertices table shows the total size of memory for each vertex in the compute set, which is also shown in the Vertices tab because it is independent of the compute sets.
In the comparison view, the difference in the size of a compute set or vertex between the source and target reports is only displayed if it appears in both reports. Otherwise, the difference column is empty for the row.
For reports generated with the Poplar SDK 2.3 or later, the instance count for each vertex in the compute set is also shown. The count for each selected tile or IPU is specific to the compute set that is expanded.
Because vertices are shared, a vertex may have a non-zero size for a tile or IPU even if its instance count is zero. Similarly, when comparing reports, a compute set that appears in only one of the reports may be made up of vertices which nevertheless appear elsewhere in the other report.
The compute sets can be filtered by their name, or the names of their constituent vertices, by using the text input and dropdown button above the table.

Tile memory usage: Vertices tab

This tab in the tile memory usage report lists the memory used by the graph vertices, together with the total memory size they occupy across the selected tiles (or all tiles, if none is selected). This list is ordered by decreasing memory usage.

For each vertex, the Poplar namespace and function name are listed, together with any additional information about their types. Please refer to the Poplar API Reference for a description of each of these functions.
The origin of each vertex, in a small blue box, is displayed after the vertex name, indicating whether the vertex was written in C++ or Assembler (ASM).
You can filter the vertices by name, using the input box above the table, or by source (C++ or ASM (Assembler) using the dropdown list.

Tile memory usage: Exchanges tab

This tab in the tile memory usage report displays the internal exchange code size for all tiles/IPUs, or the currently selected tiles/IPUs. When comparing reports, there is an additional column that shows the difference between the source and target code size.

‘Open up’ any of the exchanges by clicking on the small arrow next to it, and you’ll see detailed information of the exchange. If no tile is selected, the exchange information is grouped by name, and the total size of the exchange data is added up to give the total size in the column on the right for exchange variables of that name.
The ‘FROM’ and ‘TO’ labels indicate the direction in which the exchanges was made, showing the names of the variables involved, and how much data was passed.
The UL and L tags show whether the variables are ‘unlowered’ or ‘lowered’. Unlowered variables are created and ‘lowered’ across several tiles, so that parts of it are mapped to other tiles’ memory variables. Many of the Poplar operations create lowered variables directly, so rather than create a large variable and map it across the tiles, it creates many little variables on each if the tiles, and maps out the exchanges that are required between them. There’s no higher level variables to reference, hence the need to differentiate the two types.

Exchange information can also be seen on the Program Tree .

Tile memory usage: Variables tab

This tab in the tile memory usage report displays a memory map of the currently selected tile, showing code and variable usage across the memory locations. Note that this tab is not available when viewing memory by IPU.

Note

Entries in this section of the report are only present if lowered variables were captured in the captured profile. See the Report files section for details about how to generate this when executing your program.

You can toggle between two different views (memory map and table view) on the Variables tab by clicking on the icon in the right-hand corner of the tab contents.

Select an individual tile as described above to view its memory layout and variable usage.

There are several interactive features of the Variables view that can help you find the locations in which variables are stored:

The selected tile’s memory is displayed vertically in a scrollable area that is 1024 bytes wide. The tile memory is partitioned into memory elements , which are either Interleaved or Non-Interleaved (see here for more information). Any unused elements at the end of the IPU memory are not displayed.
Variables are displayed as coloured bars which span the memory locations, and in places where two or more variables overlap, you can see all the variables at that location by hovering your mouse over the variable.
All variables in the memory layout are coloured according to their type. Click the colour key icon in the top right-hand corner to view the colour key for each type. The meaning of each of the categories displayed here is described in the table above. You can click on the checkboxes on the left-hand side of each variable to display it or hide it from the variable plot.
Click on a variable to display its details, which appear on the right-hand side of the memory layout. This displays all variables which exist at any time at that memory location.
Click on the ‘Show’ button at the bottom of the variable details, beneath the ‘Interference’ heading, to filter other variables that interfere with the selected variable in terms of memory placement. See the Memory interference section below.
Search for a variable by entering search text into the input field above the memory layout. Variables with matching text in their names will be highlighted in the memory layout, with all others disappearing. You can clear any text you’ve entered here by hovering over the box and clicking the small x icon at the right-hand end.
Plot one or more variables on the Memory graph, as described below .

Note

You can expand the variables map to fill the report window by clicking on the full-screen button in the top right-hand corner of the map. A corresponding button to shrink the map again appears in the top-right corner.

Memory interference

You can see other variables with which a selected variable ‘interferes’. These are variables that are in contention in terms of their memory placement. There are three ways variables can interfere with each other:

Memory: A variable cannot be occupy the same bytes as some other variables because it is live at the same time as them. Always-live variables interfere with every other variable in this way.
Element: A variable cannot be in the same memory element as another one. This can occur in some case when two variables are connected to the same vertex, and it is reading from one and writing to another using certain instructions.
Region: A variable cannot be in the same memory region as another one

To see which other variables interfere with a selected variable:

Select a variable from the memory map by clicking on it. Several variables may occupy that memory location, and their details are displayed in a list on the right-hand side.
Click on the ‘Show’ button at the bottom of a variable’s details, beneath the ‘Interference’ heading, and the variables in the memory map will be filtered using that variable’s name, showing only those that interfere with it.
To re-display all variables, click the small cross in the filter box at the top of the variable memory map display.

Variable types

Variables in the memory layout diagram are categorised by colour as described below. Note that more detailed descriptions are available in the Excluding Gaps table, above.

User variables - these are user-defined variables.
- Variable - variables created using the addVariable() Graph function.
- Constant - variables created using the addConstant() Graph function.
Code variables - these are variables that are created by Poplar to execute the program code.
- Control Table - experimental, empty by default.
- Control Code - variables used by Poplar to run the program. Some specific control code variables are described in the section Known variables , below.
- Vertex Code - the code within the vertex ‘codelets’.
- Internal Exchange Code - the compiled code used to move data between tiles on an IPU.
- Host Exchange Code - the compiled code used to move data over PCI between an IPU and the host machine.
- Global Exchange Code - the compiled code used to move data between multiple IPUs.
Vertex data variables - these are variables associated with tensors.
- Vertex Instance State - the internal state of each vertex instance.
- Copy Descriptor - additional metadata used for copy vertices.
- Vector List Descriptor - data for VectorList<T, DeltaN> fields.
- Vertex Field Data - data for Vector<T> fields.
Temporary variables - these are used to temporarily store information that Poplar uses while executing the program code. They are not-always-live, overlapping variables.
- Message - temporary storage for internal exchange (between tiles on one IPU).
- Host Message - temporary storage for host exchange (between an IPU and the host).
- Global Message - temporary storage for global exchange (between IPUs).
- Rearrangement - variables used to store intermediate values when an edge is connected to non-contiguous variables.
- Output Edge - temporary output variable used when a vertex is connected to a variable on a different tile. The data is copied by internal exchange after the compute set has been executed. Note that this category is also used for Input Edges. It should really be named Input/Output Edge.
Miscellaneous variables - other variables that don’t fit into the categories above.
- Multiple - sometimes variables are merged during lowering. If they came from two different categories the resultant variable is put in this category.
- Control ID - the combination of program ID, sync IDs and software sync counter.
- Host Exchange Packet Header - header information for PCI messages between the IPUs and the host machine.
- Global Exchange Packet Header - header information for PCI messages between IPUs.
- Stack - thread stacks for each tile.
- Instrumentation Results - the cycle counts if instrumentation is enabled.

Known variables

Poplar uses some specific variables that you may encounter on various tiles. Their purpose is described below:

.text.poplar_start – this is the entrypoint and main control code for your program. It’s roughly equivalent to main() in a C program.
.text.supervisor.control__func[…] – these are the control codes for the functions in your compiled program. These are the functions you can see on the Program Tree report.
.text.supervisor.control_initPrngSeed - the control code to initialise the seed for the Psuedo-Random Number Generator (PRNG).

Plot multiple variables

When a variable is selected in the Variables tab, its details are displayed in the right-hand column next to the memory map for that tile. You can then plot that variable’s position in memory on the main graph, as follows:

With the variable selected, click on the ‘Plot variable’ button. The ‘Graph Type’ menu at the top changes to ‘Variables’. The variable name is added to the variable list above the report, and you can see how it is placed across the tile memory.
Click other variables in the memory map, and you can repeat the process above, adding them to the variable list at the top of the report, and displaying their memory placement together on the same graph. The default behaviour shows the size of the variable on each tile. If you select ‘Plot variable by address’ from the Options menu at the top of the screen, you can see how the variable is laid out in the memory space.
To remove a variable from the graph, find its name in the list above the graph, then click the small ‘x’ button at the right-hand end.

Full-screen option

When viewing the content of the Variables tab it may be easier to view the data in full-screen mode. You can toggle this option on and off using the button in the top right hand corner of the tab.

Toggle between table and graph view

The Variables tab has a table view, listing the variable names and their sizes, and also a graph view which provides more in-depth detail. You can toggle between these by using the chart/text button in the button group in the top right-hand corner of the tab.

Show differences between selected tiles

When multiple tiles are selected the difference between values is displayed in red or green text on the table view. You can remove variables that have the same value by enabling the “Show differences between selected tiles” option. This is accessible by clicking on the cog icon and checking the box in the drop-down menu. This filters out all variables that have the same value, and leaves only those that are different.

Show base address offset

Each IPU version has a different address where available memory starts on its tiles. For the Mk1 IPU, this is 0x40000 , and for the Mk2 it is 0x4C000 . Selecting this menu option resets the base address to the IPU version you’re using.

2.8. Viewing a Liveness Report 

The Liveness Report shows which of the not-always-live variables are allocated at certain points in your program, and gives a detailed breakdown of variable memory usage by the compute set that they’re in. There are two main areas to the report:

the top half of the reports shows the graph of memory usage against compute set .
the bottom half of the report shows details about the variables that are used within the selected compute set.

For a standard machine learning model that you’re training, for example a ResNET, you’ll generally see a curve that ascends to a peak and descends again. The rising portion of this curve is the memory usage during the forward pass of the training algorithm, where many activations are created (‘not always live variables’), and the peak represents the point where the maximum number of activations exist. As the curve descends again, which represents the backwards pass of the training algorithm, the activations are ‘released’ after being used to update the weights.

When you’re inspecting a liveness graph, it’s informative to look at the peak of this curve, and select the corresponding compute set to show the how the variables usage is contributing to the greatest memory utilisation.

2.8.1. Navigating Liveness reports

You can move around the main report graphs using your mouse, as described above.

2.8.2. The Liveness graph

Hover your mouse pointer over the graph to see a callout containing the compute set ID, together with the amount of memory used by that compute set.
Click on the graph to see that compute set’s stack details below the graph, which displays its name and memory size. You can select multiple program steps by holding the Shift key and clicking other program steps. Each of these steps is then displayed in its own column on the Not-Always-Live Variables and Vertices tabs.
You can choose to display memory usage statistics for the selected tiles - see the relevant Preferences section , below.
You can plot the lifetime of not-always-live variables directly on the graph - see here for more details. This is an experimental feature.

Note

If you’re comparing two reports, you can choose to display their liveness graphs together or separately. See the ‘Merge Graphs’ Viewing options section below.

Selecting a source

You can choose whether you want to select an individual tile, or, if the report was generated using Poplar SDK 1.2 or later, select an individual IPU.

To select particular tiles or IPUs:

Click on the ‘Select Source’ dropdown list at the top of the graph and select the source you want to use for the graph. Each source will have its own plot.
All tiles - shows liveness data for all the tiles (the default)
Worst tiles - shows the two tiles that have the highest memory usage during program execution, and you can select either of them to view.
If the report was generated in Poplar 1.2 or later, you can also select one or more IPUs from this list.

There is a Poplar Engine setting which allows you to capture more than the default two worst tiles. Please refer to the Poplar API Reference for full details of these options.

Filtering steps

You can concentrate on the steps you’re interested in by filtering on a particular search term. When you enter a term (and press the Return key, those steps that don’t match in the execution graph are moved to a separate dataset in the graph, and ‘greyed-out’, leaving only the steps whose named match your search term.

To cancel the filtering, click on the small ‘x’ at the right-hand end of the search box.

Viewing options

There are several options for viewing Liveness Report graphs, which you can select from the ‘Options’ drop-down menu in the top right-hand corner of the report screen:

Include Always-Live - whether to include a second trace in the graph that shows variables that are always present during the entire execution of the program, and always have the same memory address (for example stack ). Note that you can show and hide each of the traces on the graph by clicking on their colour key, just below the x-axis.
Include empty stacks - whether to include stacks that contain no variables.
Stack IPUs - whether to display any selected IPUs (see above ) stacked or not.
Show Max Memory - whether to display a line on the graph showing the maximum memory limit for total memory, a selected tile, or a selected IPU. Note that this requires the ‘Include Always-Live’ option, above, to be enabled, and ‘Stack IPUs’ to be disabled.
Merge Graphs - whether to merge source and target report graphs together, or keep them as separate graphs. When displayed separately, you can navigate each graph independently, and select different program steps from each report. If your two reports contain different numbers of program steps, you’ll be able to select the corresponding steps from each to view their details. When the graphs are merged together, you navigate them together ‘as one’. When you click on the graph, the two program steps you select (from the source and target reports) may not correspond to the same element of each program.

2.8.3. Liveness stack details

In the bottom half of the Liveness report screen, you’ll see a tabbed list of live variable details that show their memory usage. These details are described below.

Viewing enhanced debug information

If you have captured enhanced debug information when compiling your program, it is visible in the ‘Always-Live’ and ‘Not-Always-Live’ variables tabs. See the section on the Debug Information file, above, for details of how to capture this information.

If a variable has a small ‘disclosure arrow’ ( ) next to it, click it to show enhanced debug. You can see a list of software layers on the left (for example, Poplar, PopLibs, etc.) Selecting a layer shows the debug information that has been added by that layer.

For all variables you have information captured from the Poplar layer. This may include:

whether it’s a Variable, Cloned Variable or Constant,
the shape of the variable as an array of dimensions,
the type of variable (e.e. ‘Half’),
for cloned variables, which variable it was cloned from, and the method by which it was cloned,

If the variable was created as part of a PopLibs API call, you can also see the following information:

the PopLibs API call
the input tensors to the API call
the arguments to the API call
the output tensors to the API call

Note

Note the variable may not be an output of the PopLibs API call. It could be an internal variable created for the operation. Depending on your application, you may see further debug information for the framework and/or application.

By default, the debug information for the first PopLibs and Poplar call is shown. You can choose to show all PopLibs/Poplar calls that are made internally by clicking the gear icon in the top right-hand corner of the debug information box and selecting “Show All Debug Infos”. For instance, you may see the debug information for the PopLibs matmul API call. If you enable the option to “Show All Debug Infos”, you will see the internal implementation PopLibs calls, which include the PopLibs convolution API call. ( matmul is implemented as a convolution.)

Always-Live Variables

This tab shows the variables that are always live. Their data must always be available and therefore they cannot share memory with any other variable.

Not-Always-Live Variables

This tab shows the variables that are not always live. At certain points in the program we do not need to store any data in them, and therefore other variables can be allocated at the same location. Variables are never “allocated” or “deallocated” at runtime. All variables are statically allocated at compile time and always have a fixed address.

You can select one or more of the not-always-live variables to see their lifetime displayed on the graph above. Click on the small icon at the right-hand end of their listing, and they will be added to the plot, showing when they were created, and when they were destroyed. Each variable plotted also appears as a small blue box above the graph, and you can click the small x within them to remove them from the graph. This is an experimental feature.

Vertices

This tab shows the vertex functions that are contained in the selected compute set.

The origin of each vertex, in a small blue box, is displayed after the vertex name, indicating whether the vertex was written in C++ or Assembler (ASM).
The number of that vertex type created is also shown.

Cycle estimates

This tab shows the cycle estimates of each tile on an IPU. This is only available when using an IPUModel

This is an experimental feature, and can be enabled and disabled in the Preferences.

Show differences between selected tiles

When comparing reports the difference between values is displayed in red or green text on the table view. You can remove variables that have the same value by enabling the “Show differences between selected tiles” option. This is accessible by clicking on the cog icon and checking the box in the drop-down menu. This filters out all variables that have the same value, and leaves only those that are different.

2.9. Viewing a Program Tree 

The Program Tree report shows a hierarchical view of the steps in the program that is run on the IPU. The report has a menu on the left-hand side that lists the Control Programs and Functions. The program steps contained within the selected control program or function are displayed in the main report area, on the right.

Click on one of the Control Program or Function numbers in the left-hand column to see the sequence of instructions that are contained within it.
You can view any debug information that has been captured for the selected program step - see below for more details.

The program steps in the main report area are listed hierarchically, and you can collapse or open steps that have nested steps with them by clicking on them. A small grey triangle at the start identifies steps that have sub-steps: if it’s pointing to the right, then that step is ‘collapsed’, and all of its sub-steps are currently hidden. Click the step to open up that step and show all its sub-steps. All steps are initially shown opened up.

Each of the program steps is colour-coded, making it easier to ‘pick out’ particular steps that you’re interested in viewing. Where steps corresponding to those found in the Execution Trace report, they are coloured the same.

When you select a step that is associated with other program steps, they are all highlighted in yellow. Details of that vertex are then displayed in the tabbed section below the main program tree.

For a detailed explanation of what each of these program steps involve, please refer to the Poplar & PopLibs API Reference Guide on the Graphcore Developer Documentation website.

2.9.1. Searching for program steps

You can search for a particular step by entering some text in the Search box at the top of the report window, and then pressing the Return key. Program steps that match your search term are highlighted in yellow in the main part of the report. Control programs and Functions that contain matching steps are also highlighted in the left-hand window.

You can scroll through the results, either by pressing Return repeatedly, or by clicking the arrow buttons under the search box. Single arrows move the result selection back and forward by one, and double-arrows move by ten.

When comparing two reports, two sets of results are displayed, and you can step through them searately. The top set of results shows matches in the source report (on the left), and the bottom set of results shows matches in the target report (on the right). Match highlighting works independently in both reports.

If ‘Show Program IDs’ is enabled, you can search for a step’s program ID as well as its type or name.

2.9.2. Details tab

When a program step is selected, more details of that step are displayed in the Details tab, such as the ID, type and name of the step.

If there are any vertices associated with that step, they are listed in the table below, with the following details:

The name of the vertex class, together with a small blue icon indicating whether the vertex was written in C++ or Assembler (ASM).
The memory size occupied by each vertex instance of this type.
How many instances of this vertex were created from the selected program step.
If the program is an exchange, you can also see the size of the exchanges, and a list of which variables were involved. For exchanges between tiles, the FROM and TO tiles are displayed, and for StreamCopy programs, you’ll see the amount of data transferred from the host to the IPUs, and from the IPUs to the host. You can read more about exchange data in the Memory Report .

You can sort the vertex types listed in the Details tab of the Program Tree by name, memory size, or by the number of instances in the selected program step.

2.9.3. Viewing Debug Information

Clicking on a program step reveals debug information that was captured during program compilation. This reveals all the steps that are needed to execute the selected operation. In addition, when you select the the debug information, the program steps created for that API call are highlighted in the Program Tree.

The debug information is the same as that found on the Liveness report . See the display Options control above for showing and hiding the debug data.

2.9.4. Change Layout

When comparing the Program Tree of two reports a button in the top right hand corner allows you to arrange the Program Trees side by side or one on top of the other.

2.9.5. Viewing options

There is one option for viewing the program tree, which you can select from the ‘Options’ drop-down menu in the top right-hand corner of the report screen:

Show Program IDs - whether to append the ID of each program to its label in the tree.

2.10. Viewing an Operations Summary 

The Operations Summary displays a table of all operations in your model, for a software layer, showing statistics about code size, cycle counts, FLOPs and memory usage. You can also select which software layer’s operations you want to summarise.

Clicking on an operation in the table reveals further information about it in the tabbed section in the bottom half of the report, displaying graphs of code size, cycle counts, and various other measurements and estimates for the selected operation. You can choose which columns you want displayed in the table, and also apply sorting and filtering to it.

2.10.1. Operations table

The Operations summary table shows a list of all the operations within the selected software layer.

Note

Because the column headings in this table are typically quite long, we’ve used abbreviated headings that match the full column names displayed in the Columns drop-down list. If you hover your mouse over the column headings, you’ll see the full name displayed as a pop-up box.

Selecting a software layer

By default, the PopLibs software layer is displayed, as you can see from the drop-down Layers list in the top right-hand side of the table. You can select other software layers whose operations you wish to see by selecting from this list. Depending on your program, the following software layers may be available:

Poplar and PopLibs
PopART IR and PopART Builder
ONNX
TensorFlow Poplar Drivers
TensorFlow HLO Instructions
TensorFlow XLA Operations

Note

Changing layers involves a sometimes lengthy re-calculation of the table metrics, so you may need to wait a short while for larger reports.

Selecting which columns to display

Note

The ‘Columns’ control works differently depending on whether you’re viewing a single report, or comparing two reports:

When viewing a single report, you can select as many columns as your screen has room to display. Each column contains the values for that metric.
When comparing two reports, the Operation Name is always displayed, and you can select one other column to display. As well as the Operation Name column, there are three other columns that show the value of the selected column metric for the source report and the target report, as well as a column that shows the difference between the source and target values.

By default, the operations table displays the following metrics for each operation (single view only):

Operation Name
Debug Name
Code Size (Total)
Measured Cycles (Total)
FLOPs

Note

FLOPs are not generated by default. Enable the profiler.includeFlopEstimates Poplar Engine option to generate FLOP estimates.

Many other operation metrics can be displayed or hidden in this table by checking or unchecking the data types in the drop-down Columns list in the top right-hand corner of the operations table. Your current column selection preferences are automatically saved.

The Not-Always-Live Delta (NAL∆) option (experimental) shows the difference that each operation makes to the variable memory as the program executes. This helps you identify which operations are the most ‘expensive’ in terms of memory. Note a operation may have multiple liveness values if it is repeated in the execution, the value show is the liveness delta from the first occurrence.

Note

Columns showing a range of Cycle Estimates are experimental.

Sorting and filtering

You can sort the operations table by any of the column headings, as well as only showing operations that match a particular string:

Click on a column heading to sort the table by that operation metric. Repeated clicking sorts in ascending, descending order, or removed the sort from that column. A small blue triangle (or none) indicates which order is currently being used.
Enter some text in the Filter operations box above the table, and press Return to display only those operations whose operation name or debug name matches the text you enter. Remove filtering by emptying that text box.

2.10.2. Operations Summary tabs

The tabbed section in the bottom half of the Operations summary shows further information about the selected operation. When no operation is selected, only the Summary tab is visible, which shows some general statistics about all the operations in the currently selected software layer.

Select an operation from the table by clicking on it. The selected operation is displayed at the top of the tabbed section, along with a small “x” icon that you can click to deselect it, and return to the Summary tab.

Summary tab

When no operation is selected from the table, this tab shows a breakdown of operations for the selected software layer, including:

The total number of total operations for that layer
The total code size for that layer
The total number of cycles executed in that layer

When an operation is selected from the table, data from the default table columns is displayed.

Program Tree tab

When an operation is selected from the table, this tab shows the program steps involved in that operation. This is the same data displayed in the Program Tree report.

Code tab

When an operation is selected from the table, this tab shows a graph of the code size executed for that operation. Code size for OnTileExecute and DoExchange program steps is displayed against tile number for all IPUs. You can zoom and pan around this graph just like you can for other graphs.

Cycles tab

When an operation is selected from the table, this tab shows the number of cycles taken by the selected operation, plotted against all the IPU tiles. You can zoom and pan around this graph just like you can for other graphs.

By default, only cycle counts for the OnTileExecute and DoExchange program steps are displayed, but you can add other program step types to include on the graph by selecting them from the Options drop-down list in the top left-hand corner of the graph. Available options are:

Show Copies - this separates steps for OnTileExecute programs that are just for copies.
Show Estimates - this includes a number of estimated cycle counts:
- DoExchange Estimated Cycles
- OnTileExecute Estimated Cycles
- OnTileExecuteCopy Estimated Cycles
- StreamCopyMid Estimated Cycles
- GlobalExchange Estimated Cycles

FLOPs tab

When an operation is selected from the table, this tab shows a graph of the total number of FLOPs (Floating Point Operations) executed for the selected operation, plotted against IPU tiles.

Debug tab

When an operation is selected from the table, this tab shows debug information from the currently selected software layer for that operation. This information is identical to that on the Liveness Report and the Program Tree Debug sections.

2.11. Viewing an Operations Graph 

The Operations Graph displays a graphical representation of TensorFlow models, showing High Level Operations (HLO) and enabling you to:

drill down through the modules, expanding and collapsing the layers to get to the level you want;
view details of operations, edges and layers;
view the type and shape of the tensors between operations;
view graphs of pipelined models to see what is in each pipeline stage, and what passes between them;
colour items in the graph based on selected metrics (for example, code size or cycles used);
configure the layout with a number of advanced options.

The Operations graph shows how the HLOs are connected to each other in your TensorFlow model in a number of nested layers. Layers and the operations they contain are displayed as boxes, and the tensors that those operations use are shown as arrows between operations.

The report is shown in split-screen, the left-hand side showing the graph at the selected level, and the right-hand side showing information about the selected object in the graph in a series of tabs.

Pan around the operations graph by click and dragging your mouse anywhere on the graph.
Zoom in and out of the operations graph using the mouse scroll wheel.

2.11.1. Graph entities

There are several types of entity displayed on the operations graph. You can select them, and expand the layers and calls to drill down into the model. You can click and double-click entities on the graph to expand them (layers and calls) or to display other information about them (operations and edges).

Note

You can customise the appearance of the entities in the operations graph by adjusting the advanced view options: see below . The Options menu at the top of the report allows quick access to two of these layout options (‘Show Backward Pass’ and ‘Show Edge Labels’).

HLO layers

HLO layers are created when two or more operations have debug names which have the same prefix string, separated using the / character. They are displayed as boxes with solid borders and square corners:

HLO Layer

Click on a layer to see its details displayed on the right-hand side (see below for details).
Double-click on a layer to expand it. All of that layer’s sub-layers are now displayed within the original layer, which is now displayed as a box around the sub-layer entities. Double-click that outer layer box to return to the original, enclosing layer.
Other layers within the operations graph can be displayed by selecting them from the drop-down menu at the top of the graph.

Defining HLO layers

The operations graph works best if you name your TensorFlow operations using either Keras layers or tf.variable_scope (see the TensorFlow documentation for name_scope . Currently the view assumes the top level tf.variable_scope is called “all”, which is a common convention in the public examples.

For best reults, all forward operations should be in the “all” layer, and all backward operation names should start with gradents/all .

Note

It has been found that TensorFlow 2 GradientTape does not work well, as it does not record the scope names for the backward pass. For best results, use the TensorFlow 1 compat optimizers.

High Level Operations (HLOs)

HLOs are displayed as boxes with solid, black borders with rounded corners:

HLO Name

Click on an HLO to show its details in the tabs on the right-hand side of the report.

Operations that are disconnected from the rest of the graph are displayed as boxes with dashed, grey borders with rounded corners:

Disconnected HLO

HLO Calls/fusion operations

HLO Calls or fusion operations are displayed as boxes with double borders and rounded corners:

HLO Call

Click on an HLO Call to show its details in the tabs on the right-hand side of the report.
Double-click an HLO Call to expand it and see the operations within it.
Expanded HLO Calls show up as a list of breadcrumbs at the top of the graph, and you can click on those to retrace your steps.

Edge tensors

Tensors - data that is used as the input to, or output from, operations in the graph - are displayed as arrows. Their labels are either the tensor’s shape and type (for single tensor), or how many of them there are (for multiple tensors).

Click on a tensor to see its details in the tabs on the right-hand side.

Note

There are several options to show or hide labels in the Advanced Options tab.

2.11.2. Selected entity information tabs

When you select an entity from the graph (a layer, operation, call, or edge), its name is displayed at the top of the right-hand side of the report, and some extra information associated with the entity is displayed in the tabs beneath. This includes:

Summary tab - this displays different information depending on which type of graph entity is selected:
- When a layer is selected, this displays some statistics about the layer in general, such as total estimated FLOPS.
- When an operation or call is selected, the inputs and outputs to the operation are displayed, together with their associated tensor shape, as well as statistics about code size (OnTileExecute, DoExchange and Total) and estimated FLOPS.
- When an edge tensor is selected, the source (‘From’) and destination (‘To’) operations are displayed, as well as tensor shape and size.
Program Tree tab - any program tree steps associated with the selected operation.
Details tab - graphs of various metrics plotted against tiles, including code size, cycles and FLOPS.
Debug Info tab - any debug information associated with the selected operation.
Advanced Options tab - see below for details of how to customise the appearance of the operations graph.

2.11.3. Highlighting operations by metric

The operations graph has a feature to colour graph entities based on various metrics. Entities that have a relatively high value for that metric are coloured with a ‘hot’ colour (red) to highlight those operations that are costly in terms of memory, cycles, etc., and entities with a relatively low value for the metric are coloured with a ‘cool’ colour (blue).

From the ‘Highlight’ drop-down menu at the top of the report screen, select a metric that you want to use to highlight certain operations.

Metrics that are available to use for highlighting include:

None - switch off highlighting
Code size (Total)
Code size (OnTileExecute)
Code size (DoExchange)
Estimated Cycles (Total)
Measured Cycles (Total)
FLOPS

The colour key for metric values is displayed in the top right-hand corner of the graph, showing the highest and lowest values present in the current view.

2.11.4. Advanced options

The right-most tab on the right hand side of the report shows a number of display options for laying out the items in the operations graph.

Check and uncheck options to display or hide information on the graph.

Note

The Options menu at the top of the report allows quick access to two of these layout options (‘Show Backward Pass’ and ‘Show Edge Labels’).

2.12. Viewing an Execution Trace 

2.12.1. Detailed Execution Trace vs Lightweight Profiling

The Execution Trace report shows the output of instrumenting a Poplar model in order to capture information about how long various parts of the model take to run. There are two different ways of capturing this information: the detailed Execution Trace, and Lightweight Profiling. Each has advantages and disadvantages. Which one is captured can be chosen by modifying the Poplar engine options before running the model.

The detailed Execution Trace works primarily by simulating execution of the application on a simplified IPU model. The simulation data is augmented by capturing information at runtime about how many cycles each compute set takes to execute. The advantage of the detailed Execution Trace is that the data captured is very detailed: every single program step is included and the trace covers every tile on every IPU. This also means we can show the BSP trace and provide information about how a single program step may execute for different periods of time on different tiles. In addition to the cycle count for each step in the model, the detailed Execution Trace also displays statistics like the tile balance and cycle proportions, and compute-set details.

The detailed Execution Trace has several disadvantages. Simulating the execution of every program step on every tile can add a significant amount of time to the compilation of the Poplar application. Recording BSP data means that profiling code must be added to every tile. Finally, only a single cycle-duration is recorded for each compute set. If the compute set executes multiple times in the model but the executions take different amounts of time, the detailed Execution Trace will not show these differences.

Lightweight Profiling (LWP) does not involve any simulation and uses data captured at runtime from the IPU. Instead of capturing information about individual program steps, LWP records the start and end time of Blocks, which are programs added in order to instrument part of an application. Blocks may contain only a single program step, or may contain high-level operations consisting of a large number of program steps.

Instead of capturing information on every tile, LWP only records Blocks on a small number of selected tiles. This means that BSP and tile-balance information are not available from a LWP execution trace. However, as well as Blocks, LWP also records StreamCopy programs that take place on the selected tiles. Because these programs span many tiles on an IPU, a LWP trace also includes blocks for StreamCopy programs on tiles that were not selected, but which participated in a StreamCopy on a tile that was selected. These blocks can be aggregated at the top of an IPU’s trace by enabling the “Aggregate StreamCopy Blocks” option.

LWP captures far less data than the detailed Execution Trace and so is suitable for instrumenting large, long-running models with minimal overhead. The disadvantage of LWP is that the information available is less detailed than the information in the detailed Execution Trace.

Note

When you open an Execution Trace, a cache file named profile.pop_cache is created, which makes it much quicker to load the report when it’s opened a second time. If this file becomes corrupted for any reason, you can delete it by using the ‘Delete Cache’ option in the Help menu.

2.12.2. Execution Trace options

Selecting IPUs

The menubar above the graph contains a drop-down list named ‘Select IPU’ which allows you to select all IPUs or any individual IPU.

View options

There are several options for viewing Execution Trace graphs, which you can select from the ‘Options’ drop-down menu in the top right-hand corner of the report screen. Which options are available depends on whether the report uses detailed Execution Trace or Lightweight Profiling

Detailed Execution Trace Options

Show Tile Balance - this shows what percentage of tiles are in use during that program step, visible as shading in each each step.
Show Terminals - whether to include a line at the right- hand end of each process in the graph.
Group Executions - whether to group executions together that are part of the same ‘higher’ process further up the call stack. Grouping is determined by the slash-delimited function calls that are logged to the execution trace profile output.
Group Syncs - whether to group multiple successive Sync steps together in the graph.
Show Text - whether to show the name of each step in the graph.
Separate Overlapping Steps - whether to split up overlapping program steps into separate, non-overlapping ‘lanes’ in the graph so that they can all be seen at once.
Show External Syncs - whether to display External Sync steps in the graph. There are often many of these, and hiding them may make the execution graph plot easier to understand in some cases.
Show SyncAns - whether to display SyncAns steps in the graph. As for External Syncs, above, hiding these steps may simplify the graph plot.
Use new BSP stats - enable the new BSP-based statistics which are more accurate than the standard step-based statistics but may be slower to calculate for particularly large reports.

Lightweight Profiling Options

Show Text - when enabled, the name of each block is displayed on the graph.
Show Buffer Flushes - LWP needs to periodically flush profiling data from each IPU. By default, time spent carrying out these flushes is hidden. When this option is enabled, all buffer flushes are shown.
Aggregate StreamCopy Blocks - LWP includes blocks that represent StreamCopy programs on the selected tiles, as well as any tiles that also participated. When this option is enabled, StreamCopy blocks on tiles that were not selected are aggregated and displayed in a separate “I/O Events” lane at the top of each IPU’s trace.
Show Relative Cycles - the first Block that is recorded in a LWP trace may start at a non-zero number of cycles, which can make it more difficult to see blocks’ durations at a glance. When this option is enabled, the minimum cycle count is subtracted from the axis ticks, i.e., the ticks become relative to the start of the first Block.

Colour Key

You can view the colour key that the execution trace uses by clicking the key icon in the top right-hand corner of the graph.

Note

If your report is very large (typically over 40 million cycles), the smaller steps will be too small to render, and you’ll see a small pop-up message displayed just under the mini-map that indicates that you should zoom further in to see relevant detail in the graph trace.

Change Layout

When comparing the Execution Trace of two reports a button in the top right hand corner allows you to arrange the Execution Traces side by side or one on top of the other.

2.12.3. Detailed Execution Trace

Note

Detailed Execution Trace profiling has an overhead that can be prohibitive in some cases. Please refer to Profiling Overhead for details on how to prepare your execution to minimse that overhead.

There are two halves to the Execution Trace report:

the Execution Trace graph , in the top half of the screen, showing wither a flat compute set execution trace, or a flame graph of the stack,
a details report, in the bottom half of the screen, showing statistics about the cycle proportions, tile balance and compute sets present in the portion of the graph currently displayed. The details report has two tabs, the Summary tab , which shows statistics, cycle proportions and tile balance (experimental), and the Details tab , which shows details of a selected process from the execution trace graph.

The Execution Trace graph

The top half of the Execution Trace report shows, by default, a ‘flat graph’, showing a set of consecutive blocks for each IPU, identifying what program steps were executed (as shown in the Program Tree report), and how many cycles they took. You can also view it as a flame graph, where the Poplar debug information is used to group compute sets together as part of the same operation, or application layer.

Select features that you want in the graph from the Graph View drop-down, including Flame graph, BSP Trace, and displaying separate runs of the program.
Move around the main report graph using your mouse, as described above.
Hover your mouse pointer over the graph to see a callout containing the compute set ID, together with the amount of memory used by that compute set.
Click on a run at the top of the trace, and you can see the run-specific execution parameters displayed in the Details tab.
Click on the graph to see that compute set’s stack details below the graph, which displays its name and memory size.
Double-clicking on a layer when the flame graph option is selected expands that layer to the full width of the graph. You can select two layers at once by clicking on the first one, then Shift-clicking the second one. The graph then expands to contain just two layers. This makes it possible to inspect the cycle proportions of only the visible section of the execution trace, making it easier to understand the proportion of cycles spent in each type of process.
Clicking on a region in the BSP trace also selects the corresponding step in the Execution Trace graph.
You can magnify the BSP trace vertically to help identify individual tiles. Click on the + and – buttons to the left of the trace zooms in and out vertically. Click the ‘recycle’ button to return to the default magnification.
Click the icon at the left-hand edge of a ‘lane’ in the graph to collapse it such that only the top row of blocks is shown.

Execution View

Use the ‘Execution view’ dropdown list in the top left-hand corner of the Execution report to control what features you’d like to see on the graph. These include:

Runs - this setting can be toggled to display or hide the inclusion of program run information on the graph. When enabled this displays, just below the mini-map, a set of dark grey markers that indicate when each program run starts and ends. On the graph itself, a bar at the top of each IPU ‘lane’ shows the names of each program that runs. See Defining program names , below.
Flat & Flame - these settings toggle between a ‘flat’ view, where all the program steps are compressed into a single ‘lane’ on the graph, or a ‘flame’ view, where the call structure of all steps is displayed. Note that you can also control how overlapping steps are displayed with the Separate overlapping steps control, in the View Options control.
BSP - whether to include a graphical depiction of the BSP activity, showing where the patterns of the IPUs’ internal Sync, Compute and Exchange steps occur.

The current combination of settings is displayed on the drop-down button itself.

Defining program names

You can specify program names in your programs by using the Poplar or PopART APIs. The Execution Trace graph can display this information by enabling the Runs option in the Execution View drop-down control above the graph.

The Poplar Engine::run API now takes a debug string for the name of the run.
The PopART Session::run API allows you to specify a string for the name of the run, as well as additional strings for internal program runs, for example: WeightsToHost .

If you enable the display of runs in the graph, but no run name was provided for a run, a sequentially numbered default name is generated, for example: Engine::run #5 .

Note

As well as the execution parameters displayed in the Summary report, you can also display run-specific execution parameters, as described next.

Run-specific execution parameters

Poplar has the ability to tune runtime-only parameters for a specific run, which may modify its behaviour (including, but not limited to, how fast it executes). You can view these parameters on the Summary report, but also view them for a specific run:

With the execution trace displayed, click on one of the named runs displayed at the top of the trace, and then click on the Details tab to see the parameters used to compile the programs for that run. If no name was supplied for a run, it will be called, for example, Engine::run #1 .

Filtering steps

You can concentrate on the steps you’re interested in by filtering on a particular search term.

Enter a term in the search box at the top, then press the Return key. Steps that don’t match in the execution trace graph are ‘greyed-out’.
Cycle through the matching steps by pressing the Return key repeatedly, or clicking the arrow keys below the search box. The single arrows move backwards and forwards through the matching steps, one at a time, and the double arrows move ten at a time.
To cancel the filtering, empty the search box by clicking on the small ‘x’ at its right-hand end.

Summary tab

This tab provides an overview of the portion of the execution trace currently displayed in the graph in the top half of the page.

Statistics

The statistics displayed are:

Cycles - the number of cycles in the visible section of the execution trace. Note that a warning is displayed if this number overflows the 32-bit memory limit.
Rx / Tx - for StreamCopy and GlobalExchange steps, the amount of data transmitted and received during the operation.

Cycle proportions

A bar is displayed for each IPU that shows a graphical representation of the proportion of cycles that are taken executing each type of compute set. If you hover your mouse over these bars, you’ll see a key that shows what process each colour represents, as follows:

Internal Sync - the sync process that occurs between each tile on an IPU as part of the BSP process.
External Sync - the sync process that occurs between each IPU as part of the BSP process. External syncs are also used in some host-device communication situations, where the IPUs all need to synchronise with an event outside their boundaries, for example a flow control step in the host program.
OnTileExecute - the vertex compute code executed on each tile.
DoExchange - the tile-to-tile data exchange within an IPU.
GlobalExchange - an IPU-to-IPU data exchange.
StreamCopy - a data exchange between an IPU and the host machine over PCI.

Tile balance

A bar is displayed for each IPU that shows a graphical representation of the percentage of tiles utilised by steps in the current viewport.

It is calculated by averaging the tile balance for all Steps (excluding Syncs) that were executed on the IPU. In previous versions of the Graph Analyser this value was derived from the percentage of cycles executed by a step, weighted by the number of tiles the step used.

New BSP statistics

The standard execution trace statistics are calculated by taking into account which steps are currently in view and averaging values across those steps, weighting by the length of the step. This does not take into account that the tiles on each IPU do not in general all execute the same step at the same time.

The new BSP statistics are calculated based on BSP data and take into account what every tile is executing at every cycle. This can be slower to calculate but gives more accurate information about the cycle window currently in view.

When the ‘New BSP statistics’ option is enabled, the Tile Balance is replaced with Tile Utilisation . This measures the proportion of tile-cycles currently in view which are spent on processing or transfer steps, e.g. OnTileExecute, DoExchange, GlobalExchange, StreamCopyMid, as opposed to idle states like Sync, SyncAns, and StreamCopyBegin.

Details tab

When an program step is selected in the flat graph, or a layer is selected in the flame graph, a list of program steps, with further details, is shown here. Many of the details are the same across the different types:

Cycles - the number of cycles on active tiles that the program step used to execute,
Active Tiles - the number of tiles involved in executing that program step,
All Cycles - the number of cycles on all tiles, with additional statistics.
Tile Balance - a measure of how efficiently the program step is spread across the tiles. See View Options for more details.
Active Tile Balance - this is a recalculation of the tile balance measurement above, but excluding those tiles that do nothing.

Note

A warning is displayed if any of the cycle counts overflow the 32-bit memory limit.

Internal Sync

This is a sync process between tiles on an IPU.

External Sync

This is a sync process between IPUs.

SyncAns

This is an internal Automatic, Non-participatory Sync process. A tile can pre-acknowledge a number of internal/external syncs using the ‘sans’ instruction. The Sync ANS instruction will wait until all those pre-acknowledged syncs actually happen.

OnTileExecute

This is a piece of vertex code being executed in a tile. In addition to the common information listed above, the following is displayed:

By Vertex Type - this shows what vertices are involved in the process execution.

Below these details, an interactive graph plot is displayed that shows how the selected program step makes use of cycles on each tile as it executes. For DoExchange programs, there is also a graph of the data received and transmitted by the program during its execution.

DoExchange

This is an exchange process, where data is exchanged between IPU tiles. In addition to the common information listed above, the following is displayed:

Total Data - the total amount of data transferred during the exchange,
Data Transmitted - the amount of data transmitted during the exchange,
Data Received - the amount of data received during the exchange,
Data Balance - the mean amount of data exchanged divided by the maximum amount of data exchanged,
Exchange Code - how large the variable is that holds the code for performing the exchange,
Source Variables - a truncated list of the variables from which data was sent in the exchange,
Destination Variables - a truncated list of the variables to which data was sent in the exchange.

GlobalExchange operations

GlobalExchange is the process by which data is exchanged between IPUs. In addition to the common information listed above, the following is displayed:

Total Data - the total amount of data transferred during the exchange,
Data Balance - the mean amount of data exchanged divided by the maximum amount of data exchanged,
Source Variables - a truncated list of the variables from which data was sent in the exchange (with temporary variables given basic integer names),
Destination Variables - a truncated list of the variables to which data was sent in the exchange (with temporary variables given basic integer names).

A tile’s physical location on an IPU, and how far it is away from the main exchange block, determines how quickly data can be moved between it and other tiles. Also, the highest-numbered tiles on an IPU are linked back directly to the lowest-number tiles in a ring-type topology. The combination of these two factors is what generates the typically triangular and curved shapes seen in these exchange graphs.

StreamCopy

This process copies data between tensors and streams, allowing data to be transferred between the IPUs and the host machine over PCI. The execution trace shows these program steps as three separate phases, StreamCopyBegin, the Copy itself (StreamCopyMid), and StreamCopyEnd. StreamCopyMid is further divided into Host, RemoteBuffer and Mixed categories to show the direction of data flow.

In addition to the common information listed above, the following is displayed:

Total Data - the total amount of data transferred during the exchange,
Data Balance - the mean amount of data exchanged divided by the maximum amount of data exchanged,
Copies from host - how many copy instructions transferred data from the host machine,
Copies to host - how many copy instructions transferred data to the host machine.

2.12.4. Lightweight Profiling

Lightweight profiling allows you to choose which steps in your program you want to profile, instead of profiling everything (which is the default when using Poplar’s fine-grained instrumentation). Lightweight profiling adds less overhead to your program when running, and also makes for smaller report profiles for PopVision to open.

Note

See our Lightweight Profiling tutorial in our GitHub tutorials repository .

The Lightweight Profiling graph

The LWP report shows a graph indicating the when LWP Blocks executed and how long they took, in cycles. Blocks are shown for each tile that was profiled, and the tiles are grouped by IPU. Blocks can overlap, in which case they will stack up as a flame graph. There may also be periods of time which are not accounted for by any Block - these periods will appear as empty gaps on the graph.

Move around the main report graph using your mouse, as described above.
Hover your mouse pointer over a block to see a tooltip containing the block name and how many cycles that block executed for.

Lightweight Profiling block types

Several different block types are seen in the lightweight profiling graph.

Common

These blocks correspond to the portions of a Poplar application which are instrumented. For each, we know the precise cycle count when it began and finished. You can manually add Block programs when using Poplar, or use the Poplar engine options to automatically add Blocks.

Stream Copy

Stream Copies copy data between tensors and streams, allowing data to be transferred between the IPUs and the host machine over PCI. The execution trace shows these program steps as three separate phases, StreamCopyBegin, the Copy itself (StreamCopyMid), and StreamCopyEnd. StreamCopyMid is further divided into Host, RemoteBuffer and Mixed categories to show the direction of data flow.

Buffer Flush (or Block Flush)

On profiled tiles, a small buffer is used to collect the start and finish times of blocks. This buffer needs to be periodically flushed otherwise it will overflow. Poplar automatically determines appropriate times to flush the buffer and inserts buffer flush operations. When the Show Buffer Flushes option is enabled, these operations can be seen as blocks on the LWP graph.

Overflow

If the LWP profiling buffer is not flushed regularly enough it can overflow. When this occurs, no more data is recorded until the next flush occurs. The time between the buffer overflowing and the following buffer flush is referred to as an overflow region. Any block which both begins and ends in an overflow region will not be recorded and is not shown on the LWP graph. For blocks which start before an overflow but finish in the overflow region, the end time of the block will be unknown. For blocks which start in an overflow region but finish after the buffer flush, both the start time and the type of the block are unknown - these are shown as “Unknown” blocks

Unknown

As described above, if a block starts in an overflow region then the type of the block is not recorded. We still know the block exists because we record its finish, so the block is displayed as an “Unknown” block.

2.13. Application preferences

To display the Preferences dialog, select ‘Preferences’ from the menu, or press the Ctrl / Cmd + , keys. As well as the settings displayed, the view options for the various reports are also saved.

You can reset your preferences at any time by selecting ‘Reset Preferences’ from the Help menu.

2.13.1. Setting the colour theme

The PopVision™ Graph Analyser supports light and dark colour themes, and you can select a preference here. There are three options:

Auto - this is the default setting, and allows the application to follow your machine’s system-wide theme setting for light or dark mode. If the PopVision™ Graph Analyser application detects a change in your operating system theme, it automatically switches to the corresponding mode in application.
Light - this forces the PopVision™ Graph Analyser application into light mode, irrespective of your machine’s theme settings.
Dark - this forces the PopVision™ Graph Analyser application into dark mode, irrespective of your machine’s theme settings.

2.13.2. SSH preferences

You can store your SSH preferences in the Preferences dialog to allow authorisation when opening reports on remote machines. There are two settings you can enter here:

SSH private key path - enter the file path of your your machine’s private SSH key here. This filepath will be used to authenticate you on remote machines during the connection process. The default path is <home>/.ssh/id_rsa/ , where <home> denotes your home directory in your operating system.
SSH agent mode - this dropdown-list allows you to choose whether you want to specify an ssh-agent socket path, and, if so, how you want to do so:
- Disabled - do not use an ssh-agent socket (the default)
- Manually specify - enter file path to the ssh-agent socket in the field that appears below this option.
- Automatically obtain from environment - obtain the ssh-agent path from an environment variable.

2.13.4. Scroll behaviour

Use this preference to set the default behaviour for your mouse’s scroll wheel (or using two-finger drag on on a laptop trackpad). You can choose either:

‘Scroll by default’, where the mouse wheel will scroll the window content up and down. Holding down the Ctrl key while using the scroll wheel then zooms the window content in and out.
‘Zoom by default’, where the mouse wheel zooms the window content in and out. Holding down the Ctrl key while using the scroll wheel then scrolls the window content up and down.

2.13.5. Show help links

By default, ‘Help links’ are enabled for the application. These links appear in tooltips when you hover over various parts of the reports, including buttons, fields and tabs. Hovering your mouse over them for a second presents you with a tooltip, which contains a link to the section of the help documentation that corresponds to that part of the report. Click on the link to open the documentation at that place.

You can disable the help links, and prevent them appearing, by switching off this option here, in the Preferences.

2.13.6. Quit after last window is closed

This preferences control whether the Mac version of the application quits the program after the last window is closed.

2.13.7. Experimental features

Each version of the PopVision™ Graph Analyser contains some experimental features that are hidden by default. These features are not fully release-capable, and will have limited support and may change or be removed in future. You can enable them here, by toggling the button next to this option.

2.13.8. Show graph stats

You can display (or hide) statistics for the Memory and Liveness reports. They appear in the top right-hand corner of the graph and show the Average, Minimum, Maximum and Standard Deviation of the memory usage across the selected tiles, for each data set plotted.

You can move this statistics box anywhere in your graph by dragging its title bar at the top.

2.13.9. Stack graph values

When this option is turned on, the values shown in the tooltips on the memory graph are displayed ‘stacked’.
When turned off, the individual values are shown without stacking.

When this option is turned on, “(stacked)” is displayed in the tooltip.

2.13.10. Send telemetry

When you first install one of the tools, you will be asked for your consent for Graphcore to collect telemetry information that helps Graphcore improve the application and monitor performance. Your response to this dialog is stored in your preferences, and you can turn telemetry on or off with this option.

See this section for full details.

2.13.11. Software update

If this preference is switched on, the Graph Analyser will periodically check online to see if there is a newer version of itself to download and install. Note that you also have the opportunity to set this preference on the EULA splash-screen, when it first appears.

If a new update is found, you can either download and install it now (a restart of the app will install it), or you can delay it until later. You can also choose to check manually by selecting ‘Check For Updates…’ from the application/File menu.

Sometimes, network issues may cause a download to take too long to finish, in which case a dialog is diaplayed that allows you to cancel the current download, and retry later on.

2.14. FAQs

This section contains a set of frequently asked questions about capturing and understanding reports in the PopVision™ Graph Analyser.

2.14.1. Not-always-live memory discrepancy

Question : Why does the tile memory differ on the Memory and Liveness reports? If you open a Memory report, and select the ‘By Liveness’ breakdown option from the drop-down menu, and then select a particular tile, you can see its memory consumption plotted. If you then find that same tile in the Liveness report, you may notice that its memory consumption is lower. Why does this happen?

Answer : The ‘Not-Always-Live’ plot on the Memory-Liveness report actually shows the maximum memory of the not-always-live variables, which can be lower than the actual tile memory required. Because memory is statically allocated on the tile, and the allocating algorithm isn’t perfect, this could be less than the actual amount of memory required to store your program.

As an example, suppose you have two variables A and B, both 1 byte, but B needs to be stored in interleaved memory. If you have a program like this:

Write(A)
 Read(A)
 Write(B)
 Read(B)

then the two variables are not live at the same time, so in theory could be overlapped, but because of the additional constraints they aren’t. In this case the maximum not-always-live bytes is 1 byte, but they memory required (excluding gaps) is 2 bytes.

2.14.2. Using the Graph Analyser over X-Forwarding on MacOS

Question : How can I use the Graph Analyser over X-Forwarding on MacOS?

Answer : To view the Graph Analyser over X-forwarding on MacOS, follow these steps:

On your MacOS machine, download and install XQuartz from https://www.xquartz.org/.
Start the XQuartz app, and start a terminal session from within it.
In the terminal, enter ssh -X [username]@[host] , supplying the username and host for the remote machine.
In the SSH session, run the following commands (assuming you want to use v3.7.2):

$ wget https://github.com/graphcore/popvision_graph_analyser/releases/download/v3.7.2/popvision-graph-analyser-3.7.2.AppImage
$ chmod +x ./popvision-graph-analyser-3.7.2.AppImage
$ ./popvision-graph-analyser-3.7.2.AppImage

The Graph Analyser application should then start up and work normally over X.

2.14.3. How can I reduce the size of my profile report files?

To reduce the size of your profile files, see the Profile troubleshooting section above, where you can find tips on reducing instrumentation levels, reducing batch size, and reducing the number of steps being instrumented.

2.15. Glossary

2.15.1. Architecture

For more information, see the Poplar and PopLibs User Guide .

2.15.2. BSP 

2.15.3. Bulk-synchronous parallel

For more information, see the IPU Programmer’s Guide .

2.15.4. Codelet 

2.15.5. Compute set 

For more information, see the IPU Programmer’s Guide .

2.15.6. Debug context

For more information, see the Poplar and PopLibs API Reference .

2.15.7. Edge 

2.15.8. Exchange 

External exchange 

Global exchange 

Host exchange 

Inter-IPU exchange 

2.15.9. Flame graph

A visualisation of hierarchical data that is often used to show sampled stack traces of a program that has been profiled.

2.15.10. IPU 

2.15.11. Liveness

For more information, see the Memory and Performance Optimisation on the IPU technical note.

Always-live

An always-live variable is allocated an exclusive memory region for the entire lifetime of the graph execution, such as vertex code.

Not-always-live

It is not necessary to keep the content of not-always-live variables in memory throughout execution. Therefore, two variables that are not live at the same time can be allocated to the same location.

2.15.12. Lowering

2.15.13. Lowered variable

To execute an application on the IPU, variables may need to be divided between tiles to satisfy memory constraints. Poplar maps global user-defined variables onto tiles by a process that is called lowering. During this process, Poplar typically translates each of these unlowered variables into a set of lowered variables which have specific tile allocations.

2.15.14. Memory

For more information, see the Poplar and PopLibs User Guide .

Memory bank

An area of memory that allows a single concurrent access only. A memory bank is 32 and 16 KB in the Mk1 and Mk2 Colossus, respectively. Note that addresses are only contiguous in non-interleaved memory.

Memory element

In interleaved memory, an element is a pair of banks that allows access to 128 bits. In non-interleaved memory, it is a single bank that allows access to 64 bits.

Memory interference 

Memory region

A tile’s memory is organised as two regions, each made up of banks. Concurrent accesses can be made to addresses in different banks.

Non-interleaved memory

Instructions can only be fetched from non-interleaved memory.

Interleaved memory

Interleaving allows for two 64-bit-aligned addresses to be accessed simultaneously.

Overflowed memory

If the memory required by an application exceeds the maximum available on a tile, part of it is overflowed.

Out-of-memory (OOM)

An application that is out of memory needs more than is available on one or more tiles and cannot be executed.

2.15.15. Pipelining 

For more information, see the IPU Programmer’s Guide .

2.15.16. PopART 

2.15.17. Poplar 

2.15.18. PopLibs 

2.15.19. Replication 

For more information, see the Poplar and PopLibs User Guide and IPU Programmer’s Guide .

2.15.20. Sync 

External sync

Synchronisation between IPUs.

Internal sync

Synchronisation between all of the tiles on a single IPU.

2.15.21. Tile 

Active tile

A tile is active in the context of a program step if it is involved in executing that step.

Tile balance

A measure of the fraction of tiles that are involved in executing a program step.

Unlowered variable

See lowering .

2.15.22. Vertex 

For more information, see the Poplar and PopLibs User Guide .

Vertex field

For more information, see the Poplar and PopLibs User Guide .

Vertex instance

An instance of a class that defines a vertex type.

Vertex source

A vertex is an instance of a class that can be written in C++ or assembly.

For more information, see the Poplar and PopLibs User Guide .

Vertex state

For more information, see the Poplar and PopLibs User Guide .

Vertex type

A class that defines a vertex in Poplar.

For more information, see the Poplar and PopLibs User Guide .

2.16. Release notes

To see what’s changed in the PopVision™ Graph Analyser application, select ‘Release Notes’ from the Help menu, or click the “What’s new since the last release” link on the landing page.

You can toggle the Release Notes dialog between modal and full-screen view by clicking on the icon to the left of the window’s title.

2.17. Licensing information

Licensing information about the PopVision™ Graph Analyser is available to read by selecting ‘License’ from the Help menu. It contains an end-user agreement, copyright and trademark information, and license information about third-party software used in the application.

This information can also be found in the Installation README file, which you can find on the Graphcore Support site .

You can toggle the License dialog between modal and full-screen view by clicking on the icon to the left of the window’s title.

2.18. Data we collect

Graphcore’s PopVision™ tools collect data from you about the way in which you use them. The data we collect depends on how you interact with the tools and may change between releases. This data helps us to develop and improve the tools.

We do not obtain any personal data when you use the PopVision™ tools. On installation, we randomly generate a unique identifier to link your interactions with one of the tools together. This identifier is stored with your preferences for the tool and can be seen at any time in the About dialog. We also randomly generate an identifier each time you open the tool to distinguish between sessions of usage.

Note

We do not collect any data about the reports or models you analyse, such as the names of variables and events.

Pan Left	`A`
Pan Right	`D`
Zoom in	`W`
Zoom out	`S`

2. User guide

2.1. Overview

2.1.1. End User License Agreement

2.1.2. Telemetry consent

2.1.3. About the IPU

2.2. Capturing IPU reports

2.2.1. Unsupported file versions

2.2.2. Profiling Overhead

Compilation

Execution

2.2.3. Reloading reports

2.2.4. Profile troubleshooting

Reducing the size of profile reports

Missing or corrupted report files

Compilation fails with OOM

2.2.5. Poplar report files

Binary archive (‘archive.a’)

Poplar Profile (‘profile.pop’)

Lowered Vars Information

Serialized Computation graph (‘serialized_graph.capnp’)

Frameworks Information (‘framework.json’ & ‘app.json’)

Debug Information (‘debug.cbor’)

2.2.6. Using TensorFlow

2.2.7. Using PopART

2.2.8. Using PyTorch

2.3. Opening reports

2.3.1. Opening recent reports

2.3.2. Opening the Demo Report

2.3.3. Comparing reports

2.3.4. Local reports

Opening local reports

2.3.5. Remote reports

Opening a remote report

Connection errors

2.4. Viewing reports

2.4.1. Using the side menu

2.4.2. Adjusting report size

2.4.3. Navigating report graphs

Using the Navigation panel

Keyboard graph navigation

Graph data keys

2.4.4. Saving report images to disk

2.5. Viewing a Summary Report 

2.5.1. Program Information

Target

Graph

2.5.2. Engine options

2.5.3. Framework and Application JSON files

2.5.4. Report files

2.6. Viewing an Insights Report 

2.6.1. Memory insights

2.6.2. Vertex and exchange sizes

2.6.3. Tips on reducing memory usage

2.7. Viewing a Memory Report 

2.7.1. Navigating memory reports

Selecting individual tiles/IPUs

Memory Report view options

2.7.2. Total Memory graph

Memory Report breakdown

Breakdown by Region

Breakdown by category

Breakdown by liveness

2.7.3. Variables Memory graph

2.7.4. Tile map Memory graph

Changing the colour scale

2.7.5. Tile memory usage

Tile memory usage: Details tab

Excluding Gaps

Tile memory usage: Compute Sets tab

Tile memory usage: Vertices tab

Tile memory usage: Exchanges tab

Tile memory usage: Variables tab

Memory interference

Variable types

Known variables

Plot multiple variables

Full-screen option

Toggle between table and graph view

Show differences between selected tiles

Show base address offset