2. User guide

2.1. Overview

The PopVision™ Graph Analyser application is used to analyse the programs built for and executed on Graphcore’s IPU systems. It can be used for analysing and optimising the memory use and performance of programs.

The PopVision Graph Analyser generates the following reports:

image9

Summary Report of the IPU hardware, graph parameters and host configuration.

image10

Insights Report , which gives you a quick overview of the memory usage of your model on the IPU, showing the tiles, vertices and exchanges that use the most memory. Graphical insights and guides for improving memory usage are also displayed, helping you to optimise memory usage for your model.

image11

Memory Report , which gives a detailed analysis of memory usage across all the tiles in your IPU system, showing graphs of total memory and liveness data, and details of variable types, placement and size.

image12

Liveness Report , which gives a detailed breakdown of the state of the variables at each step in your program.

image13

Program Tree , which shows a hierarchical view of the steps in the program.

image14

Operations Summary , which shows a summary of all the operations for a software layer in your model, displaying statistics about code size, execution cycles, debug data and FLOPs measurements.

image15

Operations graph , which displays the High Level Operations (HLO) graph for TensorFlow programs, allowing you to drill down through modules and see details of HLOs.

image16

Execution Trace , which shows how many cycles each step of your instrumented program consumes.

Each of these reports is described in further detail in the sections below.

2.1.1. End User License Agreement

Before you can use the Graph Analyser, you must first agree to the End User License Agreement (EULA) that is displayed when the app is first opened (assuming that you have not already agreed to it in a previous version). Clicking on the Disagree button will quit the application immediately.

You can re-read the EULA at any time after you’ve agreed to it. Select View EULA from the Help menu.

  • You can toggle the EULA dialog between modal and full-screen view by clicking on the icon to the left of the window’s title.

2.1.3. About the IPU

An in-depth description of the IPU hardware is available in the online IPU Programmer’s Guide . While we describe some of the relevant features of the IPU in this document, you should refer to the Poplar User Guide for a more in-depth understanding.

2.2. Capturing IPU reports

This section describes how to generate the files that the Graph Analyser can analyse. The Graph Analyser uses report files generated during compilation and execution by the Poplar SDK.

Note

When you first open the application, there is a link on the opening page to a Getting Started with PopVision video.

The sections below describe the files supported by the Graph Analyser. These files can be created using the POPLAR_ENGINE_OPTIONS environment variable or the Poplar API. At minimum, you need either the archive.a or the profile.pop files for the Graph Analyser to display its reports.

Note

As of Poplar SDK 1.2, you only need to set POPLAR_ENGINE_OPTIONS='{"autoReport.all":"true"}' before you run a program. By default this will enable instrumentation and capture all the required reports to the current working directory. For more information, please read the description of the Poplar Engine options in the Poplar and PopLibs API Reference .

By default, report files are output to the current working directory. You can specify a different output directory with "autoReport.directory" , for example:

POPLAR_ENGINE_OPTIONS='{"autoReport.all":"true", "autoReport.directory":"./tommyFlowers"}'

Note

If you have an application that has multiple Poplar programs (for example, if you build and run a training and validation model), then a subdirectory with the Engine name will be created in "autoReport.directory" in which the profile information will be written. This allows users of the Poplar API to make sure reports are written to different locations. (If no name is provided then the profile information will continue to be written in "autoReport.directory" ). Further information can be found in the Using TensorFlow , Using PopART and Using PyTorch sections.

Note

If you are profiling cached executables using PopART, you must use popart::SessionOptions to provide a directory name for your reports. Using "autoReport.directory" in POPLAR_ENGINE_OPTIONS will not work.

2.2.1. Unsupported file versions

As of Graph Analyser version 3.7, support for the old JSON and CBOR graph profile formats has been removed. This means that the following files that were generated before Poplar SDK 2.0 can no longer be read:

  • graph.json

  • graph.cbor

  • execution.json

  • execution.cbor

  • profile_info.json

At minimum, you need either the profile.pop or archive.a files present for the Graph Analyser to display its reports. If neither of these are found, you will see a No graph profile found warning when trying to open a report.

2.2.2. Profiling Overhead

Profiling a model may add a memory and computation overhead to its compilation and execution phases. Typically, the highest performance impact is due to the execution overhead.

If you are not interested in the execution trace , it is best to deactivate it by setting "autoReport.outputExecutionProfile":"false" or "debug.instrument":"false" . This will implicitly disable "debug.instrumentControlFlow" , debug.instrumentExternalExchange , and debug.instrumentCompute (unless you explicitly enable them). Note that a profile can omit its execution part but not its compilation part. In other words, setting "autoReport.outputExecutionProfile" to true will automatically also set "autoReport.outputGraphProfile" to true.

The following sections show which options can be used to reduce the overhead in the compilation and execution profiler parts.

Compilation

During compilation, the profiler generates the graph profile (also known as the memory profile). This profile contains information that Poplar knows or estimates at compilation time, such as the programs that form the model and its variables. The contents of the graph profile are sufficient to analyse memory issues.

The environment variable POPLAR_PROFILER_LOG_LEVEL can be set to generate a log of the steps performed by the profiler during compilation and detect any possible time overhead.

Poplar Engine options can be used to include or exclude the profiling of certain information. This will reduce the time taken to create the profile and the size of the generated files. Please refer to Report Files for a description of the files.

The option POPLAR_ENGINE_OPTIONS='{"autoReport.all":"true"}' outputs a full report. If you wish to exclude a part of it, set this option and explicitly disable the undesired information. For example: POPLAR_ENGINE_OPTIONS='{"autoReport.all":"true", "autoReport.outputArchive":"false"}'

Similarly, if you wish to include only a certain part of the report, just set the specific "autoReport" option. For example: POPLAR_ENGINE_OPTIONS='{"autoReport.outputArchive":"true"}'

Options to tune the graph profile (in order of greatest expected impact):

  • "autoReport.outputLoweredVars":"false" : This option can be useful if profile.pop is too large. However, this will deactivate some functionality in the Graph Analyser such as the variables memory graph . Note that excluding lowered variables from the report will not speed up visualisation by the Graph Analyser.

  • "autoReport.outputDebugInfo":"false" : This option can be useful if debug.cbor is too large. However, this will deactivate some functionalities in the Graph Analyser such as the operations graph . Note that no meaningful speed-up is expected in the rest of Graph Analyser functionalities.

  • "autoReport.outputSerializedGraph":"false" : This option is false by default as the generated file can be large. This part of the profile is only needed to enable the computational graph in the Graph Analyser.

  • "autoReport.outputArchive":"false" : This option can be set to avoid generating archive.a . If it is set to false, you must generate profile.pop and some minor functionalities of the Memory Report will be disabled in the Graph Analyser.

Execution

Profiling the execution means measuring and recording the cycles spent on each of the programs of the model. The result can be visualised in the Graph Analyser execution trace . However, this instrumentation can lead to the following main overheads:

IPU memory overhead

Some memory in the IPU will be devoted to store program cycles and branch records. More memory will be used by the instrumentation code itself, although it is usually negligible.

You can set "debug.computeInstrumentationLevel":"ipu" to reduce the memory needed for program cycles. In this mode, only one tile ( debug.profilingTile ) will record cycles. The drawback is that per-tile cycles - the BSP trace - will not be available in the Graph Analyser. Another inconvenience is that this instrumentation level may slightly disrupt the normal execution of the model. This is because some artificial synchronisations may be introduced in order to measure the cycles of the longest-running tile.

Regarding branch records, you cannot reduce the memory needed to store them but you can pick which tile will keep them. Thus, by using debug.branchRecordTile you can pick a tile with low memory pressure. Note that the last tile in the IPU is selected by default and that is usually a good decision. Also, branch recording may introduce artificial synchronisation points to flush the records to the host. This can disrupt the normal execution, especially for pipelined models with a high number of conditional branches, such as If programs.

Because of all these extra memory requirements, a model with high memory consumption may go out of memory when profiling is enabled. Depending on the model, you can adjust its parameters to leave space for the instrumentation. For example, you can try decreasing the batch size. In TensorFlow BERT you can adjust the micro batch-size .

Host computing overhead

Poplar processes the cycle measurements after each run to create a trace that can be visualised in the Graph Analyser. This can take a considerable amount of time if the run executed many programs. This overhead may reveal itself in the Graph Analyser if the execution took multiple runs. At the beginning of each run, the IPU waits for the host in a StreamCopyBegin program. After the first run, the host may be busy processing the cycles measured in the previous run. This causes a large StreamCopyBegin as the IPU waits for the host to finish this processing. Because of this overhead, measuring throughput of a profiled model is highly discouraged.

To reduce this overhead you can reduce the number of programs profiled. By default, only the first two runs of the execution are captured. This can be increased or decreased by setting "autoReport.executionProfileProgramRunCount" as follows:

POPLAR_ENGINE_OPTIONS='{"autoReport.all":"true", "autoReport.executionProfileProgramRunCount":"10"}'

It is essential that you also try to reduce the iterations on each run. For instance, by reducing the number of steps or the number of batches per step you can get a lighter execution profile. This will not only reduce the host computation overhead but will also speed up visualisation in the Graph Analyser. The public examples contain some hints on how to reduce an execution to be profiled, for instance, TensorFlow BERT .

Finally, the report size of multi-replica executions can be reduced by focusing on a single replica. The user can select a replica with profiler.replicaToProfile , for example:

POPLAR_ENGINE_OPTIONS='{"autoReport.all":"true", "profiler.replicaToProfile":"0"}'

2.2.3. Reloading reports

The folder (or folders, if you’re comparing reports) that contain the individual report files are monitored by the Graph Analyser in case any of the files change, for example, if you’ve re-run your Poplar program and re-generated new versions of the reports.

If the application detects that any of the files have changed, a dialog opens listing the files that have changed, and prompting you to reload the report files.

Note

Make sure that your Poplar program has finished executing (in particular that the profile.pop file has been completely written to disk before clicking on the Reload button), otherwise you may see inconsistent information displayed in the application.

2.2.4. Profile troubleshooting

Very occasionally, the Graph Analyser may not be able to open a profile. This section suggests some probable causes and some suggestions as to how to remedy the issue.

Reducing the size of profile reports

Large models and programs with many iterations within them can generate large reports which can take a while to process and display in the PopVision tools. You can reduce the size of the profiles generated when instrumenting your IPU programs by:

  • Adjusting the number of steps being profiled

  • Reducing the number of batches per step

  • Changing the instrumentation level

  • Changing the branch record tile

  • Selecting a single replica

  • Reducing the gradient accumulation factor (if you’re using it) to reduce the size of a single Engine run.

Note

There are some additional suggestions for reducing profile size in the Profiling overhead section.

Missing or corrupted report files

Sometimes the Python script running the training or inference program exits too early, through some other fault, and the profile isn’t written correctly.

  • Set up POPLAR_PROFILER_LOG_LEVEL to get more information about your script’s execution.

  • The profiles are written in SQLite format. Check that they open in an SQLite client, and also by using libpva .

Compilation fails with OOM

Sometimes execution may be prevented if there is not sufficient memory to execute the program. Here are some actions you can take to reduce memory usage in your model.

  • Reduce your model size. This will reduce the number of paramater variables that need to be stored in the IPU memory.

  • Only use the Memory Report, not the Execution Trace Report.

  • Change your instrumentation level so that you are storing less information.

  • Change the branch record tile.

Note

Additional ways to optimise your memory and throughput are described in the Insights report section.

2.2.5. Poplar report files

Note

The Graph Analyser only supports fixed names for each of the report files. If you save them with different names they will not be opened. When you are browsing directories to open reports, the Graph Analyser will highlight which of the following files are present in that directory.

Binary archive (‘archive.a’)

This is an archive of ELF executable files, one for each tile. With this, you can see the total memory usage for each tile in the Memory Report .

Poplar Engine Options

POPLAR_ENGINE_OPTIONS='{"autoReport.outputArchive":"true"}'

Using Poplar API

Set the Poplar Engine option “autoReport.outputArchive” to true

Poplar profile (‘profile.pop’)

This file contains compile-time and execution information about the Poplar graph. This file is used to show memory , liveness and program tree views and also the execution trace view.

Poplar Engine Options

POPLAR_ENGINE_OPTIONS='{"autoReport.outputGraphProfile":"true"}' and/or POPLAR_ENGINE_OPTIONS='{"autoReport.outputExecutionProfile":"true"}'

Using Poplar API

Set the Poplar Engine options “autoReport.outputGraphProfile” to true and/or “autoReport.outputExecutionProfile” to true

Lowered variables

Poplar can generate information about lowered variables, which contains details about the allocation of variables on each tile, and is used to generate the variable layout in the Memory Report . IPU memory is statically allocated and this file contains the size, location, name and other details about every variable on every tile.

This information is not generated by default, as the output can be quite large, and not useful to some users. However, there are Poplar Engine options to collect the data and save it either into the profile.pop file, or as a stand-alone file.

Poplar Engine Options

When you use POPLAR_ENGINE_OPTIONS='{"autoReport.all":"true"}' the information about lowered variables will be captured in the profile.pop file. You can switch that functionality on separately with: POPLAR_ENGINE_OPTIONS='{"autoReport.outputLoweredVars":"true"}'

Using Poplar API

To capture the lowered variables data in a separate file (and not write it into the profile.pop file), use, for example,: POPLAR_ENGINE_OPTIONS='{"debug.loweredVarDumpFile":"vars.capnp"}'

Serialized computational graph (‘serialized_graph.capnp’)

This file contains a copy of the Poplar graph before compilation, including details on all of the compute sets, vertices, variables and edges (connections from vertices to variables). "autoReport.all" does not output this file by default, as it can be quite large for complex models, but you can enable it using the options below.

Poplar Engine Options

POPLAR_ENGINE_OPTIONS='{"autoReport.outputSerializedGraph":"true"}'

Using Poplar API

To save the serialised graph you need to use the Poplar API Graph::serialize .

Frameworks information (‘framework.json’ & ‘app.json’)

You can use Poplar to create two more “custom” files into which you can put your own data from a framework or your application. See the Framework and application JSON files section for more details

Debug information (‘debug.cbor’)

This file contains additional debug information collected from the Poplar software. You can use this information to understand the source of variables, Poplar programs and compute sets. The debug information is viewable in the Liveness report and the Program Tree.

Poplar Engine Options

Automatically, when using POPLAR_ENGINE_OPTIONS='{"autoReport.enable":"true"}' and manually using {"autoReport.outputDebugInfo":"true"}

Using Poplar API

Automatically created

Note

Collecting the enhanced debug information will not increase the memory footprint of your IPU application. The enhanced debug information is generated and streamed as the model is compiled.

See the two Debug information sections in the Liveness Report and the Program Tree for details of what’s included in the debug information, and where to find it in the Graph Analyser reports.

2.2.6. Using TensorFlow

If you use TensorFlow, the separate reports for each Poplar program compiled and executed will be placed in a subdirectory of "autoReport.directory" that contains the date/time (ISO foramt) and process ID in its name.

The debug.cbor file will be placed in the directory specified with "autoReport.directory" and symbolic links are created in the subdirectories.

The cluster name can now be found in details loaded from the framework.json file.

For more details, please see the guides Targeting the IPU from TensorFlow 2 and Targeting the IPU from TensorFlow 1 .

2.2.7. Using PopART

For PopART, the name of the Poplar Engine is by default set to “inference” or “training” depending on if you are using the InferenceSession or TrainingSession classes, respectively. You also have the option of providing your own Poplar Engine name when creating the session.

training_session = popart.TrainingSession(fnModel=builder.getModelProto(),
   ...
   deviceInfo=device,
   name="tommyFlowers")

The profile.pop file will be written out to:

autoReport.directory/tommyFlowers

Note

If your application has two inference sessions, by default the second will overwrite the first.

For more details please see the PopART User Guide .

Note

If you are profiling cached executables using PopART, you must use popart::SessionOptions to provide a directory name for your reports. Using "autoReport.directory" in POPLAR_ENGINE_OPTIONS will not work in this case.

2.2.8. Using PyTorch

For PyTorch (which builds on top of PopART), it will also, by default, name the Poplar Engine “inference” or “training” depending on whether you are using the InferenceModel or TrainingModel classes respectively. You also have the option to name the Poplar Engine yourself by specifying it in the Options object:

opts = poptorch.Options()
   opts.modelName("tommyflowers")
   opts.enableProfiling(dirname)

   poptorch_model = poptorch.inferenceModel(model, opts)

In this example, the profile.pop file will be written to the directory:

./tommyFlowers

For more details on setting these options, please see the PyTorch User Guide .

2.3. Opening reports

In order to view reports, the Graph Analyser requires one or more of the files listed in the Capturing Reports section.

You can open report files on your local machine, or from a remote server over SSH.

2.3.1. Opening recent reports

If you’ve used the application before, the Home screen displays a list of recently opened report directories in the Recent panel.

  • Click on a recent report to open it again. That report will automatically move to the top of the Recent list.

  • If you want to remove a report directory from the Recent list, click on the trash icon that appears to the left of a report when you hover the mouse over it.

  • If you attempt to open a report from the Recent list that has been moved or deleted since it was last opened, an error dialog appears indicating that the file can’t be found. You can either replace the file, in which case the corresponding link in the Recent panel will work again, or you can click on the Remove from recent files button in the error dialog to remove that file from the Recent list.

The Recent list can contain up to 64 previous file selections.

2.3.2. Opening the Demo Report

If you don’t have any generated reports, you can open and browse the demo report that’s included with the Graph Analyser. It’s a small report, but contains all of the features that are supported in the current application.

  • Click on the Open demo report link on the Home screen to open the demo report.

2.3.3. Comparing reports

You can open two similar reports at once to compare them by clicking on the Compare reports… link on the Home screen. This presents you with two file selection dialogs that work in exactly the same way as for opening a single report . The information from both reports is then combined on the report pages, allowing you to compare them.

  • When opening a pair of reports to compare, you can click on the magnet icon at the right-hand end of the directory textbox in either file selector. This copies the directory from the file selector on the other side.

2.3.4. Local reports

You can open report files stored on your local machine .

Opening local reports

To open a report stored locally on your machine:

  1. On the Home screen of the Graph Analyser application, click on the Open a report… link in the Open panel. You’ll be presented with a file selection dialog, and the local tab at the top will be selected by default. You’ll see listings of the directories and files on your local machine.

  2. Use this dialog to navigate to the folder in which your report files have been saved. You’ll notice that when the Graph Analyser identifies a directory in which any of the report files listed above are found, those files are listed on the right-hand side. Note that if no profile.pop or archive.a files are present, the Open button will be disabled.

  3. When typing a path into the text box at the top of the dialog, a drop-down list shows the directories within the current directory. You can use the up and down arrows on your keyboard to navigate this list, and choose the next path element by pressing Enter.

  4. You can sort these files by name or modified date, in ascending or descending order, by clicking on the appropriate column header. Your sorting preference is saved.

  5. Once you’ve selected the directory with the necessary report files within it, click on the Open button to load the report data.

  6. The dialog remembers which location tab (local or remote) you selected previously, and selects it automatically the next time it is opened.

  7. You can toggle the Open dialog between modal and full-screen view by clicking on the icon to the left of the window’s title.

The Summary Report is displayed first, and the progress bar along the top of the screen shows the files being pre-processed by the application prior to being loaded and displayed.

Notice that the bottom of the Summary Report shows the relevant files that have been found, and their loading state. More details on these files, and which reports need them, can be found in the Poplar report files section.

2.3.5. Remote reports

If you are using an IPU system on a remote server, for example on a cloud service, any reports generated will be saved to that server. In this case you can open them remotely by specifying the server address, and connecting to the machine over SSH. The report contents are them streamed back to the Graph Analyser application on your local machine, allowing you to view the reports.

Note

When the Graph Analyser opens report files stored remotely, it uploads a small binary app, containing an anaysis engine, to the remote machine. This analysis engine pre-processes the report data and sends it back over SSH to the Graph Analyser application running on your local machine. If you’re running other performance-critical processes on that remote machine, you should be aware of any effects this process may have on the capacity of the remote machine’s hardware to run any other tasks. As server performance varies a great deal, the only way to know how much processor speed the app takes is to try a small sample, and monitor the CPU usage.

If the upload of the analysis engine fails, an appropriate error message is displayed. This may be because, for example you do not have the right permissions to upload to the remote machine, or there may be insufficient disk space.

Opening a remote report

To open a report stored on a remote machine from the Graph Analyser application running on your local machine:

  1. On the Home screen of the Graph Analyser, click on the Open a report… link in the Open panel. This opens the file selection dialog, and the local tab at the top will be selected by default.

  2. Click on the remote tab at the top, and you’ll see a login dialog that allows you to connect to a remote server. Enter your username, and the address of the remote machine. You can enter the port or use the default. Click on the Connect button to connect to the remote machine.

  3. If you have configured your SSH key in the Preferences dialog, then the application will use the key for authentication on the remote server.

  4. If your SSH key requires a passphrase, then you will have to configure an SSH agent in order to use the key.

  5. If you haven’t set a path to your private SSH key in the Preferences dialog nor have you configured an SSH agent , then you will be prompted for a password to log into the remote machine.

  6. Once you’re logged in, you’ll see a file dialog listing the directories and files on the server. You can sort these files by name or modified date, in ascending or descending order, by clicking on the appropriate column header. Your sorting preference is saved.

  7. Navigate to the folder in which your Poplar report files have been saved. You’ll notice that when you select a directory in which Poplar report files are found, the file window lists those files on the right-hand side. Note that if the archive.a or the profile.pop files are not present, the Open button will be disabled, as one of these these files is the minimal requirement for generating a report in the Graph Analyser. See the Report files section for details of how to generate each file, and what it contains.

  8. Once you’ve entered the directory with the necessary report files in it, click on the Open button to load the report.

Note

The SSH connection is constantly checked, and if, for any reason, it goes down, a warning dialog is displayed, letting you know.

The Summary Report is displayed first, and the progress bar along the top of the screen shows the files being loaded into the application, and the report data being analysed and prepared for display.

Notice that the bottom of the Summary Report shows the relevant files that have been found, and their loading state. More details on these files, and which reports need them, can be found in the Poplar report files section.

Configure SSH agent

Note

The Graph Analyser does not currently support encrypted SSH private keys, which are keys that are protected by a passphrase. However it does support SSH agents. If your key is passphrase-protected, you will need to make sure to add it to your SSH agent before the Graph Analyser can use it, by using the ssh-add command-line tool and ensure that the SSH agent mode is set correctly in the Preferences dialog.

To configure the SSH agent from a terminal, run:

# Start the SSH agent in the background.
eval "$(ssh-agent -s)"

# Add your SSH private key to the SSH agent
ssh-add -K ~/.ssh/id_rsa

Then restart the Graph Analyser, open the Preferences dialog and remove the path in SSH private key path . Make sure that SSH agent mode is set to Automatically obtain SSH agent socket path from environment .

Connection errors

A number of errors can occur when connecting to a remote server. This section lists the most common and gives some troubleshooting steps.

Error

Error: getaddrinfo ENOTFOUND server.example.com

This error occurs when the specified server could not be found (DNS lookup failed). Check that you have typed the server’s name correctly. If a VPN connection is required, check that it is connected and working correctly.

Error

Error: Password authentication failed

If the SSH agent and SSH key authentication fail then password authentication is attempted. If you normally use a public key to connect to the server, check that you have correctly specified the key in SSH preferences . Otherwise, check that password authentication is enabled on the server and that you have typed your password correctly.

Error

Error: Cannot create directory ‘.cache/poplar_report_viewer’ because a file already exists there.

Graph Analyser will attempt to create a number of directories on the remote server if they do not already exist. If a file already exists with the same name then attempting to create the directory will fail. Check what the file is and either delete or rename it to allow the directory to be created.

Error

Error: Could not create directory ‘.cache’: Permission denied

Graph Analyser will attempt to create a number of directories on the remote server if they do not already exist. This error indicates that you did not have permission to create one of these directories. This usually indicates a problem with how your home directory is setup on the server: either you do not have a home directory and don’t have permission to create one, or you have not been given adequate permissions on your own home directory. Contact the server administrator to ask for a home directory to be created or for its permissions to be corrected.

Error

Error: Could not write to ‘.cache/poplar_report_viewer/backend_…’ on the remote. This could be caused by a full filesystem

This error usually occurs because there was not enough disk space on the server to upload the required binary file to your home directory (around 20MB is required). You may be able to free up some space by deleting unused files.

In some cases servers have a home directory filesystem which is very limited in size but have a much larger local disk or “scratch space” available. The Graph Analyser can upload to the scratch space if you create a symbolic link from your home directory to the scratch drive, for example:

mv ~/.cache /localdata/username/.cache
ln -s /localdata/username/.cache ~/.cache

2.4. Viewing reports

The Graph Analyser displays interactive graphical and textual reports, and you can interact with these in a number of ways to get to the information you want. Each report has a few different options that are only relevant to that report, but they all share some features in common, as described in this section.

When you open a report, its file path is displayed in the title bar of the report window.

2.4.1. Using the side menu

When you’re loading some report data, and the Summary Report is displayed, the side menu becomes visible on the left-hand side of the application window. This contains buttons at the top for viewing each of the main report types, and three buttons at the bottom:

image22

Reload report : Re-import all the report files. This may be required if any of the files from which an open report was generated have been updated since you opened the report. See Reloading reports for more information.

image23

Close report : Close an open report, which “unloads” all the report data from the application and returns you to the Home screen. If you want to view those reports again, you’ll need to open the report.

image24

Documentation : Open the in-app help. If you were viewing one of the report pages, the help window opens up on the relevant page.

image25

image26|

2.4.2. Adjusting report size

There are several ways to change the size and scale of the report in the Graph Analyser:

  • You can increase and decrease the display size of the entire application (including the Help window) by using the Ctrl/Command keys with the + and keys to magnify and shrink the display size, just as you would in a web browser. There are three zoom options in the View menu that show the shortcut keys. Reset Zoom resets the magnification level back to its default setting.

  • To zoom in and out of a particular section of the graph, click and drag horizontally in the graph preview area above the main graph, and the display will change to show the graph that corresponds to that section of the data. A pair of limiter icons appear in the preview area to show the start and end of the data displayed in the main graph area. These can be dragged left and right to change the amount of data in the main graph. Using the scroll-wheel on your mouse scrolls the report page up and down, but you can zoom by holding down the Ctrl/Command key.

    Note

    You can choose how you want your scroll wheel to behave (if you want it to scroll or zoom by default) by setting the Scroll behaviour preference.

  • You can also click and drag the main graph itself to view areas to the left and right of the currently viewed area. Note that clicking without dragging can sometimes select a specific tile (for example, in the Memory Report), but you can clear this selection from the input text box above the graph.

  • You can reset the zoom scale of the Memory and Liveness Reports by clicking on the small button to the left of the preview area, top-left of the graph. This zooms out to the furthest level, showing the entire graph.

  • To make a report larger, so that you can see more detail, you can drag the edges of the window to increase its size. This resizes the report images as you drag.

  • To adjust the space that each half of a report takes up on the page, click the splitter icon between the two halves of the report, and drag it up and down. The two report sections resize accordingly. Note that the Program Tree and Execution Trace Reports also display a vertical splitter icon when comparing two reports, so you can choose how much of each report fills the available screen space.

2.4.4. Saving report images to disk

You can save report graphs to disk as image files or copy them to the clipboard, to avoid having to make screen captures.

  1. Click on the camera icon in the top right-hand corner of a report.

  2. Select whether to save to a file or copy to the clipboard.

  3. If saving to a file, select the directory on your computer where you want to save it.

  4. Messages are displayed if the image was successfully copied to the clipboard, or if failed to save to disk, depending on which choice you’ve made.

Report images are saved as PNG files, and capture the entire visible part of the report screen, including any detailed information displayed in the tabs below the graph. They also reflect the currently selected theme colours.

2.5. Viewing a Summary Report image27

When you first open a report, the Summary view is shown. It consists of high-level information about the Poplar program, split into various sections.

Each summary section displays information in collapsible blocks (marked with a downward-pointing arrow, or, for the Poplar Engine Options section, clickable disclosure triangles), making it easy to show only sections of interest to you. Whether a section is collapsed or not is saved automatically across all reports.

2.5.1. Program Information

The top half of the report shows details of the IPU system the program was compiled for, and also details of the size of the graph that Poplar created.

Target

  • Type : The kind of IPU the program was compiled for. This will either be IPU or IPUModel

  • Architecture : The version of IPU used to run the program. This will either be Mk1 or Mk2

  • Timestamp : When the program compilation was started.

  • Tiles per IPU : The number of tiles in each IPU

  • IPUs per replica : The number of IPUs in each replica. (Only shown if more than one replica is used)

  • Replicas : The number of replicas. (Only shown if more than one replica is used)

  • Total tiles : The number of tiles in total (tiles per IPU * num IPUs)

  • Total IPUs : The number of IPUs in total (IPUs per replica * num replicas)

  • Memory per tile : The maximum memory on a tile.

  • Memory per IPU : The maximum memory on an IPU.

  • Memory per replica : The maximum memory for a replica.

  • Total memory : The total memory on all IPUs the program was compiled for.

  • Compute set instrumentation : The type of instrumentation compiled into the program to record compute set execution cycles. The method is controlled by the debug.computeInstrumentationLevel Poplar Engine option and is enable or disabled via the debug.instrumentCompute option (or debug.instrument which enables all instrumentation).

    • Off : Instrumentation disabled; estimates are used instead, if available.

    • Vertex : Cycles are recorded for each vertex execution. This means vertex execution is serialised, and this only really works on very small graphs.

    • Tile : Cycles are recorded separately for each tile. If you find this uses too much memory try using the Ipu method.

    • Ipu : Cycles are recorded for the slowest tile on each IPU for each compute set. An internal sync is inserted before and after each compute set. The tile that is used to do the cycle recording is controlled by debug.profilingTile . If you have enough memory consider using the Tile method instead.

    • Device : Cycles are recorded for the slowest tile on the entire device for each compute set. This mode is not recommended - use Ipu instead.

  • External exchange instrumentation : The type of instrumentation compiled into the program to record external exchange cycles, so host exchange and global exchange. The is enabled or disabled using the debug.instrumentExternalExchange Poplar Engine option (or debug.instrument which enables all instrumentation).

    • Off : Instrumentation disabled; estimates are used instead.

    • Tile : Cycles are recorded separately for each tile.

Graph

  • Number of compute sets : The number of compute sets in the graph. This is after compilation so it may be a larger number than the number of compute sets added through the Poplar API because some are created during compilation.

  • Number of edges : The number of edges in the graph. This is after compilation. An edge is a pointer from an Input<> , Output<> or InOut<> vertex field to a variable.

  • Number of variables : The number of variables in the graph, after compilation. Post-compilation variables are called lowered variables to distinguish them from the variables you created when defining the program. The main difference between these types of variables is that lowered variables are restricted to a single tile.

  • Number of vertices : The number of compute vertices the graph contains after compilation. Some vertices are added to the graph during compilation (for example memcpy vertices) so this may be higher than the number of vertices added with the Poplar API.

2.5.2. Poplar Engine options

This section displays all of the Poplar Engine options that were used to generate the reports, the values supplied for each, and whether they are the default values or otherwise. You can also choose to show the options for the three different phases: compilation, execution and target.

If no Poplar Engine options are displayed, click on the Engine Options heading to expand that section and show its contents.

  • To view options by their phase (execution, compilation or target), select the phase from the Type drop-down list. If a phase is not available, it is disabled.

  • If you select the Execution type, another drop-down list appears in which each of your runs is listed. Select None to see all execution parameters, or select one of the runs to see execution parameters just for that run. See run-specific execution parameters for more details.

  • Values which are different from the default are displayed in bold . Use the View - All selection box in the top right-hand corner to switch between a list of all options, or View - Non-default to view just those which are different from the default values.

  • Click on the small book icon to follow a web link to the Poplar API Reference for a description of each of these options.

2.5.3. Framework and application JSON files

If you have a framework.json or an app.json file that was created in your program, their contents are displayed here so that you can check any parameters that you recorded in them. This is useful when comparing two reports, allowing you to spot differences easily.

You can put whatever information you wish into these two files, and if they’re found in the reports folder when the other report files are being loaded, their contents are displayed in a foldable tree. This assumes that the files are valid JSON.

2.5.4. Report files

This section of the Summary report shows the folder from which the report files were loaded (or both folders, if you’re comparing reports). It also shows which individual files are being loaded into the Graph Analyser, as documented in Report Files .

  • If no Report Files are displayed, click on the Report Files heading to expand the section.

For each file that is present:

  • A set of three green dots indicates the file was found, and is being analysed and loaded.

  • A green tick indicates that a file was found and has been loaded successfully.

  • A greyed-out question mark indicates that the corresponding file was not found.

  • A red cross indicates that the file could not be loaded. A warning message can be found in the Host Information section, directly above.

The folder from which the reports were loaded (or both folders, if you’re comparing reports) is also displayed.

2.6. Viewing an Insights Report image28

The Insights Report gives you a brief summary of how well your model fits into the available IPU memory. The report shows which tiles are responsible for the highest memory usage, and which vertices and exchanges require the most memory in your model and on the IPUs.

There are also a number of recommended actions you can take to reduce the memory usage reported, such as recomputation, changing the batch size or using FP16. Where relevant, details of the largest memory requirements are shown, and an estimated expected memory saving.

2.6.1. Memory insights

The top section of the Insights report shows how well your model fits into the available IPU memory. It includes the following information:

  • A panel at the top shows whether your model is within available IPU memory capacity, or whether you had an out of memory (OOM) issue. The proportion of IPU memory used is displayed, giving you a quick guide to how much memory you have spare (or how much you need to reduce if your model was OOM).

  • A chart showing the five tiles with the highest memory usage. For each tile, the amount of memory required is displayed, as well as the IPU on which that tile is located.

  • A chart showing a histogram of memory usage across all tiles. This gives you insight into the number of tiles that are using particular amounts of memory.

  • A table showing the vertex and exchange memory usage .

2.6.2. Vertex and exchange sizes

The Insights report also shows you the vertices and exchanges that require the most memory in your model, and across the IPUs and tiles.

  • Select the Model , IPU or Tile tab to see the memory usage for each.

  • Select the Vertex (state and code) radio button to display the name of the vertex with the largest state, the amount of memory it required, and a list of compute sets in which that vertex was used. Beneath that, the vertex with the largest code size is also shown, with the same information.

  • Select the User variable radio button to see a list of always-live user variables at peak liveness that require memory reserved for the entire application.

  • Select the Exchange radio button to display the name of the largest exchange, the amount of memory it required, and the names of the variables that were involved in the transfer.

  • Select the Program step radio button to view the step with the peak memory usage, as well as a list of the not-always-live variables at this step.

  • When viewing the IPU tab, you can select which IPU to view using the drop-down list on the left. Only vertices and exchanges involving that IPU are listed.

  • When viewing the Tile tab, you can select which of the five most memory-hungry tiles to view, or enter a specific tile ID in the search box. Only vertices and exchanges involving that tile are listed.

2.6.3. Tips on reducing memory usage

The bottom section of the Insights report displays a number of panels that recommend possible solutions for improving the memory usage of your model, which may help in situations where you are out of memory. Where appropriate, estimates are given showing how much memory could be saved with each solution. Further information about these recommendations can be found in the Memory and Performance Optimisation Guide , on the Graphcore website.

Note

These solutions may affect the performance, throughput, convergence and/or training characteristics of your model.

The following recommendations and solutions are included in the report:

2.7. Viewing a Memory Report image29

The Memory Report shows a graphical representation of memory usage across all the tiles in your IPU system, showing graphs of total memory and liveness data, and details of variable types, placement and size.

There are two main areas of the Memory Report:

  • The Memory graph, in the top half of the window, shows different types of memory graph. Click on the Graph Type drop-down menu at the top left-hand corner of the graph to select a graph type:

    • Total Memory graph , which shows the memory usage of your program across all the IPU tiles. You can view a breakdown of this data by region (whether to display interleaved and non-interleaved memory separately) or by category (what the memory is used for).

    • Variables graph , which allows you to plot the memory usage of multiple individual variables.

    • Tile Map , which shows the memory usage of the tiles overlaid on a physical floor plan of the IPU.

  • The Tile Memory Usage report, in the bottom half of the screen, shows memory usage broken down by various categories, and memory maps of individual tiles.

You can choose various view options for each graph, and you can also click on the graph to view details for an individual tile.

2.7.2. Total Memory graph

This memory report shows the total memory usage across all the tiles on all IPUs.

  • On a Memory Report, select Total Memory from the Graph Type menu .

The horizontal axis shows the tile number (which you can order by software or physical ID (see Memory Report view options ), and the vertical axis shows the memory usage.

Memory Report breakdown

Breakdown by Region

IPU memory has two different types of memory regions which Poplar allocates to data depending on how that data needs to be accessed:

  • non-interleaved : Consecutive words are stored in the same memory bank. Code must be stored here.

  • interleaved : Consecutive words are stored in alternating memory banks. Some high bandwidth load/store instructions like ld128 only work in interleaved memory, and therefore some codelets require variables they are connected to to be stored here. Code cannot be stored here.

  • overflowed : Memory that exceeds the maximum amount available on a tile.

Breakdown by category

When you select Breakdown - By Category another drop-down list is displayed with all the available memory categories. This can be used to understand the overhead costs of instrumentation.

  • The Select Categories drop-down is multi-select so you can compare multiple categories simultaneously.

  • The total memory can be toggled on and off by selecting the All option.

  • Breakdown - By Category is also available when viewing memory by IPU.

Breakdown by liveness

This memory report shows the memory usage of the two types of program variables:

  • Always-Live Variables : These variables must be accessible for the entire lifetime of graph execution. This means nothing else can ever use the memory allocated for these variables. Examples include code and constants.

  • Max Not-Always-Live Variables : Not-always-live variables are only needed for some program steps. As long as two variables are not live at the same time they can be allocated in the same location, thus saving memory. This option shows the maximum amount of live memory use on each tile. See the FAQ for more details.

2.7.3. Variables Memory graph

This memory report allows you to select multiple variables and plot their memory usage across tiles.

  • On a Memory Report, select Variables from the Graph Type menu.

  • A prompt appears, suggesting you enter a variable to search for. Type the variable name into the search box at the top, and the application will find matching variables and display them in a drop-down list.

  • Select a variable from the list to plot on the graph.

  • Remove variables from the graph by clicking the small x icon in their names in the key legend, below the graph.

You can also access this feature from the Total Memory report by selecting a variable from the Variables tab as described Plot multiple variables .

Note

The Variables Memory graph is not available when viewing memory by IPU.

2.7.4. Tile map Memory graph

This memory report displays a schematic of an IPU and overlays it with a coloured representation of the tile memory usage for every tile for the selected IPU. The colour key is displayed on the right, and its range can be changed, as described in Changing the colour scale .

Note

The Tile Map Memory graph is not available when viewing memory by IPU.

  • On a Memory Report, select Tile Map from the Graph Type menu.

  • Select an IPU to view using the input on the left, and the map updates to show the memory usage for that IPU.

  • Hover the mouse over the tile map to see a popup of the details of a tile within the selected IPU. This shows the physical and software tile IDs, memory usage and rank (described in Changing the colour scale ). While you hover, a black line within the colour key, to the right of the tile map, shows the memory usage of the hovered tile, according to the colour scale currently selected.

  • Click on a tile on the map to select it and see its memory usage. Its details are shown in the tabs and tables below, as in other memory reports. You can select multiple tiles by holding down Ctrl/Command while clicking on a tile. Details for each tile are displayed in the tables below, with a column for each tile. The selected tile numbers are displayed in the search box above the tile map, so can enter them by hand if you know which one you’re looking for.

  • The Breakdown menu at the top of the tile map allows you to break down memory usage by region (see Memory Report breakdown for details). When breaking down by region, the Region control to the left of the map allows you to choose to display interleaved or non-interleaved memory.

  • Choose whether to include or exclude gaps in the tile map by using the Options menu at the top of the report.

  • The panel to the left of the tile map allows you to select which IPU to view, which variable category you’d like to view, and which colour scale to use.

Note that you can change the size of the tile map by dragging the split-screen control, and it will fill the space available in the top half of the screen.

Changing the colour scale

There are three methods of colouring the tiles on an IPU that show their memory usage in different ways. Use the Scale type drop-down to the left of the tile map to select one of these scales:

  • Relative : (default) The colour of a tile depends on the memory between the upper and lower memory values.

  • Absolute : The colour of a tile depends on the memory between zero and max memory.

  • Rank : The colour depends on a linear ordering of tiles based on their memory usage.

When your model is out of memory, the colours are scaled appropriately, not just to the max memory.

2.7.5. Tile memory usage

The bottom half of the Memory Report screen shows tabs that contain an analysis of memory usage by several different categories.

  • The default view shows memory usage for all tiles (or IPUs, if you are choosing to plot by IPU instead of tile), but you can select an individual tile/IPU as described in Selecting individual tiles/IPUs .

Tile memory usage: Details tab

The Details tab in the tile memory usage report displays a hierarchical list of memory usage by category on the selected tiles. This list is divided into three main sections:

  • Including Gaps : Shows memory usage on the selected tiles which includes the gaps between variables.

  • Excluding Gaps : Shows memory usage on the selected tiles which excludes the gaps between variables. It is split into interleaved and non-interleaved memory and also categorised by the type of data in that memory location.

  • Vertex Data : Shows the memory used by variables in the graph vertices as the Poplar program executes, categorised by the types mentioned in Excluding Gaps .

Excluding Gaps

Memory usage on the selected tiles is displayed here in two categories, with memory usage figures for each:

  • by Memory Region : Shows memory that is non-interleaved, memory that is interleaved, and any memory that has overflowed.

  • by Data Type : Shows memory further categorised by the type of data that is stored there (either overlapping data or non-overlapping data). The meaning of each of these categories is explained in the table below.

Not Overlapped data

This shows the usage for parts of the memory where only a single variable is allocated. This includes variables that cannot be overlapped with other variables (always-live variables), and also variables that just happen to be not overlapped with other variables, even though it isn’t disallowed.

Variables

These are the variables added using Graph::addVariable() .

Internal Exchange Message Buffers

During the exchange phase of program execution, it may not be possible to send data straight to its destination. For example sending a single byte directly is impossible because internal exchange has a granularity of four bytes. In cases like this Poplar will copy the data to and from temporary variables using on-tile copies (which can copy individual bytes) and then do the actual exchange from these buffers.

Constants

These are variables that were created using Graph::addConstant() .

Host Exchange Packet Headers

Host exchange is performed using a packet-based communication protocol. Each packet starts with a header that contains the address that its payload should be written to. These addresses are determined at compile time and the packet headers are stored in these variables.

Stack

This is where the program stack lives. It is created automatically during compilation. There is a single Stack variable on every tile that contains the stacks for the supervisor and worker threads. The stack size is configurable at compile time via a Poplar Engine option.

Vertex Instances

These store the state of each vertex instance. Each call to addVertex() adds a single vertex instance to the graph, whose size is equal to sizeof(TheCodeletClass) .

Copy Descriptors

During compilation, Poplar will add compute sets that perform copies. These contain copy vertices, and the copy vertices reference additional data called Copy Descriptors that describe how to perform the copy.

VectorList Descriptors

A vector of pointers that points to the values of a multi-dimensional vector. The data for VectorList<T, DeltaN> fields.

Vertex Field Data

Variable-sized fields, for example the data for Vector<T> fields.

Control Code

All code that is not vertex code, this includes all the code that is generated from you Poplar Program tree, and code for each compute set that calls the codelet compute() functions for each vertex. It does not include the compute() functions themselves, which fall under the Vertex Code category.

Vertex Code

This is where the assembled code from the codelets is stored. A codelet is a class written in C++ or assembly, whereas a vertex is an instance of that class. Adding multiple instances of a single vertex type does not increase the amount of Vertex Code memory required.

Internal Exchange Code

The code instructions used to move data between tiles on an IPU.

Host Exchange Code

The code instructions used to move data between an IPU and the host machine.

Instrumentation Results

If you set the debug.instrument option in the Poplar Engine, this is where the cycle counts for various Poplar functions are stored. You’ll notice, therefore, that enabling instrumentation increases your memory usage. Different levels of instrumentation can be selected, which will use different amounts of memory. Note that the size of these variables is dependent on the level of dynamic branching in your program – if you’re timing every instance of a function call, the compiler won’t necessarily be able to tell in advance how much memory it will require to keep a cycle count for each of them.

Overlapped data

This is data for variables that are not always live, meaning that they are temporary and can be overlapped by other not-always-live variables if the two variables are not live at the same time. Reusing memory in this way reduces the amount that is required by Poplar programs. The sizes reported here count the memory used by the variables as if they were not overlapped. For example if two 4-byte variables are allocated in the same location it would be reported as 8 bytes here.

Program & Sync IDs

The Poplar Engine has a run() method to which you pass a vector of programs you want to execute. Each of these programs has an ID, so that you can specify which one to execute first. For example, when you call run(3) to run the fourth program, 3 indicates the program ID that is sent to the IPU so that it knows which program to run. Additionally, when control flow cannot be statically determined, the IPU must inform the host which control flow path it took, so that the host knows which data to send during host exchange. This is done by sending Sync IDs.

Data Rearrangement Buffers

Data connected to a vertex edge is guaranteed to be contiguous in memory, but the Poplar API allows you to connect non-contiguous tensors to edges. In this case Poplar will need to insert rearranging copies to temporary variables so that the data presented to the vertex is contiguous.

Tile memory usage: Compute Sets tab

This tab contains a table of the compute sets that appear on the selected tiles or IPUs or, if none are selected, all tiles or IPUs. The name and total size of memory for each compute set are listed in descending order of size.

  • Each row can be expanded or collapsed by clicking the chevron icon at its left-hand edge. When a compute set is expanded, a subsidiary table of its constituent vertices is displayed beneath the compute set row.

  • The vertices table shows the total size of memory for each vertex in the compute set, which is also shown in the Vertices tab because it is independent of the compute sets.

  • In the comparison view, the difference in the size of a compute set or vertex between the source and target reports is only displayed if it appears in both reports. Otherwise, the difference column is empty for the row.

  • For reports generated with Poplar SDK version 2.3 or later, the instance count for each vertex in the compute set is also shown. The count for each selected tile or IPU is specific to the compute set that is expanded.

  • Because vertices are shared, a vertex may have a non-zero size for a tile or IPU even if its instance count is zero. Similarly, when comparing reports, a compute set that appears in only one of the reports may be made up of vertices which nevertheless appear elsewhere in the other report.

  • The compute sets can be filtered by their name, or the names of their constituent vertices, by using the text input and drop-down button above the table.

Tile memory usage: Vertices tab

This tab in the tile memory usage report lists the memory used by the graph vertices, together with the total memory size they occupy across the selected tiles (or all tiles, if none is selected). This list is ordered by decreasing memory usage.

  • For each vertex, the Poplar namespace and function name are listed, together with any additional information about their types. Please refer to the Poplar API Reference for a description of each of these functions.

  • The origin of each vertex, in a small blue box, is displayed after the vertex name, indicating whether the vertex was written in C++ or Assembler (ASM).

  • You can filter the vertices by name, using the input box above the table, or by source (C++ or ASM (Assembler)) using the drop-down list.

Tile memory usage: Exchanges tab

This tab in the tile memory usage report displays the internal exchange code size for all tiles/IPUs, or the currently selected tiles/IPUs. When comparing reports, there is an additional column that shows the difference between the source and target code size.

  • Open up details on any of the exchanges by clicking on the small arrow next to it. If no tile is selected, the exchange information is grouped by name, and the total size of the exchange data is added up to give the total size in the column on the right for exchange variables with that name.

  • The FROM and TO labels indicate the direction of the exchanges, the names of the variables involved, and how much data was passed.

  • The UL and L tags show whether the variables are unlowered or lowered . Unlowered variables are created and lowered across several tiles, so that parts of them are mapped to other tile memory variables. Many of the Poplar operations create lowered variables directly, so rather than create a large variable and map it across the tiles, it creates many little variables on each if the tiles, and maps out the exchanges that are required between them. There’s no higher level variables to reference, hence the need to differentiate between the two types.

Exchange information can also be seen on the Program Tree .

Tile memory usage: Variables tab

This tab in the tile memory usage report displays a memory map of the currently selected tile, showing code and variable usage across the memory locations. Note that this tab is not available when viewing memory by IPU.

Note

Entries in this section of the report are only present if lowered variables were captured in the profile. See Report files for details about how to generate this when executing your program.

You can toggle between two different views (memory map and table view) on the Variables tab by clicking on the icon in the right-hand corner of the tab contents.

There are several interactive features of the Variables view that can help you find the locations in which variables are stored:

  • The selected tile’s memory is displayed vertically in a scrollable area that is 1024 bytes wide. The tile memory is partitioned into memory elements , which are either Interleaved or Non-Interleaved (see Memory Report breakdown for more information). Any unused elements at the end of the IPU memory are not displayed.

  • Variables are displayed as coloured bars which span the memory locations, and in places where two or more variables overlap, you can see all the variables at that location by hovering your mouse over the variable.

  • All variables in the memory layout are coloured according to their type. Click the colour key icon in the top right-hand corner to view the colour key for each type. The meaning of each of the categories displayed here is described in the table above. You can click on the checkboxes on the left-hand side of each variable to display it or hide it from the variable plot.

  • Click on a variable to display its details, which appear on the right-hand side of the memory layout. This displays all variables which exist at any time at that memory location.

  • Click on the Show button at the bottom of the variable details, beneath the Interference heading, to filter other variables that interfere with the selected variable in terms of memory placement. See Memory interference for details.

  • Search for a variable by entering its name into the input text box above the memory layout. If a variable name matches the search text, their name will be highlighted in the memory layout, with all others disappearing. You can clear any text you’ve entered here by hovering over the box and clicking the small x icon at the right-hand end.

  • Plot one or more variables on the Memory graph, as described in Plot multiple variables .

Note

You can expand the variables map to fill the report window by clicking on the full-screen button in the top right-hand corner of the map. A corresponding button to shrink the map again appears in the top-right corner.

Memory interference

You can see other variables which a selected variable interferes with. These are variables that are in contention in terms of their memory placement. There are three ways variables can interfere with each other:

  • Memory: A variable cannot be occupy the same bytes as some other variables because it is live at the same time as them. Always-live variables interfere with every other variable in this way.

  • Element: A variable cannot be in the same memory element as another one. This can occur in some case when two variables are connected to the same vertex, and it is reading from one and writing to another using certain instructions.

  • Region: A variable cannot be in the same memory region as another one

To see which other variables interfere with a selected variable:

  • Select a variable from the memory map by clicking on it. Several variables may occupy that memory location, and their details are displayed in a list on the right-hand side.

  • Click on the Show button at the bottom of a variable’s details, beneath the Interference heading, and the variables in the memory map will be filtered using that variable’s name, showing only those that interfere with it.

  • To re-display all variables, click the small cross in the filter box at the top of the variable memory map display.

Variable types

Variables in the memory layout diagram are categorised by colour. Note that more detailed descriptions are available in the Excluding Gaps table.

  • User variables : These are user-defined variables.

    • Variable: Variables created using the Graph::addVariable() function.

    • Constant: Variables created using the Graph::addConstant() function.

  • Code variables : These are variables that are created by Poplar to execute the program code.

    • Control Table: Experimental, empty by default.

    • Control Code: Variables used by Poplar to run the program. Some specific control code variables are described in Known variables .

    • Vertex Code: The code within the vertex codelets

    • Internal Exchange Code: The compiled code used to move data between tiles on an IPU.

    • Host Exchange Code: The compiled code used to move data over PCI between an IPU and the host machine.

    • Global Exchange Code: The compiled code used to move data between multiple IPUs.

  • Vertex data variables : These are variables associated with tensors.

    • Vertex Instance State: The internal state of each vertex instance.

    • Copy Descriptor: Additional metadata used for copy vertices.

    • Vector List Descriptor: Data for VectorList<T, DeltaN> fields.

    • Vertex Field Data: Data for Vector<T> fields.

  • Temporary variables : These are used to temporarily store information that Poplar uses while executing the program code. They are not-always-live, overlapping variables.

    • Message: Temporary storage for internal exchange (between tiles on one IPU).

    • Host Message: Temporary storage for host exchange (between an IPU and the host).

    • Global Message: Temporary storage for global exchange (between IPUs).

    • Rearrangement: Variables used to store intermediate values when an edge is connected to non-contiguous variables.

    • Output Edge: Temporary output variable used when a vertex is connected to a variable on a different tile. The data is copied by internal exchange after the compute set has been executed. Note that this category is also used for Input Edges. It should really be named Input/Output Edge.

  • Miscellaneous variables : Other variables that don’t fit into the categories above.

    • Multiple: Sometimes variables are merged during lowering. If they came from two different categories the resultant variable is put into this category.

    • Control ID: The combination of program ID, sync IDs and software sync counter.

    • Host Exchange Packet Header: Header information for PCI messages between the IPUs and the host machine.

    • Global Exchange Packet Header: Header information for PCI messages between IPUs.

    • Stack: Thread stacks for each tile.

    • Instrumentation Results: The cycle counts if instrumentation is enabled.

Known variables

Poplar uses some specific variables that you may encounter on various tiles. Their purpose is:

  • .text.poplar_start : This is the entrypoint and main control code for your program. It’s roughly equivalent to main() in a C program.

  • .text.supervisor.control__func[…] : These are the control codes for the functions in your compiled program. These are the functions you can see on the Program Tree report.

  • .text.supervisor.control_initPrngSeed : The control code to initialise the seed for the psuedo-random number generator.

Plot multiple variables

When a variable is selected in the Variables tab, its details are displayed in the right-hand column next to the memory map for that tile. You can then plot that variable’s position in memory on the main graph, as follows:

  • With the variable selected, click on the Plot variable button. The Graph Type menu at the top changes to Variables The variable name is added to the variable list above the report, and you can see how it is placed across the tile memory.

  • Click other variables in the memory map, and you can repeat the process above, adding them to the variable list at the top of the report, and displaying their memory placement together on the same graph. The default behaviour shows the size of the variable on each tile. If you select Plot variable by address from the Options menu at the top of the screen, you can see how the variable is laid out in the memory space.

  • To remove a variable from the graph, find its name in the list above the graph, then click the small x button at the right-hand end.

Full-screen option

When viewing the content of the Variables tab it may be easier to view the data in full-screen mode. You can toggle this option on and off using the button in the top right hand corner of the tab.

Toggle between table and graph view

The Variables tab has a table view, listing the variable names and their sizes, and also a graph view which provides more in-depth detail. You can toggle between these by using the chart/text button in the button group in the top right-hand corner of the tab.

Show differences between selected tiles

When multiple tiles are selected the difference between values is displayed in red or green text on the table view. You can remove variables that have the same value by enabling the Show differences between selected tiles option. This is accessible by clicking on the cog icon and checking the box in the drop-down menu. This filters out all variables that have the same value, and leaves only those that are different.

Show base address offset

Each IPU version has a different address where available memory starts on its tiles. For the Mk1 IPU, this is 0x40000 , and for the Mk2 it is 0x4C000 . Selecting this menu option resets the base address to the IPU version you’re using.

2.8. Viewing a Liveness Report image30

The Liveness Report shows which of the not-always-live variables are allocated at certain points in your program, and gives a detailed breakdown of variable memory usage by the compute set that they’re in. There are two main areas to the report:

For a standard machine learning model that you’re training, for example a ResNET, you’ll generally see a curve that rises to a peak and drops again. The rising portion of this curve is the memory usage during the forward pass of the training algorithm, where many activations are created (not-always-live variables), and the peak represents the point where the maximum number of activations exist. As the curve drops again, which represents the backwards pass of the training algorithm, the activations are “released” after being used to update the weights.

When you’re inspecting a liveness graph, it’s informative to look at the peak of this curve, and select the corresponding compute set to show the how the variable usage is contributing to the greatest memory utilisation.

2.8.2. The Liveness graph

  • Hover your mouse pointer over the graph to see a callout containing the compute set ID, together with the amount of memory used by that compute set.

  • Click on the graph to see that compute set’s stack details below the graph, which displays its name and memory size. You can select multiple program steps by holding down Shift and clicking other program steps. Each of these steps is then displayed in its own column on the Not-Always-Live Variables and Vertices tabs.

  • You can choose to display memory usage statistics for the selected tiles. See the relevant Preferences section for details.

  • You can plot the lifetime of not-always-live variables directly on the graph. See Not-Always-Live Variables for more details. This is an experimental feature.

Note

If you’re comparing two reports, you can choose to display their liveness graphs together or separately. See the Merge Graphs Viewing options section .

Selecting a source

You can choose whether you want to select an individual tile, or, if the report was generated using Poplar SDK 1.2 or later, select an individual IPU.

To select particular tiles or IPUs:

  • Click on the Select Source drop-down list at the top of the graph and select the source you want to use for the graph. Each source will have its own plot.

  • All tiles : Shows liveness data for all the tiles (the default)

  • Worst tiles : Shows the two tiles that have the highest memory usage during program execution, and you can select either of them to view.

  • If the report was generated in Poplar 1.2 or later, you can also select one or more IPUs from this list.

There is a Poplar Engine setting which allows you to capture more than the default two worst tiles. Please refer to the Poplar API Reference for full details of these options.

Filtering steps

You can concentrate on the steps you’re interested in by filtering based on the step name. Type the name into the Search box and press Enter. Any step whose name matches the search text is displayed in the execution graph, while all other steps are moved to a separate dataset in the graph, and greyed-out.

To cancel the filtering, click on the small x at the right-hand end of the Search box.

Viewing options

There are several options for viewing Liveness Report graphs, which you can select from the Options drop-down menu in the top right-hand corner of the report screen:

  • Include Always-Live : Include a second trace in the graph that shows variables that are always present during the entire execution of the program, and always have the same memory address (for example stack ). Note that you can show and hide each of the traces on the graph by clicking on their colour key, just below the x-axis.

  • Include empty stacks : Include stacks that contain no variables.

  • Stack IPUs : Display any selected IPUs stacked or not. See the Selecting a source section for details.

  • Show Max Memory : Display a line on the graph showing the maximum memory limit for total memory, a selected tile, or a selected IPU. Note that this requires the Include Always-Live option, above, to be enabled, and Stack IPUs to be disabled.

  • Merge Graphs : Merge source and target report graphs together, or keep them as separate graphs. When displayed separately, you can navigate each graph independently, and select different program steps from each report. If your two reports contain different numbers of program steps, you’ll be able to select the corresponding steps from each to view their details. When the graphs are merged together, you navigate them together “as one”. When you click on the graph, the two program steps you select (from the source and target reports) may not correspond to the same element of each program.

2.8.3. Liveness stack details

In the bottom half of the Liveness report screen, you’ll see a tabbed list of live variable details that show their memory usage. These details are described in Viewing enhanced debug information .

Viewing enhanced debug information

If you have captured enhanced debug information when compiling your program, it is visible in the Always-Live and Not-Always-Live Variables tabs. See the section on the Debug Information file for details of how to capture this information.

If a variable has a small disclosure arrow ( image31 ) next to it, click it to show enhanced debug information. You can see a list of software layers on the left (for example, Poplar and PopLibs). Selecting a layer shows the debug information that has been added by that layer.

For all variables you have information captured from the Poplar layer. This may include:

  • whether it’s a Variable, Cloned Variable or Constant,

  • the shape of the variable as an array of dimensions,

  • the type of variable (for example Half ),

  • for cloned variables, which variable it was cloned from, and the method by which it was cloned,

If the variable was created as part of a PopLibs API call, you can also see the following information:

  • the PopLibs API call

  • the input tensors to the API call

  • the arguments to the API call

  • the output tensors to the API call

Note

Note the variable may not be an output of the PopLibs API call. It could be an internal variable created for the operation. Depending on your application, you may see further debug information for the framework and/or application.

By default, the debug information for the first PopLibs and Poplar call is shown. You can choose to show all PopLibs/Poplar calls that are made internally by clicking the gear icon in the top right-hand corner of the debug information box and selecting Show All Debug Infos . For instance, you may see the debug information for the PopLibs matmul API call. If you enable the option to Show All Debug Infos , you will see the internal implementation PopLibs calls, which include the PopLibs convolution API call. ( matmul is implemented as a convolution.)

Always-Live Variables

This tab shows the variables that are always live. Their data must always be available and therefore they cannot share memory with any other variable.

Not-Always-Live Variables

This tab shows the variables that are not always live. At certain points in the program we do not need to store any data in them, and therefore other variables can be allocated at the same location. Variables are never “allocated” or “deallocated” at runtime. All variables are statically allocated at compile time and always have a fixed address.

You can select one or more of the not-always-live variables to see their lifetime displayed on the graph above. Click on the small icon at the right-hand end of their listing, and they will be added to the plot, showing when they were created, and when they were destroyed. Each variable plotted also appears as a small blue box above the graph, and you can click the small x within them to remove them from the graph. This is an experimental feature.

Vertices

This tab shows the vertex functions that are contained in the selected compute set.

  • The origin of each vertex, in a small blue box, is displayed after the vertex name, indicating whether the vertex was written in C++ or Assembler (ASM).

  • The number of that vertex type created is also shown.

Cycle estimates

This tab shows the cycle estimates of each tile on an IPU. This is only available when using an IPUModel .

This is an experimental feature, and can be enabled and disabled in the Preferences.

Show differences between selected tiles

When comparing reports the difference between values is displayed in red or green text on the table view. You can remove variables that have the same value by enabling the Show differences between selected tiles option. This is accessible by clicking on the cog icon and checking the box in the drop-down menu. This filters out all variables that have the same value, and leaves only those that are different.

2.9. Viewing a Program Tree image32

The Program Tree report shows a hierarchical view of the steps in the program that is run on the IPU. The report has a menu on the left-hand side that lists the Control Programs and Functions. The program steps contained within the selected control program or function are displayed in the main report area, on the right.

  • Click on one of the Control Program or Function numbers in the left-hand column to see the sequence of instructions that are contained within it.

  • You can view any debug information that has been captured for the selected program step. See Viewing Debug Information for more details.

The program steps in the main report area are listed hierarchically, and you can collapse or open steps that have nested steps with them by clicking on them. A small grey triangle at the start identifies steps that have sub-steps: if it’s pointing to the right, then that step is collapsed , and all of its sub-steps are currently hidden. Click the step to open up that step and show all its sub-steps. All steps are initially shown opened up.

Each of the program steps is colour-coded, making it easier to pick out particular steps that you’re interested in viewing. Where steps corresponding to those found in the Execution Trace report, they are coloured the same.

When you select a step that is associated with other program steps, they are all highlighted in yellow. Details of that vertex are then displayed in the tabbed section below the main program tree.

For a detailed explanation of what each of these program steps involve, please refer to the Poplar & PopLibs API Reference Guide .

2.9.1. Searching for program steps

You can search for a particular step by typing its name into the Search box at the top of the report window, and pressing Enter. Program steps that match your search term are highlighted in yellow in the main part of the report. Control Programs and Functions that contain matching steps are also highlighted in the left-hand window.

You can scroll through the results, either by pressing Enter repeatedly, or by clicking the arrow buttons under the search box. Single arrows move the result selection back and forward by one, and double-arrows move by ten.

When comparing two reports, two sets of results are displayed, and you can step through them searately. The top set of results shows matches in the source report (on the left), and the bottom set of results shows matches in the target report (on the right). Match highlighting works independently in both reports.

If Show Program IDs is enabled, you can search for a step’s program ID as well as its type or name.

2.9.2. Details tab

When a program step is selected, more details of that step are displayed in the Details tab, such as the ID, type and name of the step.

If there are any vertices associated with that step, they are listed in the table below, with the following details:

  • The name of the vertex class, together with a small blue icon indicating whether the vertex was written in C++ or Assembler (ASM).

  • The memory size occupied by each vertex instance of this type.

  • The number of instances of this vertex created from the selected program step.

  • If the program is an exchange, you can also see the size of the exchanges, and a list of which variables were involved. For exchanges between tiles, the FROM and TO tiles are displayed, and for StreamCopy programs, you’ll see the amount of data transferred from the host to the IPUs, and from the IPUs to the host. You can read more about exchange data in the Memory Report .

You can sort the vertex types listed in the Details tab of the Program Tree by name, memory size, or by the number of instances in the selected program step.

2.9.3. Viewing debug information

Clicking on a program step reveals debug information that was captured during program compilation. This reveals all the steps that are needed to execute the selected operation. In addition, when you select the debug information, the program steps created for that API call are highlighted in the Program Tree.

The debug information is the same as that found in the Liveness report .

2.9.4. Change Layout

When comparing the Program Tree of two reports a button in the top right hand corner allows you to arrange the Program Trees side by side or one on top of the other.

2.9.5. Viewing options

There is one option for viewing the program tree, which you can select from the Options drop-down menu in the top right-hand corner of the report screen:

  • Show Program IDs : Whether to append the ID of each program to its label in the tree.

2.10. Viewing an Operations Summary image33

The Operations Summary displays a table of all operations in your model, for a software layer, showing statistics about code size, cycle counts, FLOPs and memory usage. You can also select which software layer operations you want to summarise.

Clicking on an operation in the table reveals further information about it in the tabbed section in the bottom half of the report, displaying graphs of code size, cycle counts, and various other measurements and estimates for the selected operation. You can choose which columns you want displayed in the table, and also apply sorting and filtering to it.

2.10.1. Operations table

The Operations summary table shows a list of all the operations within the selected software layer.

Note

Because the column headings in this table are typically quite long, we’ve used abbreviated headings that match the full column names displayed in the Columns drop-down list. If you hover your mouse over the column headings, you’ll see the full name displayed as a pop-up box.

Selecting a software layer

By default, the PopLibs software layer is displayed, as you can see from the drop-down Layers list in the top right-hand side of the table. You can select other software layers whose operations you wish to see by selecting from this list. Depending on your program, the following software layers may be available:

  • Poplar and PopLibs

  • PopXL and PopART Builder

  • ONNX

  • TensorFlow Poplar Drivers

  • TensorFlow HLO Instructions

  • TensorFlow XLA Operations

Note

Changing layers involves a sometimes lengthy re-calculation of the table metrics, so you may need to wait a short while for larger reports.

Selecting which columns to display

Note

The Columns control works differently depending on whether you’re viewing a single report, or comparing two reports.

  • When viewing a single report, you can select as many columns as your screen has room to display. Each column contains the values for that metric.

  • When comparing two reports, the Operation Name is always displayed, and you can select one other column to display. As well as the Operation Name column, there are three other columns that show the value of the selected column metric for the source report and the target report, as well as a column that shows the difference between the source and target values.

By default, the operations table displays the following metrics for each operation (single view only):

  • Operation Name

  • Debug Name

  • Code Size (Total)

  • Measured Cycles (Total)

  • FLOPs

Note

FLOPs are not generated by default. Enable the profiler.includeFlopEstimates Poplar Engine option to generate FLOP estimates.

Many other operation metrics can be displayed or hidden in this table by checking or unchecking the data types in the drop-down Columns list in the top right-hand corner of the operations table. Your current column selection preferences are automatically saved.

The Not-Always-Live Delta (NAL∆) option (experimental) shows the difference that each operation makes to the variable memory as the program executes. This helps you identify which operations are the most expensive in terms of memory. Note a operation may have multiple liveness values if it is repeated in the execution, the value show is the liveness delta from the first occurrence.

Note

Columns showing a range of Cycle Estimates are experimental.

Sorting and filtering

You can sort the operations table by any of the column headings, as well as only showing operations that match a particular string:

  • Click on a column heading to sort the table by that operation metric. Repeated clicking cycles through sorting in ascending order, descending order, or removing the sort from that column. A small blue triangle (or none) indicates the current sort order.

  • Type some text in the Filter operations box above the table, and press Enter to display only those operations whose operation name or debug name matches the text you enter. Remove filtering by clearing the text box.

2.10.2. Operations Summary tabs

The tabbed section in the bottom half of the Operations summary shows further information about the selected operation. When no operation is selected, only the Summary tab is visible, which shows some general statistics about all the operations in the currently selected software layer.

  • Select an operation from the table by clicking on it. The selected operation is displayed at the top of the tabbed section, along with a small x icon that you can click to deselect it, and return to the Summary tab.

Summary tab

When no operation is selected from the table, this tab shows a breakdown of operations for the selected software layer, including:

  • The total number of total operations for that layer

  • The total code size for that layer

  • The total number of cycles executed in that layer

When an operation is selected from the table, data from the default table columns is displayed.

Program Tree tab

When an operation is selected from the table, this tab shows the program steps involved in that operation. This is the same data displayed in the Program Tree report.

Code tab

When an operation is selected from the table, this tab shows a graph of the code size executed for that operation. Code size for OnTileExecute and DoExchange program steps is displayed by tile number for all IPUs. You can zoom and pan around this graph as with other graphs.

Cycles tab

When an operation is selected from the table, this tab shows the number of cycles taken by the selected operation, plotted against all the IPU tiles. You can zoom and pan around this graph as with other graphs.

By default, only cycle counts for the OnTileExecute and DoExchange program steps are displayed, but you can add other program step types to include on the graph by selecting them from the Options drop-down list in the top left-hand corner of the graph. Available options are:

  • Show Copies : Separates steps for OnTileExecute programs that are just for copies.

  • Show Estimates : Includes a number of estimated cycle counts:

    • DoExchange Estimated Cycles

    • OnTileExecute Estimated Cycles

    • OnTileExecuteCopy Estimated Cycles

    • StreamCopyMid Estimated Cycles

    • GlobalExchange Estimated Cycles

FLOPs tab

When an operation is selected from the table, this tab shows a graph of the total number of FLOPs (Floating Point Operations) executed for the selected operation, plotted against IPU tiles.

Debug tab

When an operation is selected from the table, this tab shows debug information from the currently selected software layer for that operation. This information is identical to that on the Liveness Report and the Program Tree Debug sections.

2.11. Viewing an Operations Graph image34

The Operations Graph displays a graphical representation of TensorFlow models, showing High Level Operations (HLO) and enabling you to:

  • drill down through the modules, expanding and collapsing the layers to get to the level you want;

  • view details of operations, edges and layers;

  • view the type and shape of the tensors between operations;

  • view graphs of pipelined models to see what is in each pipeline stage, and what passes between them;

  • colour items in the graph based on selected metrics (for example, code size or cycles used);

  • configure the layout with a number of advanced options.

The Operations graph shows how the HLOs are connected to each other in your TensorFlow model in a number of nested layers. Layers and the operations they contain are displayed as boxes, and the tensors that those operations use are shown as arrows between operations.

The report is shown in split-screen, the left-hand side showing the graph at the selected level, and the right-hand side showing information about the selected object in the graph in a series of tabs.

  • Pan around the operations graph by clicking and dragging your mouse anywhere on the graph.

  • Zoom in and out of the operations graph using the mouse scroll wheel.

2.11.1. Graph entities

There are several types of entity displayed on the operations graph. You can select them, and expand the layers and calls to drill down into the model. You can click and double-click entities on the graph to expand them (layers and calls) or to display other information about them (operations and edges).

Note

You can customise the appearance of the entities in the operations graph by adjusting the a advanced view options . The Options menu at the top of the report allows quick access to two of these layout options ( Show Backward Pass and Show Edge Labels ).

HLO layers

HLO layers are created when two or more operations have debug names which have the same prefix string, separated using the / character. They are displayed as boxes with solid borders and square corners:

HLO Layer

  • Click on a layer to see its details displayed on the right-hand side.

  • Double-click on a layer to expand it. All of that layer’s sub-layers are now displayed within the original layer, which is now displayed as a box around the sub-layer entities. Double-click that outer layer box to return to the original, enclosing layer.

  • Other layers within the operations graph can be displayed by selecting them from the drop-down menu at the top of the graph.

Defining HLO layers

The operations graph works best if you name your TensorFlow operations using either Keras layers or tf.variable_scope (see the TensorFlow documentation for name_scope . Currently the view assumes the top level tf.variable_scope is called all , which is a common convention in the public examples.

For best reults, all forward operations should be in the all layer, and all backward operation names should start with gradients/all .

Note

It has been found that TensorFlow 2 GradientTape does not work well, as it does not record the scope names for the backward pass. For best results, use the TensorFlow 1 compat optimizers.

High Level Operations (HLOs)

HLOs are displayed as boxes with solid, black borders with rounded corners:

HLO Name

  • Click on an HLO to show its details in the tabs on the right-hand side of the report.

Operations that are disconnected from the rest of the graph are displayed as boxes with dashed, grey borders with rounded corners:

Disconnected HLO

HLO Calls/fusion operations

HLO Calls or fusion operations are displayed as boxes with double borders and rounded corners:

HLO Call

  • Click on an HLO Call to show its details in the tabs on the right-hand side of the report.

  • Double-click an HLO Call to expand it and see the operations within it.

  • Expanded HLO Calls show up as a list of breadcrumbs at the top of the graph, and you can click on those to retrace your steps.

Edge tensors

Tensors: Data that is used as the input to, or output from, operations in the graph are displayed as arrows. Their labels are either the tensor’s shape and type (for single tensor), or how many of them there are (for multiple tensors).

  • Click on a tensor to see its details in the tabs on the right-hand side.

Note

There are several options to show or hide labels in the Advanced Options tab.

2.11.2. Selected entity information tabs

When you select an entity from the graph (a layer, operation, call, or edge), its name is displayed at the top of the right-hand side of the report, and some extra information associated with the entity is displayed in the tabs beneath. This includes:

  • Summary tab : Displays different information depending on which type of graph entity is selected:

    • When a layer is selected, this displays some statistics about the layer in general, such as total estimated FLOPS.

    • When an operation or call is selected, the inputs and outputs to the operation are displayed, together with their associated tensor shape, as well as statistics about code size (OnTileExecute, DoExchange and Total) and estimated FLOPS.

    • When an edge tensor is selected, the source ( From ) and destination ( To ) operations are displayed, as well as tensor shape and size.

  • Program Tree tab : Any program tree steps associated with the selected operation.

  • Details tab : Graphs of various metrics plotted against tiles, including code size, cycles and FLOPS.

  • Debug Info tab : Any debug information associated with the selected operation.

  • Advanced Options tab : See Advanced options for details of how to customise the appearance of the operations graph.

2.11.3. Highlighting operations by metric

The operations graph has a feature to add colour to graph entities based on various metrics. Entities that have a relatively high value for that metric are coloured with a “hot” colour (red) to highlight those operations that are costly in terms of memory, cycles, and so on, and entities with a relatively low value for the metric are coloured with a “cool” colour (blue).

  • From the Highlight drop-down menu at the top of the report screen, select a metric that you want to use to highlight certain operations.

Metrics that are available to use for highlighting include:

  • None: Switch off highlighting

  • Code size (Total)

  • Code size (OnTileExecute)

  • Code size (DoExchange)

  • Estimated Cycles (Total)

  • Measured Cycles (Total)

  • FLOPS

The colour key for metric values is displayed in the top right-hand corner of the graph, showing the highest and lowest values present in the current view.

2.11.4. Advanced options

The right-most tab on the right hand side of the report shows a number of display options for laying out the items in the operations graph. These settings are saved automatically, and persist between application sessions.

  • Check and uncheck options to display or hide information on the graph.

Note

The Options menu at the top of the report allows quick access to two of these layout options ( Show Backward Pass and Show Edge Labels ).

2.12. Viewing an Execution Trace image35

2.12.1. Detailed Execution Trace vs Lightweight Profiling

The Execution Trace Report shows the output of instrumenting a Poplar model in order to capture information about how long various parts of the model take to run. There are two different ways of capturing this information: the detailed Execution Trace, and Lightweight Profiling. Each has advantages and disadvantages. Which one is captured can be chosen by modifying the Poplar Engine options before running the model.

The detailed Execution Trace works primarily by simulating execution of the application on a simplified IPU model. The simulation data is augmented by capturing information at runtime about how many cycles each compute set takes to execute. The advantage of the detailed Execution Trace is that the data captured is very detailed: every single program step is included and the trace covers every tile on every IPU. This also means we can show the BSP trace and provide information about how a single program step may execute for different periods of time on different tiles. In addition to the cycle count for each step in the model, the detailed Execution Trace also displays statistics like the tile balance and cycle proportions, and compute-set details.

The detailed Execution Trace has several disadvantages. Simulating the execution of every program step on every tile can add a significant amount of time to the compilation of the Poplar application. Recording BSP data means that profiling code must be added to every tile. Finally, only a single cycle-duration is recorded for each compute set. If the compute set executes multiple times in the model but the executions take different amounts of time, the detailed Execution Trace will not show these differences.

Lightweight Profiling (LWP) does not involve any simulation and uses data captured at runtime from the IPU. Instead of capturing information about individual program steps, LWP records the start and end time of Blocks, which are programs added in order to instrument part of an application. Blocks may contain only a single program step, or may contain high-level operations consisting of a large number of program steps.

Instead of capturing information on every tile, LWP only records Blocks on a small number of selected tiles. This means that BSP and tile-balance information are not available from a LWP execution trace. However, as well as Blocks, LWP also records StreamCopy programs that take place on the selected tiles. Because these programs span many tiles on an IPU, an LWP trace also includes blocks for StreamCopy programs on tiles that were not selected, but which participated in a StreamCopy on a tile that was selected. These blocks can be aggregated at the top of an IPU’s trace by enabling the Aggregate StreamCopy Blocks option.

LWP captures far less data than the detailed Execution Trace and so is suitable for instrumenting large, long-running models with minimal overhead. The disadvantage of LWP is that the information available is less detailed than the information in the detailed Execution Trace.

Note

When you open an Execution Trace, a cache file named profile.pop_cache is created, which makes it much quicker to load the report when it’s opened a second time. If this file becomes corrupted for any reason, you can delete it by using the Delete Cache option in the Help menu.

2.12.2. Execution Trace options

Selecting IPUs

The menu bar above the graph contains a drop-down list named Select IPU which allows you to select all IPUs or an individual IPU.

View options

There are several options for viewing Execution Trace graphs, which you can select from the Options drop-down menu in the top right-hand corner of the report screen. Which options are available depends on whether the report uses detailed Execution Trace or Lightweight Profiling

Detailed Execution Trace Options
  • Show Tile Balance : Shows what percentage of tiles are in use during that program step, visible as shading in each each step.

  • Show Terminals : Include a line at the right- hand end of each process in the graph.

  • Group Executions : Group together executions that are part of the same higher process further up the call stack. Grouping is determined by the slash-delimited function calls that are logged to the execution trace profile output.

  • Group Syncs : Group multiple successive Sync steps together in the graph.

  • Show Text : Show or hide the name of each step in the graph.

  • Separate Overlapping Steps : Split up overlapping program steps into separate, non-overlapping lanes in the graph so that they can all be seen at once.

  • Show External Syncs : Display or hide the External Sync steps in the graph. There are often many of these, and hiding them may make the execution graph plot easier to understand in some cases.

  • Show SyncAns : Display SyncAns steps in the graph. As for External Syncs, hiding these steps may simplify the graph plot.

  • Use new BSP stats : Enable the new BSP-based statistics which are more accurate than the standard step-based statistics but may be slower to calculate for particularly large reports.

Lightweight Profiling Options
  • Show Text : When enabled, the name of each block is displayed on the graph.

  • Show Buffer Flushes : LWP needs to periodically flush profiling data from each IPU. By default, time spent carrying out these flushes is hidden. When this option is enabled, all buffer flushes are shown.

  • Aggregate StreamCopy Blocks : LWP includes blocks that represent StreamCopy programs on the selected tiles, as well as any tiles that also participated. When this option is enabled, StreamCopy blocks on tiles that were not selected are aggregated and displayed in a separate I/O Events lane at the top of each IPU’s trace.

  • Show Relative Cycles : The first Block that is recorded in a LWP trace may start at a non-zero number of cycles, which can make it more difficult to see block durations at a glance. When this option is enabled, the minimum cycle count is subtracted from the axis ticks, so, the ticks become relative to the start of the first Block.

Colour Key

You can view the colour key that the execution trace uses by clicking the key icon in the top right-hand corner of the graph.

Note

If your report is very large (typically over 40 million cycles), the smaller steps will be too small to render, and you’ll see a small pop-up message displayed just under the mini-map that indicates that you should zoom further in to see relevant detail in the graph trace.

Change Layout

When comparing the Execution Trace of two reports a button in the top right hand corner allows you to arrange the Execution Traces side by side or one on top of the other.

2.12.3. Detailed Execution Trace

Note

Detailed Execution Trace profiling has an overhead that can be prohibitive in some cases. Please refer to Profiling Overhead for details on how to prepare your execution to minimse that overhead.

There are two halves to the Execution Trace Report:

  • the Execution Trace graph , in the top half of the screen, showing either a flat compute set execution trace, or a flame graph of the stack,

  • a details report, in the bottom half of the screen, showing statistics about the cycle proportions, tile balance and compute sets present in the portion of the graph currently displayed. The details report has two tabs, the Summary tab , which shows statistics, cycle proportions and tile balance (experimental), and the Details tab , which shows details of a selected process from the execution trace graph.

The Execution Trace graph

The top half of the Execution Trace Report shows, by default, a flat graph, showing a set of consecutive blocks for each IPU, identifying what program steps were executed (as shown in the Program Tree report), and how many cycles they took. You can also view it as a flame graph, where the Poplar debug information is used to group compute sets together as part of the same operation, or application layer.

  • Select features that you want in the graph from the Graph View drop-down, including Flame graph, BSP Trace, and displaying separate runs of the program.

  • Move around the main report graph using your mouse, as described in Adjusting report size.

  • Hover your mouse pointer over the graph to see a callout containing the compute set ID, together with the amount of memory used by that compute set.

  • Click on a run at the top of the trace, and you can see the run-specific execution parameters displayed in the Details tab.

  • Click on the graph to see that compute set’s stack details below the graph, which displays its name and memory size.

  • Double-clicking on a layer when the flame graph option is selected expands that layer to the full width of the graph. You can select two layers at once by clicking on the first one, then holding down Shift key and clicking on the second one. The graph then expands to contain just two layers. This makes it possible to inspect the cycle proportions of only the visible section of the execution trace, making it easier to understand the proportion of cycles spent in each type of process.

  • Clicking on a region in the BSP trace also selects the corresponding step in the Execution Trace graph.

  • You can magnify the BSP trace vertically to help identify individual tiles. Click on the + and – buttons to the left of the trace zooms in and out vertically. Click the recycle button to return to the default magnification.

  • Click the icon at the left-hand edge of a lane in the graph to collapse it such that only the top row of blocks is shown.

Execution View

Use the Execution view drop-down list in the top left-hand corner of the Execution report to control what features you’d like to see on the graph. These include:

  • Runs : Toggle to display or hide the inclusion of program run information on the graph. When enabled this displays, just below the mini-map, a set of dark grey markers that indicate when each program run starts and ends. On the graph itself, a bar at the top of each IPU lane shows the names of each program that runs. See Defining program names .

  • Flat & Flame : These settings toggle between a flat view, where all the program steps are compressed into a single lane on the graph, or a flame view, where the call structure of all steps is displayed. Note that you can also control how overlapping steps are displayed with the Separate overlapping steps control, in the View Options control.

  • BSP : Include a graphical depiction of the BSP activity, showing where the patterns of the IPU internal Sync, Compute and Exchange steps occur.

The current combination of settings is displayed on the drop-down button itself.

Defining program names

You can specify program names in your programs by using the Poplar or PopART APIs. The Execution Trace graph can display this information by enabling the Runs option in the Execution View drop-down control above the graph.

  • The Poplar Engine::run API now takes a debug string for the name of the run.

  • The PopART Session::run API allows you to specify a string for the name of the run, as well as additional strings for internal program runs, for example: WeightsToHost .

If you enable the display of runs in the graph, but no run name was provided for a run, a sequentially numbered default name is generated, for example: Engine::run #5 .

Note

As well as the execution parameters displayed in the Summary report, you can also display run-specific execution parameters, as described next.

Run-specific execution parameters

Poplar has the ability to tune runtime-only parameters for a specific run, which may modify its behaviour (including, but not limited to, how fast it executes). You can view these parameters on the Summary report, but also view them for a specific run:

  • With the execution trace displayed, click on one of the named runs displayed at the top of the trace, and then click on the Details tab to see the parameters used to compile the programs for that run. If no name was supplied for a run, it will be called, for example, Engine::run #1 .

Filtering steps

You can concentrate on the steps you’re interested in by filtering on a particular search term.

  • Type a term into the Search box at the top and press Enter. Steps that don’t match in the execution trace graph are greyed-out.

  • Cycle through the matching steps by pressing Enter repeatedly, or clicking the arrows below the Search box. The single arrows move backwards and forwards through the matching steps, one at a time, and the double arrows move ten at a time.

  • To cancel the filtering, empty the search box by clicking on the small x at its right-hand end.

Summary tab

This tab provides an overview of the portion of the execution trace currently displayed in the graph in the top half of the page.

Statistics

The statistics displayed are:

  • Cycles : The number of cycles in the visible section of the execution trace. Note that a warning is displayed if this number overflows the 32-bit memory limit.

  • Rx / Tx : For StreamCopy and GlobalExchange steps, the amount of data transmitted and received during the operation.

Cycle proportions

A bar is displayed for each IPU that shows a graphical representation of the proportion of cycles that are taken executing each type of compute set. If you hover your mouse over these bars, you’ll see a key that shows what process each colour represents, as follows:

  • Internal Sync : The sync process that occurs between each tile on an IPU as part of the BSP process.

  • External Sync : The sync process that occurs between each IPU as part of the BSP process. External syncs are also used in some host-device communication situations, where the IPUs all need to synchronise with an event outside their boundaries, for example a flow control step in the host program.

  • OnTileExecute : The vertex compute code executed on each tile.

  • DoExchange : The tile-to-tile data exchange within an IPU.

  • GlobalExchange : An IPU-to-IPU data exchange.

  • StreamCopy : A data exchange between an IPU and the host machine over PCI.

Tile balance

A bar is displayed for each IPU that shows a graphical representation of the percentage of tiles utilised by steps in the current viewport.

It is calculated by averaging the tile balance for all Steps (excluding Syncs) that were executed on the IPU. In previous versions of the Graph Analyser this value was derived from the percentage of cycles executed by a step, weighted by the number of tiles the step used.

New BSP statistics

The standard execution trace statistics are calculated by taking into account which steps are currently in view and averaging values across those steps, weighting by the length of the step. This does not take into account that the tiles on each IPU do not in general all execute the same step at the same time.

The new BSP statistics are calculated based on BSP data and take into account what every tile is executing at every cycle. This can be slower to calculate but gives more accurate information about the cycle window currently in view.

When the New BSP statistics option is enabled, the Tile Balance is replaced with Tile Utilisation . This measures the proportion of tile-cycles currently in view which are spent on processing or transfer steps, for example OnTileExecute, DoExchange, GlobalExchange, StreamCopyMid, as opposed to idle states like Sync, SyncAns, and StreamCopyBegin.

Details tab

When an program step is selected in the flat graph, or a layer is selected in the flame graph, a list of program steps, with further details, is shown here. Many of the details are the same across the different types:

  • Cycles : The number of cycles on active tiles that the program step used to execute,

  • Active Tiles : The number of tiles involved in executing that program step,

  • All Cycles : The number of cycles on all tiles, with additional statistics.

  • Tile Balance : A measure of how efficiently the program step is spread across the tiles. See View Options for more details.

  • Active Tile Balance : A recalculation of the tile balance measurement above, but excluding those tiles that do nothing.

Note

A warning is displayed if any of the cycle counts overflow the 32-bit memory limit.

Internal Sync

This is a sync process between tiles on an IPU.

External Sync

This is a sync process between IPUs.

SyncAns

This is an internal Automatic, Non-participatory Sync process. A tile can pre-acknowledge a number of internal/external syncs using the sans instruction. The Sync ANS instruction will wait until all those pre-acknowledged syncs actually happen.

OnTileExecute

This is a piece of vertex code being executed in a tile. In addition to the common information listed above, the following is displayed:

  • By Vertex Type : Shows what vertices are involved in the process execution.

Below these details, an interactive graph plot is displayed that shows how the selected program step makes use of cycles on each tile as it executes. For DoExchange programs, there is also a graph of the data received and transmitted by the program during its execution.

DoExchange

This is an exchange process, where data is exchanged between IPU tiles. In addition to the common information listed above, the following is displayed:

  • Total Data : The total amount of data transferred during the exchange,

  • Data Transmitted : The amount of data transmitted during the exchange,

  • Data Received : The amount of data received during the exchange,

  • Data Balance : The mean amount of data exchanged divided by the maximum amount of data exchanged,

  • Exchange Code : How large the variable is that holds the code for performing the exchange,

  • Source Variables : A truncated list of the variables from which data was sent in the exchange,

  • Destination Variables : A truncated list of the variables to which data was sent in the exchange.

GlobalExchange operations

GlobalExchange is the process by which data is exchanged between IPUs. In addition to the common information listed above, the following is displayed:

  • Total Data : The total amount of data transferred during the exchange,

  • Data Balance : The mean amount of data exchanged divided by the maximum amount of data exchanged,

  • Source Variables : A truncated list of the variables from which data was sent in the exchange (with temporary variables given basic integer names),

  • Destination Variables : A truncated list of the variables to which data was sent in the exchange (with temporary variables given basic integer names).

A tile’s physical location on an IPU, and how far it is away from the main exchange block, determines how quickly data can be moved between it and other tiles. Also, the highest-numbered tiles on an IPU are linked back directly to the lowest-number tiles in a ring-type topology. The combination of these two factors is what generates the typically triangular and curved shapes seen in these exchange graphs.

StreamCopy

This process copies data between tensors and streams, allowing data to be transferred between the IPUs and the host machine over PCI. The execution trace shows these program steps as three separate phases, StreamCopyBegin, the Copy itself (StreamCopyMid), and StreamCopyEnd. StreamCopyMid is further divided into Host, RemoteBuffer and Mixed categories to show the direction of data flow.

In addition to the common information listed above, the following is displayed:

  • Total Data : The total amount of data transferred during the exchange,

  • Data Balance : The mean amount of data exchanged divided by the maximum amount of data exchanged,

  • Copies from host : How many copy instructions transferred data from the host machine,

  • Copies to host : How many copy instructions transferred data to the host machine.

2.12.4. Lightweight Profiling

Lightweight profiling allows you to choose which steps in your program you want to profile, instead of profiling everything (which is the default when using Poplar’s fine-grained instrumentation). Lightweight profiling adds less overhead to your program when running, and also makes for smaller report profiles for PopVision to open.

Note

See the Lightweight Profiling tutorial in our GitHub tutorials repository .

The Lightweight Profiling graph

The LWP report shows a graph indicating the when LWP Blocks executed and how long they took, in cycles. Blocks are shown for each tile that was profiled, and the tiles are grouped by IPU. Blocks can overlap, in which case they will stack up as a flame graph. There may also be periods of time which are not accounted for by any Block - these periods will appear as empty gaps on the graph.

  • Move around the main report graph using your mouse, as described in Adjusting report size.

  • Hover your mouse pointer over a block to see a tooltip containing the block name and how many cycles that block executed for.

Lightweight Profiling block types

Several different block types are seen in the lightweight profiling graph.

Common

These blocks correspond to the portions of a Poplar application which are instrumented. For each, we know the precise cycle count when it began and finished. You can manually add Block programs when using Poplar, or use the Poplar Engine options to automatically add Blocks.

Stream Copy

Stream Copies copy data between tensors and streams, allowing data to be transferred between the IPUs and the host machine over PCI. The execution trace shows these program steps as three separate phases, StreamCopyBegin, the Copy itself (StreamCopyMid), and StreamCopyEnd. StreamCopyMid is further divided into Host, RemoteBuffer and Mixed categories to show the direction of data flow.

Buffer Flush (or Block Flush)

On profiled tiles, a small buffer is used to collect the start and finish times of blocks. This buffer needs to be periodically flushed otherwise it will overflow. Poplar automatically determines appropriate times to flush the buffer and inserts buffer flush operations. When the Show Buffer Flushes option is enabled, these operations can be seen as blocks on the LWP graph.

Overflow

If the LWP profiling buffer is not flushed regularly enough it can overflow. When this occurs, no more data is recorded until the next flush occurs. The time between the buffer overflowing and the following buffer flush is referred to as an overflow region. Any block which both begins and ends in an overflow region will not be recorded and is not shown on the LWP graph. For blocks which start before an overflow but finish in the overflow region, the end time of the block will be unknown. For blocks which start in an overflow region but finish after the buffer flush, both the start time and the type of the block are unknown - these are shown as Unknown blocks

Unknown

As described above, if a block starts in an overflow region then the type of the block is not recorded. We still know the block exists because we record its finish, so the block is displayed as an Unknown block.

2.13. Application preferences

To display the Preferences dialog, select Preferences from the menu, or press the Ctrl/Command + , keys.

You can reset your preferences at any time by selecting Reset Preferences from the Help menu.

2.13.1. Setting the colour theme

The Graph Analyser supports light and dark colour themes. There are three options:

  • Auto : (default) Allows the application to follow your machine’s system-wide theme setting for light or dark mode. If the Graph Analyser application detects a change in your operating system theme, it automatically switches to the corresponding mode.

  • Light : Forces the Graph Analyser application into light mode, irrespective of your machine’s theme settings.

  • Dark : Forces the Graph Analyser application into dark mode, irrespective of your machine’s theme settings.

2.13.2. SSH preferences

You can store your SSH preferences in the Preferences dialog to allow authorisation when opening reports on remote machines. There are two settings you can enter here:

  • SSH private key path : Enter the file path of your your machine’s private SSH key here. This filepath will be used to authenticate you on remote machines during the connection process. The default path is <home>/.ssh/id_rsa/ , where <home> denotes your home directory in your operating system.

  • SSH agent mode : Drop-down-list to choose whether you want to specify an SSH agent socket path, and, if so, how you want to do so:

    • Disabled : Do not use an SSH agent socket (the default)

    • Manually specify : Enter file path to the SSH agent socket in the box that appears below this option.

    • Automatically obtain from environment : Obtain the SSH agent path from an environment variable.

2.13.4. Scroll behaviour

This option sets the default behaviour for your mouse’s scroll wheel (or using two-finger drag on a laptop trackpad). You can choose either:

  • Scroll by default where the mouse wheel will scroll the window content up and down. Holding down Ctrl/Command while using the scroll wheel then zooms the window content in and out.

  • Zoom by default where the mouse wheel zooms the window content in and out. Holding down Ctrl/Command while using the scroll wheel then scrolls the window content up and down.

2.13.6. Quit after last window is closed

This controls whether the Mac version of the application quits the program after the last window is closed.

2.13.7. Experimental features

Each version of the Graph Analyser contains some experimental features that are hidden by default. These features are not fully release-capable, and will have limited support and may change or be removed in future. You can enable them here.

2.13.8. Byte units

This allows you to choose which memory unit and prefix you’d like to use across the application. All figures denoting memory usage will then use that format, making it easier to compare figures across reports.

The options displayed are for a memory size of 1,024 bytes, so you can choose to display:

  • Binary multiples (1.0 KiB) : Display in base 2 format (1KiB = 1,024 bytes),

  • SI prefixes (1.0 kB) : Display in SI units (1kB = 1,000 bytes),

  • Bytes only (1,024 B): Display in bytes only

The only exception to this setting is the file browser, where SI prefixes are always used, in accordance with convention.

2.13.9. Show graph stats

You can display (or hide) statistics for the Memory and Liveness reports. They appear in the top right-hand corner of the graph and show the Average, Minimum, Maximum and Standard Deviation of the memory usage across the selected tiles, for each data set plotted.

You can move this statistics box anywhere in your graph by dragging its title bar at the top.

2.13.10. Stack graph values

  • When this option is turned on, the values shown in the tooltips on the memory graph are displayed stacked.

  • When turned off, the individual values are shown without stacking.

When this option is turned on, (stacked) is displayed in the tooltip.

2.13.11. Send telemetry

When you first install one of the PopVision™ tools, you will be asked for your consent for Graphcore to collect telemetry information that helps improve the application and monitor performance. Your response to this dialog is stored in your preferences, and you can turn telemetry on or off with this option.

See Telemetry consent for full details.

2.13.12. Software update

You can choose to allow the Graph Analyser to periodically check for updates. Note that you also have the opportunity to set this preference on the EULA splash-screen, when it first appears.

If a new update is found, you can either download and install it now (a restart of the app will install it), or you can delay it until later. You can also choose to check manually by selecting Check For Updates… from the File menu.

Sometimes, network issues may cause a download to take too long to finish, in which case a dialog is displayed that allows you to cancel the current download, and retry later.

2.14. FAQs

This section contains a set of frequently asked questions about capturing and understanding reports in the Graph Analyser.

2.14.1. Not-always-live memory discrepancy

Question : Why does the tile memory differ on the Memory and Liveness reports? If you open a Memory report, and select the By Liveness breakdown option from the drop-down menu, and then select a particular tile, you can see its memory consumption plotted. If you then find that same tile in the Liveness report, you may notice that its memory consumption is lower. Why does this happen?

Answer : The Not-Always-Live plot on the Memory-Liveness report actually shows the maximum memory of the not-always-live variables, which can be lower than the actual tile memory required. Because memory is statically allocated on the tile, and the allocating algorithm isn’t perfect, this could be less than the actual amount of memory required to store your program.

As an example, suppose you have two variables A and B, both 1 byte, but B needs to be stored in interleaved memory. If you have a program like this:

Write(A)
 Read(A)
 Write(B)
 Read(B)

then the two variables are not live at the same time, so in theory could be overlapped, but because of the additional constraints they aren’t. In this case the maximum not-always-live bytes is 1 byte, but they memory required (excluding gaps) is 2 bytes.

2.14.2. Using the Graph Analyser over X-Forwarding on MacOS

Question : How can I use the Graph Analyser over X-Forwarding on MacOS?

Answer : To view the Graph Analyser over X-forwarding on MacOS, follow these steps:

  1. On your MacOS machine, download and install XQuartz from https://www.xquartz.org/.

  2. Start the XQuartz app, and start a terminal session from within it.

  3. In the terminal, enter ssh -X [username]@[host] , supplying the username and host for the remote machine.

  4. In the SSH session, run the following commands (assuming you want to use version 3.7.2 of the Graph Analyser):

$ wget https://github.com/graphcore/popvision_graph_analyser/releases/download/v3.7.2/popvision-graph-analyser-3.7.2.AppImage
$ chmod +x ./popvision-graph-analyser-3.7.2.AppImage
$ ./popvision-graph-analyser-3.7.2.AppImage

The Graph Analyser application should then start up and work normally over X.

2.14.3. How can I reduce the size of my profile report files?

To reduce the size of your profile files, see the Profile troubleshooting section, where you can find tips on reducing instrumentation levels, reducing batch size, and reducing the number of steps being instrumented.

2.15. Glossary

2.15.1. Architecture

For more information, see the Poplar and PopLibs User Guide .

2.15.2. BSP

2.15.3. Bulk-synchronous parallel

For more information, see the IPU Programmer’s Guide .

2.15.4. Codelet

2.15.5. Compute set

For more information, see the IPU Programmer’s Guide .

2.15.6. Debug context

For more information, see the Poplar and PopLibs API Reference .

2.15.7. Edge

2.15.8. Exchange

External exchange

Global exchange

Host exchange

Inter-IPU exchange

2.15.9. Flame graph

A visualisation of hierarchical data that is often used to show sampled stack traces of a program that has been profiled.

2.15.10. IPU

2.15.11. Liveness

For more information, see the Memory and Performance Optimisation Guide .

Always-live

An always-live variable is allocated an exclusive memory region for the entire lifetime of the graph execution, such as vertex code.

Not-always-live

It is not necessary to keep the content of not-always-live variables in memory throughout execution. Therefore, two variables that are not live at the same time can be allocated to the same location.

2.15.12. Lowering

2.15.13. Lowered variable

To execute an application on the IPU, variables may need to be divided between tiles to satisfy memory constraints. Poplar maps global user-defined variables onto tiles by a process that is called lowering. During this process, Poplar typically translates each of these unlowered variables into a set of lowered variables which have specific tile allocations.

2.15.14. Memory

For more information, see the Poplar and PopLibs User Guide .

Memory bank

An area of memory that allows a single concurrent access only. A memory bank is 32 and 16 KiB in the Mk1 and Mk2 Colossus, respectively. Note that addresses are only contiguous in non-interleaved memory.

Memory element

In interleaved memory, an element is a pair of banks that allows access to 128 bits. In non-interleaved memory, it is a single bank that allows access to 64 bits.

Memory interference

Memory region

A tile’s memory is organised as two regions, each made up of banks. Concurrent accesses can be made to addresses in different banks.

Non-interleaved memory

Instructions can only be fetched from non-interleaved memory.

Interleaved memory

Interleaving allows for two 64-bit-aligned addresses to be accessed simultaneously.

Overflowed memory

If the memory required by an application exceeds the maximum memory available on a tile, part of it is overflowed.

Out-of-memory (OOM)

An application that is out of memory needs more memory than is available on one or more tiles and cannot be executed.

2.15.15. Pipelining

For more information, see the IPU Programmer’s Guide .

2.15.16. PopART

2.15.17. Poplar

2.15.18. PopLibs

2.15.19. Replication

For more information, see the Poplar and PopLibs User Guide and IPU Programmer’s Guide .

2.15.20. Sync

External sync

Synchronisation between IPUs.

Internal sync

Synchronisation between all of the tiles on a single IPU.

2.15.21. Tile

Active tile

A tile is active in the context of a program step if it is involved in executing that step.

Tile balance

A measure of the fraction of tiles that are involved in executing a program step.

Unlowered variable

See lowering .

2.15.22. Vertex

For more information, see the Poplar and PopLibs User Guide .

Vertex field

For more information, see the Poplar and PopLibs User Guide .

Vertex instance

An instance of a class that defines a vertex type.

Vertex source

A vertex is an instance of a class that can be written in C++ or assembly.

For more information, see the Poplar and PopLibs User Guide .

Vertex state

For more information, see the Poplar and PopLibs User Guide .

Vertex type

A class that defines a vertex in Poplar.

For more information, see the Poplar and PopLibs User Guide .

2.16. Release notes

To see what’s changed in the Graph Analyser application, select Release Notes from the Help menu, or click the What’s new since the last release link in the Help panel on the Home screen.

  • You can toggle the Release Notes dialog between modal and full-screen view by clicking on the icon to the left of the window’s title.

2.17. Licensing information

Licensing information about the Graph Analyser is available to read by selecting License from the Help menu. It contains an end-user agreement, copyright and trademark information, and license information about third-party software used in the application.

This information can also be found in the Installation README file, which you can find on the Graphcore Support site .

  • You can toggle the License dialog between modal and full-screen view by clicking on the icon to the left of the window’s title.

2.18. Data we collect

Graphcore’s PopVision™ tools collect data from you about the way in which you use them. The data we collect depends on how you interact with the tools and may change between releases. This data helps us to develop and improve the tools.

We do not obtain any personal data when you use the PopVision™ tools. On installation, we randomly generate a unique identifier to link your interactions with one of the tools together. This identifier is stored with your preferences for the tool and can be seen at any time in the About dialog. We also randomly generate an identifier each time you open the tool to distinguish between sessions of usage.

Note

We do not collect any data about the reports or models you analyse, such as the names of variables and events.