3. System Analyser

Version: 1.3.6

3.1. Overview

The Graphcore PopVision System Analyser is a desktop tool for analysing the execution of IPU-targeted software on your host system processors. It shows an interactive timeline visualisation of the execution steps involved, helping you to identify any bottlenecks between the CPUs and IPUs. This is particularly useful when you are scaling models to run on multiple CPUs and IPUs.

Used in combination with the PopVision Graph Analyser application, the System Analyser allows you to identify exactly how long execution events take to run, from the main program itself all the way down to individual IPU execution steps.

Poplar - and the machine learning frameworks it supports such as PopART, TensorFlow and PyTorch - use the Poplar libpvti library to capture profiling information from your code. This information is saved to a file which you can then open and analyse in the System Analyser application. User APIs in C++ and Python are available to instrument your own application code.

The System Analyser app requires Poplar SDK 1.4 or later.

3.2. Capturing execution information

To capture execution data from your program to a file, use the following options when executing your program. These are specified at the same time as the POPLAR_ENGINE_OPTIONS:

PVTI_OPTIONS='{"enable":"true"}'

Additional options include:

  • directory - to specify where the .pvti file will be saved,

  • channels - to specify a list of channels to capture from, as described Using the libpvti API, below.

.pvti files are streamed to disk as the program executes, so it should not have a significant effect on your host system’s memory (although it may affect the speed of execution).

The pvti.hpp file, in the libpvti library in Poplar, gives more details.

3.2.1. Capturing function entry and exit

For most basic cases, there are POPLAR_TRACEPOINT macros that can be used inside your program to capture the timings of function entry and exit. For example:

void Engine::prepareForStreamAccess(StreamAndIndex stream) const {
  POPLAR_TRACEPOINT();
  const DataStream &streamInfo = getStreamInfo(stream.id);
  logging::engine::debug("prepareForStreamAccess {} \\"{}\\" (id {}, index {}))",
                          isHostToDevice(streamInfo.type) ? "host write of",
                                                          : "host read of",
                          streamInfo.handle, stream.id, stream.index);
}

This will capture the name of the method, and, using object construction and destruction, record the entry and exit time.

Similar macros are available for PopART (POPART_TRACEPOINT) and TensorFlow (TENSORFLOW_TRACEPOINT).

3.2.2. Using the libpvti API

The API allows you to ‘mark’ the begin and end of a ‘section of code’. This is not limited to functions - markers can be placed anywhere in your code. More information on the instrumentation API can be found in the Poplar libpvti.hpp header file documentation & README.

To use the API to indicate the beginning and end of trace events, you must first create a channel, as the example below shows. In C++:

// Create channel
pvti::TraceChannel channel = {"MyChannel"};

// Functional implementation
void foo() {
  pvti::TracePoint::begin(channel, "foo");
  ...
  pvti::TracePoint::end(channel, "foo");
}

// Scoped tracepoint object
void bar() {
  pvti::Tracepoint tp(channel, "bar");
  ...
}

And in Python:

import libpvti as pvti
channel = pvti.createTraceChannel("MyChannel")

# Functional implementation
def foo ():
  pvti.Tracepoint.begin(channel, "foo")
  ...
  pvti.Tracepoint.end(channel, "foo")

# context manager implementation
def bar():
  with pvti.Tracepoint(channel, "bar"):
    ...

# wrapped object
class Bob:
  def somemethod(self):
    ...

bob = Bob()
pvti.instrument(bob ["somemethod"], channel)

# decorator (later)
@pvti_instrument()
def cat():
  ...

3.2.3. Capturing scalar values

The API also allows you to capture scalar values over time. This can be used to log any scalar value, potential uses could be to capture host memory usage or CPU load. More information on the instrumentation API can be found in the Poplar libpvti.hpp header file documentation & README.

To use the API to capture scalar values, you must first create a graph (with units) and then a series against that graph, as the example below shows. In C++:

// Create graph
pvti::Graph graph("MyGraph", "%");

// Create series
auto series1 = graph.addSeries("series1");
auto series2 = graph.addSeries("series2");

// Capture values
series1.add(0.1);
series2.add(0.3);

And in Python:

import libpvti as pvti
# Create graph
graph = pvti.Graph("MyGraph", "%")

# Create series
series1 = graph.addSeries("series1")
series2 = graph.addSeries("series2")

# Capture values
series1.add(0.1)
series2.add(0.3)

You can capture default driver monitoring information by setting the environment variable GCDA_MONITOR=1.

3.3. Opening reports

After starting the System Analyser, you’re presented with a ‘landing page’ from which you can open reports and view various topics within this online help. You can open report files on your local machine, or from a remote server over SSH.

3.3.1. Local reports

You can open report files stored on your local machine as described below.

To open a local report on your machine:

  • Click on the Open Report link. You’ll be presented with a file selection dialog, and the ‘local’ tab at the top will be selected by default. You’ll see listings of the directories and files on your local machine.

  • You can sort these files by name, modified date or size, in ascending or descending order, by clicking on the appropriate column header.

  • The System Analyser application can open .pvti files and .json files that support the Chromium trace format. When the PopVision System Analyser identifies a directory in which any .pvti files are found, those files are listed on the right-hand side.

  • After navigating to the desired directory, you can select a file by clicking on it. Multiple files can be selected at once.

  • Once you’ve selected the file(s) you wish to open, click on the ‘Open’ button to load the report data from the file(s).

If you’ve previously opened a report, it will appear in the Recent list of report files. Click on one to open it again.

While the report file(s) load into the application, you’ll see a ‘Loading’ progress bar, then the main view is displayed, as detailed below.

3.3.2. Remote reports

If you are using an IPU system on a remote server, for example on a cloud service, any reports generated will be saved to that server, so you cannot open them ‘locally’. You can, however, open them remotely by specifying the server address, and connecting to the machine over SSH. The reports are analysed and the output is streamed back to the PopVision System Analyser application on your local machine, allowing you to view the reports.

When the PopVision System Analyser opens report files on a remote machine, it downloads a small binary app to it which pre-processes the report data and sends it back over SSH to the PopVision System Analyser application running on your local machine. If you’re running other performance-critical processes on that remote machine, you should be aware of any effects this process may have on the capacity of the remote machine’s hardware to run any other tasks. As server performance varies a great deal, the only way to know how much processor speed it takes is to try a small sample, and monitor the CPU usage.

To open a remote report on another machine:

  • Click on the Open Report link. You’ll be presented with a file selection dialog, and the ‘local’ tab at the top will be selected by default.

  • Click on the ‘remote’ tab at the top, and you’ll see a login dialog that allows you to connect to a remote server. Enter your username, and the address of the remote machine.

  • If you just want to log in with a password for the remote machine, enter it in the Password field.

  • Alternatively, you can use your local machine’s SSH key to authorise your connection. Enter its file path in the Preferences dialog.

  • Once logged in, you’ll see listings of the directories and files on the remote machine. You can sort these files by name, modified date or size, in ascending or descending order, by clicking on the appropriate column header.

  • The System Analyser application can open .pvti files and .json files that support the Chromium trace format. Multiple files can be opened together from this dialog by clicking to select all the files you want to open. You’ll notice that when the PopVision System Analyser identifies a directory in which any .pvti files are found, those files are listed on the right-hand side.

  • Once you’ve selected the file(s) you wish to open, click on the ‘Open’ button to load the report data from the file(s).

If you’ve previously opened a report, it will appear in the Recent list of report files. Click on one to open it again.

While the report file(s) load into the application, you’ll see a ‘Loading’ progress bar, then the main view is displayed, as detailed below.

The System Analyser does not currently support encrypted SSH private keys, i.e. keys that are protected by a passphrase. However it does support SSH agents. If your key is passphrase protected you will need to make sure to add it to your SSH agent before the PopVision System Analyser can use it, by using the ssh-add command-line tool and ensure ‘SSH Agent mode’ is set correctly in the Preferences.

To configure SSH Agent, from a terminal you can run the following.

# Start the ssh-agent in the background.
eval "$(ssh-agent -s)"

# Add your SSH private key to the ssh-agent
ssh-add -K ~/.ssh/id_rsa

Then edit your Preferences to remove your SSH private key path. Make sure that SSH agent mode is set to “Automatically obtain ssh-agent socket path from environment”.

3.4. Viewing reports

The application’s main view displays the Timeline information of the execution events recorded in the files you opened, together with a scaled-down overview above, which shows the entire set of events irrespective of your current zoom and pan state.

In the main window, the following actions are available:

  • Pan and zoom in and out of the timeline, viewing every event in the execution of the program.

  • Select individual events to view their details, and view the durations of selected sections of the timeline.

  • Save report images to your machine.

3.4.1. Using the sidebar buttons

Once a report has been opened, the application sidebar is displayed, which contains several buttons that allow you to perform the following actions:

  • Reload the report (this option is also available from the application’s View menu, or by pressing Control/Command and R).

  • Close the current report(s),

  • Open this documentation window,

  • Expand or contract the sidebar button labels.

3.4.2. Timeline flamegraphs

Events in the timeline are grouped and layered according to the file where they originated, and beneath that to the process and thread in which they occurred. If there’s room to display it, the event’s name is shown within each event block. Hovering your mouse over an event block displays a pop-up that shows the details of the event, as described below.

The number of events within the currently displayed portion of the timeline is displayed in the top left-hand corner, as well as the duration of that portion. You’ll see these numbers change as you pan and zoom around the timeline.

The time scale is displayed across the top of the timeline, showing elapsed time from the start of the first event. This is displayed in hours and minutes and seconds, and more significant digits of the seconds are displayed as you zoom in. You can choose to display the time scale in relative terms (starting at 0:00 at the beginning of the first captured event), or in absolute terms, where a real time is displayed. Choose the display you want by clicking the Options button at the top of the application window.

Once you’ve opened a report, you can open additional reports to display on the same timeline. Just click the Add file button at the top-left of the application window, and add files as above.

Events in the timeline are coloured as follows:

  • Poplar - events triggered from the Poplar libraries are coloured red.

  • Framework - events triggered from the PopART libraries, or any of the machine learning frameworks such as TensorFlow, are coloured orange.

  • Driver - events triggered by the driver layer are coloured blue.

  • Others - user-generated event categories are then assigned a different colour.

3.4.3. Timeline line graphs

If any scalar values have been captured, the timeline will also show a collection of line graphs grouped at the bottom of a file. These can be used to view these scalar values over time, and compare to the events shown in the flamegraphs above.

Hovering over a line graph will display a tooltip giving each separate series’ value at that timestamp.

3.4.4. Timeline options

Above the timeline overview there are the following buttons:

  • Add file - this reopens the file browsing dialog, allowing you to add more files to the current timeline as above.

  • Graph Type - this allows you to switch between graph display types:

    • Timeline - displays a flamegraph showing the position and duration of each event in the call stack. Each event on the timeline view corresponds directly to an event logged in the PVTI file.

    • Aggregated - displays a flamegraph showing each event aggregated by its unique call stack. The starting timestamp at which a block appears in this view has no link to the timestamp that its events took place at within the trace. However, its duration does correspond to the total duration of the events that combine to make the block.

  • Options - this shows the following options to apply to your timeline:

    • Absolute timing - checking this option will position files on your timeline using their absolute times rather than positioning all files to start at time 0.

    • Collapse all charts - checking this option will collapse all charts in your timeline, it will leave parent nodes in their current state. Unchecking this will similarly expand all charts in your timeline.

    • Hide charts below threshold duration - checking this option will hide all charts in your timeline with a total duration (sum of all top-level event durations) less than or equal to the threshold percentage of the total time-window of the file. You can specify the threshold percentage using the radio buttons below the option.

  • Key - this allows you to see which channels the different colours represent on your timeline. Clicking one of these will reload the timeline without that channel shown.

  • Save (camera icon) - this allows you to save the current timeline as a png. This can either save to a location on your machine or to your clipboard.

3.4.5. Panning and zooming

You can move around the timeline using the mouse to bring events into view:

  • In the overview at the top of the timeline, click and drag a section to zoom into the corresponding are of the timeline. The section of the timeline you’re currently viewing is highlighted in the overview.

  • Drag the mouse left and right in the timeline to shift it left and right at the current zoom level.

  • Use the mouse wheel to zoom in and out of the timeline. If the timeline is too deep to fit into the application window, a scrollbar is displayed to enable you to move the timeline up and down in the window. As the mouse wheel is used for zooming in and out, you can hold down the Control key to scroll the timeline up and down using the mouse wheel.

You can switch between window and full-screen display by selecting Toggle Full screen from the View menu, or by pressing Control/Command and F.

3.4.6. Selecting events

You can select any individual event in the timeline by clicking on it. Tabs giving further event information are then displayed beneath the timeline.

  • Details:

    • Name - for events dispatched through one of the libraries (Poplar, PopART, TensorFlow, etc.) this is a concatenated list of the namespaces from which the event originated. For user-created functions (for example, in your Python programs), it is the name of the function.

    • Channel - the channel is where this event has been called. This can be one of the predefined channels (Drivers, Poplar, Framework) or user-created.

    • Timestamp - this is the execution time at which the event occurred in the timeline, measured in hours, minutes and microseconds.

    • Absolute Timestamp - same as the timestamp but in absolute terms rather than relative to the start of the event.

    • Duration - the amount of time the event took to execute.

  • Call Tree:

    The call tree shows the events that are descendants of the selected event in a tree structure. You can expand to see the children of an event by clicking the caret on the left. Percentages given in the call tree are a percentage of the total time of the selected event. Hovering on a node in the call tree will highlight the respective events in the timeline above.

    • Self Time - this is the total time of an event minus the time taken by its children.

    • Total Time - this is the total time that the event took.

    • Activity - this is the name of the event.

    The call tree also has the following options that apply to it:

    • Aggregated - when checked, this option will aggregate all nodes with the same event name at the same level. Summing their times and combining their children all under one node.

3.4.7. Expanding and collapsing regions

You will notice that your timeline has a tree-like structure of nodes on the left hand side. Each node can be individually expanded and collapsed by clicking the caret on the left of the node label. You will also notice nodes with an ellipsis button. This gives an options menu with the following options:

  • Collapse all other charts - this will expand the node you have selected and then collapse all other charts on the timeline, allowing you to focus in on this one node’s data.

Collapsed chart nodes will still show the top level events in that node against the collapsed line. This can be a useful way to gain an understanding of what is happening across multiple threads.

3.4.8. Viewing selected duration

You can view a duration on the timeline by holding down the Shift key and dragging the mouse from side to side anywhere in the timeline. This displays a timing duration marker at the top of the timeline, showing you the duration of the timeline you’ve selected. The selected duration is highlighted in pink in the overview.

3.4.9. Saving reports

To save the currently displayed portion of the timeline as a PNG image file:

  1. Click on the Save button in the top, right-hand corner of the main screen.

  2. Your system’s file browser dialog appears. Select the directory in which you want the image file saved.

  3. Click Save.

3.5. Application preferences

3.5.1. Colour theme

The PopVision System Analyser supports light and dark colour themes, and you can select a preference here. There are three options:

  • Auto - this is the default setting, and allows the application to follow your machine’s system-wide theme setting for light or dark mode.

  • Light - this forces the application into light mode, irrespective of your machine’s theme settings.

  • Dark - this forces the application into dark mode, irrespective of your machine’s theme settings.

3.5.2. SSH preferences

You can store your SSH preferences in the Preferences dialog to allow authorisation when opening reports on remote machines.

  • SSH private key path - enter the file path of your your machine’s private SSH key here. This filepath will be used to authenticate you on remote machines during the connection process. The default path is ~/.ssh/id_rsa/.

  • SSH agent mode - this dropdown-list allows you to choose whether you want to specify an ssh-agent socket path, and, if so, how you want to do so:

    • Disabled - do not use an ssh-agent socket.

    • Manually specify - enter file path to the ssh-agent socket in the field that appears below this option.

    • Automatically obtain from environment - obtain the ssh-agent path from an environment variable.

3.5.3. Experimental features

Each version of the PopVision System Analyser contains some experimental features that are hidden by default. These features are not fully release-capable, and will have limited support and may change or be removed in future. You can enable them here, by toggling the button next to this option.

3.6. About System Analyser

To see the details of the System Analyser application, select About PopVision System Analyser from the application’s main menu. A dialog window appears showing:

  • Version - the version number of the application.

  • Commit - the unique commit hash of this release version.

  • Date - the data and time this version was released.

  • Component version numbers - the version numbers of the main software components used by the application, including Electron, Node, Chrome and the V8 engine.