3. System Analyser

Version: 1.0.5

3.1. Overview

The Graphcore PopVision System Analyser is a desktop tool for analysing the execution of IPU-targeted software on your host system processors. It shows an interactive timeline visualisation of the execution steps involved, helping you to identify any bottlenecks between the CPUs and IPUs. This is particularly useful when you are scaling models to run on multiple CPUs and IPUs.

Used in combination with the PopVision Graph Analyser application, the System Analyser allows you to identify exactly how long execution events take to run, from the main program itself all the way down to individual IPU execution steps.

Poplar - and the machine learning frameworks it supports such as PopART, TensorFlow and PyTorch - use the Poplar libpvti library to capture profiling information from your code. This information is saved to a file which you can then open and analyse in the System Analyser application. User APIs in C++ and Python are available to instrument your own application code.

The System Analyser app requires Poplar SDK 1.4 or later.

3.2. Capturing execution information

To capture execution data from your program to a file, use the following options when executing your program. These are specified at the same time as the POPLAR_ENGINE_OPTIONS:

{`PVTI_OPTIONS='{"enable":"true"}'`}

Additional options include:

  • directory - to specify where the .pvti file will be saved,

  • channels - to specify a list of channels to capture from, as described Using the libpvti API, below.

.pvti files are streamed to disk as the program executes, so it should not have a significant effect on your host system’s memory (although it may affect the speed of execution).

The pvti.hpp file, in the libpvti library in Poplar, gives more details.

3.2.1. Capturing function entry and exit

For most basic cases, there are POPLAR_TRACEPOINT macros that can be used inside your program to capture the timings of function entry and exit. For example:

{`void Engine::prepareForStreamAccess(StreamAndIndex stream) const {
  POPLAR_TRACEPOINT();
  const DataStream &streamInfo = getStreamInfo(stream.id);
  logging::engine::debug("prepareForStreamAccess {} \\"{}\\" (id {}, index {}))",
                          isHostToDevice(streamInfo.type) ? "host write of",
                                                          : "host read of",
                          streamInfo.handle, stream.id, stream.index);
}`}

This will capture the name of the method, and, using object construction and destruction, record the entry and exit time.

Similar macros are available for PopART (POPART_TRACEPOINT) and TensorFlow (TENSORFLOW_TRACEPOINT).

3.2.2. Using the libpvti API

The API allows you to ‘mark’ the begin and end of a ‘section of code’. This is not limited to functions - markers can be placed anywhere in your code. More information on the instrumentation API can be found in the Poplar libpvti.hpp header file documentation & README.

To use the API to indicate the beginning and end of trace events, you must first create a channel, as the example below shows. In C++:

{`// Create channel
pvti::TraceChannel channel = {"MyChannel"};

// Functional implementation
void foo() {
  pvti::TracePoint::begin(channel, "foo");
  ...
  pvti::TracePoint::end(channel, "foo");
}

// Scoped tracepoint object
void bar() {
  pvti::Tracepoint tp(channel, "bar");
  ...
}`}

And in Python:

{`import libpvti as pvti
channel = pvti.createTraceChannel("MyChannel")

# Functional implementation
def foo ():
  pvti.Tracepoint.begin(channel, "foo")
  ...
  pvti.Tracepoint.end(channel, "foo")

# context manager implementation
def bar():
  with pvti.Tracepoint(channel, "bar"):
    ...

# wrapped object
class Bob:
  def somemethod(self):
    ...

bob = Bob()
pvti.instrument(bob ["somemethod"], channel)

# decorator (later)
@pvti_instrument()
def cat():
  ...`}

3.3. Opening reports

After starting the System Analyser, you’re presented with a ‘landing page’ from which you can open reports and view various topics within this online help.

To open a report:

  • Click on the Open Report link and select a local file from the dialog box that opens. The System Analyser application can open .pvti files and .json files that support the Chromium trace format. Multiple files can be opened together from this dialog by shift-clicking to select all the files you want to open. Note that file size is limited to 2GB.

  • If you’ve previously opened a report, it will appear in the Recent list of report files – simply click on it to re-open.

While the report file(s) load into the application, you’ll see a ‘Loading’ progress bar, then the main view is displayed, showing the timeline, as described below.

3.4. Viewing reports

The application’s main view displays the Timeline information of the execution events recorded in the files you opened, together with a scaled-down overview above, which shows the entire set of events irrespective of your current zoom and pan state.

In the main window, the following actions are available:

  • Pan and zoom in and out of the timeline, viewing every event in the execution of the program.

  • Select individual events to view their details, and view the durations of selected sections of the timeline.

  • Save report images to your machine.

3.4.1. Using the sidebar buttons

Once a report has been opened, the application sidebar is displayed, which contains several buttons that allow you to perform the following actions:

  • Reload the report (this option is also available from the application’s View menu, or by pressing Control/Command and R).

  • Close the current report(s),

  • Open this documentation window,

  • Expand or contract the sidebar button labels.

3.4.2. Timeline information

Events in the timeline are grouped and layered according to the file where they originated, and beneath that to the process and thread in which they occurred. If there’s room to display it, the event’s name is shown within each event block. Hovering your mouse over an event block displays a pop-up that shows the details of the event, as described below.

The number of events within the currently displayed portion of the timeline is displayed in the top left-hand corner, as well as the duration of that portion. You’ll see these numbers change as you pan and zoom around the timeline.

The time scale is displayed across the top of the timeline, showing elapsed time from the start of the first event. This is displayed in hours and minutes and seconds, and more significant digits of the seconds are displayed as you zoom in. You can choose to display the time scale in relative terms (starting at 0:00 at the beginning of the first captured event), or in absolute terms, where a real time is displayed. Choose the display you want by clicking the Options button at the top of the application window.

Once you’ve opened a report, you can open additional reports to display on the same timeline. Just click the Add file button at the top-left of the application window, as add files as above.

Events in the timeline are coloured as follows:

  • Poplar events - events triggered from the Poplar libraries are coloured orange.

  • Framework events - events triggered from the PopART libraries, or any of the machine learning frameworks such as TensorFlow, are coloured blue.

  • Driver events - events triggered by the driver layer are coloured green.

  • Program events - user-generated program names are coloured red.

3.4.3. Panning and zooming

You can move around the timeline using the mouse to bring events into view:

  • In the overview at the top of the timeline, click and drag a section to zoom into the corresponding are of the timeline. The section of the timeline you’re currently viewing is highlighted in the overview.

  • Drag the mouse left and right in the timeline to shift it left and right at the current zoom level.

  • Use the mouse wheel to zoom in and out of the timeline. If the timeline is too deep to fit into the application window, a scrollbar is displayed to enable you to move the timeline up and down in the window. As the mouse wheel is used for zooming in and out, you can hold down the Control key to scroll the timeline up and down using the mouse wheel.

You can switch between window and full-screen display by selecting Toggle Full screen from the View menu, or by pressing Control/Command and F.

3.4.4. Selecting events

You can select any individual event in the timeline by clicking on it. The event details are then displayed beneath the timeline, showing:

  • Name - for events dispatched through one of the libraries (Poplar, PopART, TensorFlow, etc.) this is a concatenated list of the namespaces from which the event originated. For user-created functions (for example, in your Python programs), it is the name of the function.

  • Timestamp - this is the execution time at which the event occurred in the timeline, measured in hours, minutes and microseconds.

  • Duration - the amount of time the event took to execute.

The same information is displayed in the pop-up box when you hover your mouse over an event.

3.4.5. Viewing selected duration

You can view a duration on the timeline by holding down the Shift key and dragging the mouse from side to side anywhere in the timeline. This displays a timing duration marker at the top of the timeline, showing you the duration of the timeline you’ve selected. The selected duration is highlighted in pink in the overview.

3.4.6. Saving reports

To save the currently displayed portion of the timeline as a PNG image file:

  1. Click on the Save button in the top, right-hand corner of the main screen.

  2. Your system’s file browser dialog appears. Select the directory in which you want the image file saved.

  3. Click Save.

3.5. About System Analyser

To see the details of the System Analyser application, select About PopVision System Analyser from the application’s main menu. A dialog window appears showing:

  • Version - the version number of the application.

  • Commit - the unique commit hash of this release version.

  • Date - the data and time this version was released.

  • Component version numbers - the version numbers of the main software components used by the application, including Electron, Node, Chrome and the V8 engine.