8. Viewing a Memory Report
The Memory Report shows a graphical representation of memory usage across all the tiles in your IPU system, showing graphs of total memory and liveness data, and details of variable types, placement and size.
There are two main areas of the Memory Report:
The Memory graph, in the top half of the window, shows different types of memory graph. Click on the Graph Type drop-down menu at the top left-hand corner of the graph to select a graph type:
Total Memory graph, which shows the memory usage of your program across all the IPU tiles. You can view a breakdown of this data by region (whether to display interleaved and non-interleaved memory separately) or by category (what the memory is used for).
Variables graph, which allows you to plot the memory usage of multiple individual variables.
Tile Map, which shows the memory usage of the tiles overlaid on a physical floor plan of the IPU.
The Tile Memory Usage report, in the bottom half of the screen, shows memory usage broken down by various categories, and memory maps of individual tiles.
You can choose various view options for each graph, and you can also click on the graph to view details for an individual tile.
8.2. Total Memory graph
This memory report shows the total memory usage across all the tiles on all IPUs.
On a Memory Report, select Total Memory from the Graph Type menu.
The horizontal axis shows the tile number (which you can order by software or physical ID (see Memory Report view options, and the vertical axis shows the memory usage.
8.2.1. Memory Report breakdown
Breakdown by Region
IPU memory has two different types of memory regions which Poplar allocates to data depending on how that data needs to be accessed:
non-interleaved: Consecutive words are stored in the same memory bank. Code must be stored here.
interleaved: Consecutive words are stored in alternating memory banks. Some high bandwidth load/store instructions like
ld128
only work in interleaved memory, and therefore some codelets require variables they are connected to to be stored here. Code cannot be stored here.overflowed: Memory that exceeds the maximum amount available on a tile.
Breakdown by category
When you select Breakdown - By Category another drop-down list is displayed with all the available memory categories. This can be used to understand the overhead costs of instrumentation.
The Select Categories drop-down is multi-select so you can compare multiple categories simultaneously.
The total memory can be toggled on and off by selecting the All option.
Breakdown - By Category is also available when viewing memory by IPU.
Breakdown by liveness
This memory report shows the memory usage of the two types of program variables:
Always-Live Variables: These variables must be accessible for the entire lifetime of graph execution. This means nothing else can ever use the memory allocated for these variables. Examples include code and constants.
Max Not-Always-Live Variables: Not-always-live variables are only needed for some program steps. As long as two variables are not live at the same time they can be allocated in the same location, thus saving memory. This option shows the maximum amount of live memory use on each tile. See the FAQ for more details.
8.3. Variables Memory graph
This memory report allows you to select multiple variables and plot their memory usage across tiles.
On a Memory Report, select Variables from the Graph Type menu.
A prompt appears, suggesting you enter a variable to search for. Type the variable name into the search box at the top, and the application will find matching variables and display them in a drop-down list.
Select a variable from the list to plot on the graph.
Remove variables from the graph by clicking the small x icon in their names in the key legend, below the graph.
You can also access this feature from the Total Memory report by selecting a variable from the Variables tab as described Plot multiple variables.
The Variables Memory graph is not available when viewing memory by IPU.
8.4. Tile map Memory graph
This memory report displays a schematic of an IPU and overlays it with a coloured representation of the tile memory usage for every tile for the selected IPU. The colour key is displayed on the right, and its range can be changed, as described in Changing the colour scale.
The Tile Map Memory graph is not available when viewing memory by IPU.
On a Memory Report, select Tile Map from the Graph Type menu.
Select an IPU to view using the input on the left, and the map updates to show the memory usage for that IPU.
Hover the mouse over the tile map to see a popup of the details of a tile within the selected IPU. This shows the physical and software tile IDs, memory usage and rank (described in Changing the colour scale). While you hover, a black line within the colour key, to the right of the tile map, shows the memory usage of the hovered tile, according to the colour scale currently selected.
Click on a tile on the map to select it and see its memory usage. Its details are shown in the tabs and tables below, as in other memory reports. You can select multiple tiles by holding down
Ctrl
/Command
while clicking on a tile. Details for each tile are displayed in the tables below, with a column for each tile. The selected tile numbers are displayed in the search box above the tile map, so can enter them by hand if you know which one you’re looking for.The Breakdown menu at the top of the tile map allows you to break down memory usage by region (see Memory Report breakdown for details). When breaking down by region, the Region control to the left of the map allows you to choose to display interleaved or non-interleaved memory.
Choose whether to include or exclude gaps in the tile map by using the Options menu at the top of the report.
The panel to the left of the tile map allows you to select which IPU to view, which variable category you’d like to view, and which colour scale to use.
Note that you can change the size of the tile map by dragging the split-screen control, and it will fill the space available in the top half of the screen.
8.5. Changing the colour scale
There are three methods of colouring the tiles on an IPU that show their memory usage in different ways. Use the Scale type drop-down to the left of the tile map to select one of these scales:
Relative: (default) The colour of a tile depends on the memory between the upper and lower memory values.
Absolute: The colour of a tile depends on the memory between zero and max memory.
Rank: The colour depends on a linear ordering of tiles based on their memory usage.
When your model is out of memory, the colours are scaled appropriately, not just to the max memory.
9. Tile memory usage
The bottom half of the Memory Report screen shows tabs that contain an analysis of memory usage by several different categories.
The default view shows memory usage for all tiles (or IPUs, if you are choosing to plot by IPU instead of tile), but you can select an individual tile/IPU as described in Selecting individual tiles/IPUs.
9.1. Tile memory usage: Details tab
The Details tab in the tile memory usage report displays a hierarchical list of memory usage by category on the selected tiles. This list is divided into three main sections:
Including Gaps: Shows memory usage on the selected tiles which includes the gaps between variables.
Excluding Gaps: Shows memory usage on the selected tiles which excludes the gaps between variables. It is split into interleaved and non-interleaved memory and also categorised by the type of data in that memory location.
Vertex Data: Shows the memory used by variables in the graph vertices as the Poplar program executes, categorised by the types mentioned in Excluding Gaps.
9.1.1. Excluding Gaps
Memory usage on the selected tiles is displayed here in two categories, with memory usage figures for each:
by Memory Region: Shows memory that is non-interleaved, memory that is interleaved, and any memory that has overflowed.
by Data Type: Shows memory further categorised by the type of data that is stored there (either overlapping data or non-overlapping data). The meaning of each of these categories is explained in the table below.
Not Overlapped data
This shows the usage for parts of the memory where only a single variable is allocated. This includes variables that cannot be overlapped with other variables (always-live variables), and also variables that just happen to be not overlapped with other variables, even though it isn’t disallowed.
Variables
These are the variables added using
Graph::addVariable()
.
Internal Exchange Message Buffers
During the exchange phase of program execution, it may not be possible to send data straight to its destination. For example sending a single byte directly is impossible because internal exchange has a granularity of four bytes. In cases like this Poplar will copy the data to and from temporary variables using on-tile copies (which can copy individual bytes) and then do the actual exchange from these buffers.
Constants
These are variables that were created using
Graph::addConstant()
.
Host Exchange Packet Headers
Host exchange is performed using a packet-based communication protocol. Each packet starts with a header that contains the address that its payload should be written to. These addresses are determined at compile time and the packet headers are stored in these variables.
Stack
This is where the program stack lives. It is created automatically during compilation. There is a single
Stack
variable on every tile that contains the stacks for the supervisor and worker threads. The stack size is configurable at compile time via a Poplar Engine option.
Vertex Instances
These store the state of each vertex instance. Each call to
addVertex()
adds a single vertex instance to the graph, whose size is equal tosizeof(TheCodeletClass)
.
Copy Descriptors
During compilation, Poplar will add compute sets that perform copies. These contain copy vertices, and the copy vertices reference additional data called Copy Descriptors that describe how to perform the copy.
VectorList Descriptors
A vector of pointers that points to the values of a multi-dimensional vector. The data for
VectorList<T, DeltaN>
fields.
Vertex Field Data
Variable-sized fields, for example the data for
Vector<T>
Control Code
All code that is not vertex code, this includes all the code that is generated from you Poplar
Program
tree, and code for each compute set that calls the codeletcompute()
functions for each vertex. It does not include thecompute()
functions themselves, which fall under the Vertex Code category.
Vertex Code
This is where the assembled code from the codelets is stored. A codelet is a class written in C++ or assembly, whereas a vertex is an instance of that class. Adding multiple instances of a single vertex type does not increase the amount of Vertex Code memory required.
Internal Exchange Code
The code instructions used to move data between tiles on an IPU.
Host Exchange Code
The code instructions used to move data between an IPU and the host machine.
Instrumentation Results
If you set the
debug.instrument
option in the Poplar Engine, this is where the cycle counts for various Poplar functions are stored. You’ll notice, therefore, that enabling instrumentation increases your memory usage. Different levels of instrumentation can be selected, which will use different amounts of memory. Note that the size of these variables is dependent on the level of dynamic branching in your program – if you’re timing every instance of a function call, the compiler won’t necessarily be able to tell in advance how much memory it will require to keep a cycle count for each of them.
Overlapped data
This is data for variables that are not always live, meaning that they are temporary and can be overlapped by other not-always-live variables if the two variables are not live at the same time. Reusing memory in this way reduces the amount that is required by Poplar programs. The sizes reported here count the memory used by the variables as if they were not overlapped. For example if two 4-byte variables are allocated in the same location it would be reported as 8 bytes here.
Program & Sync IDs
The Poplar Engine has a
run()
method to which you pass a vector of programs you want to execute. Each of these programs has an ID, so that you can specify which one to execute first. For example, when you callrun(3)
to run the fourth program,3
indicates the program ID that is sent to the IPU so that it knows which program to run. Additionally, when control flow cannot be statically determined, the IPU must inform the host which control flow path it took, so that the host knows which data to send during host exchange. This is done by sending Sync IDs.
Data Rearrangement Buffers
Data connected to a vertex edge is guaranteed to be contiguous in memory, but the Poplar API allows you to connect non-contiguous tensors to edges. In this case Poplar will need to insert rearranging copies to temporary variables so that the data presented to the vertex is contiguous.
9.2. Tile memory usage: Compute Sets tab
This tab contains a table of the compute sets that appear on the selected tiles or IPUs or, if none are selected, all tiles or IPUs. The name and total size of memory for each compute set are listed in descending order of size.
Each row can be expanded or collapsed by clicking the chevron icon at its left-hand edge. When a compute set is expanded, a subsidiary table of its constituent vertices is displayed beneath the compute set row.
The vertices table shows the total size of memory for each vertex in the compute set, which is also shown in the Vertices tab because it is independent of the compute sets.
In the comparison view, the difference in the size of a compute set or vertex between the source and target reports is only displayed if it appears in both reports. Otherwise, the difference column is empty for the row.
For reports generated with Poplar SDK version 2.3 or later, the instance count for each vertex in the compute set is also shown. The count for each selected tile or IPU is specific to the compute set that is expanded.
Because vertices are shared, a vertex may have a non-zero size for a tile or IPU even if its instance count is zero. Similarly, when comparing reports, a compute set that appears in only one of the reports may be made up of vertices which nevertheless appear elsewhere in the other report.
The compute sets can be filtered by their name, or the names of their constituent vertices, by using the text input and drop-down button above the table.
9.3. Tile memory usage: Vertices tab
This tab in the tile memory usage report lists the memory used by the graph vertices, together with the total memory size they occupy across the selected tiles (or all tiles, if none is selected). This list is ordered by decreasing memory usage.
For each vertex, the Poplar namespace and function name are listed, together with any additional information about their types. Please refer to the Poplar API Reference for a description of each of these functions.
The origin of each vertex, in a small blue box, is displayed after the vertex name, indicating whether the vertex was written in C++ or Assembler (ASM).
You can filter the vertices by name, using the input box above the table, or by source (C++ or ASM) using the drop-down list.
9.4. Tile memory usage: Exchanges tab
This tab in the tile memory usage report displays the internal exchange code size for all tiles/IPUs, or the currently selected tiles/IPUs. When comparing reports, there is an additional column that shows the difference between the source and target code size.
Open up details on any of the exchanges by clicking on the small arrow next to it. If no tile is selected, the exchange information is grouped by name, and the total size of the exchange data is added up to give the total size in the column on the right for exchange variables with that name.
The FROM and TO labels indicate the direction of the exchanges, the names of the variables involved, and how much data was passed.
The UL and L tags show whether the variables are unlowered or lowered. Unlowered variables are created and lowered across several tiles, so that parts of them are mapped to other tile memory variables. Many of the Poplar operations create lowered variables directly, so rather than create a large variable and map it across the tiles, it creates many little variables on each if the tiles, and maps out the exchanges that are required between them. There’s no higher level variables to reference, hence the need to differentiate between the two types.
Exchange information can also be seen on the Program Tree.
9.5. Tile memory usage: Variables tab
This tab in the tile memory usage report displays a memory map of the currently selected tile, showing code and variable usage across the memory locations. Note that this tab is not available when viewing memory by IPU.
Entries in this section of the report are only present if lowered variables were captured in the profile. See Report files for details about how to generate this when executing your program.
You can toggle between two different views (memory map and table view) on the Variables tab by clicking on the icon in the right-hand corner of the tab contents.
Select an individual tile as described in Selecting individual tiles/IPUs to view its memory layout and variable usage.
There are several interactive features of the Variables view that can help you find the locations in which variables are stored:
The selected tile’s memory is displayed vertically in a scrollable area that is 1024 bytes wide. The tile memory is partitioned into memory elements, which are either Interleaved or Non-Interleaved (see Memory Report breakdown for more information). Any unused elements at the end of the IPU memory are not displayed.
Variables are displayed as coloured bars which span the memory locations, and in places where two or more variables overlap, you can see all the variables at that location by hovering your mouse over the variable.
All variables in the memory layout are coloured according to their type. Click the colour key icon in the top right-hand corner to view the colour key for each type. The meaning of each of the categories displayed here is described in the table above. You can click on the checkboxes on the left-hand side of each variable to display it or hide it from the variable plot.
Click on a variable to display its details, which appear on the right-hand side of the memory layout. This displays all variables which exist at any time at that memory location.
Click on the Show button at the bottom of the variable details, beneath the Interference heading, to filter other variables that interfere with the selected variable in terms of memory placement. See Memory interference for details.
Search for a variable by entering its name into the input text box above the memory layout. If a variable name matches the search text, their name will be highlighted in the memory layout, with all others disappearing. You can clear any text you’ve entered here by hovering over the box and clicking the small x icon at the right-hand end.
Select various display options, as described below.
Plot one or more variables on the Memory graph, as described in Plot multiple variables.
You can expand the variables map to fill the report window by clicking on the full-screen button in the top right-hand corner of the map. A corresponding button to shrink the map again appears in the top-right corner.
9.5.1. Variable display options
The Options menu allows you to choose how to display variable information, but the menu options depend on whether you’re using table view or memory map view.
Choose to display memory addresses starting from zero, instead of the tile architecture’s base address, by selecting Start memory addresses from zero from the Options menu.
If you’re viewing the variables in table view, two other options are available:
If you select more than one tile from the graph, you can check the Show differemces between selected tiles option to add further columns to the table, showing you how the selections differ from the first-selected tile.
Check the Show memory addresses option to display the memory address of each variable in the table.
9.5.2. Memory interference
You can see other variables which a selected variable interferes with. These are variables that are in contention in terms of their memory placement. There are three ways variables can interfere with each other:
Memory: A variable cannot be occupy the same bytes as some other variables because it is live at the same time as them. Always-live variables interfere with every other variable in this way.
Element: A variable cannot be in the same memory element as another one. This can occur in some case when two variables are connected to the same vertex, and it is reading from one and writing to another using certain instructions.
Region: A variable cannot be in the same memory region as another one.
To see which other variables interfere with a selected variable:
Select a variable from the memory map by clicking on it. Several variables may occupy that memory location, and their details are displayed in a list on the right-hand side.
Click on the Show button at the bottom of a variable’s details, beneath the Interference heading, and the variables in the memory map will be filtered using that variable’s name, showing only those that interfere with it.
To re-display all variables, click the small cross in the filter box at the top of the variable memory map display.
9.6. Variable types
Variables in the memory layout diagram are categorised by colour. Note that more detailed descriptions are available in the Excluding Gaps table.
User variables: These are user-defined variables.
Variable: Variables created using the
Graph::addVariable()
function.Constant: Variables created using the
Graph::addConstant()
function.
Code variables: These are variables that are created by Poplar to execute the program code.
Control Table: Experimental, empty by default.
Control Code: Variables used by Poplar to run the program. Some specific control code variables are described in Known variables.
Vertex Code: The code within the vertex
codelets
Internal Exchange Code: The compiled code used to move data between tiles on an IPU.
Host Exchange Code: The compiled code used to move data over PCI between an IPU and the host machine.
Global Exchange Code: The compiled code used to move data between multiple IPUs.
Vertex data variables: These are variables associated with tensors.
Vertex Instance State: The internal state of each vertex instance.
Copy Descriptor: Additional metadata used for copy vertices.
Vector List Descriptor: Data for
VectorList<T, deltaN>
fields.Vertex Field Data: Data for
Vector<T>
fields.
Temporary variables: These are used to temporarily store information that Poplar uses while executing the program code. They are not-always-live, overlapping variables.
Message: Temporary storage for internal exchange (between tiles on one IPU).
Host Message: Temporary storage for host exchange (between an IPU and the host).
Global Message: Temporary storage for global exchange (between IPUs).
Rearrangement: Variables used to store intermediate values when an edge is connected to non-contiguous variables.
Output Edge: Temporary output variable used when a vertex is connected to a variable on a different tile. The data is copied by internal exchange after the compute set has been executed. Note that this category is also used for Input Edges. It should really be named Input/Output Edge.
Miscellaneous variables: Other variables that don’t fit into the categories above.
Multiple: Sometimes variables are merged during lowering. If they came from two different categories the resultant variable is put into this category.
Control ID: The combination of program ID, sync IDs and software sync counter.
Host Exchange Packet Header: Header information for PCI messages between the IPUs and the host machine.
Global Exchange Packet Header: Header information for PCI messages between IPUs.
Stack: Thread stacks for each tile.
Instrumentation Results: The cycle counts if instrumentation is enabled.
9.7. Known variables
Poplar uses some specific variables that you may encounter on various tiles. Their purpose is:
.text.poplar_start: This is the entrypoint and main control code for your program. It’s roughly equivalent to
main()
in a C program..text.supervisor.control__func[…]: These are the control codes for the functions in your compiled program. These are the functions you can see on the Program Tree report.
.text.supervisor.control_initPrngSeed: The control code to initialise the seed for the psuedo-random number generator.
9.8. Plot multiple variables
When a variable is selected in the Variables tab, its details are displayed in the right-hand column next to the memory map for that tile. You can then plot that variable’s position in memory on the main graph, as follows:
With the variable selected, click on the Plot variable button. The Graph Type menu at the top changes to Variables The variable name is added to the variable list above the report, and you can see how it is placed across the tile memory.
Click other variables in the memory map, and you can repeat the process above, adding them to the variable list at the top of the report, and displaying their memory placement together on the same graph. The default behaviour shows the size of the variable on each tile. If you select Plot variable by address from the Options menu at the top of the screen, you can see how the variable is laid out in the memory space.
To remove a variable from the graph, find its name in the list above the graph, then click the small x button at the right-hand end.
9.9. Full-screen option
When viewing the content of the Variables tab it may be easier to view the data in full-screen mode. You can toggle this option on and off using the button in the top right hand corner of the tab.
9.10. Toggle between table and graph view
The Variables tab has a table view, listing the variable names and their sizes, and also a graph view which provides more in-depth detail. You can toggle between these by using the chart/text button in the button group in the top right-hand corner of the tab.
9.11. Show differences between selected tiles
When multiple tiles are selected the difference between values is displayed in red or green text on the table view. You can remove variables that have the same value by enabling the Show differences between selected tiles option. This is accessible by clicking on the cog icon and checking the box in the drop-down menu. This filters out all variables that have the same value, and leaves only those that are different.
9.12. Show base adress offset
Each IPU version has a different address where available memory starts on its tiles.
For the Mk1 IPU, this is 0x40000
, and for the Mk2 it is 0x4C000
.
Selecting this menu option resets the base address to the IPU version you’re using.