4. Execution profile

The execution profile contains information about the programs that have been run since the execution profile was last reset. Because the profiling data varies for different target types and profiling methods, the entire object is a tagged union.

4.1. Generating the report

After you have run the program one or more times you can get dynamic profiling information (what code was run, cycle counts, and so on).

You can save the profiling information to a file for use by the Graph Analyser. For example:

ProfileValue executionProfile = engine.getExecutionProfile();
std::ofstream executionFile;
executionFile.open("execution.json");
poplar::serializeToJSON(executionFile, executionProfile);
executionFile.close();

4.2. Contents of the report

4.2.1. Profiler mode

The profilerMode is the tag for this object. It can be one of the following:

  • NONE

  • CPU

  • IPU_MODEL

  • COMPUTE_SETS

  • SINGLE_TILE_COMPUTE_SETS

  • VERTICES

  • EXTERNAL_EXCHANGES

  • HOST_EXCHANGES

It has the following fields, some of which are only present for certain modes.

COMPUTE_SETS

  • computeSetCyclesByTile: A 2D array indexed by compute set id, then tile, that gives the total number of cycles taken to execute that compute set on that tile.

SINGLE_TILE_COMPUTE_SETS

  • computeSetCycles: A 1D array indexed by compute set id that gives the total number of cycles taken to execute that compute set on all tiles. For this mode an internal sync is inserted before & after the compute set.

VERTICES

  • vertexCycles: A 1D array indexed by vertex ID that contains the number of cycles each vertex took the last time it was run.

  • vertexComputeSet: A 1D array indexed by vertex ID giving the compute set the vertex is in.

  • vertexType: A 1D array indexed by vertex ID giving an index into the list of vertex types.

EXTERNAL_EXCHANGES

  • externalExchangeCycles: A 2D array indexed by external exchange ID, and then tile, that gives the number of cycles used for each external (that is, from one IPU to another) exchange on each tile.

HOST_EXCHANGES

  • hostExchangeCycles: This is the same as externalExchangeCycles but for host<->IPU exchanges.

Additionally for all modes except NONE and CPU there profile contains program trace and simulation information.

4.2.2. Program trace information

  • programTrace is a 1D array of the programs IDs that were run. These are indexes into programs in the graph profile.

4.2.3. Simulation information

  • simulation has a list of execution steps based on the simulation of the programs that are listed in programTrace. This information is redundant. It is calculated entirely from the graph profile and the programTrace but it is included for convenience.

The fields of simulation are as follows.

  • cycles is the total number of cycles it took to execute all of the programs in programTrace.

  • tileCycles is the number of cycles spent doing each kind of activity. Unlike cycles this counts cycles from different tiles as distinct. That is, if two tiles both do a computation that takes 10 cycles in parallel, then cycles will be 10, but tileCycles.compute will be 20. activeCompute is a compute cycle where the active thread is computing, and cycles is a compute cycle where the active thread or any of the other threads is computing.

"tileCycles":{
  "activeCompute":1349,
  "compute":8094,
  "copySharedStructure":0,
  "doExchange":2070,
  "globalExchange":0,
  "streamCopy":16,
  "sync":26238
}
  • steps lists the compute, sync and exchange steps that are run. Each entry is a tagged union based on the type field which may be one of

    • OnTileExecute

    • StreamCopy

    • CopySharedStructure

    • Sync

    • DoExchange

    • GlobalExchange.

When running on actual hardware, the simulation uses computeSetCycles or computeSetCyclesByTile for the compute set cycles. If hardware cycles are not available (for example, under IPU_MODEL) then cycle estimates are used.

The other fields in each step depend on its type. Sync only contains the sync type: External or Internal

{
  "syncType":"External",
  "type":"sync"
}

All other types contain the following fields:

  • type: The step type as described above.

  • program: The program ID for this step (an index into programs).

  • name: This field may be present if the program has a name. If the program has no name this field is omitted.

  • tileBalance: A fraction from 0-1 which indicates how balanced computation was between the tiles. It is calculated as the total number of compute cycles used / cycles * numTiles. If all tiles take the same number of cycles to finish this then this will be 1.0. If for example you have one tile that takes 10 cycles and one that takes 5 then this will be 0.75.

  • activeTiles: The number of tiles that are computing (or exchanging for exchanges).

  • activeTileBalance: The same as tileBalance but it ignores completely idle tiles.

  • cycles: The number of cycles taken by the longest running tile. Because OnTileExecute calls can overlap with each other and with exchanges this may be non-zero even if the execution doesn’t actually take any extra time.

  • cyclesFrom: The first cycle number where this program was executing on any tile.

  • cyclesTo: The last cycle number where this program was executing on any tile.

The exchange types (DoExchange, StreamCopy, GlobalExchange and SharedStructureCopy) also contain these fields:

  • totalData: The total amount of data transferred during the exchange.

  • dataBalance: Exactly like tileBalance but for the amount of data sent and received by each tile, instead of cycles.

OnTileExecute also contains these fields:

  • threadBalance: Similar in concept to tileBalance except it measures how well-utilised the hardware threads are. If you always run 6 threads or 0 threads this will be 1.0 even if the total computation on each tile takes a different amount of time.

  • computeSet: The ID of the compute set executed by this step.

DoExchange, GlobalExchange and StreamCopy contain a field that is an index into the corresponding exchange lists, called exchange, externalExchange or hostExchange respectively.

Finally, OnTileExecute, DoExchange and CopySharedStructure contain this field:

  • cyclesOverlapped: How many cycles were overlapped with previous steps.