10. gc-info

gc-info is a multi-function tool that can display various kinds of information about the available IPUs.

10.1. Commands

gc-info takes a command parameter to specify the type of information required. The most useful commands are listed below.

Note

The --list-devices, --device-info, --ipu-arch, --tile-clock-speed and --ipu-count commands can be used at any time. All other commands require exclusive access to the IPU and so cannot be used on devices that are currently running an application.

10.1.1. List devices

The --list-devices command displays a list of all devices (on IPU Pod systems that will only be within the active partition). This includes all individual IPUs and all logical devices composed of multiple IPUs.

Here are some examples from an IPU Pod system:

$ gc-info --list-devices
Graphcore device listing:
Partition: p1 [active]
-+- Id: [0], target: [Fabric], IPU-M host: [10.1.5.10], IPU#: [3]
-+- Id: [1], target: [Fabric], IPU-M host: [10.1.5.10], IPU#: [2]
-+- Id: [2], target: [Fabric], IPU-M host: [10.1.5.10], IPU#: [1]
-+- Id: [3], target: [Fabric], IPU-M host: [10.1.5.10], IPU#: [0]
-+- Id: [4], target: [Multi IPU]
 |--- Id: [0], DNC Id: [0], IPU-M host: [10.1.5.10], IPU#: [3]
 |--- Id: [1], DNC Id: [1], IPU-M host: [10.1.5.10], IPU#: [2]
-+- Id: [5], target: [Multi IPU]
 |--- Id: [2], DNC Id: [0], IPU-M host: [10.1.5.10], IPU#: [1]
 |--- Id: [3], DNC Id: [1], IPU-M host: [10.1.5.10], IPU#: [0]
-+- Id: [6], target: [Multi IPU]
 |--- Id: [0], DNC Id: [0], IPU-M host: [10.1.5.10], IPU#: [3]
 |--- Id: [1], DNC Id: [1], IPU-M host: [10.1.5.10], IPU#: [2]
 |--- Id: [2], DNC Id: [2], IPU-M host: [10.1.5.10], IPU#: [1]
 |--- Id: [3], DNC Id: [3], IPU-M host: [10.1.5.10], IPU#: [0]

You can see what devices are available in other partitions by using the --all-partitions option flag.

$ gc-info -l --all-partitions
Graphcore device listing:
Partition: p1 [active]
-+- Id: [0], target: [Fabric], IPU-M host: [10.1.5.10], IPU#: [3]
-+- Id: [1], target: [Fabric], IPU-M host: [10.1.5.10], IPU#: [2]
-+- Id: [2], target: [Fabric], IPU-M host: [10.1.5.10], IPU#: [1]
-+- Id: [3], target: [Fabric], IPU-M host: [10.1.5.10], IPU#: [0]
Partition: p2
-+- Id: [4], target: [Fabric], IPU-M host: [10.1.5.12], IPU#: [3]
-+- Id: [5], target: [Fabric], IPU-M host: [10.1.5.12], IPU#: [2]
-+- Id: [6], target: [Fabric], IPU-M host: [10.1.5.12], IPU#: [1]
-+- Id: [7], target: [Fabric], IPU-M host: [10.1.5.12], IPU#: [0]

In this view the IPUs from all partitions are shown in a flat list. Note that IPUs will be assigned different device IDs to those that are assigned in the default ‘active partition only’ mode.

It is only possible to attach to IPUs in the active partition.

Here is an example for a PCIe-based system:

$ gc-info --list-devices
Graphcore device listing:
-+- Id: [0], target: [PCIe], PCI domain: [0000:1b:00.0]
-+- Id: [1], target: [PCIe], PCI domain: [0000:1c:00.0]
-+- Id: [2], target: [PCIe], PCI domain: [0000:48:00.0]
-+- Id: [3], target: [PCIe], PCI domain: [0000:49:00.0]
-+- Id: [4], target: [PCIe], PCI domain: [0000:8a:00.0]
-+- Id: [5], target: [PCIe], PCI domain: [0000:8c:00.0]
-+- Id: [6], target: [PCIe], PCI domain: [0000:c4:00.0]
-+- Id: [7], target: [PCIe], PCI domain: [0000:c5:00.0]
-+- Id: [8], target: [Multi IPU]
 |--- Id: [3], DNC Id: [0], PCI domain: [0000:49:00.0]
 |--- Id: [2], DNC Id: [1], PCI domain: [0000:48:00.0]
-+- Id: [9], target: [Multi IPU]
 |--- Id: [1], DNC Id: [0], PCI domain: [0000:1c:00.0]
 |--- Id: [0], DNC Id: [1], PCI domain: [0000:1b:00.0]
-+- Id: [10], target: [Multi IPU]
 |--- Id: [7], DNC Id: [0], PCI domain: [0000:c5:00.0]
 |--- Id: [6], DNC Id: [1], PCI domain: [0000:c4:00.0]
-+- Id: [11], target: [Multi IPU]
 |--- Id: [4], DNC Id: [0], PCI domain: [0000:8a:00.0]
 |--- Id: [5], DNC Id: [1], PCI domain: [0000:8c:00.0]
-+- Id: [12], target: [Multi IPU]
 |--- Id: [3], DNC Id: [0], PCI domain: [0000:49:00.0]
 |--- Id: [2], DNC Id: [1], PCI domain: [0000:48:00.0]
 |--- Id: [1], DNC Id: [2], PCI domain: [0000:1c:00.0]
 |--- Id: [0], DNC Id: [3], PCI domain: [0000:1b:00.0]
-+- Id: [13], target: [Multi IPU]
 |--- Id: [7], DNC Id: [0], PCI domain: [0000:c5:00.0]
 |--- Id: [6], DNC Id: [1], PCI domain: [0000:c4:00.0]
 |--- Id: [4], DNC Id: [2], PCI domain: [0000:8a:00.0]
 |--- Id: [5], DNC Id: [3], PCI domain: [0000:8c:00.0]
-+- Id: [14], target: [Multi IPU]
 |--- Id: [3], DNC Id: [0], PCI domain: [0000:49:00.0]
 |--- Id: [2], DNC Id: [1], PCI domain: [0000:48:00.0]
 |--- Id: [1], DNC Id: [2], PCI domain: [0000:1c:00.0]
 |--- Id: [0], DNC Id: [3], PCI domain: [0000:1b:00.0]
 |--- Id: [7], DNC Id: [4], PCI domain: [0000:c5:00.0]
 |--- Id: [6], DNC Id: [5], PCI domain: [0000:c4:00.0]
 |--- Id: [4], DNC Id: [6], PCI domain: [0000:8a:00.0]
 |--- Id: [5], DNC Id: [7], PCI domain: [0000:8c:00.0]

10.1.2. Device info

The --device-info command displays the device attributes of the specified device.

10.1.3. Tile overview

The --tile-overview command will show a representation of the sync state of all the tiles of the IPU or IPUs this device is connected to.

$ gc-info -d 0 --tile-overview
      0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15
  0 xx:: xx:: xx:: xx:: xx:: xx:: xx:: xx:: xx:: xx:: xx:: xx:: xx:: xx:: xx:: xx::
  1 :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: ::::
  2 :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: ::::
  3 :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: ::::

The representation is in terms of supertile (group of 4 tiles), arranged by tile column and distance from the exchange. Columns 0-7 being north tile columns (moving eastwards), 8-15 being south tile columns (moving back westwards). The rows are the supertile index as you move away from the exchange.

The groupings of 4 show the state of each tile within a supertile, the first pair showing tiles 0 and 1 in supertile, and the second pair tiles 2 and 3.

Specifically for row 0, the first pair for each supertile (marked xx above) are the first tiles connected to the exchange in that column.

The symbol position is related to the physical tile number by:

physical tile number = 64 x row + 4 x column + tile

Where:

  • row is the supertile row index

  • column is the column index

  • tile is 0..3 (tile index within the supertile)

Note

There are two numbering schemes used to index tiles. A physical tile number is commonly used by low level tools such as gc-info. Internally Poplar uses a virtual tile number, but will report both.

Whilst the layout takes a little interpretation, the representation is compact and the most useful feature is to identify discrepancies between sync types. Identifying one tile that is not of a particular sync type for an IPU or group of IPUs can help you target debug.

Table 10.1 Key for tile-overview

X

(non debug) exception

x

debug exception (trap)

r

waiting on sync (receiving data)

:

waiting on multiple syncs (sans instruction)

'

waiting on local sync (workers in tile)

.

waiting on internal sync (other tiles within IPU)

AB

waiting on external sync including host (A=GS1, B=GS2)

abcd

waiting on external sync excluding host (a=GS1, b=GS2, c=GS3, d=GS4)

*

executing (some other instruction)

-

tile unloaded

?

inaccessible / unknown

10.1.4. Register dump

The --tile-status command dumps the register state from the individual tiles in the IPU. There are lots of options to control what it dumps. This is useful for low-level debugging of an application or IPU fault.

Some examples:

$ gc-info --device-id 0 --tile-status 0  # dumps out all tile registers on device 0
$ gc-info --device-id 0 --tile-status 0 --context SU # dumps out all tile registers on device 0 for the supervisor context
$ gc-info --device-id 0 --tile-status 0 --context SU --register PC # dumps out the PC for the supervisor context on tile 0
$ gc-info --device-id 0 --tile-status - # dumps out the tile status for all tiles.
$ gc-info --device-id 0 --tile-status - -c SU -r PC # Dumps out the PC for the supervisor context from every tile

The SU context displays information from the supervisor execution context used for managing worker threads and exchanges.

The Wn (or just n) context displays information from worker thread n.

There are also commands to display various SoC registers for low-level debugging; for example, --xb-status, --gsp-status, --pci-status.

10.1.5. Dump tile memory

The --dump-mem command displays the contents of memory on the specified tile. It takes three arguments: the tile number, the start address and the number of bytes to display.

For example, the following command dumps 16 bytes of memory from address 0x4c000 on tile 0:

$ gc-info --device-id 0 --dump-mem 0 0x4c000 16

10.2. Usage

10.2.1. Commands

-l, --list-devices

List devices

--chip-id

Show IPU chip ID

-t {tile_id}, --tile-status {tile_id}

Tile register dump

-k, --tile-clock-speed

Tile clock speed

-i, --device-info

Device info

-m {arg}, --dump-mem {arg}

{tile_num} {start_address} {size_in_bytes}

--tr-status

Trunk Router status

-x, --xb-status

XB status

--gsp-status

GSPs status

--nlc-status

NLCs status

--pci-status

SoC PCI status registers

--ss-status

System services registers

--ipu-arch

Display IPU arch name

--ipu-count

Display the number of IPUs installed

--tile-overview

Show an overview of the state of all tiles.

-h, --help

Produce help message

--version

Version number

10.2.2. Command options

--all-partitions

Only valid with –list-devices. List devices in all partitions

-r {arg}, --register {arg}

Select register to print from tiles (‘-’ is all registers). Only valid with –tile-status (default: -)

-c {arg}, --context {arg}

Select register context (‘-’ is supervisor and TDI). Only valid with –tile-status (default: -)

--group-output

Set to group tile status output by value. Only valid with –register

-j, --json-output

Emit JSON output.

--remote-id {arg}

Remote device ID in format HOSTNAME:DEVICE_ID

-d {id}, --device-id {id}

Device ID

10.2.3. Examples

gc-info --list-devices
gc-info --device-id {id} --tile-status {tile_num}
gc-info --device-id {id} --tile-status {tile_num} --register {reg}
gc-info --device-id {id} --tile-status {tile_num} --context - --register {reg}
gc-info --device-id {id} --tile-status {tile_num} --context SU --register {reg}
gc-info --device-id {id} --tile-status {tile_num} --context W0 --register {reg}
gc-info --device-id {id} --tile-status {tile_num} --context TDI --register {reg}
gc-info --device-id {id} --tile-status {tile_num} --context 0 --register {reg}
gc-info --device-id {id} --device-info
gc-info --device-id {id} --dump-mem {tile_num} {start_address} {size_in_bytes}
gc-info --device-id {id} --xb-status
gc-info --device-id {id} --gsp-status

10.3. Glossary

External exchange

Data communication between IPUs or between an IPU and the host.

GSP

Global Sync Peripheral. An IPU subsystem that manages the synchronization of IPUs over multiple PCI cards.

NLC

Network Link Controller. The interface between a PCI controller used for external exchange and a trunk router.

PCI

Peripheral Component Interconnect. The bus standard used for connecting IPUs to the host, and for IPU-Links between IPUs.

SoC

System on Chip. A device, such as the IPU, that contains processor, memory, external interfaces and on-chip peripherals,

SS

System services registers.

Supertile

A group of four colocated tiles.

TDI

Tile debug interface.

TR

Trunk Router. Connects exchange traffic between the exchange block and the PCI controller.

XB

Exchange block. A subsystem that manages external exchange on behalf of the tiles in the IPU.