8. gc-info

This tool lists detailed information about the IPUs present in the hardware platform. To extract some of the information, gc-info will need to lock access to IPUs. Therefore, most options (except --list-devices) cannot be used for IPUs that are already in use.

8.1. Sub-commands

A number of sub-commands are available as command line options to gc-info. The most useful are listed below.

Note

Several of the options to gc-info are intended primarily for use by Graphcore engineering and support staff. The glossary provides a brief explanation of some of the terminology, in case it is useful.

8.1.1. List devices

The --list-devices and --list-all-devices command options will list the IPUs in the system. The --list-devices option will list only IPUs directly connected to the server.

8.1.2. Tile overview

The --tile-overview command option will show a representation of the sync state of all the tiles of the IPU or IPUs this device is connected to.

$ gc-info -d 0 --tile-overview
      0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15
  0 xx:: xx:: xx:: xx:: xx:: xx:: xx:: xx:: xx:: xx:: xx:: xx:: xx:: xx:: xx:: xx::
  1 :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: ::::
  2 :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: ::::
  3 :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: ::::

The representation is in terms of supertile (group of 4 tiles), arranged by tile column and distance from the exchange. Columns 0-7 being north tile columns (moving eastwards), 8-15 being south tile columns (moving back westwards). The rows are the supertile index as you move away from the exchange.

The groupings of 4 show the state of each tile within a supertile, the first pair showing tiles 0 and 1 in supertile, and the second pair tiles 2 and 3.

Specifically for row 0, the first pair for each supertile (marked xx above) are the first tiles connected to the exchange in that column.

The symbol position is related to the physical tile number by:

physical tile number = 64 x row + 4 x column + tile

Where:

  • row is the supertile row index

  • column is the column index

  • tile is 0..3 (tile index within the supertile)

Note

There are two numbering schemes used to index tiles. A physical tile number is commonly used by low level tools such as gc-info. Internally Poplar uses a virtual tile number, but will report both.

Whilst the layout takes a little interpretation, the representation is compact and the most useful feature is to identify discrepancies between sync types. Identifying one tile that is not of a particular sync type for an IPU or group of IPUs can help you target debug.

Table 8.1 Key for tile-overview

X

(non debug) exception

x

debug exception (trap)

r

waiting on sync (receiving data)

:

waiting on multiple syncs (sans instruction)

'

waiting on local sync (workers in tile)

.

waiting on internal sync (other tiles within IPU)

AB

waiting on external sync including host (A=GS1, B=GS2)

abcd

waiting on external sync excluding host (a=GS1, b=GS2, c=GS3, d=GS4)

*

executing (some other instruction)

-

tile unloaded

?

inaccessible / unknown

8.1.3. Register dump

The --tile-status command dumps the register state from the individual tiles in the IPU. There are lots of options to control what it dumps. This is useful for low-level debugging of an application or IPU fault.

Some examples:

$ gc-info --device-id 0 --tile-status 0  # dumps out all tile registers on device 0
$ gc-info --device-id 0 --tile-status 0 --context SU # dumps out all tile registers on device 0 for the supervisor context
$ gc-info --device-id 0 --tile-status 0 --context SU --register PC # dumps out the PC for the supervisor context on tile 0
$ gc-info --device-id 0 --tile-status - # dumps out the tile status for all tiles.
$ gc-info --device-id 0 --tile-status - -c SU -r PC # Dumps out the PC for the supervisor context from every tile

The SU context displays information from the supervisor execution context used for managing worker threads and exchanges.

The Wn (or just n) context displays information from worker thread n.

There are also commands to display various SoC registers for low-level debugging; for example, --xb-status, --gsp-status, --pci-status.

8.1.4. Dump tile memory

The --dump-mem command displays the contents of memory on the specified tile. It takes three arguments: the tile number, the start address and the number of bytes to display.

For example, the following command dumps 16 bytes of memory from address 0x4c000 on tile 0:

$ gc-info --device-id 0 --dump-mem 0 0x4c000 16

8.2. Usage

8.2.1. Commands

-l, --list-devices

List devices

-a, --list-all-devices

List all devices

--chip-id

Show IPU chip ID

-t {tile_id}, --tile-status {tile_id}

Tile register dump

-k, --tile-clock-speed

Tile clock speed

-i, --device-info

Device info

-m {arg}, --dump-mem {arg}

{tile_num} {start_address} {size_in_bytes}

--tr-status

Trunk Router status

-x, --xb-status

XB status

--gsp-status

GSPs status

--nlc-status

NLCs status

--pci-status

SoC PCI status registers

--ss-status

System services registers

--sxp-status

Secure exchange pipe registers

--ipu-arch

Display IPU arch name

--ipu-count

Display the number of IPUs installed

-r {arg}, --register {arg}

Select register to print from tiles (‘-‘ is all registers) (default: -)

-c {arg}, --context {arg}

Select register context (‘-‘ is supervisor and TDI) (default: -)

--group-output

Set to group tile status output by value. Only valid with –register

--phy-summary

PCI PHY summary

--phy-dump

PCI PHY dump

--show-insn

List the instruction at the current supervisor’s $PC, for all tiles.

--tile-overview

Show an overview of the state of all tiles.

--fw-pub-keys

Display part of firmware public keys

--remote-id {arg}

Remote device id in format HOSTNAME:DEVICE_ID

-j, --json-output

Emit JSON output

-h, --help

Produce help message

--version

Version number

8.2.2. Command options

-s, --disassemble

Disassemble memory dump

-d {id}, --device-id {id}

Device id

8.2.3. Examples

gc-info --list-devices
gc-info --device-id {id} --tile-status {tile_num}
gc-info --device-id {id} --tile-status {tile_num} --register {reg}
gc-info --device-id {id} --tile-status {tile_num} --context - --register {reg}
gc-info --device-id {id} --tile-status {tile_num} --context SU --register {reg}
gc-info --device-id {id} --tile-status {tile_num} --context W0 --register {reg}
gc-info --device-id {id} --tile-status {tile_num} --context TDI --register {reg}
gc-info --device-id {id} --tile-status {tile_num} --context 0 --register {reg}
gc-info --device-id {id} --device-info
gc-info --device-id {id} --dump-mem {tile_num} {start_address} {size_in_bytes} [--disassemble]
gc-info --device-id {id} --xb-status
gc-info --device-id {id} --gsp-status
gc-info --device-id {id} --show-insn
gc-info --device-id {id} --fw-pub-keys

8.3. Glossary

External exchange

Data communication between IPUs or between an IPU and the host.

GSP

Global Sync Peripheral. An IPU subsystem that manages the synchronization of IPUs over multiple PCI cards.

NLC

Network Link Controller. The interface between a PCI controller used for external exchange and a trunk router.

PCI

Peripheral Component Interconnect. The bus standard used for connecting IPUs to the host, and for IPU-Links between IPUs.

SoC

System on Chip. A device, such as the IPU, that contains processor, memory, external interfaces and on-chip peripherals,

SS

System services registers.

Supertile

A group of four colocated tiles.

TDI

Tile debug interface.

TR

Trunk Router. Connects exchange traffic between the exchange block and the PCI controller.

XB

Exchange block. A subsystem that manages external exchange on behalf of the tiles in the IPU.