20. Device attributes

Each IPU available in a system has a number of attributes associated with it. These attributes describe both “fixed” aspects of the device, such as the board serial number and firmware version, as well as properties that can change at runtime, like clock speed, temperature or the name of the application currently using the IPU. Some of the more useful device attributes are made available in a user-friendly format by gc-monitor.

It is also possible to display the raw attributes, formatted as key-value string pairs, by using gc-inventory or gc-info (when used with the --device-info command).

For example, here’s some output from a C600-based system:

$ gc-info -i -d 0
Device Info:
  id: 0
  target: PCIe
  average board temp: 17.8 C
  average die temp: 20.0 C
  board ipu index: 0
  board serial number: 0063.0063.822391
  board type: C600
  board variant: C600
  clock: 20MHz
  driver version: 1.3.0
  firmware major version: 2
  firmware minor version: 6
  firmware patch version: 8
  firmware version: 2.6.8
  hexoatt active size (bytes): 0
  hexoatt total size (bytes): 16911433728
  hexopt active size (bytes): 0
  hexopt total size (bytes): 268435456
  icu bootloader version: 2.6.2
  ipu architecture: ipu21
  ipu error state: no errors
  ipu power: N/A
  ipu utilisation: 0.00%
  ipu utilisation (session): 0.00%
  link correctable error count: 0
  link correctable error count (session): 0
  link speed: 16.0 GT/s
  link width: 8
  numa node: 0
  parity error count threshold: 4
  parity error threshold interval: 7776000 seconds
  parity initialised: 1
  pci id: 0000:35:00.0
  pcie physical slot: 3
  remote buffers supported: 1
  sysfs file id: 0
  total board power: 12.6 W
  user executable: gc-hosttraffictest
  user name: someuser
  user process id: 1262065

and here’s an example from an IPU-Machine:

$ gc-info -i -d 0
Device Info:
  id: 0
  target: Fabric
  average board temp: 29.6 C
  average die temp: 39.2 C
  board ipu index: 3
  board serial number: 0131.0002.8204521
  board type: M2000
  board variant: M2000
  clock: 1330MHz
  config domain: 94818642074432
  driver version: 1.1.3
  firmware major version: 2
  firmware minor version: 5
  firmware patch version: 9
  firmware version: 2.5.9
  gateway software version: 2.6.1
  graph streaming: true
  hexoatt active size (bytes): 0
  hexoatt total size (bytes): 34082914304
  hexopt active size (bytes): 0
  hexopt total size (bytes): 268435456
  host link correctable error count: 1238
  host link correctable error count (session): 0
  ipu architecture: ipu2
  ipu error state: no errors
  ipu power: 32.4 W
  ipu utilisation: 0.00%
  ipu utilisation (session): 0.00%
  ipum software version: 2.6.0-028
  ipuof host: 10.5.13.3
  ipuof partition id: 2065-small-partition-reconfig
  ipuof routing id: 0
  ipuof routing type: DNC
  ipuof server version: 1.11.0
  link correctable error count: 0
  link correctable error count (session): 0
  link speed: 16.0 GT/s
  link width: 8
  number of replicas: 1
  partition sync type: c2-compatible
  pci id: 3
  pcie physical slot: 3
  reconfigurable partition: true
  remote buffers supported: 1
  user executable: gc-hosttraffictest
  user name: exampleusername
  user process id: 115783

Note: board type will be reported as ‘M2000’ for both IPU-M2000 and Bow-2000 products.

Attributes can be queried programmatically (from C++, Python or Go) with the gcipuinfo library.

20.1. List of supported attributes

Table 20.1 Device attributes

Attribute key string

Description

"id"

Unique identifier of a single-IPU or multi-IPU device.

"average board temp"

Average temperature in degrees Celsius as read by the sensors on the board.

"average die temp"

Average temperature in degrees Celsius as read by IPU sensors.

"board ipu index"

IPU number on board (0-1 for PCIe cards, 0-3 for IPU-Machines).

"board type"

The IPU board type ‘family’, for example C600 or M2000. Note: M2000 includes IPU-M2000 and Bow-2000.

"clock"

Current clock frequency.

"driver version"

PCIe driver version, specified as a <major.minor.patch> triple.

"gateway software version"

(IPUoF) IPU-Gateway software version, specified as a <major.minor.patch> triple.

"gcd id"

(IPUoF) Graphcore Compile Domain ID.

"hexoatt total size (bytes)"

Total remote-buffer memory available.

"hexoatt active size (bytes)"

Total remote buffer-memory in use by the IPU.

"hexopt total size (bytes)"

Total host exchange memory available.

"hexopt active size (bytes)"

Total host exchange memory in use by the IPU.

"ipu architecture"

IPU hardware architecture version.

"ipuof host"

(IPUoF) IP address of IPU-Gateway.

"ipuof server version"

(IPUoF) Fabric server version.

"ipu utilisation"

Percentage of time spent waiting for one or more IPU syncs, measured in the last second.

"ipu utilisation (session)"

Percentage of time spent waiting for one or more IPU syncs since the HSPs were set up.

"link correctable error count"

IPU Link correctable error count.

"link speed"

(PCIe) PCIe link speed available.

"link width"

(PCIe) Number of PCIe lanes available.

"max active code size (bytes)"

Maximum active code size (bytes).

"max active data size (bytes)"

Maximum active data size (bytes).

"max active stack size (bytes)"

Maximum active stack size (bytes).

"multi-ipu device id"

Multi-IPU device the IPU belongs to.

"multi-ipu discovery method"

Method used to discover multi-IPU groups.

"numa node"

NUMA node the IPU is on.

"number of ipu-link segments"

(IPUoF) Number of IPU-Link segments.

"number of replicas"

(IPUoF) Number of replicas in the partition.

"ipuof partition id"

(IPUoF) partition ID.

"partition sync type"

(IPUoF) sync configuration type, for example ‘c2-compatible’.

"pci id"

PCIe device identifier.

"pcie physical slot"

PCIe physical slot.

"process start time"

The start time of the process currently using the IPU.

"reconfigurable partition"

(IPUoF) Set to 1 if the IPU is part of a reconfigurable partition.

"remote buffers supported"

Set to 1 if remote buffers are supported.

"board serial number"

Serial number of the board.

"total board power"

Total current power consumption as read by board level sensors. Not used on IPU-Machines

"user executable"

The name of the process using the device.

"user name"

The username of the user using the device.

"user process id"

The process IDs of the process using the device.

"gateway routing type"

(IPUoF) GW-Link routing type.

"ipu link segment id"

(IPUoF) Identifier of IPU-Link segment.

"number of gcds"

Number of Graph Compile Domains.

"firmware version"

ICU Firmware version, specified as a <major.minor.patch> triple. In development builds, this will be suffixed with branch and build information.

"ipuof server error"

(IPUoF) Set if error occurred while attempting to communicate with the IPUoF server (a ‘connection’ error), or if the IPUoF server was unable to use the device (a ‘device’ error)

"host link correctable error count"

(PCIe) Host Link correctable error count.

"application host"

(IPUoF) IP address of the headnode where the application using this IPU is running.

"ipu error state"

Error state of the IPU. Set to ‘ipu memory failure’ if the tile parity error thresholds have been exceeded.

"parity error count threshold"

Threshold for number of parity errors to promote to a unrecoverable error.

"parity error threshold interval"

Threshold in seconds at which ‘num parity errors’ are promoted to an uncorrectable error.

"ipum software version"

(IPUoF) IPU-M software version.

"ipu power"

Power consumption of a single IPU. Only available on IPU-Machines

"link correctable error count (session)"

IPU Link correctable error count since device was last reset.

"host link correctable error count (session)"

(PCIe) Host-Link correctable error count since device was last reset.

"board variant"

IPU board model name. This will be identical to BoardType if this product only has a single variant.

"gateway write combining"

Gateway write combining status.

"secondary pcie interface supported"

Set to 1 if the secondary interface is supported.

"icu bootloader version"

ICU bootloader version, specified as a <major.minor.patch> triple. In development builds, this will be suffixed with branch and build information.

Note that gc-info and gc-inventory will also display some attributes which are not described in the table above. These attributes should be ignored — they are either deprecated or only used internally. They do not have any useful meaning to end users and may be removed in future software releases.