19. Device attributes

Each IPU available in a system has a number of attributes associated with it. These attributes describe both “fixed” aspects of the device, such as the board serial number and firmware version, as well as properties that can change at runtime, like clock speed, temperature or the name of the application currently using the IPU. Some of the more useful device attributes are made available in a user-friendly format by gc-monitor.

It is also possible to display the raw attributes, formatted as key-value string pairs, by using gc-inventory or gc-info (when used with the --device-info command).

For example:

$ gc-info -d 0 --device-info
Device Info:
  id: 0
  target: PCIe
  average board temp: 36.2 C
  average die temp: 33.4 C
  board ipu index: 1
  board serial number: 0340.0004.919062
  board type: C2
  clock: 1300MHz
  driver version: 1.0.56
  firmware version: 1.4.14
  hexoatt active size (bytes): 0
  hexoatt total size (bytes): 17045651456
  hexopt active size (bytes): 134217728
  hexopt total size (bytes): 134217728
  ipu architecture: ipu1
  ipu utilisation: 99.96%
  ipu utilisation (session): 99.98%
  link correctable error count: 0
  link speed: 8 GT/s
  link width: 8
  numa node: 0
  parity initialised: 1
  pci id: 0000:1a:00.0
  pcie physical slot: 3
  process start time: 20211122T224648Z
  remote buffers supported: 1
  sysfs file id: 0
  total board power: 99.5 W
  user executable: gc-hosttraffictest
  user name: justina
  user process id: 38641

Attributes can be queried programmatically (from C++, Python or Go) with the gcipuinfo library.

19.1. List of supported attributes

Table 19.1 Device attributes

Attribute key string

Description

"id"

Unique identifier of a single or multi-IPU device.

"average board temp"

Average temperature in degrees Celsius as read by the sensors on the board.

"average die temp"

Average temperature in degrees Celsius as read by IPU sensors.

"board ipu index"

IPU number on board (0-1 for PCIe cards, 0-3 for IPU-Machine).

"board type"

The IPU board type ‘family’, for example C2 or M2000.

"clock"

Current clock frequency.

"driver version"

PCIe driver version, specified as a <major.minor.patch> triple.

"gateway software version"

(IPUoF) IPU-Gateway software version, specified as a <major.minor.patch> triple.

"gcd id"

(IPUoF) Graphcore Compile Domain ID.

"hexoatt total size (bytes)"

Total remote buffers memory available.

"hexoatt active size (bytes)"

Total remote buffers memory in use by the IPU.

"hexopt total size (bytes)"

Total host exchange memory available.

"hexopt active size (bytes)"

Total host exchange memory in use by the IPU.

"ipu architecture"

IPU hardware architecture version.

"ipuof host"

(IPUoF) IP address of IPU-Gateway.

"ipuof server version"

(IPUoF) Fabric server version.

"ipu utilisation"

Percentage of time spent waiting for one or more IPU sync(s), measured in the last second.

"ipu utilisation (session)"

Percentage of time spent waiting for one or more IPU sync(s) since the HSPs were set up.

"link correctable error count"

(PCIe) Link correctable error count.

"link speed"

(PCIe) PCIe link speed available.

"link width"

(PCIe) Number of PCIe lanes available.

"max active code size (bytes)"

Maximum active code size (bytes).

"max active data size (bytes)"

Maximum active data size (bytes).

"max active stack size (bytes)"

Maximum active stack size (bytes).

"multi-ipu device id"

Multi-IPU device the IPU belongs to.

"multi-ipu discovery method"

Method used to discover multi-IPU groups.

"numa node"

NUMA node the IPU is on.

"number of ipu-link segments"

(IPUoF) Number of IPU-Link segments.

"number of replicas"

(IPUoF) Number of replicas in the partition.

"ipuof partition id"

(IPUoF) partition ID.

"partition sync type"

(IPUoF) sync configuration type, for example ‘c2-compatible’.

"pci id"

PCIe device identifier.

"pcie physical slot"

PCIe physical slot.

"process start time"

The start time of the process currently using the IPU.

"reconfigurable partition"

(IPUoF) - set to 1 if the IPU is part of a reconfigurable partition.

"remote buffers supported"

Set to 1 if remote buffers are supported.

"board serial number"

Serial number of the board.

"total board power"

Total current power consumption as read by board level sensors. Not used on IPU-Machines

"user executable"

The name of the process using the device.

"user name"

The username of the user using the device.

"user process id"

The process IDs of the process using the device.

"gateway routing type"

(IPUoF) GW-Link routing type.

"ipu link segment id"

(IPUoF) Identifier of IPU-Link segment.

"number of gcds"

Number of Graph Compile Domains.

"firmware version"

ICU Firmware version, specified as a <major.minor.patch> triple.

"ipuof server error"

(IPUoF) Set if error occurred while attempting to communicate with the IPUoF server (a ‘connection’ error), or if the IPUoF server was unable to use the device (a ‘device’ error)

"host link correctable error count"

(PCIe) Host Link correctable error count.

"application host"

(IPUoF) IP address of the headnode where the application using this IPU is running.

"ipu error state"

Error state of the IPU. Set to ‘ipu memory failure’ if the tile parity error thresholds have been exceeded.

"parity error count threshold"

threshold number of parity errors to promote to a unrecoverable error.

"parity error threshold interval"

threshold in seconds at which num parity errors are promoted to an uncorrectable error.

"ipum software version"

(IPUoF) IPU-M software version.

"ipu power"

Power consumption of a single IPU. Only available on IPU-Machines

"link correctable error count (session)"

(PCIe) Link correctable error count since device was last reset.

"host link correctable error count (session)"

(PCIe) Host Link correctable error count since device was last reset.

"board variant"

IPU board model name. This will be identical to BoardType if this product only has a single variant.

"gateway write combining"

Gateway write combining status.

Note that gc-info and gc-inventory will also display some attributes which are not described in the table above. These attributes should be ignored - they are either deprecated or only used internally. They do not have any useful meaning to end users and may be removed in future software releases.