20. Device attributes
Each IPU available in a system has a number of attributes associated with it.
These attributes describe both “fixed” aspects of the device, such as the board serial number and firmware version,
as well as properties that can change at runtime, like clock speed, temperature or the name of the application currently using the IPU.
Some of the more useful device attributes are made available in a user-friendly format by gc-monitor
.
It is also possible to display the raw attributes, formatted as key-value string pairs, by using
gc-inventory
or gc-info
(when used with the --device-info
command).
For example, here’s some output from a C600-based system:
$ gc-info -i -d 0
Device Info:
id: 0
target: PCIe
average board temp: 17.8 C
average die temp: 20.0 C
board ipu index: 0
board serial number: 0063.0063.822391
board type: C600
board variant: C600
clock: 20MHz
driver version: 1.3.0
firmware major version: 2
firmware minor version: 6
firmware patch version: 8
firmware version: 2.6.8
hexoatt active size (bytes): 0
hexoatt total size (bytes): 16911433728
hexopt active size (bytes): 0
hexopt total size (bytes): 268435456
icu bootloader version: 2.6.2
ipu architecture: ipu21
ipu error state: no errors
ipu power: N/A
ipu utilisation: 0.00%
ipu utilisation (session): 0.00%
link correctable error count: 0
link correctable error count (session): 0
link speed: 16.0 GT/s
link width: 8
numa node: 0
parity error count threshold: 4
parity error threshold interval: 7776000 seconds
parity initialised: 1
pci id: 0000:35:00.0
pcie physical slot: 3
remote buffers supported: 1
sysfs file id: 0
total board power: 12.6 W
user executable: gc-hosttraffictest
user name: someuser
user process id: 1262065
and here’s an example from an IPU-Machine:
$ gc-info -i -d 0
Device Info:
id: 0
target: Fabric
average board temp: 29.6 C
average die temp: 39.2 C
board ipu index: 3
board serial number: 0131.0002.8204521
board type: M2000
board variant: M2000
clock: 1330MHz
config domain: 94818642074432
driver version: 1.1.3
firmware major version: 2
firmware minor version: 5
firmware patch version: 9
firmware version: 2.5.9
gateway software version: 2.6.1
graph streaming: true
hexoatt active size (bytes): 0
hexoatt total size (bytes): 34082914304
hexopt active size (bytes): 0
hexopt total size (bytes): 268435456
host link correctable error count: 1238
host link correctable error count (session): 0
ipu architecture: ipu2
ipu error state: no errors
ipu power: 32.4 W
ipu utilisation: 0.00%
ipu utilisation (session): 0.00%
ipum software version: 2.6.0-028
ipuof host: 10.5.13.3
ipuof partition id: 2065-small-partition-reconfig
ipuof routing id: 0
ipuof routing type: DNC
ipuof server version: 1.11.0
link correctable error count: 0
link correctable error count (session): 0
link speed: 16.0 GT/s
link width: 8
number of replicas: 1
partition sync type: c2-compatible
pci id: 3
pcie physical slot: 3
reconfigurable partition: true
remote buffers supported: 1
user executable: gc-hosttraffictest
user name: exampleusername
user process id: 115783
Note: board type will be reported as ‘M2000’ for both IPU-M2000 and Bow-2000 products.
Attributes can be queried programmatically (from C++, Python or Go) with the gcipuinfo
library.
20.1. List of supported attributes
Attribute key string |
Description |
---|---|
|
Unique identifier of a single-IPU or multi-IPU device. |
|
Average temperature in degrees Celsius as read by the sensors on the board. |
|
Average temperature in degrees Celsius as read by IPU sensors. |
|
IPU number on board (0-1 for PCIe cards, 0-3 for IPU-Machines). |
|
The IPU board type ‘family’, for example C600 or M2000. Note: M2000 includes IPU-M2000 and Bow-2000. |
|
Current clock frequency. |
|
PCIe driver version, specified as a <major.minor.patch> triple. |
|
(IPUoF) IPU-Gateway software version, specified as a <major.minor.patch> triple. |
|
(IPUoF) Graphcore Compile Domain ID. |
|
Total remote-buffer memory available. |
|
Total remote buffer-memory in use by the IPU. |
|
Total host exchange memory available. |
|
Total host exchange memory in use by the IPU. |
|
IPU hardware architecture version. |
|
(IPUoF) IP address of IPU-Gateway. |
|
(IPUoF) Fabric server version. |
|
Percentage of time spent waiting for one or more IPU syncs, measured in the last second. |
|
Percentage of time spent waiting for one or more IPU syncs since the HSPs were set up. |
|
IPU Link correctable error count. |
|
(PCIe) PCIe link speed available. |
|
(PCIe) Number of PCIe lanes available. |
|
Maximum active code size (bytes). |
|
Maximum active data size (bytes). |
|
Maximum active stack size (bytes). |
|
Multi-IPU device the IPU belongs to. |
|
Method used to discover multi-IPU groups. |
|
NUMA node the IPU is on. |
|
(IPUoF) Number of IPU-Link segments. |
|
(IPUoF) Number of replicas in the partition. |
|
(IPUoF) partition ID. |
|
(IPUoF) sync configuration type, for example ‘c2-compatible’. |
|
PCIe device identifier. |
|
PCIe physical slot. |
|
The start time of the process currently using the IPU. |
|
(IPUoF) Set to 1 if the IPU is part of a reconfigurable partition. |
|
Set to 1 if remote buffers are supported. |
|
Serial number of the board. |
|
Total current power consumption as read by board level sensors. Not used on IPU-Machines |
|
The name of the process using the device. |
|
The username of the user using the device. |
|
The process IDs of the process using the device. |
|
(IPUoF) GW-Link routing type. |
|
(IPUoF) Identifier of IPU-Link segment. |
|
Number of Graph Compile Domains. |
|
ICU Firmware version, specified as a <major.minor.patch> triple. In development builds, this will be suffixed with branch and build information. |
|
(IPUoF) Set if error occurred while attempting to communicate with the IPUoF server (a ‘connection’ error), or if the IPUoF server was unable to use the device (a ‘device’ error) |
|
(PCIe) Host Link correctable error count. |
|
(IPUoF) IP address of the headnode where the application using this IPU is running. |
|
Error state of the IPU. Set to ‘ipu memory failure’ if the tile parity error thresholds have been exceeded. |
|
Threshold for number of parity errors to promote to a unrecoverable error. |
|
Threshold in seconds at which ‘num parity errors’ are promoted to an uncorrectable error. |
|
(IPUoF) IPU-M software version. |
|
Power consumption of a single IPU. Only available on IPU-Machines |
|
IPU Link correctable error count since device was last reset. |
|
(PCIe) Host-Link correctable error count since device was last reset. |
|
IPU board model name. This will be identical to BoardType if this product only has a single variant. |
|
Gateway write combining status. |
|
Set to 1 if the secondary interface is supported. |
|
ICU bootloader version, specified as a <major.minor.patch> triple. In development builds, this will be suffixed with branch and build information. |
Note that gc-info
and gc-inventory
will also display some attributes which are not described in the table above.
These attributes should be ignored — they are either deprecated or only used internally.
They do not have any useful meaning to end users and may be removed in future software releases.