6. API reference
- 
enum DeviceDiscoveryMode
- Values: - 
enumerator DiscoverActivePartitionIPUs
- Discover all devices in current partition. - This is the default mode 
 - 
enumerator DiscoverAllPartitionIPUs
- Discover all devices across all partitions. 
 
- 
enumerator DiscoverActivePartitionIPUs
- 
class gcipuinfo
- Application Event Record key names - 
static constexpr const char *keyRecordPath = "event record path"
- The path the record was stored at. 
 - 
static constexpr const char *keyTimestamp = "timestamp"
- When the event was recorded. 
 - 
static constexpr const char *keySeverity = "severity"
- Severity level - EventSeverity. 
 - 
static constexpr const char *keyCommandLine = "command line"
- Command line of the process that recorded the event. 
 - 
static constexpr const char *keyPid = "pid"
- Process id of the process that recorded the event. 
 - 
static constexpr const char *keyAttachedIPUs = "attached ipus"
- List of any attached IPUs (device ids) 
 - 
static constexpr const char *keySpecificIPUs = "specific ipus"
- List of any specific IPUs named in this event (device ids) 
 - 
static constexpr const char *keyAttachedIPUHosts = "attached ipu hosts"
- List of all currently attached IPU-Machine hostnames. 
 - 
static constexpr const char *keySpecificIPUHosts = "specific ipu hosts"
- List of any IPU-Machine hostnames associated with the devices named in “specific ipus”. 
 - 
static constexpr const char *keyPartition = "partition"
- The partition in use by the application, if applicable. 
 - 
static constexpr const char *keyDescription = "description"
- A description of the event. 
 - Device attribute query methods. - 
bool updateData()
- Updates the attribute info. - Calling this function updates the data. Alternatively, you can call setUpdateMode() to ensure the data is always updated. - Returns
- trueif the device attributes have been successfully updated,- falseif an error occurred.
 
 - 
void setUpdateMode(bool autoUpdate)
- Sets the attribute update mode. - By default, the attribute data is queried on construction of this class. This function selects between the default behaviour, and always updating on API call. - Parameters
- autoUpdate – - trueto enable auto-update mode,- falseto use the original data.
 
 - 
std::vector<std::map<std::string, std::string>> getDevices()
- Get all attributes, for all devices. - Returns
- A - std::vectorcontaining a- std::mapfor each device.
 
 - 
std::string getDevicesAsJSON()
- Return a JSON representation of all device attributes. - Returns
- A JSON-formatted tree of devices and device attributes as a - std::string
 
 - 
std::map<std::string, std::string> getAttributesForDevice(unsigned deviceId)
- Get all attributes, for a specific devices. - Parameters
- deviceId – Device ID to query. 
- Returns
- A - std::mapfor each device.
 
 - 
std::string getNamedAttributeForDevice(unsigned deviceId, const std::string key)
- Get a specific attribute for a specific device. - Parameters
- deviceId – Device ID to query. 
- key – Device attribute name. 
 
- Returns
- The attribute. 
 
 - 
std::vector<std::string> getNamedAttributeForAll(const std::string key)
- Get a specific attribute for all devices. - Parameters
- key – Device attribute name. 
- Returns
- A - std::vectorcontaining the attribute for all devices.
 
 - Application Event Record query methods. - 
std::string getLastAppEventRecordAsJSON(EventSeverity minimumSeverity = EventSevNone, const std::string &eventRecordPath = "")
- Return a JSON-formatted string describing the last recorded application event. - If there is no recorded event, or the event has a EventSeverity below minimumSeverity, an empty JSON dictionary “{}” is returned. - Looks in the path specified by - Parameters
- eventRecordPath – for an event record. If this is empty, falls back to the path in - IPU_APP_EVENT_RECORD_PATH. If neither are set, throws an exception.
 
 - 
std::map<std::string, std::string> getLastAppEventRecord(EventSeverity minimumSeverity = EventSevNone, const std::string &eventRecordPath = "")
- Return a map describing the last recorded application event. - If there is no recorded event, or the event has a EventSeverity below minimumSeverity, an empty map is returned. - Looks in the path specified by - Parameters
- eventRecordPath – for an event record. If this is empty, falls back to the path in - IPU_APP_EVENT_RECORD_PATH. If neither are set, throws an exception.
 
 - 
EventSeverity getLastAppEventRecordSeverity(const std::string &eventRecordPath = "")
- Return the EventSeverity of the last event in the event record. - If there is no recorded event, EventSevNone is returned. - Looks in the path specified by - Parameters
- eventRecordPath – for an event record. If this is empty, falls back to the path in - IPU_APP_EVENT_RECORD_PATH. If neither are set, throws an exception.
 
 - Device health check methods. - 
std::string checkHealthOfDevices(unsigned timeoutMS, bool checkActiveIPUs = false)
- Run basic ‘health checks’ on all currently configured IPUs. - If all devices appear to be operating normally, returns an empty JSON object: If any malfunctioning devices were discovered, returns a JSON tree idenfifying the affected IPUs and the IPU-Machine host and partition they belong to. e.g.- {}There are two error types defined:- { "hosts": { "10.1.5.10": [ { "error": "device", "id": "2", "partition": "p1" }, { "error": "connection", "id": "3", "partition": "p1" } ] } } - ”connection” - the client was unable to communicate with the IPUoF server (either because of network issues or server error) within the specified timeout. 
- ”device” - the IPUoF server discovered a problem with the IPU or RNIC device. 
 - Each IPU health check must complete within timeoutMS, or else a “connection” error will be recorded. - By default, devices which are currently in use by applications are not checked, unless checkActiveIPUs is true. 
 - Public Types - 
enum EventSeverity
- Application Event Record severity level. - The severity level of an application event indicates how serious it is and potential ways of resolving the issue. - Values: - 
enumerator EventSevNone = 0
 - 
enumerator EventSevWarning = 1
- An event which may indicate a system problem. - text: “warning” 
 - 
enumerator EventSevApplicationError = 2
- An error likely in the application code or configuration. - poplar::poplar_error, poplar::application_runtime_error - text: “application” 
 - 
enumerator EventSevUndeterminedError = 3
- It is not known if this is a system error, or an error in the application code or configuration. - poplar::unknown_runtime_error - text: “undetermined” 
 - 
enumerator EventSevRequiresUserReset = 4
- The error may be resolvable by an IPU reset or a partition reset (for POD systems) or a link reset (for non-Pod systems). - poplar::recoverable_runtime_error + IPU_RESET or PARTITION_RESET or LINK_RESET - text: “requires_user_reset” 
 - 
enumerator EventSevRequiresSystemReset = 5
- The error may be resolvable by a full reboot of the IPU-M system or Poplar server. - poplar::recoverable_runtime_error + FULL_RESET - text: “requires_system_reset” 
 - 
enumerator EventSevNonRecoverable = 6
- The error may require admin-level system reconfiguration or hardware replacement. - poplar::unrecoverable_runtime_error - text: “nonrecoverable” 
 
- 
enumerator EventSevNone = 0
 - Public Functions - 
gcipuinfo(DeviceDiscoveryMode = DiscoverActivePartitionIPUs)
 
- 
static constexpr const char *keyRecordPath = "event record path"
6.1. Attribute labels
- 
const std::string IPUAttributeLabels::DeviceId
- Unique identifier of a single-IPU or multi-IPU device. - text: “id” 
- 
const std::string IPUAttributeLabels::AverageBoardTemp
- Average temperature in degrees Celsius as read by the sensors on the board. - text: “average board temp” 
- 
const std::string IPUAttributeLabels::AverageDieTemp
- Average temperature in degrees Celsius as read by IPU sensors. - text: “average die temp” 
- 
const std::string IPUAttributeLabels::BoardIpuIndex
- IPU number on board (0-1 for PCIe cards, 0-3 for IPU-Machines). - text: “board ipu index” 
- 
const std::string IPUAttributeLabels::BoardType
- The IPU board type ‘family’, for example C600 or M2000. - Note: M2000 includes IPU-M2000 and Bow-2000. text: “board type” 
- 
const std::string IPUAttributeLabels::ClockFrequency
- Current clock frequency. - text: “clock” 
- 
const std::string IPUAttributeLabels::DriverVersion
- PCIe driver version, specified as a <major.minor.patch> triple. - text: “driver version” 
- 
const std::string IPUAttributeLabels::GatewaySoftwareVersion
- (IPUoF) IPU-Gateway software version, specified as a <major.minor.patch> triple. - text: “gateway software version” 
- 
const std::string IPUAttributeLabels::GcdId
- (IPUoF) Graphcore Compile Domain ID. - text: “gcd id” 
- 
const std::string IPUAttributeLabels::HexoattTotalSize
- Total remote-buffer memory available. - text: “hexoatt total size (bytes)” 
- 
const std::string IPUAttributeLabels::HexoattActiveSize
- Total remote buffer-memory in use by the IPU. - text: “hexoatt active size (bytes)” 
- 
const std::string IPUAttributeLabels::HexoptTotalSize
- Total host exchange memory available. - text: “hexopt total size (bytes)” 
- 
const std::string IPUAttributeLabels::HexoptActiveSize
- Total host exchange memory in use by the IPU. - text: “hexopt active size (bytes)” 
- 
const std::string IPUAttributeLabels::IpuArchitecture
- IPU hardware architecture version. - text: “ipu architecture” 
- 
const std::string IPUAttributeLabels::IpuofHost
- (IPUoF) IP address of IPU-Gateway. - text: “ipuof host” 
- 
const std::string IPUAttributeLabels::IpuofServerVersion
- (IPUoF) Fabric server version. - text: “ipuof server version” 
- 
const std::string IPUAttributeLabels::IpuUtilisation
- Percentage of time spent waiting for one or more IPU syncs, measured in the last second. - text: “ipu utilisation” 
- 
const std::string IPUAttributeLabels::IpuUtilisationSession
- Percentage of time spent waiting for one or more IPU syncs since the HSPs were set up. - text: “ipu utilisation (session)” 
- 
const std::string IPUAttributeLabels::LinkCorrectableErrorCount
- IPU Link correctable error count. - text: “link correctable error count” 
- 
const std::string IPUAttributeLabels::LinkSpeed
- (PCIe) PCIe link speed available. - text: “link speed” 
- 
const std::string IPUAttributeLabels::LinkWidth
- (PCIe) Number of PCIe lanes available. - text: “link width” 
- 
const std::string IPUAttributeLabels::MaxActiveCodeSize
- Maximum active code size (bytes). - text: “max active code size (bytes)” 
- 
const std::string IPUAttributeLabels::MaxActiveDataSize
- Maximum active data size (bytes). - text: “max active data size (bytes)” 
- 
const std::string IPUAttributeLabels::MaxActiveStackSize
- Maximum active stack size (bytes). - text: “max active stack size (bytes)” 
- 
const std::string IPUAttributeLabels::MultiIpuDeviceId
- Multi-IPU device the IPU belongs to. - text: “multi-ipu device id” 
- 
const std::string IPUAttributeLabels::MultiIpuDiscoveryMethod
- Method used to discover multi-IPU groups. - text: “multi-ipu discovery method” 
- 
const std::string IPUAttributeLabels::NumaNode
- NUMA node the IPU is on. - text: “numa node” 
- 
const std::string IPUAttributeLabels::NumIpuLinkSegments
- (IPUoF) Number of IPU-Link segments. - text: “number of ipu-link segments” 
- 
const std::string IPUAttributeLabels::NumReplicas
- (IPUoF) Number of replicas in the partition. - text: “number of replicas” 
- 
const std::string IPUAttributeLabels::PartitionId
- (IPUoF) partition ID. - text: “ipuof partition id” 
- 
const std::string IPUAttributeLabels::PartitionSyncType
- (IPUoF) sync configuration type, for example ‘c2-compatible’. - text: “partition sync type” 
- 
const std::string IPUAttributeLabels::PciId
- PCIe device identifier. - text: “pci id” 
- 
const std::string IPUAttributeLabels::PhysicalSlot
- PCIe physical slot. - text: “pcie physical slot” 
- 
const std::string IPUAttributeLabels::ProcessStartTime
- The start time of the process currently using the IPU. - text: “process start time” 
- 
const std::string IPUAttributeLabels::ReconfigurablePartition
- (IPUoF) Set to 1 if the IPU is part of a reconfigurable partition. - text: “reconfigurable partition” 
- 
const std::string IPUAttributeLabels::RemoteBuffersSupported
- Set to 1 if remote buffers are supported. - text: “remote buffers supported” 
- 
const std::string IPUAttributeLabels::SerialNumber
- Serial number of the board. - text: “board serial number” 
- 
const std::string IPUAttributeLabels::TotalBoardPower
- Total current power consumption as read by board level sensors. - Not used on IPU-Machines text: “total board power” 
- 
const std::string IPUAttributeLabels::UserExecutable
- The name of the process using the device. - text: “user executable” 
- 
const std::string IPUAttributeLabels::UserName
- The username of the user using the device. - text: “user name” 
- 
const std::string IPUAttributeLabels::UserProcessId
- The process IDs of the process using the device. - text: “user process id” 
- 
const std::string IPUAttributeLabels::GatewayRoutingType
- (IPUoF) GW-Link routing type. - text: “gateway routing type” 
- 
const std::string IPUAttributeLabels::IpuLinkSegmentId
- (IPUoF) Identifier of IPU-Link segment. - text: “ipu link segment id” 
- 
const std::string IPUAttributeLabels::NumGcds
- Number of Graph Compile Domains. - text: “number of gcds” 
- 
const std::string IPUAttributeLabels::FirmwareVersion
- ICU Firmware version, specified as a <major.minor.patch> triple. - In development builds, this will be suffixed with branch and build information. text: “firmware version” 
- 
const std::string IPUAttributeLabels::IpuofServerError
- (IPUoF) Set if error occurred while attempting to communicate with the IPUoF server (a ‘connection’ error), or if the IPUoF server was unable to use the device (a ‘device’ error) text: “ipuof server error” 
- 
const std::string IPUAttributeLabels::HostLinkCorrectableErrorCount
- (PCIe) Host Link correctable error count. - text: “host link correctable error count” 
- 
const std::string IPUAttributeLabels::ApplicationHost
- (IPUoF) IP address of the headnode where the application using this IPU is running. - text: “application host” 
- 
const std::string IPUAttributeLabels::IpuErrorState
- Error state of the IPU. - Set to ‘ipu memory failure’ if the tile parity error thresholds have been exceeded. text: “ipu error state” 
- 
const std::string IPUAttributeLabels::ParityErrorCountThreshold
- Threshold for number of parity errors to promote to a unrecoverable error. - text: “parity error count threshold” 
- 
const std::string IPUAttributeLabels::ParityErrorThresholdInterval
- Threshold in seconds at which ‘num parity errors’ are promoted to an uncorrectable error. - text: “parity error threshold interval” 
- 
const std::string IPUAttributeLabels::IpumSoftwareVersion
- (IPUoF) IPU-M software version. - text: “ipum software version” 
- 
const std::string IPUAttributeLabels::IpuPower
- Power consumption of a single IPU. - Only available on IPU-Machines text: “ipu power” 
- 
const std::string IPUAttributeLabels::LinkCorrectableErrorCountSession
- IPU Link correctable error count since device was last reset. - text: “link correctable error count (session)” 
- 
const std::string IPUAttributeLabels::HostLinkCorrectableErrorCountSession
- (PCIe) Host-Link correctable error count since device was last reset. - text: “host link correctable error count (session)” 
- 
const std::string IPUAttributeLabels::BoardVariant
- IPU board model name. - This will be identical to BoardType if this product only has a single variant. text: “board variant” 
- 
const std::string IPUAttributeLabels::GatewayWriteCombining
- Gateway write combining status. - text: “gateway write combining” 
- 
const std::string IPUAttributeLabels::SecondaryPcieInterfaceSupported
- Set to 1 if the secondary interface is supported. - text: “secondary pcie interface supported” 
- 
const std::string IPUAttributeLabels::ICUBootloaderVersion
- ICU bootloader version, specified as a <major.minor.patch> triple. - In development builds, this will be suffixed with branch and build information. text: “icu bootloader version”