3. C600 register map

3.1. API version history

When commands are added to or removed from the C600 firmware, the API version will be incremented. The API version at which each command or field has been added is indicated in the following tables to ensure that the user can identify which information is expected. The API version is itself available via command 0x03, see API version for details.

API Version

ICU Version

1

2.6.3

2

2.6.7

3.2. Command list

A full listing of all SMBus commands is given in Table 3.1. More details about each of the commands can be found in Section 3.3, Command descriptions.

Table 3.1 SMBus commands

ID

Name

Bytes (r/w)

Protocol

API

0x01

Vendor ID

2

Read Word

1

0x02

Product ID

2

Read Word

1

0x03

API version

2

Read Word

1

0x04

ICU major version

2

Read Word

1

0x05

ICU minor version

2

Read Word

1

0x06

ICU patch version

2

Read Word

1

0x07

ICU version string

31 / 1

Block Process Call

1

0x08

Board public ID

24

Block Read

1

0x09

Board revision

22

Block Read

1

0x0A

Board PCB information

2

Read Word

2

0x10

System uptime

8

Block Read

1

0x12

POST status

4

Block Read

1

0x13

IPU errors

24 / 1

Block Read or Write

1

0x15

dmesg update

2

Write Word

2

0x16

dmesg read

30 / 2

Block Process Call

2

0x17

IPU status

2

Read Word

2

0x20

Clock frequency

2

Read Word

1

0x22

Frequency limiter

2 / 2

Block Read or Write

1

0x23

Board TDP limit

2 / 2

Block Read or Write

2

0x31

Board power consumption

2

Read Word

2

0x40

Temperatures - max board

2

Read Word

1

0x41

Temperatures - group A

24

Block Read

1

0x42

Max temperatures - group A

24

Block Read

1

0x43

Temperature thresholds

24

Block Read

1

0x45

Thermal status

6

Block Read

1

0x80

Driver error state

4

Block Read

1

3.3. Command descriptions

3.3.1. System information

Commands 0x00 - 0x0F are reserved for system information. These values should be constant once the ICU has booted.

ID

Name

Bytes (r/w)

Protocol

API

0x01

Vendor ID

2

Read Word

1

0x02

Product ID

2

Read Word

1

0x03

API version

2

Read Word

1

0x04

ICU major version

2

Read Word

1

0x05

ICU minor version

2

Read Word

1

0x06

ICU patch version

2

Read Word

1

0x07

ICU version string

31 / 1

Block Process Call

1

0x08

Board public ID

24

Block Read

1

0x09

Board revision

22

Block Read

1

0x0A

Board PCB information

2

Read Word

2

3.3.1.1. Vendor ID (0x01)

Read word command 0x01 returns a data word fixed to contain the Graphcore PCIe vendor ID, and used to verify that the I2C device is a Graphcore ICU.

Response:

Byte

Name

Format

Value

API

1-2

Vendor ID

uint16_t

0x1d95

1

3.3.1.2. Product ID (0x02)

Read word command 0x02 returns a data word fixed to contain the PCIe device ID of the C600 card, and used to verify that the I2C device is a Graphcore C600 PCIe card.

Response:

Byte

Name

Format

Value

API

1-2

Product ID

uint16_t

0x0600

1

3.3.1.3. API version (0x03)

Read word command 0x03 returns a data word containing the currently supported version of the SMBus command API - see API version history.

Response:

Byte

Name

Format

Unit

API

1-2

API version

uint16_t

1

3.3.1.4. ICU major version (0x04)

Read Word command 0x04 returns a data word containing the running firmware’s major version number.

Response:

Byte

Name

Format

Unit

API

1-2

ICU major version

uint16_t

1

3.3.1.5. ICU minor version (0x05)

Read Word command 0x05 returns a data word containing the running firmware’s minor version number.

Response:

Byte

Name

Format

Unit

API

1-2

ICU minor version

uint16_t

1

3.3.1.6. ICU patch version (0x06)

Read word command 0x06 returns a data word containing the running firmware’s patch version number.

Response:

Byte

Name

Format

Unit

API

1-2

ICU patch version

uint16_t

1

3.3.1.7. ICU version string (0x07)

Block process call command 0x07 returns up to 31 data words containing the running firmware’s null-terminated full version string, starting at the index supplied. If the returned string is not null-terminated, more characters can be read by repeating the command after increasing the argument index given in the block write command, and concatenated to build up the full string.

Block write command:

Byte

Name

Format

Unit

API

1

Index

uint8_t

1

Block read response:

Byte

Name

Format

Unit

API

1-31

Version string

ASCII

1

3.3.1.8. Board public ID (0x08)

Block read command 0x08 returns a 24 byte null-terminated string containing the C600 card’s human readable board description.

Response:

Byte

Name

Format

Unit

API

1-24

Board public ID

ASCII

1

3.3.1.9. Board revision (0x09)

Block read command 0x09 returns a 22 byte null-terminated string containing the C600 card’s board identification string, also known as the card serial number.

Response:

Byte

Name

Format

Unit

API

1-22

Board revision

ASCII

1

3.3.1.10. Board PCB information (0x0A)

Read Word command 0x0A returns a data word containing the PCB and BOM identification information for the C600 card.

Response:

Byte

Name

Format

Unit

API

1

PCB identifier

uint8_t

2

2

BOM identifier

uint8_t

2

3.3.2. Active system state

Commands 0x10 - 0x1F are reserved for the active system state. These values may change as the system is running.

ID

Name

Bytes (r/w)

Protocol

API

0x10

System uptime

8

Block Read

1

0x12

POST status

4

Block Read

1

0x13

IPU errors

24 / 1

Block Read or Write

1

0x15

dmesg update

2

Write Word

2

0x16

dmesg read

30 / 2

Block Process Call

2

0x17

IPU status

2

Read Word

2

3.3.2.1. System uptime (0x10)

Block Read command 0x10 returns a 64-bit integer containing the ICU’s uptime in milliseconds.

Response:

Byte

Name

Format

Unit

API

1-8

System uptime

int64_t

millisec

1

3.3.2.2. POST status (0x12)

Block read command 0x12 returns the ICU’s Power On Self Test (POST) result. Outline details for reading this result are below, however any result other than 0x0 indicates an issue was encountered during the boot sequence.

Response:

Byte

Name

Format

Unit

API

1-4

POST status

int32_t

bitfield

1

POST status bitfield:

Bit

Description

31

Failure bit. 1 if failed else 0

30

Warning bit. 1 if multiple warnings else 0

29:24

Reserved

23:16

First encountered Zephyr error code from failed/warning code unit (truncated to 8-bit)

15:8

First encountered failure/warning number from failed/warning code unit (unit specific)

7:0

gc_init_id number associated with the failed/warning code unit

3.3.2.3. IPU errors (0x13)

3.3.2.3.1. Update errors

Block write command 0x13 triggers the ICU to check for logged errors, according to the provided update request type. Up to 1000 error records can be maintained within the ICU

Block write command:

Byte

Name

Format

Unit

API

1

Update request

uint8_t

enum

1

Update request enumeration:

Value

Name

Description

API

0

Next

Ready the next error log to be read, and increments read index

1

1

Most recent

Ready the most recent log to be read. Does not update

1

2

Mark all read

Updates the read index to point to the most recent error log

1

3.3.2.3.2. Read errors

Block read command 0x13 retrieves 24 bytes of data describing the requested error log. If no more errors are available, this command will return all zeros excluding the State field. The ICU needs time to perform the action of a preceeding Block Write command - if the read is performed too soon, State may return Updating and the Block Read command should be repeated after a short delay. The error information consists of three error registers and a timestamp indicating when the error was detected.

Block Read command:

Byte

Name

Format

Unit

API

1-4

Timestamp

uint32_t

seconds

1

5-8

CMGMTEVVR

uint32_t

bitfield

1

9-12

CICERRVR

uint32_t

bitfield

1

13-16

CIUERRVR

uint32_t

bitfield

1

17-18

Index

uint16_t

1

19-20

Remaining

uint16_t

1

21-22

Bootcount

uint16_t

1

23

Source

uint8_t

enum

1

24

State

uint8_t

enum

1

Source enumeration:

Value

Name

Description

API

0

No records

No (more) errors to report

1

1

Newman reset

Logged during a Newman reset operation

1

2

Shutdown

Logged during an ICU shutdown operation

1

3

Monitor

Logged during a regular monitoring operation

1

4

Shellcheck

Logged via a development interface

1

State enumeration:

Value

Name

Description

API

1

Updating

Previous request is still being handled

1

2

Updated

Error log is ready and available

1

3

No records

No logs have been recorded

1

4

Error

An error occurred while reading the error logs

1

3.3.2.4. dmesg update (0x15)

Write Word command 0x15 triggers the ICU to cache its dmesg log. Up to 100 log messages can be cached. The most recent messages will be cached if there are more than 100 logs. The update request input can be any numeric value.

Write Word command:

Byte

Name

Format

Unit

API

1-2

Update request

uint16_t

2

3.3.2.5. dmesg read (0x16)

Block Process Call command 0x16 returns up to 29 ASCII bytes containing the ICU’s dmesg, starting at the log and message indexes supplied, along with the update state. If the returned string is not null-terminated, more characters can be read by repeating the command after increasing the argument Message index given in the Block Write command, and concatenated to build up the full string. An array of strings can be built up in a similar fashion by increasing the argument Log index. If the first character of the returned string is null-terminated, the end of the log cache has been reached.

Block Write command:

Byte

Name

Format

Unit

API

1

Message index

uint8_t

2

2

Log index

uint8_t

2

Block Read response:

Byte

Name

Format

Unit

API

1

State

uint8_t

enum

2

2-30

dmesg string

ASCII

2

State enumeration:

Value

Name

Description

API

0

Updating

Previous request is still being handled

2

1

Updated

dmesg cache is ready and available

2

2

Index error

Log index out of range

2

3

Error

An error occurred while reading the dmesg

2

3.3.2.6. IPU status (0x17)

Read Word command 0x17 returns a data word containing a bitfield representation of the IPU’s current status.

Response:

Byte

Name

Format

Unit

API

1-2

IPU status

bitfield

2

IPU status bitfield:

Bit

Description

15:3

Reserved

2

PCIe Bus Master Enable detected

1

PCIe Power Brake activated

0

IPU in use

3.3.3. Clock control

Commands 0x20 - 0x2F are reserved for clock control information and operations.

ID

Name

Bytes (r/w)

Protocol

API

0x20

Clock frequency

2

Read Word

1

0x22

Frequency limiter

2 / 2

Block Read or Write

1

0x23

Board TDP limit

2 / 2

Block Read or Write

2

3.3.3.1. Clock frequency (0x20)

Read word command 0x20 returns the status of the IPU’s current upper clock speed.

Response:

Byte

Name

Format

Unit

API

1-2

Fast PLL IPU

uint16_t

MHz

1

3.3.3.2. Frequency limiter (0x22)

Block Read or Block Write commands 0x22 are used to read or set a limit to the maximum frequency that the IPU can run at, with an upper bound of the system default maximum frequency. If set to 0xFFFF then the system default will be used.

Block read:

Byte

Name

Format

Unit

API

1-2

Frequency limit

uint16_t

MHz

1

Block write:

Byte

Name

Format

Unit

API

1-2

Frequency limit

uint16_t

MHz

1

3.3.3.3. Board TDP limit (0x23)

Reads or sets the limit for the maximum thermal design power (TDP) that the IPU on the board can consume. This value is rounded down to the nearest multiple of 12. For example a setting of 203 watts is rounded to 192 watts.

Block read:

Byte

Name

Format

Unit

API

1-2

TDP limit

uint16_t

Watts

2

Block write:

Byte

Name

Format

Unit

API

1-2

TDP limit

uint16_t

Watts

2

3.3.4. Telemetry - power usage

Commands 0x30 - 0x3F are reserved for IPU and C600 power usage information.

ID

Name

Bytes (r/w)

Protocol

API

0x31

Board power consumption

2

Read Word

2

3.3.4.1. Board power consumption (0x31)

Read Word command 0x31 returns the average board power consumption over the last second. This is a digital average of readings from the on-board sensors, which provide an analogue average of the power usage over 63ms periods.

Response:

Byte

Name

Format

Unit

API

1-2

Board power consumption

Linear11

Watts

2

3.3.5. Telemetry - temperatures

Commands 0x40 - 0x4F are reserved for IPU and C600 temperature information.

ID

Name

Bytes (r/w)

Protocol

API

0x40

Temperatures - max board

2

Read Word

1

0x41

Temperatures - group A

24

Block Read

1

0x42

Max temperatures - group A

24

Block Read

1

0x43

Temperature thresholds

24

Block Read

1

0x45

Thermal status

6

Block Read

1

3.3.5.1. Temperatures - max board (0x40)

Read Word command 0x40 returns the current maximum temperature reported by any of the temperature sensors on the C600, in whole degrees Celsius. This value should be used to control system cooling.

Response:

Byte

Name

Format

Unit

API

1-2

Max board temp (current)

int16_t

Celsius

1

3.3.5.2. Temperatures - group A (0x41)

Block Read command 0x41 reports an aggregated list of all available temperature sensors on the C600 card. All values are reported in a Linear11 representation of Celcius allowing for decimal degrees. IPU PVT values refer to sensors within the IPU chip itself, while other values are located on the C600 card at their given location.

Response:

Byte

Name

Format

Unit

API

1-2

IPU PVT east

Linear11

Celsius

1

3-4

IPU PVT west0

Linear11

Celsius

1

5-6

ADC inlet

Linear11

Celsius

1

7-8

ADC exhaust

Linear11

Celsius

1

9-10

ADC phase0 bottomside

Linear11

Celsius

1

11-12

ADC phase1 bottomside

Linear11

Celsius

1

13-14

ADC IPU chip bottomside

Linear11

Celsius

1

15-16

ADC mid

Linear11

Celsius

1

17-18

I2C inlet

Linear11

Celsius

1

19-20

I2C IPU chip

Linear11

Celsius

1

21-22

I2C exhaust

Linear11

Celsius

1

23-24

I2C mid

Linear11

Celsius

1

3.3.5.3. Max temperatures - group A (0x42)

Block read command 0x42 reports an aggregated list of the maximum temperatures reported by all available temperature sensors on the C600 card since the last card powercycle.

Response:

Byte

Name

Format

Unit

API

1-2

IPU PVT east

Linear11

Celsius

1

3-4

IPU PVT west0

Linear11

Celsius

1

5-6

ADC inlet

Linear11

Celsius

1

7-8

ADC exhaust

Linear11

Celsius

1

9-10

ADC phase0 bottomside

Linear11

Celsius

1

11-12

ADC phase1 bottomside

Linear11

Celsius

1

13-14

ADC IPU chip bottomside

Linear11

Celsius

1

15-16

ADC mid

Linear11

Celsius

1

17-18

I2C inlet

Linear11

Celsius

1

19-20

I2C IPU chip

Linear11

Celsius

1

21-22

I2C exhaust

Linear11

Celsius

1

23-24

I2C mid

Linear11

Celsius

1

3.3.5.4. Temperature thresholds (0x43)

Block Read command 0x43 reports an aggregated list of the temperature thresholds used by the different temperature sensors present on the C600 card.

Response:

Byte

Name

Format

Unit

API

1-2

PVT emergency threshold

Linear11

Celsius

1

3-4

PVT warning threshold

Linear11

Celsius

1

5-6

Thermal control maximum

Linear11

Celsius

1

7-8

Thermal control minimum

Linear11

Celsius

1

9-10

I2C inlet emergency

Linear11

Celsius

1

11-12

I2C inlet warning

Linear11

Celsius

1

13-14

I2C IPU chip emergency

Linear11

Celsius

1

15-16

I2C IPU chip warning

Linear11

Celsius

1

17-18

I2C exhaust emergency

Linear11

Celsius

1

19-20

I2C exhaust warning

Linear11

Celsius

1

21-22

I2C mid emergency

Linear11

Celsius

1

23-24

I2C mid warning

Linear11

Celsius

1

3.3.5.5. Thermal status (0x45)

Block Read command 0x45 reports information about the thermal history of the card, including active thermal events, and incrementing counters for these events since the last card powercycle.

Response:

Byte

Name

Format

Unit

API

1-2

Thermal status register

uint16_t

<bit-field>

1

3-4

Temperature warning count

uint16_t

x1

1

5-6

Temperature excess count

uint16_t

x1

1

Thermal status register

Bit

Name

API

0

Thermal shutdown active

1

1

Temperature warning active

1

2

Temperature excess active

1

3.3.6. Host messaging

Commands 0x80 - 0x8F are reserved for the PCIe host system to report information

ID

Name

Bytes (r/w)

Protocol

API

0x80

Driver error state

4

Block Read

1

3.3.6.1. Driver error state (0x80)

Block Read command 0x80 reports the error status for the card as determined by the host driver. This status is non-volatile, and once set requires the card to be RMA’d.

Response:

Byte

Name

Format

Unit

API

1-4

Driver error state

uint32_t

1