10. Application binary interface (ABI)

This chapter describes the Colossus IPU 32-bit application binary interface (ABI).

The Executable and Linkable Format (ELF) defines a linking interface for compiled programs. This document is the processor-specific supplement for use with ELF on the 32-bit Colossus IPU. It is intended for linking objects compiled in C, C++ and assembly code.

The ELF specification can be found in: System V Application Binary Interface, Edition 4.1.

10.1. Types

The data types used by Poplar are described in the tables Scalar data types and Vector data types. In addition:

  • By default the char type is signed.

  • long is the same as int.

  • long double is the same as double.

  • The underlying type of an enumerated type is int.

  • Function pointers are the same as data pointers.

  • double and long double are not supported by Poplar targets.

  • long, long long, unsigned long, unsigned long long are not supported on the IPU.

Table 10.1 Scalar data types

Type

Size (bits)

Align (bits)

Meaning

char

8

8

Character type

short

16

16

Short integer

int

32

32

Integer

long

32

32

Integer

long long

64

64

Integer

half

16

16

16-bit IEEE float

float

32

32

32-bit IEEE float

double

64

64

64-bit IEEE float

long double

64

64

64-bit IEEE float

void *

32

32

Data pointer

Table 10.2 Vector data types

Size (bits)

Align (bits)

32

32

64

64

128

64

10.1.1. Floating point types

The IPU has hardware support for float and half. For CPU targets, half is, by default, an alias for float (and sizeof(half) will be 4).

The parameter accurateHalf can be set to true when creating a CPU target, in which case half will be correctly implemented as 16-bit IEEE floating point. This will be slower, but will produce the same results as the IPU.

Codelets should be written to be generic to the size of half so that changing this setting requires no code changes.

10.1.2. Structure types

Structure types pack according to the standard rules:

  • Field offsets are aligned according to the field’s type.

  • A structure is aligned according to the maximum alignment of its members.

  • Tail padding is added to make the structure’s size a multiple of its alignment.

10.1.3. Bit fields

The following types may be specified in a bit-field’s declaration: char, short, int, long , long long and enum.

If an enum type has negative values, enum bit-fields are signed. Otherwise, if a signed integer type of the specified width is not able to represent all enum values then enum bit-fields are unsigned. Otherwise, enum bit-fields are signed. All other bit-field types are signed unless explicitly unsigned.

Bit-fields pack from the least significant end of the allocation unit. Each non-zero bit field is allocated at the first available bit offset that allows the bit field to be placed in a properly aligned container of the declared type. Non bit-field members are allocated at the first available offset satisfying their declared type’s size and alignment constraints.

A zero-width bit-field forces padding until the next bit-offset aligned with the bit field’s declared type.

Unnamed bit-fields are allocated space in the same manner as named bit-fields.

A structure is aligned according to each of the bit field’s declared types in addition to the types of any other members. Both zero-width and unnamed bit fields are taken into account when calculating a structure’s alignment.

10.2. Vertex calling convention

Vertex functions (codelets) are only called by the supervisor thread and have no arguments.

On entry to a vertex codelet, the $mvertex_base and $mworker_base registers hold the address of the vertex state structure and the address of the worker thread’s scratch space. A vertex codelet is not required to preserve callee save registers, therefore all $m and $a registers can be used as scratch storage.

Worker vertices have no parameters (all state is accessible through the register $mvertex_base). On termination they must provide a boolean exit status to the supervisor thread, using one of the exit instructions.

10.3. Function calling convention

10.3.1. Function parameters

Function calling uses $m0 to $m3 to pass integer parameters and $a0 to $a5 to pass floating point parameters. Additional parameters are passed on the stack.

Parameters of 64 or 128 bits must be passed in aligned register pairs or quads respectively. An aligned register pair is numbered even then odd, for example $m0:1 or $m2:3. An aligned register quad starts with a register whose number modulo four is zero, for example $a0:3. Proceeding arguments can only be passed in the remaining consecutive registers.

All variadic function parameters are passed via the stack.

Scalar types smaller than 32 bits are passed as zero- or sign-extended 32-bit values.

An aggregate containing a single member is passed as if it was an argument of that member type. Otherwise aggregates are passed by passing a pointer to the aggregate. The callee is allowed to write to the pointed to aggregate.

10.3.2. Return values

Integer return values are passed in $m0 to $m3 and floating-point return values are passed in $a0 to $a3.

Scalar types smaller than 32 bits are returned as zero- or sign-extended 32-bit values.

An aggregate containing a single member is returned as if it was an argument of that member type. Otherwise, the caller passes as an implicit parameter the address of the return destination. This must be a valid address to which the return value can be written. The return destination must not alias any other memory visible to the callee.

10.3.3. Entry and exit

On entry to a function, the caller ensures the link register, $m10, holds the address of the instruction to execute after the function returns, and the stack pointer, $sp, holds a 64-bit aligned address at the top of the stack.

On exiting a function, the callee resets the stack pointer to its value on entry if it was modified, and copies the on-entry value of the link register register to the program counter, $pc.

10.3.4. Register assignments

The register assignments are described in Table 10.3 and Table 10.4.

Table 10.3 MRF register assignments for the function calling convention

Registers

Type

Usage

$m0 - $m3

Caller save

Arguments and return values

$m4 - $m6

Caller save

Scratch

$m7

Callee save

Scratch

$m8

Callee save

Scratch or base pointer

$m9

Callee save

Scratch or frame pointer

$m10

Caller save

Link register

$m11

Callee save

Stack pointer

  • $m10 register holds the address to return to when a function completes (see Entry and exit).

  • $m11 holds the base address of the stack of the current function (see Stack frame).

Table 10.4 ARF register assignments for the function calling convention

Registers

Type

Usage

$a0 - $a5

Caller save

Arguments and return values

$a6, $a7

Callee save

Scratch

10.4. Stack frame

Table 10.5 and Table 10.6 illustrates the organisation of the two stack frames when a function is called. The stack pointer grows downwards towards address 0x0. The outgoing arguments are written so that earlier arguments have smaller offsets from the stack pointer, so are at lower addresses.

See Worker stack and scratch space for information on allocating stack memory.

Table 10.5 Stack frame layout (callee)

Callee’s frame

outgoing arguments (caller writes)

Local objects and spills

Incoming arguments (callee writes)

Table 10.6 Stack frame layout (caller)

Caller’s frame

Incoming arguments (caller writes)

Local objects and spills

Incoming arguments (callee writes)