14. Application binary interface (ABI)

This chapter describes the Colossus IPU 32-bit application binary interface (ABI).

The Executable and Linkable Format (ELF) defines a linking interface for compiled programs. This document is the processor-specific supplement for use with ELF on the 32-bit Colossus IPU. It is intended for linking objects compiled in C, C++ and assembly code.

The ELF specification can be found in: System V Application Binary Interface, Edition 4.1.

14.1. Types

The data types used by Poplar are described in the tables Table 4.1 and Table 4.3. In addition:

  • By default the char type is signed.

  • long is the same as int.

  • long double is the same as double.

  • The underlying type of an enumerated type is int.

  • Function pointers are the same as data pointers.

  • double and long double are not supported by Poplar targets.

  • long, long long, unsigned long, unsigned long long are not supported on the IPU.

14.1.1. Floating point types

The IPU has hardware support for float and half. For CPU targets, half is, by default, an alias for float (and sizeof(half) will be 4).

The parameter accurateHalf can be set to true when creating a CPU target, in which case half will be correctly implemented as 16-bit IEEE floating point. This will be slower, but will produce the same results as the IPU.

Codelets should be written to be generic to the size of half so that changing this setting requires no code changes.

14.1.2. Structure types

Structure types pack according to the standard rules:

  • Field offsets are aligned according to the field’s type.

  • A structure is aligned according to the maximum alignment of its members.

  • Tail padding is added to make the structure’s size a multiple of its alignment.

14.1.3. Bit fields

The following types may be specified in a bit-field’s declaration: char, short, int, long , long long and enum.

If an enum type has negative values, enum bit-fields are signed. Otherwise, if a signed integer type of the specified width is not able to represent all enum values then enum bit-fields are unsigned. Otherwise, enum bit-fields are signed. All other bit-field types are signed unless explicitly unsigned.

Bit-fields pack from the least significant end of the allocation unit. Each non-zero bit field is allocated at the first available bit offset that allows the bit field to be placed in a properly aligned container of the declared type. Non bit-field members are allocated at the first available offset satisfying their declared type’s size and alignment constraints.

A zero-width bit-field forces padding until the next bit-offset aligned with the bit field’s declared type.

Unnamed bit-fields are allocated space in the same manner as named bit-fields.

A structure is aligned according to each of the bit field’s declared types in addition to the types of any other members. Both zero-width and unnamed bit fields are taken into account when calculating a structure’s alignment.

14.2. Vertex calling convention

Vertex functions (codelets) are only called by the supervisor thread and have no arguments.

On entry to a vertex codelet, the $mvertex_base and $mworker_base registers hold the address of the vertex state structure and the address of the worker thread’s scratch space. A vertex codelet is not required to preserve callee save registers, therefore all $m and $a registers can be used as scratch storage.

Worker vertices have no parameters (all state is accessible through the register $mvertex_base). On termination they must provide a boolean exit status to the supervisor thread, using one of the exit instructions.

14.3. Function calling convention

14.3.1. Function parameters

Function calling uses $m0 to $m3 to pass integer parameters and $a0 to $a5 to pass floating point parameters. Additional parameters are passed on the stack.

Parameters of 64 or 128 bits must be passed in aligned register pairs or quads respectively. An aligned register pair is numbered even then odd, for example $m0:1 or $m2:3. An aligned register quad starts with a register whose number modulo four is zero, for example $a0:3. Proceeding arguments can only be passed in the remaining consecutive registers.

All variadic function parameters are passed via the stack.

Scalar types smaller than 32 bits are passed as zero- or sign-extended 32-bit values.

An aggregate containing a single member is passed as if it was an argument of that member type. Otherwise aggregates are passed by passing a pointer to the aggregate. The callee is allowed to write to the pointed to aggregate.

14.3.2. Return values

Integer return values are passed in $m0 to $m3 and floating-point return values are passed in $a0 to $a3.

Scalar types smaller than 32 bits are returned as zero- or sign-extended 32-bit values.

An aggregate containing a single member is returned as if it was an argument of that member type. Otherwise, the caller passes as an implicit parameter the address of the return destination. This must be a valid address to which the return value can be written. The return destination must not alias any other memory visible to the callee.

14.3.3. Entry and exit

On entry to a function, the caller ensures the link register, $m10, holds the address of the instruction to execute after the function returns, and the stack pointer, $sp, holds a 64-bit aligned address at the top of the stack.

On exiting a function, the callee resets the stack pointer to its value on entry if it was modified, and copies the on-entry value of the link register register to the program counter, $pc.

14.3.4. Register assignments

The register assignments are described in Table 14.1 and Table 14.2.

Table 14.1 MRF register assignments for the function calling convention




$m0 - $m3

Caller save

Arguments and return values

$m4 - $m6

Caller save



Callee save



Callee save

Scratch or base pointer


Callee save

Scratch or frame pointer


Caller save

Link register


Callee save

Stack pointer

Table 14.2 ARF register assignments for the function calling convention




$a0 - $a5

Caller save

Arguments and return values

$a6, $a7

Callee save


14.4. Stack frame

Table 14.3 and Table 14.4 illustrates the organisation of the two stack frames when a function is called. The stack pointer grows downwards towards address 0x0. The outgoing arguments are written so that earlier arguments have smaller offsets from the stack pointer, so are at lower addresses.

See Section 10.3, Worker stack and scratch space for information on allocating stack memory.

Table 14.3 Stack frame layout (callee)

Callee’s frame

outgoing arguments (caller writes)

Local objects and spills

Incoming arguments (callee writes)

Table 14.4 Stack frame layout (caller)

Caller’s frame

Incoming arguments (caller writes)

Local objects and spills

Incoming arguments (callee writes)