14. Application binary interface (ABI)
This chapter describes the Colossus IPU 32-bit application binary interface (ABI).
The Executable and Linkable Format (ELF) defines a linking interface for compiled programs. This document is the processor-specific supplement for use with ELF on the 32-bit Colossus IPU. It is intended for linking objects compiled in C, C++ and assembly code.
The ELF specification can be found in: System V Application Binary Interface, Edition 4.1.
14.1. Types
The data types used by Poplar are described in the tables Table 4.1 and Table 4.3. In addition:
By default the
char
type is signed.long
is the same asint
.long double
is the same asdouble
.The underlying type of an enumerated type is
int
.Function pointers are the same as data pointers.
double
andlong double
are not supported by Poplar targets.long
,long long
,unsigned long
,unsigned long long
are not supported on the IPU.
14.1.1. Floating point types
The IPU has hardware support for float
and half
. For CPU targets,
half
is, by default, an alias for float (and sizeof(half)
will be 4).
The parameter accurateHalf
can be set to true when creating a CPU target,
in which case half
will be correctly implemented as 16-bit IEEE floating
point. This will be slower, but will produce the same results as the IPU.
Codelets should be written to be generic to the size of half
so that
changing this setting requires no code changes.
14.1.2. Structure types
Structure types pack according to the standard rules:
Field offsets are aligned according to the field’s type.
A structure is aligned according to the maximum alignment of its members.
Tail padding is added to make the structure’s size a multiple of its alignment.
14.1.3. Bit fields
The following types may be specified in a bit-field’s declaration:
char
, short
, int
, long
, long long
and enum
.
If an enum
type has negative values, enum
bit-fields are signed.
Otherwise, if a signed integer type of the specified width is not able to
represent all enum
values then enum bit-fields are unsigned. Otherwise,
enum
bit-fields are signed. All other bit-field types are signed unless
explicitly unsigned.
Bit-fields pack from the least significant end of the allocation unit. Each non-zero bit field is allocated at the first available bit offset that allows the bit field to be placed in a properly aligned container of the declared type. Non bit-field members are allocated at the first available offset satisfying their declared type’s size and alignment constraints.
A zero-width bit-field forces padding until the next bit-offset aligned with the bit field’s declared type.
Unnamed bit-fields are allocated space in the same manner as named bit-fields.
A structure is aligned according to each of the bit field’s declared types in addition to the types of any other members. Both zero-width and unnamed bit fields are taken into account when calculating a structure’s alignment.
14.2. Vertex calling convention
Vertex functions (codelets) are only called by the supervisor thread and have no arguments.
On entry to a vertex codelet, the $mvertex_base
and
$mworker_base
registers hold the address of the vertex state structure and the
address of the worker thread’s scratch space. A vertex codelet is
not required to preserve callee save registers, therefore all $m
and $a
registers can be used as scratch storage.
Worker vertices have no parameters (all state is accessible through
the register $mvertex_base
). On termination they must provide a
boolean exit status to the
supervisor thread, using one of the exit instructions.
14.3. Function calling convention
14.3.1. Function parameters
Function calling uses $m0
to $m3
to pass integer parameters and $a0
to $a5
to pass floating point parameters. Additional parameters are passed
on the stack.
Parameters of 64 or 128 bits must be passed in aligned register pairs or
quads respectively. An aligned register pair is numbered even then odd, for
example $m0:1
or $m2:3
. An aligned register quad starts with a register
whose number modulo four is zero, for example $a0:3
. Proceeding arguments
can only be passed in the remaining consecutive registers.
All variadic function parameters are passed via the stack.
Scalar types smaller than 32 bits are passed as zero- or sign-extended 32-bit values.
An aggregate containing a single member is passed as if it was an argument of that member type. Otherwise aggregates are passed by passing a pointer to the aggregate. The callee is allowed to write to the pointed to aggregate.
14.3.2. Return values
Integer return values are passed in $m0
to $m3
and floating-point
return values are passed in $a0
to $a3
.
Scalar types smaller than 32 bits are returned as zero- or sign-extended 32-bit values.
An aggregate containing a single member is returned as if it was an argument of that member type. Otherwise, the caller passes as an implicit parameter the address of the return destination. This must be a valid address to which the return value can be written. The return destination must not alias any other memory visible to the callee.
14.3.3. Entry and exit
On entry to a function, the caller ensures the link register, $m10
, holds
the address of the instruction to execute after the function returns, and the
stack pointer, $sp
, holds a 64-bit aligned address at the top of the stack.
On exiting a function, the callee resets the stack pointer to its value on
entry if it was modified, and copies the on-entry value of the link register
register to the program counter, $pc
.
14.3.4. Register assignments
The register assignments are described in Table 14.1 and Table 14.2.
Registers |
Type |
Usage |
---|---|---|
|
Caller save |
Arguments and return values |
|
Caller save |
Scratch |
|
Callee save |
Scratch |
|
Callee save |
Scratch or base pointer |
|
Callee save |
Scratch or frame pointer |
|
Caller save |
Link register |
|
Callee save |
Stack pointer |
$m10
register holds the address to return to when a function completes (see Section 14.3.3, Entry and exit).$m11
holds the base address of the stack of the current function (see Section 14.4, Stack frame).
Registers |
Type |
Usage |
---|---|---|
|
Caller save |
Arguments and return values |
|
Callee save |
Scratch |
14.4. Stack frame
Table 14.3 and Table 14.4 illustrates the organisation of the two stack frames when a function is called. The stack pointer grows downwards towards address 0x0. The outgoing arguments are written so that earlier arguments have smaller offsets from the stack pointer, so are at lower addresses.
See Section 10.3, Worker stack and scratch space for information on allocating stack memory.
Callee’s frame |
---|
outgoing arguments (caller writes) |
Local objects and spills |
Incoming arguments (callee writes) |
Caller’s frame |
---|
Incoming arguments (caller writes) |
Local objects and spills |
Incoming arguments (callee writes) |