IPU C/C++ builtins
The following IPUspecific builtin functions can be used in C/C++ code. For some of these the Tile Vertex Instruction Set Architecture is referenced. Refer to this document for more detailed information on the instructions and control and status registers (CSRs) that are targeted by these builtins.
Note
For a lot of these builtins, it is possible to omit the __builtin_ipu
prefix by using the corresponding C++ intrinsic.
See IPU C++ intrinsics for more information.
Note
Use #include <ipudef.h>
for the IPU native types mentioned throughout this section, such half
, half2
, float2
and more.
For information on nonIPU, generic Clang builtins, refer to the Clang documentation on builtin functions and this comprehensive document for GCC builtins, which Clang also aims to support.
IPU functionality and memory
Get lower half of cycle count from CSR

unsigned __builtin_ipu_get_scount_l()
Get the value of the CSR
$COUNT_L
, which is the lower 32 bits of the tile cycle counter value.
Get upper half of cycle count from CSR

unsigned __builtin_ipu_get_scount_u()
Get the value of the CSR
$COUNT_U
, which is the upper 32 bits of the tile cycle counter value.
Get vertex base from CSR

void *__builtin_ipu_get_vertex_base()
Get vertex data structure pointer from the
$VERTEX_BASE
CSR.
Get tile ID from CSR

unsigned __builtin_ipu_get_tile_id()
Get the ID of the current tile from the $TILE_ID
CSR.
Check for worker mode

bool __builtin_ipu_is_worker_mode()
Check for worker mode.
Example
#include <stdbool.h> // needed in C
bool example() {
bool res = __builtin_ipu_is_worker_mode();
return res;
}
Triplepack three addresses

uint2 __builtin_ipu_tapack(const void *addr1, const void *addr2, const void *addr3)
Convert three absolute addresses to the triplepacked address format.
Targets the
tapack
instruction.
See the Tile Vertex Instruction Set Architecture for more details about the f16v2cmpgt instruction.
Write to a CSR

void __builtin_ipu_put(unsigned val, unsigned char csr_index)
Write to a control and status register.
Targets the
put
instruction.
See the Tile Vertex Instruction Set Architecture for more details about:
the put instruction
Example
Write immediate x
to the CSR at index 32.
void example(unsigned x) {
__builtin_ipu_put(x, 32);
}
Write to an upper CSR

void __builtin_ipu_uput(unsigned val, unsigned char csr_index)

void __builtin_ipu_uput(float val, unsigned char csr_index)
Write to a control register in the upper CSR address space.
Targets the
uput
instruction.
See the Tile Vertex Instruction Set Architecture for more details about:
the uput instruction
Example
Write immediate x
to the CSR at index 2 in the upper CSR space.
void example(unsigned x) {
__builtin_ipu_uput(x, 2);
}
Note
The function prototypes shown above are the overloaded aliases that can be used by including <ipu_builtins.h>
. The pure IPU builtins __builtin_ipu_uput
and __builtin_ipu_uputf
are available without this header.
Read from a CSR

unsigned __builtin_ipu_get(unsigned char csr_index)
Read the value of a control and status register into a general purpose register.
Targets the
get
instruction.
See the Tile Vertex Instruction Set Architecture for more details about:
the get instruction
Example
Set res
to the value of the CSR at index 1.
unsigned example() {
unsigned res = __builtin_ipu_get(x, 1);
return res;
}
Read from an upper CSR

unsigned __builtin_ipu_uget(unsigned char csr_index)
Read the value of a control and status register in the upper CSR space into a general purpose register.
Targets the
uget
instruction.
See the Tile Vertex Instruction Set Architecture for more details about:
the uget instruction
Example
Set res
to the value of the CSR at index 4 in the upper CSR space.
unsigned example() {
unsigned res = __builtin_ipu_uget(x, 4);
return res;
}
Read from an upper CSR

float __builtin_ipu_ugetf(unsigned char csr_index)
Read the value of a control and status register in the upper CSR space into a general purpose register.
Targets the
uget
instruction.
See the Tile Vertex Instruction Set Architecture for more details about:
the uget instruction
Load and write 64bit value to the common configuration space

void __builtin_ipu_ld64putcs(const unsigned imm)
Load a naturallyaligned 64bit value and write it to the common compute configuration space. The load address is provided by the CSR
$CCCSLOAD
, which is automatically postincremented by 8.Targets the
ld64putcs
instruction.
See the Tile Vertex Instruction Set Architecture for more details about the ld64putcs instruction.
Load and write 128bit value to the common configuration space

void __builtin_ipu_ld128putcs(const unsigned imm)
Load a naturallyaligned 128bit value and write it to the common compute configuration space. The load address is provided by the CSR
$CCCSLOAD
, which is automatically postincremented by 16.Targets the
ld128putcs
instruction.
See the Tile Vertex Instruction Set Architecture for more details about the ld128putcs instruction.
64bit load and 64bit store, with postincrementing addresses

float2 __builtin_ipu_ldst64pace(float2 src, uint2 addr, uint stride, const unsigned imm)
Load a naturally aligned 64bit value and simultaneously store a 64bit value
src
, with two independent postincrementing addresses. The two addresses are packed into the register pairaddr
.The postincrement of the two addresses is determined by the
stride
and the 4bit immediateimm
.Targets the
ldst64pace
instruction.
See the Tile Vertex Instruction Set Architecture for more details about the ldst64pace instruction, specifically how stride
and imm
are configured and how the addresses are packed into addr
.
Note
This builtin may be used in conjunction with __builtin_ipu_tapack
.
Bit operations
And operation

int __builtin_ipu_and(int x, int y)

float __builtin_ipu_and(float x, float y)

float2 __builtin_ipu_and(float2 x, float2 y)
Get the result of the
and
bit operation of two values.Targets the
and
instruction.
See the Tile Vertex Instruction Set Architecture for more details about the and instruction.
Note
The function prototypes shown above are the overloaded aliases that can be used by including <ipu_builtins.h>
. The pure IPU builtins __builtin_ipu_and_i32
, __builtin_ipu_and_f32
and __builtin_ipu_and_v2f32
are available without this header.
Andc operation

int __builtin_ipu_andc(int x, int y)

float __builtin_ipu_andc(float x, float y)

float2 __builtin_ipu_andc(float2 x, float2 y)
Get the result of the
andc
bit operation of two values.Targets the
andc
instruction.
See the Tile Vertex Instruction Set Architecture for more details about the andc instruction.
Note
The function prototypes shown above are the overloaded aliases that can be used by including <ipu_builtins.h>
. The pure IPU builtins __builtin_ipu_andc_i32
, __builtin_ipu_andc_f32
and __builtin_ipu_andc_v2f32
are available without this header.
Or operation

int __builtin_ipu_or(int x, int y)

float __builtin_ipu_or(float x, float y)

float2 __builtin_ipu_or(float2 x, float2 y)
Get the result of the
or
bit operation of two values.Targets the
or
instruction.
See the Tile Vertex Instruction Set Architecture for more details about the or instruction.
Note
The function prototypes shown above are the overloaded aliases that can be used by including <ipu_builtins.h>
. The pure IPU builtins __builtin_ipu_or_i32
, __builtin_ipu_or_f32
and __builtin_ipu_or_v2f32
are available without this header.
Not operation

float __builtin_ipu_not(float x)

float2 __builtin_ipu_not(float2 x)
Get the result of the
not
bit operation of a value.Targets the
not
instruction.
See the Tile Vertex Instruction Set Architecture for more details about the not instruction.
Note
The function prototypes shown above are the overloaded aliases that can be used by including <ipu_builtins.h>
. The pure IPU builtins __builtin_ipu_not_f32
and __builtin_ipu_not_v2f32
are available without this header.
Reverse bytes

unsigned __builtin_ipu_bitrev8(unsigned x)
Reverses the bit order of each byte in
x
.Targets the
bitrev8
instruction.
See the Tile Vertex Instruction Set Architecture for more details about the bitrev8 instruction.
Reverse bytes

unsigned __builtin_ipu_cms(int x)
Calculates number of higher order bits that match the sign bit in
x
.Targets the
cms
instruction.
See the Tile Vertex Instruction Set Architecture for more details about the cms instruction.
SIMD roll permutation on 4x32bit values

float2 __builtin_ipu_roll32(float2 x, float2 y)
Performs SIMD roll permutation on the 4 32bit values across
x
andy
.x y > Result  3  2   1  0   2  1 
Targets the
roll32
instruction.
See the Tile Vertex Instruction Set Architecture for more details about the roll32 instruction.
SIMD rollleft permutation on 8x8bit values

unsigned __builtin_ipu_roll8l(unsigned x, unsigned y)
Performs SIMD rollleft permutation on the 8 8bit values across
x
andy
.x y > Result  7  6  5  4   3  2  1  0   6  5  4  3 
Targets the
roll8l
instruction.
See the Tile Vertex Instruction Set Architecture for more details about the roll8l instruction.
SIMD rollright permutation on 8x8bit values

unsigned __builtin_ipu_roll8r(unsigned x, unsigned y)
Performs SIMD rollright permutation on the 8 8bit values across
x
andy
.x y > Result  7  6  5  4   3  2  1  0   4  3  2  1 
Targets the
roll8r
instruction.
See the Tile Vertex Instruction Set Architecture for more details about the roll8r instruction.
Upper half of SIMD shuffle permutation on 8x8bit values

unsigned __builtin_ipu_shuf8x8hi(unsigned x, unsigned y)
Performs SIMD shuffle permutation on the 8 8bit values across
x
andy
, and returns the upper word of the result.x y > Result  7  6  5  4   3  2  1  0   7  3  6  2 
Targets the
shuf8x8hi
instruction.
See the Tile Vertex Instruction Set Architecture for more details about the shuf8x8hi instruction.
Lower half of SIMD shuffle permutation on 8x8bit values

unsigned __builtin_ipu_shuf8x8lo(unsigned x, unsigned y)
Performs SIMD shuffle permutation on the 8 8bit values across
x
andy
, and returns the lower word of the result.x y > Result  7  6  5  4   3  2  1  0   5  1  4  0 
Targets the
shuf8x8lo
instruction.
See the Tile Vertex Instruction Set Architecture for more details about the shuf8x8lo instruction.
Upper half of SIMD sort permutation on 4x32bit values

float2 __builtin_ipu_sort4x32hi(float2 x, float2 y)
Performs SIMD sort permutation on the 4 32bit values across
x
andy
, and returns the upper two words of the result.x y > Result  3  2   1  0   3  1 
Targets the
sort4x32hi
instruction.
See the Tile Vertex Instruction Set Architecture for more details about the sort4x32hi instruction.
Lower half of SIMD sort permutation on 4x32bit values

float2 __builtin_ipu_sort4x32lo(float2 x, float2 y)
Performs SIMD sort permutation on the 4 32bit values across
x
andy
, and returns the lower two words of the result.x y > Result  3  2   1  0   2  0 
Targets the
sort4x32lo
instruction.
See the Tile Vertex Instruction Set Architecture for more details about the sort4x32lo instruction.
SIMD sort8 permutation on 4x8bit values

unsigned __builtin_ipu_sort8(unsigned x)
Performs SIMD sort8 permutation on the 4 8bit values in
x
.x > Result  3  2  1  0   3  1  2  0 
Targets the
sort8
instruction.
See the Tile Vertex Instruction Set Architecture for more details about the sort8 instruction.
SIMD swap8 permutation on 4x8bit values

unsigned __builtin_ipu_swap8(unsigned x)
Performs SIMD swap8 permutation on the 4 8bit values in
x
.x > Result  3  2  1  0   2  3  0  1 
Targets the
swap8
instruction.
See the Tile Vertex Instruction Set Architecture for more details about the swap8 instruction.
Conditional ternary operator

half __builtin_ipu_select_half(half condition, half a, half b)

half2 __builtin_ipu_select_half2(half2 condition, half2 a, half2 b)

half4 __builtin_ipu_select_half4(half4 condition, half4 a, half4 b)

float __builtin_ipu_select_float(float condition, float a, float b)

float2 __builtin_ipu_select_float2(float2 condition, float2 a, float2 b)
Builtins that calculate
condition ? a : b
for float types. For the scalar variants, result will bea
ifcondition
is all 1s,b
ifcondition
is all 0s. For the vector variants, the element at an index i of the output vector will similarly depend on the ith element ofcondition
.
Float operations
Operations are supported on a number of floatingpoint number formats, for scalar and vector variables. This support is based on 7542008  IEEE Standard for FloatingPoint Arithmetic.
For details, see the section Floating Point Unit in the Tile Vertex Instruction Set Architecture.
Absolute addition of two values

half2 __builtin_ipu_absadd(half2 x, half2 y)

half4 __builtin_ipu_absadd(half4 x, half4 y)

float __builtin_ipu_absadd(float x, float y)

float2 __builtin_ipu_absadd(float2 x, float2 y)
Sum of two absolute values.
Targets the
f16v2absadd
,f16v4absadd
,f32v2absadd
andf32absadd
instructions.
See the Tile Vertex Instruction Set Architecture for more details about these instructions:
Note
The function prototypes shown above are the overloaded aliases that can be used by including <ipu_builtins.h>
. The pure IPU builtins __builtin_ipu_f16v2absadd
, __builtin_ipu_f16v4absadd
, __builtin_ipu_f32v2absadd
and __builtin_ipu_f32absadd
are available without this header.
Absolute maximum of two values

half2 __builtin_ipu_absmax(half2 x, half2 y)

half4 __builtin_ipu_absmax(half4 x, half4 y)

float __builtin_ipu_absmax(float x, float y)

float2 __builtin_ipu_absmax(float2 x, float2 y)
The maximum of two absolute values.
Targets the
f16v2absmax
,f16v4absmax
,f32v2absmax
andf32absmax
instructions.
See the Tile Vertex Instruction Set Architecture for more details about these instructions:
Note
The function prototypes shown above are the overloaded aliases that can be used by including <ipu_builtins.h>
. The pure IPU builtins __builtin_ipu_f16v2absmax
, __builtin_ipu_f16v4absmax
, __builtin_ipu_f32v2absmax
and __builtin_ipu_f32absmax
are available without this header.
Maximum of two values

half2 __builtin_ipu_max(half2 x, half2 y)

half4 __builtin_ipu_max(half4 x, half4 y)

float __builtin_ipu_max(float x, float y)

float2 __builtin_ipu_max(float2 x, float2 y)
The maximum of two values.
Targets the
f16v2max
,f16v4max
,f32v2max
andf32max
instructions.
See the Tile Vertex Instruction Set Architecture for more details about these instructions:
Note
The function prototypes shown above are the overloaded aliases that can be used by including <ipu_builtins.h>
. The pure IPU builtins __builtin_ipu_f16v2max
, __builtin_ipu_f16v4max
, __builtin_ipu_f32v2max
and __builtin_ipu_f32max
are available without this header.
Lateral maximum of two values

half2 __builtin_ipu_maxc(half2 x, half2 y)

half4 __builtin_ipu_maxc(half4 x, half4 y)

float __builtin_ipu_maxc(float x, float y)

float2 __builtin_ipu_maxc(float2 x, float2 y)
The lateral maximum of two variables.
Targets the
f16v2maxc
andf16v4maxc
instructions.
See the Tile Vertex Instruction Set Architecture for more details about these instructions:
Note
The function prototypes shown above are the overloaded aliases that can be used by including <ipu_builtins.h>
. The pure IPU builtins __builtin_ipu_f16v2maxc
and __builtin_ipu_f16v4maxc
are available without this header.
Minimum of two values

half2 __builtin_ipu_min(half2 x, half2 y)

half4 __builtin_ipu_min(half4 x, half4 y)

float __builtin_ipu_min(float x, float y)

float2 __builtin_ipu_min(float2 x, float2 y)
The minimum of two variables.
Targets the
f16v2min
,f16v4min
,f32v2min
andf32min
instructions.
See the Tile Vertex Instruction Set Architecture for more details about these instructions:
Note
The function prototypes shown above are the overloaded aliases that can be used by including <ipu_builtins.h>
. The pure IPU builtins __builtin_ipu_f16v2min
, __builtin_ipu_f16v4min
, __builtin_ipu_f32v2min
and __builtin_ipu_f32min
are available without this header.
Minofmaximum of two values

half2 __builtin_ipu_clamp(half2 x, half2 y)

half4 __builtin_ipu_clamp(half4 x, half2 y)

float __builtin_ipu_clamp(float x, float2 y)

float2 __builtin_ipu_clamp(float2 x, float2 y)
The minofmaximum of each of the elements in
x
, compared with the two elements iny
.Targets the
f16v2clamp
,f16v4clamp
,f32v2clamp
andf32clamp
instructions.
See the Tile Vertex Instruction Set Architecture for more details about these instructions:
Note
The function prototypes shown above are the overloaded aliases that can be used by including <ipu_builtins.h>
. The pure IPU builtins __builtin_ipu_f16v2clamp
, __builtin_ipu_f16v4clamp
, __builtin_ipu_f32v2clamp
and __builtin_ipu_f32clamp
are available without this header.
CMAC operation

void __builtin_ipu_cmac(half2 x, half2 y)

void __builtin_ipu_cmac(half4 x, half4 y)
Performs the CMAC operation on two values.
Targets the
f16v2cmac
andf16v4cmac
instructions.
See the Tile Vertex Instruction Set Architecture for more details about these instructions:
Note
The function prototypes shown above are the overloaded aliases that can be used by including <ipu_builtins.h>
. The pure IPU builtins __builtin_ipu_f16v2cmac
and __builtin_ipu_f16v4cmac
are available without this header.
Natural exponential

half2 __builtin_ipu_exp(half2 x)

float __builtin_ipu_exp(float x)
The natural exponential function.
Targets the
f16v2exp
andf32exp
instructions.
See the Tile Vertex Instruction Set Architecture for more details about these instructions:
Note
The function prototypes shown above are the overloaded aliases that can be used by including <ipu_builtins.h>
. The pure IPU builtin __builtin_ipu_f16v2exp
is available without this header.
2tothepowerof

half2 __builtin_ipu_exp2(half2 x)

float __builtin_ipu_exp2(float x)
Calculates 2^{x}.
Targets the
f16v2exp2
andf32exp2
instructions.
See the Tile Vertex Instruction Set Architecture for more details about these instructions:
Note
The function prototypes shown above are the overloaded aliases that can be used by including <ipu_builtins.h>
. The pure IPU builtin __builtin_ipu_f16v2exp2
is available without this header.
Natural logarithm

half2 __builtin_ipu_ln(half2 x)

float __builtin_ipu_ln(float x)
The natural logarithm function.
Targets the
f16v2ln
andf32ln
instructions.
See the Tile Vertex Instruction Set Architecture for more details about these instructions:
Note
The function prototypes shown above are the overloaded aliases that can be used by including <ipu_builtins.h>
. The pure IPU builtin __builtin_ipu_f16v2ln
is available without this header.
Base2 logarithm

half2 __builtin_ipu_log2(half2 x)

float __builtin_ipu_log2(float x)
Base2 logarithm function.
Targets the
f16v2log2
abdf32log2
instructions.
See the Tile Vertex Instruction Set Architecture for more details about these instructions:
Note
The function prototypes shown above are the overloaded aliases that can be used by including <ipu_builtins.h>
. The pure IPU builtin __builtin_ipu_f16v2log2
is available without this header.
Probabilistic mask function

half4 __builtin_ipu_rmask(half4 x, float y)

float2 __builtin_ipu_rmask(float2 x, float y)
Returns a masked version of the first argument. See the Tile Vertex Instruction Set Architecture for more information.
Targets the
f16v4rmask
andf32v2rmask
instructions.
See the Tile Vertex Instruction Set Architecture for more details about these instructions:
Note
The function prototypes shown above are the overloaded aliases that can be used by including <ipu_builtins.h>
. The pure IPU builtins __builtin_ipu_f16v4rmask
and __builtin_ipu_f32v2rmask
are available without this header.
Sigmoid function

half2 __builtin_ipu_sigm(half2 x)

float __builtin_ipu_sigm(float x)
Returns the result of the sigmoid function of a value.
Targets the
f16v2sigm
andf32sigm
instructions.
See the Tile Vertex Instruction Set Architecture for more details about these instructions:
Note
The function prototypes shown above are the overloaded aliases that can be used by including <ipu_builtins.h>
. The pure IPU builtins __builtin_ipu_f16v2sigm
and __builtin_ipu_f32sigm
are available without this header.
Lateral sum

float __builtin_ipu_sum(half2 x)

float2 __builtin_ipu_sum(half4 x)
Returns the lateral summation of the elements in
x
.Targets the
f16v2sum
andf16v4sum
instructions.
See the Tile Vertex Instruction Set Architecture for more details about these instructions:
Note
The function prototypes shown above are the overloaded aliases that can be used by including <ipu_builtins.h>
. The pure IPU builtins __builtin_ipu_f16v2sum
and __builtin_ipu_f16v4sum
are available without this header.
Tanh

half2 __builtin_ipu_tanh(half2 x)

float __builtin_ipu_tanh(float x)
Returns the result of the hyperbolic tangent function of
x
.Targets the
f16v2tanh
andf32tanh
instructions.
See the Tile Vertex Instruction Set Architecture for more details about these instructions:
Note
The function prototypes shown above are the overloaded aliases that can be used by including <ipu_builtins.h>
. The pure IPU builtin __builtin_ipu_f16v2tanh
is available without this header.
Vector product

void __builtin_ipu_f32v2aop(float2 x, float2 y, unsigned char z)
Calculates vector product of the first two arguments.
Targets the
f32v2aop
instruction.
See the Tile Vertex Instruction Set Architecture for more details about the f32v2aop instruction.
Vector sum with scalar multiplicand

float2 __builtin_ipu_f32v2axpy(float2 x, float2 y)
Calculates vector result of
ax + y
wherea
is the value of the CSR$TAS
.Targets the
f32v2axpy
instruction.
See the Tile Vertex Instruction Set Architecture for more details about the f32v2axpy instruction.
Get and initialise accumulators

half2 __builtin_ipu_gina(half2 x, unsigned int y)

float2 __builtin_ipu_gina(float2 x, unsigned int y)
Get and initialise accumulators.
Targets the
f16v2gina
andf32v2gina
instructions.
See the Tile Vertex Instruction Set Architecture for more details about these instructions:
Note
The function prototypes shown above are the overloaded aliases that can be used by including <ipu_builtins.h>
. The pure IPU builtins __builtin_ipu_f16v2gina
and __builtin_ipu_f32v2gina
are available without this header.
Float comparisons
A number of comparison instructions are provided.
For details, see the section Comparisons in the Tile Vertex Instruction Set Architecture.
Equality test

half2 __builtin_ipu_cmpeq(half2 x, half2 y)

half4 __builtin_ipu_cmpeq(half4 x, half4 y)

float __builtin_ipu_cmpeq(float x, float y)

float2 __builtin_ipu_cmpeq(float2 x, float2 y)
Elementwise equality comparison of two arguments.
Targets the
f16v2cmpeq
,f16v4cmpeq
,f32cmpeq
andf32v2cmpeq
instructions.
See the Tile Vertex Instruction Set Architecture for more details about these instructions:
Note
The function prototypes shown above are the overloaded aliases that can be used by including <ipu_builtins.h>
. The pure IPU builtins __builtin_ipu_f16v2cmpeq
, __builtin_ipu_f16v4cmpeq
, __builtin_ipu_f32cmpeq
and __builtin_ipu_f32v2cmpeq
are available without this header.
Greaterthanorequalto test

half2 __builtin_ipu_cmpge(half2 x, half2 y)

half4 __builtin_ipu_cmpge(half4 x, half4 y)

float __builtin_ipu_cmpge(float x, float y)

float2 __builtin_ipu_cmpge(float2 x, float2 y)
Elementwise greaterthanorequalto test of two arguments.
Targets the
f16v2cmpge
,f16v4cmpge
,f32cmpge
andf32v2cmpge
instructions.
See the Tile Vertex Instruction Set Architecture for more details about these instructions:
Note
The function prototypes shown above are the overloaded aliases that can be used by including <ipu_builtins.h>
. The pure IPU builtins __builtin_ipu_f16v2cmpge
, __builtin_ipu_f16v4cmpge
, __builtin_ipu_f32cmpge
and __builtin_ipu_f32v2cmpge
are available without this header.
Greaterthan test

half2 __builtin_ipu_cmpgt(half2 x, half2 y)

half4 __builtin_ipu_cmpgt(half4 x, half4 y)

float __builtin_ipu_cmpgt(float x, float y)

float2 __builtin_ipu_cmpgt(float2 x, float2 y)
Elementwise greaterthan test of two arguments.
Targets the
f16v2cmpgt
,f16v4cmpgt
,f32cmpgt
andf32v2cmpgt
instructions.
See the Tile Vertex Instruction Set Architecture for more details about these instructions:
f16v2cmpgt instruction.
Note
The function prototypes shown above are the overloaded aliases that can be used by including <ipu_builtins.h>
. The pure IPU builtins __builtin_ipu_f16v2cmpgt
, __builtin_ipu_f16v4cmpgt
, __builtin_ipu_f32cmpgt
and __builtin_ipu_f32v2cmpgt
are available without this header.
Lessthanorequalto test

half2 __builtin_ipu_cmple(half2 x, half2 y)

half4 __builtin_ipu_cmple(half4 x, half4 y)

float __builtin_ipu_cmple(float x, float y)

float2 __builtin_ipu_cmple(float2 x, float2 y)
Elementwise lessthanorequalto test of two arguments.
Targets the
f16v2cmple
,f16v4cmple
,f32cmple
andf32v2cmple
instructions.
See the Tile Vertex Instruction Set Architecture for more details about these instructions:
Note
The function prototypes shown above are the overloaded aliases that can be used by including <ipu_builtins.h>
. The pure IPU builtins __builtin_ipu_f16v2cmple
, __builtin_ipu_f16v4cmple
, __builtin_ipu_f32cmple
and __builtin_ipu_f32v2cmple
are available without this header.
Lessthan test

half2 __builtin_ipu_cmplt(half2 x, half2 y)

half4 __builtin_ipu_cmplt(half4 x, half4 y)

float __builtin_ipu_cmplt(float x, float y)

float2 __builtin_ipu_cmplt(float2 x, float2 y)
Elementwise lessthan test of two arguments.
Targets the
f16v2cmplt
,f16v4cmplt
,f32cmplt
andf32v2cmplt
instructions.
See the Tile Vertex Instruction Set Architecture for more details about these instructions:
Note
The function prototypes shown above are the overloaded aliases that can be used by including <ipu_builtins.h>
. The pure IPU builtins``__builtin_ipu_f16v2cmplt``, __builtin_ipu_f16v4cmplt
, __builtin_ipu_f32cmplt
and __builtin_ipu_f32v2cmplt
are available without this header.
Inequality test

half2 __builtin_ipu_cmpne(half2 x, half2 y)

half4 __builtin_ipu_cmpne(half4 x, half4 y)

float __builtin_ipu_cmpne(float x, float y)

float2 __builtin_ipu_cmpne(float2 x, float2 y)
Elementwise inequality test of two arguments.
Targets the
f16v2cmpne
,f16v4cmpne
,f32cmpne
andf32v2cmpne
instructions.
See the Tile Vertex Instruction Set Architecture for more details about these instructions:
Note
The function prototypes shown above are the overloaded aliases that can be used by including <ipu_builtins.h>
. The pure IPU builtins __builtin_ipu_f16v2cmpne
, __builtin_ipu_f16v4cmpne
, __builtin_ipu_f32cmpne
and __builtin_ipu_f32v2cmpne
are available without this header.
Float classification
Classify float

short2 __builtin_ipu_class(half2 num)

short4 __builtin_ipu_class(half4 num)

int __builtin_ipu_class(float num)

short2 __builtin_ipu_class(float2 num)
Floatingpoint number classifier.
The result will be one of the float class identifiers.
Targets the
f16v2class
,f16v4class
,f32class
andf32v2class
instructions.
See the Tile Vertex Instruction Set Architecture for more details about these instructions:
Note
The function prototypes shown above are the overloaded aliases that can be used by including <ipu_builtins.h>
. The pure IPU builtins __builtin_ipu_f16v2class
, __builtin_ipu_f16v4class
, __builtin_ipu_f32class
and __builtin_ipu_f32v2class
are available without this header.
Check whether floatingpoint value is finite

int __builtin_ipu_isfinite(float val)

short2 __builtin_ipu_isfinite(half2 val)

int2 __builtin_ipu_isfinite(float2 val)

short4 __builtin_ipu_isfinite(half4 val)
Check whether a floatingpoint value, whether scalar or vector, is finite and return the boolean result value as an integer type of same shape and size as the input parameter. This builtin expands to a sequence of instructions with vector floatingpoint values handled by vector code.
Note
The function prototypes shown above are the overloaded aliases that can be used by including <ipu_builtins.h>
. The pure IPU builtins __builtin_ipu_isfinite_f32
, __builtin_ipu_isfinite_v2f16
, __builtin_ipu_isfinite_v2f32
and __builtin_ipu_isfinite_v4f16
are available without this header.
Check whether floatingpoint value is infinite

int __builtin_ipu_isinf(float val)

short2 __builtin_ipu_isinf(half2 val)

int2 __builtin_ipu_isinf(float2 val)

short4 __builtin_ipu_isinf(half4 val)
Check whether a floatingpoint value, whether scalar or vector, is inf or +inf and return the boolean result value as an integer type of same shape and size as the input parameter. This builtin expands to a sequence of instructions with vector floatingpoint values handled by vector code.
Note
The function prototypes shown above are the overloaded aliases that can be used by including <ipu_builtins.h>
. The pure IPU builtins __builtin_ipu_isinf_f32
, __builtin_ipu_isinf_v2f16
, __builtin_ipu_isinf_v2f32
and __builtin_ipu_isinf_v4f16
are available without this header.
Check whether floatingpoint value is NaN

int __builtin_ipu_isnan(float val)

short2 __builtin_ipu_isnan(half2 val)

int2 __builtin_ipu_isnan(float2 val)

short4 __builtin_ipu_isnan(half4 val)
Check whether a floatingpoint value, whether scalar or vector, is not a number (NaN) and return the boolean result value in an integer type of same shape and size as the input parameter. This builtin expands to a sequence of instructions with vector floatingpoint values handled by vector code.
Note
The function prototypes shown above are the overloaded aliases that can be used by including <ipu_builtins.h>
. The pure IPU builtins __builtin_ipu_isnan_f32
, __builtin_ipu_isnan_v2f16
, __builtin_ipu_isnan_v2f32
and __builtin_ipu_isnan_v4f16
are available without this header.
Random number generation
The IPU hardware includes a pseudorandom number generator (PRNG) and allows for the generation of random values sampled from both the discrete uniform distribution and a quantized 12th degree IrwinHall distribution (an approximation to the Normal dis
tribution). The PRNG algorithm used is described in A Fast Hardware Pseudorandom Number Generator Based on xoroshiro128 <https://ieeexplore.ieee.org/document/9875973>`__
.
The period of the IPU PRNG, which is the length of the unique sequence produced, 2 ^{128}1.
For more detail, see the section Pseudorandom number generator in the Tile Vertex Instruction Set Architecture.
Generate half2 vector using Gaussian distribution

half2 __builtin_ipu_f16v2grand()
Generate a Gaussian distribution, twoelement halfprecision random vector in the range [5 \(\frac{13}{16}\), 5 \(\frac{13}{16}\)].
Targets the
f16v2grand
instruction.
See the Tile Vertex Instruction Set Architecture for more details about the f16v2grand instruction.
Generate float2 vector using Gaussian distribution

float2 __builtin_ipu_f32v2grand()
Generate a Gaussian distribution, twoelement singlesprecision random vector in the range [5 \(\frac{13}{16}\), 5 \(\frac{13}{16}\)].
Targets the
f32v2grand
instruction.
See the Tile Vertex Instruction Set Architecture for more details about the f32v2grand instruction.
Generate random 32bit integer

unsigned __builtin_ipu_urand32()
Generate a uniform distribution, 32bit random integer in the range [0, 2 ^{32}1] of length .
Targets the
urand32
instruction.
See the Tile Vertex Instruction Set Architecture for more details about the urand32 instruction.
Generate random 64bit integer

unsigned long long __builtin_ipu_urand64()
Generate a uniform distribution, 64bit random integer in the range [0, 2 ^{64}1].
Targets the
urand64
instruction.
See the Tile Vertex Instruction Set Architecture for more details about the urand64 instruction.
Generate random 16bit float

half __builtin_ipu_urand_f16()
Generate a uniform distribution, 16bit random float (
half
) in the range [0.5, 0.5].
Generate random 32bit float

float __builtin_ipu_urand_f32()
Generate a uniform distribution, 32bit random
float
in the range [0.5, 0.5].