7. Supported operations
Operations (or ops) in a graph are connected by input and output tensors. Each operation is applied to the input tensors, and optionally produces output tensors. The supported operations are listed in Section 7.2, List of available operations.
You can add operations to a graph by calling operation methods within the graph
context as shown in Listing 4.1. In this example, within the
context of the main
graph, host_load
, add
and host_store
are
added to the main graph.
7.1. Data input and output
In PopXL, you can create a data transfer stream from the host to the device with:
h2d_stream(shape: Iterable[int], dtype: dtype, name: Optional[str] = None)
Then, load data through the host to the device stream with:
host_load(h2d_stream: HostToDeviceStream, name: Optional[str] = None)
where h2d_stream
handles the stream, and name
is the
name of the returned tensor.
Similarly, you can create a data transfer stream from the device to the host:
d2h_stream(shape: Iterable[int], dtype: dtype, name: Optional[str] = None)
Then, store data from device to host with:
host_store(d2h_stream: DeviceToHostStream, t: Tensor)
where t
is the tensor to be copied to the host. Note that you require a
separate host_load
or host_store
for each tensor and transfer to or from
the device respectively. However, the transfers will be internally merged into one for efficiency if the op’s schedule allows it.
7.2. List of available operations
The operations currently supported in PopXL are listed in Table 7.1, Table 7.2 and Table 7.3.
Operation 
Description 

Compute the absolute value of each element of the input tensor. 

Add two tensors elementwise. 

Add two tensors elementwise in place, in the lhs tensor. Follows NumPy broadcasting rules. Arguments must have the same dtype. 

Compute the argmax of a tensor. 

Compute the argmin of a tensor. 

Average pool a tensor. 

Apply batch normalisation to a tensor in an inference setting. 

Call a graph. 

Call a graph and return information about the call site. 

Cast a tensor to a specific data type. 

Compute the ceil of the elements of input tensor. NaN values are propagated. 

Clip all elements so they are within the range [min, max]. NaN values are propagated. 

Clip all elements so they are within the range [min, max]. NaN values are propagated. 

Concatenate tensors along an axis. The result will be copied to a new tensor. 

Concatenate tensors along an axis. 

Execute 

Execute 

Use the convolution operator on a tensor. 

Perform a scaled convolution on a float8 tensor. 

Perform a convolution transpose operation on a tensor. 

Perform a single transposed and scaled convolution operation on a tensor. 

Compute the cosine of each element of the input tensor. 

Performs the cumulative sum of the input elements along the given dimension 

Prevent gradient computation of this tensor. 

Prevent inplace gradient computation of this tensor. 

Divide two tensors elementwise. 

Randomly set elements of the input tensor to zero. 

Return a cloned slice of the input tensor. 

Update a slice of a tensor. 

Update a slice of a tensor in place. 

Apply an elementwise equality operation. 

Compute the exponential of the elements of input tensor. 

Compute the exponential of the elements of input tensor (inplace). 

Flatten a tensor. 

Flatten a tensor in place. 


Compute the floor of the elements of input tensor. NaN values are propagated. 
Compute the elementwise remainder after division (modulo operation). 

Select multiple elements from a tensor along specified axes. 

Compute the GELU activation on a tensor. 

Compute the GELU activation on a tensor (inplace). 

Compute the accurate GELU activation on a tensor. 

Compute the accurate GELU activation on a tensor (inplace). 

Computes where the first tensor is greater than the second tensor. 

Apply group normalisation to a tensor. 

Select multiple elements from a tensor along specified axes. 

Compute the histogram of the input tensor. 

Transfer a tensor from the host to the IPU. 

Transfer a tensor from the IPU to the host. 

Input is equal to the output. This can also be used to rename a Tensor. 

Increment the elements of a tensor using modulo arithmetic. 

Increment the elements of a tensor using modulo arithmetic in place. 

Create a tensor that is initialised with zero or undefined values. 

Interpolate the input tensor. Each dimension value of the output tensor is: output_dimension = floor(input_dimension * scale_factor). 

Copy a tensor to or from I/O tiles on the current IPU. 

Copy a tensor to an IPU. 

Return a boolean tensor of the same shape indicating which elements are finite (not NaN or infinity). 

Return a boolean tensor of the same shape indicating which elements are positive or negative infinity. 

Return a boolean tensor of the same shape indicating which elements are NaN. 

Compute the sum of the magnitudes of the elements in a tensor (L1 norm) along specified axes. 

Compute the square root of the sum of the squares of the elements in a tensor (L2 norm) along specified axes. 

Square each element before applying an add reduction. 

Apply layer normalisation to a tensor. 

Compute the log of the elements of input tensor. 


Compute the base2 logarithm of input tensor. Args: t (Tensor): Input tensor. Returns: Tensor: Output tensor. 
Compute the elementwise logical 

Compute the elementwise 

Compute the elementwise logical 

Compute the log of summed elements of a tensor along specified axes. 

Compute the log of the summed exponentials of elements in a tensor, along specified axes. 

Perform matrix multiplication of two tensors. 

Perform a scaled matrix multiplication between two tensors. 

Compute the maximum value of the elements in a tensor along specified axes. 

Max pool a tensor. 

Compute the elementwise maximum of N tensors. 

Compute the arithmetic mean of elements in a tensor along the specified axes. 

Compute the median of elements in a tensor along axes. 

Compute the minimum of the elements of a tensor along axes. 

Multiply two tensors elementwise. 

Perform elementwise negation (two’s complement) of a tensor. 

Compute the negative log likelihood loss. 

Compute the negative log likelihood loss. 

Produce a onehot tensor based on inputs. 

Raise the elements of 

Add a fused operation 

Add a fused operation 

Print a tensor. 

Compute the product of elements along an axis. 

Randomly sample from a normal distribution. 

Randomly sample from a uniform distribution. 

Compute the ReLU activation of a tensor. 

Compute the ReLU activation of a tensor in place. 


Copy the provided graph’s code to the destination location from remote memory. 
Load a tensor from Streaming Memory. 

Load from Streaming Memory into a specified tensor. 

Store a tensor in Streaming Memory. 

Input is equal to the output. This can also be used to rename a Tensor. 

Repeatedly call a graph. 

Repeatedly call a graph and return information about the call site. 

Reshape a tensor. 

Reshape a tensor (inplace). 

Apply pooling across each region of interest. 

Perform a scaled addition of two tensors. 

Perform a scaled addition of two tensors (inplace). 

Update the values of multiple elements in an tensor. 

Add a shaped dropout operation to the input tensor. 

Return the sign of each element in the Tensor (1, 0 or 1). NaN values have a sign of 0. 

Compute the sine of each element of the input tensor. 

Select elements from a tensor using a slice or multiple slices. 

Select elements from a tensor, in place, using a slice or multiple slices. 

Normalize the elements of a tensor along specified axes. 

Split a tensor along an axis into a list of tensors. 

Produce 

Compute the square root of the elements of a tensor. 

Remove axes of length one from the tensor. 

Subtract two tensors elementwise. 

Subsamples a tensor by selecting every n’th element from each dimension. The subsample count N is provided for each dimension. 

Sum elements over an axis. 

Compute the sum of the squares of tensor elements over an axis. 

Compute the Swish activation of a tensor. 

Compute the Swish activation of a tensor in place. 

Compute the hyperbolic tangent function elementwise on a tensor. 

Select multiple elements from an array. 

Retrieve the topK largest or smallest elements along a specified axis. 

Permute the axes of a tensor. 

Permute the axes of a tensor in place. 

Elementwise selection based on satisfying a condition. 
Operation 
Description 

Allreduce tensors across IPUs within a replica. 

Allreduce tensors across IPUs within a replica where the grad tensors of the corresponding grad op are identical. 

Allreduce tensors across IPUs within a replica where the input tensors are identical. 

Take the replicated tensor sharded slice of a Tensor. 

Gather a tensor across replicas such that the output tensor contains the values of the tensor from each replica. 

Reduce a tensor across replicas. 

Reduces tensor 

Reduce a tensor across replicas with each replica receiving a unique slice of the tensor. 

Each replica takes a equal slice of 
Operation 
Description 

Update (inplace) tensor 

Update (inplace) tensor 

Update (inplace) tensor 

Update (inplace) tensor 

Update (inplace) tensor 

Scale a tensor inplace. 

Zero the input tensor. 

Calculate an updater term to update the weights for Adam. 

Calculate the updated weight tensor for Adam or LAMB. 

Calculate an updater term to update the weights for Adamax. 

Update a tensor inplace by copying the tensor containing the updater values. 

Calculate an updater term to update the weights for LAMB. 

Apply a sparse accumulate operation to a tensor. 