6. Supported operations

Operations (or ops) in a graph are connected by input and output tensors. Each operation is applied to the input tensors, and optionally produces output tensors. The supported operations are listed in Section 6.2, List of available operations.

You can add operations to a graph by calling operation methods within the graph context as shown in Listing 3.1. In this example, within the context of the main graph, host_load, add and host_store are added to the main graph.

6.1. Data input and output

In PopXL, you can create a data transfer stream from the host to the device with:

h2d_stream(shape: Iterable[int], dtype: dtype, name: Optional[str] = None)

Then, load data through the host to the device stream with:

host_load(h2d_stream: HostToDeviceStream, name: Optional[str] = None)

where h2d_stream handles the stream, and name is the name of the returned tensor.

Similarly, you can create a data transfer stream from the device to the host:

d2h_stream(shape: Iterable[int], dtype: dtype, name: Optional[str] = None)

Then, store data from device to host with:

host_store(d2h_stream: DeviceToHostStream, t: Tensor)

where t is the tensor to be copied to the host. Note that you require a separate host_load or host_store for each tensor and transfer to or from the device respectively. However, the transfers will be internally merged into one for efficiency if the op’s schedule allows it.

6.2. List of available operations

The operations currently supported in PopXL are listed in Table 6.1, Table 6.2 and Table 6.3.

Table 6.1 Available operations in `popxl.ops`
Operation	Description
`abs()`	Compute the absolute value of each element of the input tensor.
`add()`	Add two tensors elementwise.
`add_()`	Add two tensors elementwise in place, in the lhs tensor. Follows NumPy broadcasting rules. Arguments must have the same dtype.
`argmax()`	Compute the argmax of a tensor.
`argmin()`	Compute the argmin of a tensor.
`average_pool()`	Average pool a tensor.
`batch_norm_inference()`	Apply batch normalisation to a tensor in an inference setting.
`call()`	Call a graph.
`call_with_info()`	Call a graph and return information about the call site.
`cast()`	Cast a tensor to a specific data type.
`cast_then_pow2scale()`	Add a fused operation `cast(X, dtype) * pow2(log2_scale)` to cast from floating point 8 type.
`concat()`	Concatenate tensors along an axis. The result will be copied to a new tensor.
`concat_()`	Concatenate tensors along an axis.
`conditional()`	Execute `then_branch` or `else_branch` according to the value of tensor `cond` at runtime.
`conditional_with_info()`	Execute `then_branch` or `else_branch` according to the value of tensor `cond` at runtime and return the call site info.
`conv()`	Use the convolution operator on a tensor.
`conv_pow2scaled()`	Perform a scaled convolution on a float8 tensor.
`conv_transpose()`	Perform a convolution transpose operation on a tensor.
`conv_transpose_pow2scaled()`	Perform a single transposed and scaled convolution operation on a tensor.
`cos()`	Compute the cosine of each element of the input tensor.
`cumsum()`	Performs the cumulative sum of the input elements along the given dimension `dim`.
`detach()`	Prevent gradient computation of this tensor.
`detach_()`	Prevent in-place gradient computation of this tensor.
`div()`	Divide two tensors elementwise.
`dropout()`	Randomly set elements of the input tensor to zero.
`dynamic_slice()`	Return a cloned slice of the input tensor.
`dynamic_update()`	Update a slice of a tensor.
`dynamic_update_()`	Update a slice of a tensor in place.
`equal()`	Apply an elementwise equality operation.
`exp()`	Compute the exponential of the elements of input tensor.
`exp_()`	Compute the exponential of the elements of input tensor (in-place).
`flatten()`	Flatten a tensor.
`flatten_()`	Flatten a tensor in place.
`fmod()`	Compute the elementwise remainder after division (modulo operation).
`gather()`	Select multiple elements from a tensor along specified axes.
`gelu()`	Compute the GELU activation on a tensor.
`gelu_()`	Compute the GELU activation on a tensor (in-place).
`greater()`	Computes where the first tensor is greater than the second tensor.
`group_norm()`	Apply group normalisation to a tensor.
`histogram()`	Compute the histogram of the input tensor.
`host_load()`	Transfer a tensor from the host to the IPU.
`host_store()`	Transfer a tensor from the IPU to the host.
`increment_mod()`	Increment the elements of a tensor using modulo arithmetic.
`increment_mod_()`	Increment the elements of a tensor using modulo arithmetic in place.
`init()`	Create a tensor that is initialised with zero or undefined values.
`interpolate()`	Interpolate the input tensor. Each dimension value of the output tensor is: output_dimension = floor(input_dimension * scale_factor).
`io_tile_copy()`	Copy a tensor to or from I/O tiles on the current IPU.
`ipu_copy()`	Copy a tensor to an IPU.
`l1()`	Compute the sum of the magnitudes of the elements in a tensor (L1 norm) along specified axes.
`l2()`	Compute the square root of the sum of the squares of the elements in a tensor (L2 norm) along specified axes.
`lamb_square()`	Square each element before applying an add reduction.
`layer_norm()`	Apply layer normalisation to a tensor.
`log()`	Compute the log of the elements of input tensor.
`logical_and()`	Compute the elementwise logical `and` of two tensors.
`logical_not()`	Compute the elementwise `not` of a tensor.
`logical_or()`	Compute the elementwise logical `or` of the input tensors.
`logsum()`	Compute the log of summed elements of a tensor along specified axes.
`logsumexp()`	Compute the log of the summed exponentials of elements in a tensor, along specified axes.
`matmul()`	Perform matrix multiplication of two tensors.
`matmul_pow2scaled()`	Perform a scaled matrix multiplication between two tensors.
`max()`	Compute the maximum value of the elements in a tensor along specified axes.
`max_pool()`	Max pool a tensor.
`maximum()`	Compute the elementwise maximum of N tensors.
`mean()`	Compute the arithmetic mean of elements in a tensor along the specified axes.
`median()`	Compute the median of elements in a tensor along axes.
`min()`	Compute the minimum of the elements of a tensor along axes.
`mul()`	Multiply two tensors elementwise.
`negate()`	Perform elementwise negation (two’s complement) of a tensor.
`nll_loss()`	Compute the negative log likelihood loss.
`nll_loss_with_softmax_grad()`	Compute the negative log likelihood loss.
`onehot()`	Produce a one-hot tensor based on inputs.
`pow()`	Raise the elements of `t` to the power of `e`.
`pow2scale_then_cast()`	Add a fused operation `cast(src * pow2(log2_scale), dtype)` to cast to floating point 8 data type.
`print_tensor()`	Print a tensor.
`prod()`	Compute the product of elements along an axis.
`random_normal()`	Randomly sample from a normal distribution.
`random_uniform()`	Randomly sample from a uniform distribution.
`relu()`	Compute the ReLU activation of a tensor.
`relu_()`	Compute the ReLU activation of a tensor in place.
`remote_code_load()`	Copy the provided graph’s code to the destination location from remote memory.
`remote_load()`	Load a tensor from Streaming Memory.
`remote_load_()`	Load from Streaming Memory into a specified tensor.
`remote_store()`	Store a tensor in Streaming Memory.
`repeat()`	Repeatedly call a graph.
`repeat_with_info()`	Repeatedly call a graph and return information about the call site.
`reshape()`	Reshape a tensor.
`reshape_()`	Reshape a tensor (in-place).
`roi_align()`	Apply pooling across each region of interest.
`scaled_add()`	Perform a scaled addition of two tensors.
`scaled_add_()`	Perform a scaled addition of two tensors (in-place).
`scatter()`	Update the values of multiple elements in an tensor.
`scatter_reduce()`
`shaped_dropout()`	Add a shaped dropout operation to the input tensor.
`sin()`	Compute the sine of each element of the input tensor.
`slice()`	Select elements from a tensor using a slice or multiple slices.
`slice_()`	Select elements from a tensor, in place, using a slice or multiple slices.
`softmax()`	Normalize the elements of a tensor along specified axes.
`split()`	Split a tensor along an axis into a list of tensors.
`split_random_seed()`	Produce `n` random seeds from an initial seed.
`sqrt()`	Compute the square root of the elements of a tensor.
`squeeze()`	Remove axes of length one from the tensor.
`sub()`	Subtract two tensors elementwise.
`subsample()`	Subsamples a tensor by selecting every n’th element from each dimension. The subsample count N is provided for each dimension.
`sum()`	Sum elements over an axis.
`sumsquare()`	Compute the sum of the squares of tensor elements over an axis.
`swish()`	Compute the Swish activation of a tensor.
`swish_()`	Compute the Swish activation of a tensor in place.
`tanh()`	Compute the hyperbolic tangent function elementwise on a tensor.
`tied_gather()`	Select multiple elements from an array.
`topk()`	Retrieve the top-K largest or smallest elements along a specified axis.
`transpose()`	Permute the axes of a tensor.
`transpose_()`	Permute the axes of a tensor in place.
`where()`	Elementwise selection based on satisfying a condition.

Table 6.2 Available operations in `popxl.ops.collectives`
Operation	Description
`all_reduce()`	Allreduce tensors across IPUs within a replica.
`all_reduce_identical_grad_inputs()`	Allreduce tensors across IPUs within a replica where the grad tensors of the corresponding grad op are identical.
`all_reduce_identical_inputs()`	Allreduce tensors across IPUs within a replica where the input tensors are identical.
`replica_sharded_slice()`	Take the replicated tensor sharded slice of a Tensor.
`replicated_all_gather()`	Gather a tensor across replicas such that the output tensor contains the values of the tensor from each replica.
`replicated_all_reduce()`	Reduce a tensor across replicas.
`replicated_all_reduce_()`	Reduces tensor `t` across replicas inplace on `t`.
`replicated_reduce_scatter()`	Reduce a tensor across replicas with each replica receiving a unique slice of the tensor.
`replicated_slice()`	Each replica takes a equal slice of `t` split along axis `axis`. e.g. if `t` has shape `(2,4)`, there are two replicas and `axis==0`: the first replica will output `[0:1, ...]` and the second replica `[1:2, ...]`.

Table 6.3 Available operations in `popxl.ops.var_updates`
Operation	Description
`accumulate_()`	Update (in-place) tensor `t` given updater values `X` and a factor `f` according to `t = t + (f * X)`.
`accumulate_mean_()`	Update (in-place) tensor `t` given updater values `X` and a factor `f` according to `t = (step/(step+1)) * t + (1/(step+1)) * X`.
`accumulate_moving_average_()`	Update (in-place) tensor `t` given updater values `X` and a factor `f` according to `t = (f * t) + ((1-f) * X)`.
`accumulate_moving_average_square_()`	Update (in-place) tensor `t` given updater values `X` and a factor `f` according to `t = (f * t) + ((1-f) * X^2)`.
`accumulate_square_()`	Update (in-place) tensor `t` given updater values `X` and a factor `f` according to `t = t + (f * X^2)`.
`accumulator_scale_()`	Scale a tensor in-place.
`accumulator_zero_()`	Zero the input tensor.
`adam_updater()`	Calculate an updater term to update the weights for Adam.
`adam_var_update()`	Calculate the updated weight tensor for Adam or LAMB.
`adamax_updater()`	Calculate an updater term to update the weights for Adamax.
`copy_var_update_()`	Update a tensor in-place by copying the tensor containing the updater values.
`lamb_updater()`	Calculate an updater term to update the weights for LAMB.
`sparse_accumulate_()`	Apply a sparse accumulate operation to a tensor.