8. Custom operations

Note

PopXL custom operations are built on top of PopART custom operations (see the PopART User Guide). The main C++ code is shared between PopART and PopXL, but there are subtle differences. You do not require ONNX creators or ONNX-based shape inference for PopXL and, conversely you do need to add Python bindings and wrappers for PopXL, which are not needed for PopART custom operations. It is good practice to make your custom operations reusable in both frameworks.

PopXL ships with a large number of built-in operations as standard (see Section 7, Supported operations). However, in addition to this, PopXL has a mechanism to add custom operations. You can use this mechanism when you are unable to express the semantics you need via built-in operations. For example, if you need to target a specific Poplar/Poplibs API, or need control of the unwinding behaviour (which is how tensor layouts on device are decided).

This section explains how to add custom operations to your model via an example operation: a Leaky ReLU. This operation is akin to a conventional ReLU operation except that negative inputs produce small, negative outputs, multiplying negative values by a small non-zero scalar, \(\alpha\). That is, the Leaky ReLU operation applies the following arithmetic element-wise:

\[\begin{split}\text{LeakyReLU}(x) = \begin{cases} x &\text{ if } x \geq 0 \\ \alpha x &\text{ if } x < 0 \end{cases}\end{split}\]

Creating and using custom operations requires some environment setup and requires implementing a number of C++ types and Python bindings.

8.1. Environment

To implement a custom operation you first need to configure your environment so you can compile C++ custom operations and create Python bindings easily. To do this, ensure you have the following packages in the Python environment that you use to run your models:

pip install cppimport==21.3.7
pip install pybind11==2.6.2

The cppimport package automatically compiles and includes C++ code in Python, avoiding the use of, for example, Makefiles.

The pybind11 package is what we use to provide Python bindings for C++ code.

8.2. Parameter struct

The first step is to define a C++ struct that encapsulate the parameters that the custom operation needs. In our case, the LeakyReLU operation has one parameter, alpha, resulting in a struct defined as follows:

Listing 8.1 Struct that encapsulates Leaky ReLU parameters
/**
 * @brief Struct to encapsulate Leaky ReLU parameters.
 *
 * This structure is encapsulating the parameters/attributes logic, such that it
 * can be shared between FWD and GRAD op implementations.
 */
struct LeakyReluParams {
  float alpha = 1e-2;
  /**
   * @brief Append custom op parameters to op serialiser.
   *
   * @param os The serialised op to add the attributes to.
   */
  void appendAttributes(popart::OpSerialiserBase &os) const {
    os.appendAttribute("alpha", this->alpha);
  }

  /**
   * @brief Build from PopART attributes. Using default values if no info
   * provided.
   *
   * @param attributes The attributes to use to build the parameters with.
   * \return LeakyReluParams The object encapsulating the leaky relu parameters.
   */
  static LeakyReluParams
  makeFromAttributes(const popart::Attributes &attributes) {
    auto params  = LeakyReluParams();
    params.alpha = attributes.getAttribute<popart::Attributes::Float>(
        "alpha", params.alpha);
    return params;
  }
};

Download

Note that this struct must implement two methods: appendAttributes and makeFromAttributes.

The method appendAttributes appends the parameters from an instance of PopART’s popart::OpSerialiserBase class. This is so that two operations with different parameter values can be distinguished from each other in the IR.

The static method, makeFromAttributes, creates an instance of the parameter struct from PopART’s popart::Attributes class.

8.3. Operation class

The next step is to implement a C++ class that derives from popart::Op. This derived class will be the type used to represent your custom operation in the IR:

Listing 8.2 Intermediate representation of Leaky ReLU
/**
 * @brief Leaky ReLU gradient op.
 */
class LeakyReluGradOp
    : public ParameterizedOp<LeakyReluGradOp, LeakyReluParams> {
public:
  // Inherit constructor from ParameterizedOp<LeakyReluGradOp, LeakyReluParams>.
  using ParameterizedOp<LeakyReluGradOp, LeakyReluParams>::ParameterizedOp;

  /**
   * Create an operator identifier for LeakyReluOp.
   */
  static popart::OperatorIdentifier defaultOperatorId() {
    return popart::OperatorIdentifier{
        "custom.ops", "LeakyReluGrad", 1, {1, 1}, 1};
  }

  /**
   * @brief Determine the shapes and type of output tensors.
   */
  void setup() override { outInfo(0) = inInfo(0); };

  // GradOp-gradInputInfo begin
  const std::vector<popart::GradInOutMapper> &gradInputInfo() const override {
    static const std::vector<popart::GradInOutMapper> inInfo = {
        {0, 0, popart::GradOpInType::GradOut},
        {1, 0, popart::GradOpInType::In}};
    return inInfo;
  }
  // GradOp-gradInputInfo end

  /**
   * @brief Return mapping to associated the outputs indices of LeakyReluGradOp
   * with inputs of the LeakyReluOp. \return const std::map<int, int>&
   */
  // GradOp-gradOutToNonGradIn begin
  const std::map<int, int> &gradOutToNonGradIn() const override {
    // The Grad Op has 1 output, which is the gradient of the only input
    static const std::map<int, int> outInfo = {{0, 0}};
    return outInfo;
  }
  // GradOp-gradOutToNonGradIn end
};

Download

Instead of subclassing popart::Op directly, the LeakyReluOp class derives from the class template popart::ParameterizedOp. Using this class template means a number of the popart::Op virtual functions are implemented automatically (for example: popart::Op::clone(), popart::Op::appendAttributes()).

All custom operations should implement a static defaultOperatorId function returning an popart::OperatorIdentifier object that uniquely identifies your operation. This is a function that is assumed to exist when you generate Python bindings.

Additionally, a custom operation should implement the virtual function popart::Op::setup(). This is a function that is used to determine the shape and type of the tensors that your custom operation produces. This function requires you to set, for each output that your custom operation produces, the popart::TensorInfo object at outInfo(n), where n is the index of the output. Note that a popart::TensorInfo object holds both type and shape information. Typically, the popart::TensorInfo object for outputs is identical or in part derived from the popart::TensorInfo object for inputs. You can get the popart::TensorInfo object for an input at index n via inInfo(n).

In our example, the Leaky ReLU has one input at index 0 and one output, also at index 0. The output shape and size matches exactly that of the input, resulting in the following implementation:

   * @brief Determine the shapes and type of output tensors.
   */
  void setup() override { outInfo(0) = inInfo(0); };

Next, to add support for the autodiff transform, it is necessary to implement a popart::Op::getGradOps() function. In our IR, gradients are generated by operations themselves, and these operations are distinct from operation in the forward pass. Hence, if you need a custom operations to support autodiff, you will probably want to also define one or more custom gradient operations. A typical pattern is that the gradients for a custom operation can be defined by a single custom gradient operation. We will stick to this pattern here, but there are other ways of achieving the same result. We will define the custom gradient operation for Leaky ReLU in Section 8.5, Gradient operation class.

Note

In addition to the functions defined above, the popart::Op base class has a number of additional virtual methods that may be helpful to advanced users. For Leaky ReLU, we have implemented the bare minimum and rely on the default implementations of these virtual functions. For advanced use cases, please read the Op documentation. Most of these methods are for enabling other transforms to work with the op, like how getGradOps is overridden to enable autodiff.

8.4. Opx class

In addition to the LeakyReluOp class, which is the class used to represent Leaky ReLU operations in the IR, we also need a LeakyReluOpx class that implements the semantics of the operation. You can do this by implementing a C++ class that derives from popart::popx::Opx:

Listing 8.3 Opx implementation of Leaky ReLU
namespace pe = popops::expr;

/**
 * Leaky ReLU implementation.
 */
class LeakyReluOpx : public popart::popx::Opx {
public:
  LeakyReluOpx(popart::Op *op, popart::popx::Devicex *devicex)
      : popart::popx::Opx(op, devicex) {
    verifyOp<LeakyReluOp>(op, {CustomOperators::LeakyReluId});
  }

  /**
   * @brief Add the Poplar code for this operation to a Poplar sequence.
   * \param prog The Poplar sequence to add the operation to.
   */
  void grow(poplar::program::Sequence &prog) const override {
    auto op              = getOp<LeakyReluOp>();
    poplar::Tensor input = getInTensor(0);
    const float alpha    = op.params().alpha;
    // x < 0.0f ? alpha * x : x
    // pe::_1 here is a placeholder for the argument of the expression.
    auto expression = pe::Select(pe::Mul(pe::Const(alpha), pe::_1),
                                 pe::_1,
                                 pe::Lt(pe::_1, pe::Const(0.0f)));
    auto output     = popops::map(graph(),
                              expression,
                              {input},
                              prog,
                              debugContext("LeakyRelu"),
                              poplar::OptionFlags());
    setOutTensor(0, output);
  }
};

// Registers that LeakyReluOpx can implement ops with OperatorIdentifier
// CustomOperators::LeakyReluId
namespace {
popart::popx::OpxCreator<LeakyReluOpx>
    LeakyReluOpxCreator({CustomOperators::LeakyReluId});
}

Download

The popart::popx::Opx::grow() function is the main popart::popx::Opx function you need to implement. In this function, you are expected to add the code required to produce your operation’s outputs to a Poplar Sequence object. Then, once you have described how to compute Poplar Tensor objects for said outputs, you must use popart::popx::Opx::setOutTensor() to associate, for each output, the output index with a specific Poplar Tensor object.

Our Leaky ReLU example produces only one output tensor, and that tensor is output at index 0, so it only calls popart::popx::Opx::setOutTensor() once.

For further details on how to write Poplar programs, see the Poplar and PopLibs User Guide.

Note

Similar to the popart::Op base class, the popart::popx::Opx::grow() class has a number of additional virtual methods that may be helpful to advanced users. In particular, the methods that control the unwinding algorithm, which determines tensor layouts on device.

8.5. Gradient operation class

To fully support the autodiff transform we next define a gradient operation for Leaky ReLU, LeakyReluGradOp, which is similar to the forward operation defined in Section 8.3, Operation class:

Listing 8.4 Intermediate representation of Leaky ReLU’s gradient operation
/**
 * @brief Leaky ReLU gradient op.
 */
class LeakyReluGradOp
    : public ParameterizedOp<LeakyReluGradOp, LeakyReluParams> {
public:
  // Inherit constructor from ParameterizedOp<LeakyReluGradOp, LeakyReluParams>.
  using ParameterizedOp<LeakyReluGradOp, LeakyReluParams>::ParameterizedOp;

  /**
   * Create an operator identifier for LeakyReluOp.
   */
  static popart::OperatorIdentifier defaultOperatorId() {
    return popart::OperatorIdentifier{
        "custom.ops", "LeakyReluGrad", 1, {1, 1}, 1};
  }

  /**
   * @brief Determine the shapes and type of output tensors.
   */
  void setup() override { outInfo(0) = inInfo(0); };

  // GradOp-gradInputInfo begin
  const std::vector<popart::GradInOutMapper> &gradInputInfo() const override {
    static const std::vector<popart::GradInOutMapper> inInfo = {
        {0, 0, popart::GradOpInType::GradOut},
        {1, 0, popart::GradOpInType::In}};
    return inInfo;
  }
  // GradOp-gradInputInfo end

  /**
   * @brief Return mapping to associated the outputs indices of LeakyReluGradOp
   * with inputs of the LeakyReluOp. \return const std::map<int, int>&
   */
  // GradOp-gradOutToNonGradIn begin
  const std::map<int, int> &gradOutToNonGradIn() const override {
    // The Grad Op has 1 output, which is the gradient of the only input
    static const std::map<int, int> outInfo = {{0, 0}};
    return outInfo;
  }
  // GradOp-gradOutToNonGradIn end
};

Download

We emphasise that gradient operations are operations themselves and hence require the definition of a static defaultOperatorId function and popart::Op::setup(), as explained in Section 8.3, Operation class. Also, note that LeakyReluOp and LeakyReluGradOp share the parameter struct definition, LeakyReluParams.

The job of gradient operations (meaning the set of operations obtained by calling popart::Op::getGradOps() on a forward operation) is to perform one step of the chain rule.

For LeakyReluOp, recall that popart::Op::getGradOps() returns a single gradient operation that is LeakyReluGradOp, so we have one operation that has to produce a partial derivative for a single output of the forward operation.

Performing one step of the chain rule for LeakyReluGradOp means computing \(\frac{\partial F}{\partial x}\) for some function \(F\) (the partial derivative of \(F\) with respect to input tensor \(x\)) having been given as input \(\frac{\partial F}{\partial \text{LeakyReLU}}\) (the partial derivative of \(F\) with respect to the output of \(\text{LeakyReLU}\)) as well as any forward tensors that it needs, using the chain rule:

\[\frac{\partial F}{\partial x} = \frac{\partial F}{\partial \text{LeakyReLU}}\cdot\frac{\partial \text{LeakyReLU}}{\partial x}\]

The right-hand side of this equation (the partial derivative of \(\text{LeakyReLU}\) with respect to its input, \(x\)) is mathematically defined as follows:

\[\begin{split}\frac{\partial \text{LeakyReLU}}{\partial x}(x) = \begin{cases} 1 &\text{ if } x \geq 0 \\ \alpha &\text{ if } x < 0 \end{cases}\end{split}\]

Note that this partial derivative needs the forward tensor, \(x\), as input because it is used in the condition.

Our operation, LeakyReluGradOp, needs to calculate this right-hand side of the chain rule equation and multiply it with the left-hand side, \(\frac{\partial F}{\partial \text{LeakyReLU}}\), to obtain \(\frac{\partial F}{\partial x}\). This left-hand side is given as an input to LeakyReluGradOp. Putting this all together, using \(y\) to denote \(\text{LeakyReLU}\) (LeakyReluOp output) and \(y'\) to denote the partial derivative \(\frac{\partial F}{\partial \text{LeakyReLU}}\), then we can express the calculation that LeakyReluGradOp has to do as follows:

\[\begin{split}\text{LeakyReLUGrad}(y', x) = \begin{cases} y' &\text{ if } x \geq 0 \\ \alpha y' &\text{ if } x < 0 \end{cases}\end{split}\]

This definition is what we will use when defining the semantics of LeakyReluGradOp in Section 8.6, Gradient opx class. For now, all we need to understand is that LeakyReluGradOp consumes two tensor inputs and produces one tensor output. In terms of data type and tensor shape these inputs and outputs are all identical to the forward tensor \(x\).

This information is what we need when implementing popart::Op::gradInputInfo() and popart::Op::gradOutToNonGradIn(), which is an additional requirement we place on gradient operations.

The popart::Op::gradInputInfo() function tells the autodiff transform what input tensors an operation requires. Gradient operations can request to connect to any inputs or outputs of the forward operation, or can request to connect to a gradient of an output of the forward operation.

In this instance, the gradient operation LeakyReluGradOp asks for it’s input at index 0 to be connected with the partial derivative of LeakyReluOp’s output at index 0 (so \(\frac{\partial F}{\partial \text{LeakyReLU}}\)), and LeakyReluGradOp’s input at index 1 to be connected to the input of LeakyReluOp at index 0 (so \(x\)):

const std::vector<popart::GradInOutMapper> &gradInputInfo() const override {
  static const std::vector<popart::GradInOutMapper> inInfo = {
      {0, 0, popart::GradOpInType::GradOut},
      {1, 0, popart::GradOpInType::In}};
  return inInfo;
}

The popart::Op::gradOutToNonGradIn() function is what the autodiff transform uses to determine what outputs a gradient operation produces. Gradient operations produce gradients for some inputs of the forward graph. To do this, this function must be implemented so it returns a mapping from gradient operation output indices to forward operation input indices.

The LeakyReluGradOp operation only produces one output, \(y'\), at index 0, and it is the gradient of the LeakyReluOp’s input \(\partial x\) at index 0. The appropriate mapping is therefore as follows:

const std::map<int, int> &gradOutToNonGradIn() const override {
  // The Grad Op has 1 output, which is the gradient of the only input
  static const std::map<int, int> outInfo = {{0, 0}};
  return outInfo;
}

8.6. Gradient opx class

Similar to the forward operation we need a LeakyReluGradOpx class that implements the semantics of the gradient operation of Leaky ReLU. Again, we do this by implementing a C++ class that derives from popart::popx::Opx. Gradient operations are implemented just like the forward operations - there are no additional functions that need to be implemented.

Listing 8.5 Opx implementation of Leaky ReLU’s gradient operation
/**
 * Leaky ReLU gradient operation implementation.
 */
class LeakyReluGradOpx : public popart::popx::Opx {
public:
  LeakyReluGradOpx(popart::Op *op, popart::popx::Devicex *devicex)
      : popart::popx::Opx(op, devicex) {
    verifyOp<LeakyReluGradOp>(op, {CustomGradOperators::LeakyReluGradId});
  }

  /**
   * @brief Add the Poplar code for this operation to a Poplar sequence.
   * \param prog The Poplar sequence to add the operation to.
   */
  void grow(poplar::program::Sequence &prog) const override {
    auto op              = getOp<LeakyReluGradOp>();
    poplar::Tensor grad  = getInTensor(0);
    poplar::Tensor input = getInTensor(1);

    const float alpha = op.params().alpha;
    // (grad * (x < 0.0f ? alpha : 1))
    // pe::_1 and pe::_2 are placeholders for the arguments of the expression.
    pe::Mul expression = pe::Mul(pe::Select(pe::Const(alpha),
                                            pe::Const(1.0f),
                                            pe::Lt(pe::_2, pe::Const(0.0f))),
                                 pe::_1);
    auto output        = popops::map(graph(),
                              expression,
                              {grad, input},
                              prog,
                              debugContext("LeakyReluGrad"),
                              poplar::OptionFlags());
    setOutTensor(0, output);
  }
};

// Registers that LeakyReluGradOpx can implement ops with OperatorIdentifier
// CustomGradOperators::LeakyReluGradId
namespace {
popart::popx::OpxCreator<LeakyReluGradOpx>
    LeakyReluGradOpxCreator({CustomGradOperators::LeakyReluGradId});
}

Download

8.7. Python bindings

Warning

The Python binding function used in this section is experimental and may change in future API updates.

It is necessary to define Python bindings for your custom operation’s C++ code, so that the custom operations can be used in Python. We use the Pybind11 library to create these bindings, using an experimental template function makeParameterizedOpBindings as follows:

Listing 8.6 Creating a Python binding of LeakyReluOp using Pybind11
280/**
281 * @brief Pybind11 custom op module declaration.
282 * NOTE: make sure the name of the module correspond to the source filename!
283 */
284// cppcheck-suppress syntaxError
285PYBIND11_MODULE(leaky_relu_op_impl, m) {
286  // Bindings the parameters of the op: constructor + fields.
287  py::class_<LeakyReluParams>(m, "LeakyReluParams")
288      .def(py::init<float>(), py::arg("alpha") = 0.01)
289      .def_readwrite("alpha", &LeakyReluParams::alpha);
290
291  // Helper function to make the custom op bindings automatically (once the
292  // params are pybinded).
293  popart::ir::op::makeParameterizedOpBindings<LeakyReluOp>(m, "LeakyRelu");
294}

Download

The above binding gives us a Python module named leaky_relu_op_impl, but this module isn’t very user-friendly. In the next section we therefore make an easy-to-use Python wrapper that is user-friendly.

8.8. Python wrapper

Warning

This Python wrapper solution uses some internal PopXL definitions that likely will change in future API updates.

The last remaining step is to define a user-facing Python function which uses the Python bindings to provide a nice clean Pythonic interface for adding a Leaky ReLU to the IR, similar to the other ops in PopXL:

Listing 8.7 PopXL Python wrapper for Leaky ReLU
12
13from typing import Optional
14
15import cppimport.import_hook  # pylint: disable=unused-import
16from popxl.context import get_current_context, op_debug_context
17from popxl.ops.utils import check_in_graph
18from popxl.tensor import Tensor
19
20# The custom op and its pybinding will be automatically compiled by cppimport
21# into a module of this name.
22import leaky_relu_op_impl
23
24
25@op_debug_context
26def leaky_relu(
27    # pylint: disable=unused-argument
28    x: Tensor,
29    alpha: Optional[float] = 0.01,
30    **kwargs
31) -> Tensor:
32    """Compute the leaky relu operator element-wise on the input tensor.
33
34    See https://pytorch.org/docs/stable/generated/torch.nn.LeakyReLU.html
35
36    Args:
37        x (Tensor): The tensor to input.
38        alpha (float, optional): The value to use to determine
39            the slope of the negative axis. Defaults to 0.01.
40        **kwargs: Any additional arguments to passed. Left here as an example for
41            other operators.
42
43    Returns:
44        Tensor: The output tensor.
45    """
46    ctx = get_current_context()
47    graph = ctx.graph
48    pb_graph = graph._pb_graph
49
50    settings = ctx._get_op_settings("leaky_relu")
51    params = leaky_relu_op_impl.LeakyReluParams(alpha=alpha)
52    # Inputs & outputs mapping (checking inputs are in graph!).
53    inputs = {0: x.id}
54    outputs = {0: graph._create_tensor_id("leaky_relu_out")}
55    check_in_graph(graph, **{x.id: x})
56
57    # Building the op using default operator id
58    op = leaky_relu_op_impl.LeakyRelu.create_op_in_graph(
59        graph=pb_graph,
60        inputs=inputs,
61        outputs=outputs,
62        params=params,
63        settings=settings,
64    )
65    # Applying context all registered hooks to the new op.
66    # NOTE: crucial to support PopXL graph transforms.
67    ctx._op_created(op)
68    return Tensor._from_pb_tensor(op.outTensor(0))

Download

Note that in Listing 8.7, the module name leaky_relu_op_impl is based on the name of the Pybind11 module in Section 8.7, Python bindings.

8.9. Auto-compiling custom operations with cppimport

Although it is possible to manually compile custom operations, we recommend using cppimport to automatically compile the C++ code of custom operations. This makes for a much improved user experience of custom operations. Conveniently, cppimport will detect when your C++ source has changed and only compile when needed; it is no longer necessary to manually compile your custom operation every time you make a change.

In the remainder of this section, we assume all of your custom operation’s C++ code (not including the Python wrapper in Section 8.8, Python wrapper) lives in a single .cpp file called leaky_relu_op_impl.cpp.

To use cppimport, you first need to add the following comment to the first line of leaky_relu_op_impl.cpp:

1// cppimport

This comment is used by cppimport as an opt-in mechanism to specify the file is meant to be used with cppimport, and must appear on the first line.

Then, at the end of the file, include the following multi-line comment:

// cppimport configuration for compiling the pybind11 module.
/*
<%
cfg['extra_compile_args'] = ['-std=c++14', '-fPIC', '-shared', '-O3', '-DONNX_NAMESPACE=onnx']
cfg['libraries'] = ['popart']
setup_pybind11(cfg)
%>
*/

This contains all the information cppimport needs to compile your operation.

8.10. Using your custom operation

Finally, to use your custom operation, as highlighted in Listing 8.8, just import the leaky_relu Python wrapper function your defined in Section 8.8, Python wrapper. Then, you can use this function much like you can built-in PopXL operations:

Listing 8.8 Using a custom operation in PopXL
10
11from leaky_relu_op import leaky_relu
12
13
14def build_and_run_graph(
15    input_data: Union[float, np.ndarray], alpha: float
16) -> np.ndarray:
17    """Build a PopXL graph with the leaky relu custom op in and run it.
18
19    Args:
20        input_data (Union[float, np.ndarray]): The input data to use,
21            either a 1D float or a NumPy array of floats.
22        alpha (float): The alpha vale to use in the leaky relu op.
23
24    Returns:
25        np.ndarray: The output data array to be used for checking.
26    """
27    # Creating a model with popxl
28    ir = popxl.Ir()
29    main = ir.main_graph
30    input_array = np.array(input_data)
31    with main:
32        # host load
33        input0 = popxl.h2d_stream(input_array.shape, popxl.float32, name="in_stream")
34        a = ops.host_load(input0, "a")
35
36        # custom leaky relu.
37        o = leaky_relu(a, alpha=alpha)
38
39        # host store
40        o_d2h = popxl.d2h_stream(o.shape, o.dtype, name="out_stream")
41        ops.host_store(o_d2h, o)
42
43    with popxl.Session(ir, "ipu_model") as session:
44        outputs = session.run({input0: input_array})
45
46    print("ALPHA param:", alpha)
47    print("INPUT data:", input_data)
48    print("OUTPUT result:", outputs[o_d2h])
49
50    return outputs[o_d2h]
51
52

Download