8. Custom operations
Note
PopXL custom operations are built on top of PopART custom operations (see the PopART User Guide). The main C++ code is shared between PopART and PopXL, but there are subtle differences. You do not require ONNX creators or ONNX-based shape inference for PopXL and, conversely you do need to add Python bindings and wrappers for PopXL, which are not needed for PopART custom operations. It is good practice to make your custom operations reusable in both frameworks.
PopXL ships with a large number of built-in operations as standard (see Section 7, Supported operations). However, in addition to this, PopXL has a mechanism to add custom operations. You can use this mechanism when you are unable to express the semantics you need via built-in operations. For example, if you need to target a specific Poplar/Poplibs API, or need control of the unwinding behaviour (which is how tensor layouts on device are decided).
This section explains how to add custom operations to your model via an example operation: a Leaky ReLU. This operation is akin to a conventional ReLU operation except that negative inputs produce small, negative outputs, multiplying negative values by a small non-zero scalar, \(\alpha\). That is, the Leaky ReLU operation applies the following arithmetic element-wise:
Creating and using custom operations requires some environment setup and requires implementing a number of C++ types and Python bindings.
8.1. Environment
To implement a custom operation you first need to configure your environment so you can compile C++ custom operations and create Python bindings easily. To do this, ensure you have the following packages in the Python environment that you use to run your models:
pip install cppimport==21.3.7
pip install pybind11==2.6.2
The cppimport
package automatically compiles and includes C++ code in
Python, avoiding the use of, for example, Makefiles.
The pybind11
package is what we use to provide Python bindings for C++ code.
8.2. Parameter struct
The first step is to define a C++ struct that encapsulate the parameters that
the custom operation needs. In our case, the LeakyReLU operation has one
parameter, alpha
, resulting in a struct defined as follows:
/**
* @brief Struct to encapsulate Leaky ReLU parameters.
*
* This structure is encapsulating the parameters/attributes logic, such that it
* can be shared between FWD and GRAD op implementations.
*/
struct LeakyReluParams {
float alpha = 1e-2;
/**
* @brief Append custom op parameters to op serialiser.
*
* @param os The serialised op to add the attributes to.
*/
void appendAttributes(popart::OpSerialiserBase &os) const {
os.appendAttribute("alpha", this->alpha);
}
/**
* @brief Build from PopART attributes. Using default values if no info
* provided.
*
* @param attributes The attributes to use to build the parameters with.
* \return LeakyReluParams The object encapsulating the leaky relu parameters.
*/
static LeakyReluParams
makeFromAttributes(const popart::Attributes &attributes) {
auto params = LeakyReluParams();
params.alpha = attributes.getAttribute<popart::Attributes::Float>(
"alpha", params.alpha);
return params;
}
};
Note that this struct must implement two methods: appendAttributes
and
makeFromAttributes
.
The method appendAttributes
appends the parameters from an instance of
PopART’s popart::OpSerialiserBase
class. This is so
that two operations with different parameter values can be distinguished from
each other in the IR.
The static method, makeFromAttributes
, creates an instance of the parameter
struct from PopART’s popart::Attributes
class.
8.3. Operation class
The next step is to implement a C++ class that derives from
popart::Op
. This derived class will be the type
used to represent your custom operation in the IR:
/**
* @brief Leaky ReLU gradient op.
*/
class LeakyReluGradOp
: public ParameterizedOp<LeakyReluGradOp, LeakyReluParams> {
public:
// Inherit constructor from ParameterizedOp<LeakyReluGradOp, LeakyReluParams>.
using ParameterizedOp<LeakyReluGradOp, LeakyReluParams>::ParameterizedOp;
/**
* Create an operator identifier for LeakyReluOp.
*/
static popart::OperatorIdentifier defaultOperatorId() {
return popart::OperatorIdentifier{
"custom.ops", "LeakyReluGrad", 1, {1, 1}, 1};
}
/**
* @brief Determine the shapes and type of output tensors.
*/
void setup() override { outInfo(0) = inInfo(0); };
// GradOp-gradInputInfo begin
const std::vector<popart::GradInOutMapper> &gradInputInfo() const override {
static const std::vector<popart::GradInOutMapper> inInfo = {
{0, 0, popart::GradOpInType::GradOut},
{1, 0, popart::GradOpInType::In}};
return inInfo;
}
// GradOp-gradInputInfo end
/**
* @brief Return mapping to associated the outputs indices of LeakyReluGradOp
* with inputs of the LeakyReluOp. \return const std::map<int, int>&
*/
// GradOp-gradOutToNonGradIn begin
const std::map<int, int> &gradOutToNonGradIn() const override {
// The Grad Op has 1 output, which is the gradient of the only input
static const std::map<int, int> outInfo = {{0, 0}};
return outInfo;
}
// GradOp-gradOutToNonGradIn end
};
Instead of subclassing popart::Op
directly,
the LeakyReluOp
class derives from the class template
popart::ParameterizedOp
. Using this class template
means a number of the popart::Op
virtual functions
are implemented automatically (for example:
popart::Op::clone()
,
popart::Op::appendAttributes()
).
All custom operations should implement a static defaultOperatorId
function
returning an popart::OperatorIdentifier
object
that uniquely identifies your operation. This is a function that is assumed to
exist when you generate Python bindings.
Additionally, a custom operation should implement the virtual function
popart::Op::setup()
. This is a function that is used
to determine the shape and type of the tensors that your custom operation
produces. This function requires you to set, for each output that your custom
operation produces, the popart::TensorInfo
object
at outInfo(n)
, where n
is the index of the output. Note that a
popart::TensorInfo
object holds both type and shape
information. Typically, the popart::TensorInfo
object for outputs is identical or in part derived from the
popart::TensorInfo
object for inputs. You can get
the popart::TensorInfo
object for an input at index
n
via inInfo(n)
.
In our example, the Leaky ReLU has one input at index 0 and one output, also at index 0. The output shape and size matches exactly that of the input, resulting in the following implementation:
* @brief Determine the shapes and type of output tensors.
*/
void setup() override { outInfo(0) = inInfo(0); };
Next, to add support for the autodiff transform, it is
necessary to implement a popart::Op::getGradOps()
function. In our IR, gradients are generated by operations themselves, and these
operations are distinct from operation in the forward pass. Hence, if you need a
custom operations to support autodiff, you will probably want
to also define one or more custom gradient operations. A typical pattern is
that the gradients for a custom operation can be defined by a single custom
gradient operation. We will stick to this pattern here, but there are other ways
of achieving the same result. We will define the custom gradient operation for
Leaky ReLU in
Section 8.5, Gradient operation class.
Note
In addition to the functions defined above, the
popart::Op
base class has a number of additional
virtual methods that may be helpful to advanced users. For Leaky ReLU, we
have implemented the bare minimum and rely on the default implementations of
these virtual functions. For advanced use cases, please read the
Op documentation
.
Most of these methods are for
enabling other transforms to work with the op, like how getGradOps
is
overridden to enable autodiff
.
8.4. Opx class
In addition to the LeakyReluOp
class, which is the class used to represent
Leaky ReLU operations in the IR, we also need a LeakyReluOpx
class that
implements the semantics of the operation. You can do this by implementing a C++
class that derives from popart::popx::Opx
:
namespace pe = popops::expr;
/**
* Leaky ReLU implementation.
*/
class LeakyReluOpx : public popart::popx::Opx {
public:
LeakyReluOpx(popart::Op *op, popart::popx::Devicex *devicex)
: popart::popx::Opx(op, devicex) {
verifyOp<LeakyReluOp>(op, {CustomOperators::LeakyReluId});
}
/**
* @brief Add the Poplar code for this operation to a Poplar sequence.
* \param prog The Poplar sequence to add the operation to.
*/
void grow(poplar::program::Sequence &prog) const override {
auto op = getOp<LeakyReluOp>();
poplar::Tensor input = getInTensor(0);
const float alpha = op.params().alpha;
// x < 0.0f ? alpha * x : x
// pe::_1 here is a placeholder for the argument of the expression.
auto expression = pe::Select(pe::Mul(pe::Const(alpha), pe::_1),
pe::_1,
pe::Lt(pe::_1, pe::Const(0.0f)));
auto output = popops::map(graph(),
expression,
{input},
prog,
debugContext("LeakyRelu"),
poplar::OptionFlags());
setOutTensor(0, output);
}
};
// Registers that LeakyReluOpx can implement ops with OperatorIdentifier
// CustomOperators::LeakyReluId
namespace {
popart::popx::OpxCreator<LeakyReluOpx>
LeakyReluOpxCreator({CustomOperators::LeakyReluId});
}
The popart::popx::Opx::grow()
function is the main
popart::popx::Opx
function you need to implement.
In this function, you are expected to add the code required to
produce your operation’s outputs to a Poplar
Sequence
object. Then, once you have described how to compute
Poplar Tensor
objects for said outputs, you
must use popart::popx::Opx::setOutTensor()
to
associate, for each output, the output index with a specific Poplar
Tensor
object.
Our Leaky ReLU example produces only one output tensor, and that tensor is
output at index 0, so it only calls
popart::popx::Opx::setOutTensor()
once.
For further details on how to write Poplar programs, see the Poplar and PopLibs User Guide.
Note
Similar to the popart::Op
base class, the
popart::popx::Opx::grow()
class has a number
of additional virtual methods that may be helpful to advanced users. In
particular, the methods that control the unwinding algorithm, which
determines tensor layouts on device.
8.5. Gradient operation class
To fully support the autodiff transform we next define a
gradient operation for Leaky ReLU, LeakyReluGradOp
, which is similar to
the forward operation defined in Section 8.3, Operation class:
/**
* @brief Leaky ReLU gradient op.
*/
class LeakyReluGradOp
: public ParameterizedOp<LeakyReluGradOp, LeakyReluParams> {
public:
// Inherit constructor from ParameterizedOp<LeakyReluGradOp, LeakyReluParams>.
using ParameterizedOp<LeakyReluGradOp, LeakyReluParams>::ParameterizedOp;
/**
* Create an operator identifier for LeakyReluOp.
*/
static popart::OperatorIdentifier defaultOperatorId() {
return popart::OperatorIdentifier{
"custom.ops", "LeakyReluGrad", 1, {1, 1}, 1};
}
/**
* @brief Determine the shapes and type of output tensors.
*/
void setup() override { outInfo(0) = inInfo(0); };
// GradOp-gradInputInfo begin
const std::vector<popart::GradInOutMapper> &gradInputInfo() const override {
static const std::vector<popart::GradInOutMapper> inInfo = {
{0, 0, popart::GradOpInType::GradOut},
{1, 0, popart::GradOpInType::In}};
return inInfo;
}
// GradOp-gradInputInfo end
/**
* @brief Return mapping to associated the outputs indices of LeakyReluGradOp
* with inputs of the LeakyReluOp. \return const std::map<int, int>&
*/
// GradOp-gradOutToNonGradIn begin
const std::map<int, int> &gradOutToNonGradIn() const override {
// The Grad Op has 1 output, which is the gradient of the only input
static const std::map<int, int> outInfo = {{0, 0}};
return outInfo;
}
// GradOp-gradOutToNonGradIn end
};
We emphasise that gradient operations are operations themselves and hence
require the definition of a static defaultOperatorId
function and
popart::Op::setup()
, as explained in
Section 8.3, Operation class. Also, note that LeakyReluOp
and
LeakyReluGradOp
share the parameter struct definition, LeakyReluParams
.
The job of gradient operations (meaning the set of operations obtained by
calling popart::Op::getGradOps()
on a forward
operation) is to perform one step of the
chain rule.
For LeakyReluOp
, recall that
popart::Op::getGradOps()
returns a single gradient
operation that is LeakyReluGradOp
, so we have one operation that has to
produce a partial derivative for a single output of the forward operation.
Performing one step of the chain rule for LeakyReluGradOp
means computing
\(\frac{\partial F}{\partial x}\) for some function \(F\)
(the partial derivative of \(F\) with respect to input tensor \(x\))
having been given as input
\(\frac{\partial F}{\partial \text{LeakyReLU}}\)
(the partial derivative of \(F\) with respect to the output of
\(\text{LeakyReLU}\)) as well as any forward tensors that it needs,
using the chain rule:
The right-hand side of this equation (the partial derivative of \(\text{LeakyReLU}\) with respect to its input, \(x\)) is mathematically defined as follows:
Note that this partial derivative needs the forward tensor, \(x\), as input because it is used in the condition.
Our operation, LeakyReluGradOp
, needs to calculate this right-hand side
of the chain rule equation and multiply it with the left-hand side,
\(\frac{\partial F}{\partial \text{LeakyReLU}}\),
to obtain \(\frac{\partial F}{\partial x}\). This left-hand side is given
as an input to LeakyReluGradOp
. Putting this all together, using
\(y\) to denote \(\text{LeakyReLU}\) (LeakyReluOp
output) and
\(y'\) to denote the partial derivative
\(\frac{\partial F}{\partial \text{LeakyReLU}}\), then
we can express the calculation that LeakyReluGradOp
has to do as follows:
This definition is what we will use when defining the semantics of
LeakyReluGradOp
in Section 8.6, Gradient opx class. For now, all we
need to understand is that LeakyReluGradOp
consumes two tensor inputs and
produces one tensor output. In terms of data type and tensor shape these inputs
and outputs are all identical to the forward tensor \(x\).
This information is what we need when implementing
popart::Op::gradInputInfo()
and
popart::Op::gradOutToNonGradIn()
, which is an
additional requirement we place on gradient operations.
The popart::Op::gradInputInfo()
function tells the
autodiff transform what input tensors an operation requires.
Gradient operations can request to connect
to any inputs or outputs of the forward operation, or can request to connect
to a gradient of an output of the forward operation.
In this instance, the gradient operation LeakyReluGradOp
asks for it’s input
at index 0 to be connected with the partial derivative of LeakyReluOp
’s
output at index 0 (so
\(\frac{\partial F}{\partial \text{LeakyReLU}}\)), and LeakyReluGradOp
’s
input at index 1 to be connected to the input of LeakyReluOp
at index 0 (so
\(x\)):
const std::vector<popart::GradInOutMapper> &gradInputInfo() const override {
static const std::vector<popart::GradInOutMapper> inInfo = {
{0, 0, popart::GradOpInType::GradOut},
{1, 0, popart::GradOpInType::In}};
return inInfo;
}
The popart::Op::gradOutToNonGradIn()
function is what
the autodiff transform uses to determine what outputs a
gradient operation produces. Gradient operations produce gradients for some
inputs of the forward graph. To do this, this function must be implemented so it
returns a mapping from gradient operation output indices to forward operation
input indices.
The LeakyReluGradOp
operation only produces one output,
\(y'\), at index 0, and it
is the gradient of the LeakyReluOp
’s input \(\partial x\) at index 0.
The appropriate mapping is therefore as follows:
const std::map<int, int> &gradOutToNonGradIn() const override {
// The Grad Op has 1 output, which is the gradient of the only input
static const std::map<int, int> outInfo = {{0, 0}};
return outInfo;
}
8.6. Gradient opx class
Similar to the forward operation we need a LeakyReluGradOpx
class that
implements the semantics of the gradient operation of Leaky ReLU. Again, we do
this by implementing a C++ class that derives from
popart::popx::Opx
. Gradient operations are
implemented just like the forward operations - there are no additional functions
that need to be implemented.
/**
* Leaky ReLU gradient operation implementation.
*/
class LeakyReluGradOpx : public popart::popx::Opx {
public:
LeakyReluGradOpx(popart::Op *op, popart::popx::Devicex *devicex)
: popart::popx::Opx(op, devicex) {
verifyOp<LeakyReluGradOp>(op, {CustomGradOperators::LeakyReluGradId});
}
/**
* @brief Add the Poplar code for this operation to a Poplar sequence.
* \param prog The Poplar sequence to add the operation to.
*/
void grow(poplar::program::Sequence &prog) const override {
auto op = getOp<LeakyReluGradOp>();
poplar::Tensor grad = getInTensor(0);
poplar::Tensor input = getInTensor(1);
const float alpha = op.params().alpha;
// (grad * (x < 0.0f ? alpha : 1))
// pe::_1 and pe::_2 are placeholders for the arguments of the expression.
pe::Mul expression = pe::Mul(pe::Select(pe::Const(alpha),
pe::Const(1.0f),
pe::Lt(pe::_2, pe::Const(0.0f))),
pe::_1);
auto output = popops::map(graph(),
expression,
{grad, input},
prog,
debugContext("LeakyReluGrad"),
poplar::OptionFlags());
setOutTensor(0, output);
}
};
// Registers that LeakyReluGradOpx can implement ops with OperatorIdentifier
// CustomGradOperators::LeakyReluGradId
namespace {
popart::popx::OpxCreator<LeakyReluGradOpx>
LeakyReluGradOpxCreator({CustomGradOperators::LeakyReluGradId});
}
8.7. Python bindings
Warning
The Python binding function used in this section is experimental and may change in future API updates.
It is necessary to define Python bindings for your custom operation’s C++
code, so that the custom operations can be used in Python. We use the
Pybind11 library to create these
bindings, using an experimental template function
makeParameterizedOpBindings
as follows:
280/**
281 * @brief Pybind11 custom op module declaration.
282 * NOTE: make sure the name of the module correspond to the source filename!
283 */
284// cppcheck-suppress syntaxError
285PYBIND11_MODULE(leaky_relu_op_impl, m) {
286 // Bindings the parameters of the op: constructor + fields.
287 py::class_<LeakyReluParams>(m, "LeakyReluParams")
288 .def(py::init<float>(), py::arg("alpha") = 0.01)
289 .def_readwrite("alpha", &LeakyReluParams::alpha);
290
291 // Helper function to make the custom op bindings automatically (once the
292 // params are pybinded).
293 popart::ir::op::makeParameterizedOpBindings<LeakyReluOp>(m, "LeakyRelu");
294}
The above binding gives us a Python module named leaky_relu_op_impl
, but
this module isn’t very user-friendly. In the next section we therefore make an
easy-to-use Python wrapper that is user-friendly.
8.8. Python wrapper
Warning
This Python wrapper solution uses some internal PopXL definitions that likely will change in future API updates.
The last remaining step is to define a user-facing Python function which uses the Python bindings to provide a nice clean Pythonic interface for adding a Leaky ReLU to the IR, similar to the other ops in PopXL:
12
13from typing import Optional
14
15import cppimport.import_hook # pylint: disable=unused-import
16from popxl.context import get_current_context, op_debug_context
17from popxl.ops.utils import check_in_graph
18from popxl.tensor import Tensor
19
20# The custom op and its pybinding will be automatically compiled by cppimport
21# into a module of this name.
22import leaky_relu_op_impl
23
24
25@op_debug_context
26def leaky_relu(
27 # pylint: disable=unused-argument
28 x: Tensor,
29 alpha: Optional[float] = 0.01,
30 **kwargs
31) -> Tensor:
32 """Compute the leaky relu operator element-wise on the input tensor.
33
34 See https://pytorch.org/docs/stable/generated/torch.nn.LeakyReLU.html
35
36 Args:
37 x (Tensor): The tensor to input.
38 alpha (float, optional): The value to use to determine
39 the slope of the negative axis. Defaults to 0.01.
40 **kwargs: Any additional arguments to passed. Left here as an example for
41 other operators.
42
43 Returns:
44 Tensor: The output tensor.
45 """
46 ctx = get_current_context()
47 graph = ctx.graph
48 pb_graph = graph._pb_graph
49
50 settings = ctx._get_op_settings("leaky_relu")
51 params = leaky_relu_op_impl.LeakyReluParams(alpha=alpha)
52 # Inputs & outputs mapping (checking inputs are in graph!).
53 inputs = {0: x.id}
54 outputs = {0: graph._create_tensor_id("leaky_relu_out")}
55 check_in_graph(graph, **{x.id: x})
56
57 # Building the op using default operator id
58 op = leaky_relu_op_impl.LeakyRelu.create_op_in_graph(
59 graph=pb_graph,
60 inputs=inputs,
61 outputs=outputs,
62 params=params,
63 settings=settings,
64 )
65 # Applying context all registered hooks to the new op.
66 # NOTE: crucial to support PopXL graph transforms.
67 ctx._op_created(op)
68 return Tensor._from_pb_tensor(op.outTensor(0))
Note that in Listing 8.7, the module name
leaky_relu_op_impl
is based on the name of the
Pybind11 module in
Section 8.7, Python bindings.
8.9. Auto-compiling custom operations with cppimport
Although it is possible to manually compile custom operations, we recommend using cppimport to automatically compile the C++ code of custom operations. This makes for a much improved user experience of custom operations. Conveniently, cppimport will detect when your C++ source has changed and only compile when needed; it is no longer necessary to manually compile your custom operation every time you make a change.
In the remainder of this section, we assume all of your custom operation’s C++
code (not including the Python wrapper in
Section 8.8, Python wrapper) lives in a single .cpp
file called
leaky_relu_op_impl.cpp
.
To use cppimport, you first
need to add the following comment to the first line of
leaky_relu_op_impl.cpp
:
1// cppimport
This comment is used by
cppimport as an opt-in
mechanism to specify the file is meant to be used with cppimport
, and must
appear on the first line.
Then, at the end of the file, include the following multi-line comment:
// cppimport configuration for compiling the pybind11 module.
/*
<%
cfg['extra_compile_args'] = ['-std=c++14', '-fPIC', '-shared', '-O3', '-DONNX_NAMESPACE=onnx']
cfg['libraries'] = ['popart']
setup_pybind11(cfg)
%>
*/
This contains all the information cppimport needs to compile your operation.
8.10. Using your custom operation
Finally, to use your custom operation, as highlighted in
Listing 8.8, just import the leaky_relu
Python
wrapper function your defined in Section 8.8, Python wrapper. Then,
you can use this function much like you can built-in PopXL operations:
10
11from leaky_relu_op import leaky_relu
12
13
14def build_and_run_graph(
15 input_data: Union[float, np.ndarray], alpha: float
16) -> np.ndarray:
17 """Build a PopXL graph with the leaky relu custom op in and run it.
18
19 Args:
20 input_data (Union[float, np.ndarray]): The input data to use,
21 either a 1D float or a NumPy array of floats.
22 alpha (float): The alpha vale to use in the leaky relu op.
23
24 Returns:
25 np.ndarray: The output data array to be used for checking.
26 """
27 # Creating a model with popxl
28 ir = popxl.Ir()
29 main = ir.main_graph
30 input_array = np.array(input_data)
31 with main:
32 # host load
33 input0 = popxl.h2d_stream(input_array.shape, popxl.float32, name="in_stream")
34 a = ops.host_load(input0, "a")
35
36 # custom leaky relu.
37 o = leaky_relu(a, alpha=alpha)
38
39 # host store
40 o_d2h = popxl.d2h_stream(o.shape, o.dtype, name="out_stream")
41 ops.host_store(o_d2h, o)
42
43 with popxl.Session(ir, "ipu_model") as session:
44 outputs = session.run({input0: input_array})
45
46 print("ALPHA param:", alpha)
47 print("INPUT data:", input_data)
48 print("OUTPUT result:", outputs[o_d2h])
49
50 return outputs[o_d2h]
51
52