5.9. PopLibs

3.1.0

New features

Note

The two FP8 datatypes are F143 and F152 where the three digits indicate the number of bits used to represent the sign, exponent and mantissa respectively.

poplin:
- Added option to use experimental expand dims pre-convolution transformation that can avoid rearranging inputs in some cases (off by default).
- Added convolution support for F143 and F152 FP8 inputs and weights.
- Implemented QR Factorization. This functionality is experimental.
popops:
- Added support for multiple outputs in map expressions.
- Added setMetadataTensor function to combine an integer scale with a constant metadata format for use with F143 and F152 FP8 tensors.
- Added new element-wise unary operation exp2 (power of 2).
popsparse:
- Added support for sparse-dense and dense-sparse matrix multiplication where the sparsity structure of the sparse operand does not change. Only square block sizes of 1, 4, 8 and 16 are supported.

Bug Fixes

poplin:
- Minor optimisation to avoid some expensive target copies - small improvement in release builds and large improvement in debug builds.
- Correction to quarter input convolution vertex.
popops:
- Fixed the assembler version of the FP8 popops::cast operation which gave incorrect results in previous releases.

Other improvements

poplin:
- Performance improvements of Triangular Solve algorithm.
- Improved addConstant performance by 10x (and 20x when broadcasting) when adding large constants (some models saw a 20-40% compilation time reduction), and added stricter value checking to catch unintentional truncation of initial values.
popops:
- Added simple usage documentation for dynamic slice & update.

Known issues

None

Compatibility changes

None

3.0.0

New features

poplin:
- Add new options to measure accuracy of the planner by comparing the planner to the profiled output in the single_conv_layer tool.
- Improve common error messages to identify if the error was for a convolution or matrix multiply.
popops:
- Add support to cast between FP8 and FP32.
- Optimise gather, scatter and other operations which have padding indices.

Bug Fixes

popnn:
- Fixed incorrect arithmetic expression in popnn the calculation of nonlinearity estimated cycles.
popops
- Fixed bug in popops::multiUpdatemax which returned the wrong value.
poprand
- Fixed Poplar rand to ensure random numbers are generated within the requested range.

Other improvements

Rename movz to movnz. Due to an initial bug in the ISA, movnz was incorrectly named as movz. movz is now deprecated.
Poplar vertices can now return a void instead of a bool. Multiple PopLibs vertices have been updated to return void.
poplin
- Improve documentation for layerNormStatistics.
- Add better error reporting of valid types for convolution and matmul calls.
popops
- Balance work across worker threads for scatters to improve performance.
- Added an assembler implementation of the casts between FP8 and FP16 that improves performance by a factor of 36.

Known issues

None

Compatibility changes

None