5.10. PopLibs

3.2.0

New features

  • Added new popops elementwise unary operation exp2 (power of 2).

Bug Fixes

  • Fixed issue with static sparse MatMul that returned incorrect values for a specific mask.

  • Accurate cast of quarter to and from float for the Mk2 architecture used in the IPU-M2000 and Bow-2000.

Other improvements

  • Optimize performance of Cholesky factorisation.

  • Updates to the PopLibs API documentation

Known issues

None

Compatibility changes

None

3.1.0

New features

Note

The two FP8 datatypes are F143 and F152 where the three digits indicate the number of bits used to represent the sign, exponent and mantissa respectively.

  • poplin:

    • Added option to use experimental expand dims pre-convolution transformation that can avoid rearranging inputs in some cases (off by default).

    • Added convolution support for F143 and F152 FP8 inputs and weights.

    • Implemented QR Factorization. This functionality is experimental.

  • popops:

    • Added support for multiple outputs in map expressions.

    • Added setMetadataTensor function to combine an integer scale with a constant metadata format for use with F143 and F152 FP8 tensors.

    • Added new element-wise unary operation exp2 (power of 2).

  • popsparse:

    • Added support for sparse-dense and dense-sparse matrix multiplication where the sparsity structure of the sparse operand does not change. Only square block sizes of 1, 4, 8 and 16 are supported.

Bug Fixes

  • poplin:

    • Minor optimisation to avoid some expensive target copies - small improvement in release builds and large improvement in debug builds.

    • Correction to quarter input convolution vertex.

  • popops:

    • Fixed the assembler version of the FP8 popops::cast operation which gave incorrect results in previous releases.

Other improvements

  • poplin:

    • Performance improvements of Triangular Solve algorithm.

    • Improved addConstant performance by 10x (and 20x when broadcasting) when adding large constants (some models saw a 20-40% compilation time reduction), and added stricter value checking to catch unintentional truncation of initial values.

  • popops:

    • Added simple usage documentation for dynamic slice & update.

Known issues

None

Compatibility changes

None