5.10. Poplar

3.0.0

New features

  • Optimization to detect when you can perform a transpose to make rearranging the copies more efficient.

  • Add new functions to API to allow Poplar targets to be serializable so they can be used with high-level execution caching capabilities.

  • Additional FP8 support:

    • Add support for constant tensors of type FP8

    • Add support for conversion of Inf, NaN to FP8 without metadata.

    • Add support to reinterpret from FP8 to unsigned char using the existing Tensor::reinterpret method.

    • Add new conversion functions to convert data to and from FP16 on the host, including converting FP8 to and from FP16.

    • Extend the existing conversion functions to support conversion between FP32 and FP8 on the host.

    • Engine::connectStream will throw an exception if FP8 data is used without the metadata being defined.

    • Allow saturation mode when casting float or half to FP8 on the host.

    • Add new functions to determine appropriate metadata given a buffer of half, float or double data on the host.

  • General reduction of memory use in compilation.

  • Improve “start automatic non-participatory sync” (SANS) generation when spanning functions.

  • Improve the error message (add a new exception) when profiling an application that was not built with profiling information enabled.

  • Add new functions to get the numeric limits of a type based on the target.

  • Add a new Poplar runtime engine option remoteBuffer.allocateOnHost that allows you to control if memory is allocated on the host when using remote buffers. See RuntimeOptions for more details.

  • The vertex compute() method can now specify a return type of void or bool.

  • Poplar uses KiB/MiB not KB/MB, when appropriate, when reporting memory sizes.

  • Poplar detects mismatches between graph construction and runtime Poplar options. Mismatches that will have a functional impact will throw an error, otherwise it is reported as a warning.

  • Improved the formatting for the output of PrintTensor to make it easier to read, including summarisation for large tensors. The PrintTensorFmt class can be used to customise output.

Bug Fixes

  • Fixed profiling counter for stream copy when they overflow.

  • Fixed a bug in the rearrangeOnHost validation for remote buffers where a malformed program was not detected.

  • Reduced graph construction time when debug.lowerProgDump is not enabled.

  • Fixed stack size calculation when profiling an application.

  • Fixed tile exception when running YOLOv5 due to incorrect SANS analysis.

  • Fixed replicated exchanges that cause tile exceptions.

  • Improved documentation for Vector and VectorList types.

  • Optimized lowerCopiesToCompleSets in certain cases to reduce compile time.

  • Optimized the graph construction time for addConstant with large values.

  • Fixed the Poplar examples to ensure they compile with C++ 11.

  • Fixed the initial value saturation of unsigned integers in a vertex (when calling graph.setInitialValue).

  • Improved syntax highlighting for code blocks in documentation.

Other improvements

  • The Poplar engine option debug.retainDebugInformation is now set to false by default to reduce host memory. You must explicitly enable this option to use getReport and printProfile.

  • Avoid duplicate information in the profile for each replica to reduce the file size.

Known issues

None

Compatibility changes

None