5.10. Poplar

3.0.0

New features

Optimization to detect when you can perform a transpose to make rearranging the copies more efficient.
Add new functions to API to allow Poplar targets to be serializable so they can be used with high-level execution caching capabilities.
Additional FP8 support:
- Add support for constant tensors of type FP8
- Add support for conversion of Inf, NaN to FP8 without metadata.
- Add support to reinterpret from FP8 to unsigned char using the existing Tensor::reinterpret method.
- Add new conversion functions to convert data to and from FP16 on the host, including converting FP8 to and from FP16.
- Extend the existing conversion functions to support conversion between FP32 and FP8 on the host.
- Engine::connectStream will throw an exception if FP8 data is used without the metadata being defined.
- Allow saturation mode when casting float or half to FP8 on the host.
- Add new functions to determine appropriate metadata given a buffer of half, float or double data on the host.
General reduction of memory use in compilation.
Improve “start automatic non-participatory sync” (SANS) generation when spanning functions.
Improve the error message (add a new exception) when profiling an application that was not built with profiling information enabled.
Add new functions to get the numeric limits of a type based on the target.
Add a new Poplar runtime engine option remoteBuffer.allocateOnHost that allows you to control if memory is allocated on the host when using remote buffers. See RuntimeOptions for more details.
The vertex compute() method can now specify a return type of void or bool.
Poplar uses KiB/MiB not KB/MB, when appropriate, when reporting memory sizes.
Poplar detects mismatches between graph construction and runtime Poplar options. Mismatches that will have a functional impact will throw an error, otherwise it is reported as a warning.
Improved the formatting for the output of PrintTensor to make it easier to read, including summarisation for large tensors. The PrintTensorFmt class can be used to customise output.

Bug Fixes

Fixed profiling counter for stream copy when they overflow.
Fixed a bug in the rearrangeOnHost validation for remote buffers where a malformed program was not detected.
Reduced graph construction time when debug.lowerProgDump is not enabled.
Fixed stack size calculation when profiling an application.
Fixed tile exception when running YOLOv5 due to incorrect SANS analysis.
Fixed replicated exchanges that cause tile exceptions.
Improved documentation for Vector and VectorList types.
Optimized lowerCopiesToCompleSets in certain cases to reduce compile time.
Optimized the graph construction time for addConstant with large values.
Fixed the Poplar examples to ensure they compile with C++ 11.
Fixed the initial value saturation of unsigned integers in a vertex (when calling graph.setInitialValue).
Improved syntax highlighting for code blocks in documentation.

Other improvements

The Poplar engine option debug.retainDebugInformation is now set to false by default to reduce host memory. You must explicitly enable this option to use getReport and printProfile.
Avoid duplicate information in the profile for each replica to reduce the file size.

Known issues

None

Compatibility changes

None