5.10. Poplar
3.0.0
New features
Optimization to detect when you can perform a transpose to make rearranging the copies more efficient.
Add new functions to API to allow Poplar targets to be serializable so they can be used with high-level execution caching capabilities.
Additional FP8 support:
Add support for constant tensors of type FP8
Add support for conversion of Inf, NaN to FP8 without metadata.
Add support to reinterpret from FP8 to unsigned char using the existing
Tensor::reinterpret
method.Add new conversion functions to convert data to and from FP16 on the host, including converting FP8 to and from FP16.
Extend the existing conversion functions to support conversion between FP32 and FP8 on the host.
Engine::connectStream
will throw an exception if FP8 data is used without the metadata being defined.Allow saturation mode when casting float or half to FP8 on the host.
Add new functions to determine appropriate metadata given a buffer of half, float or double data on the host.
General reduction of memory use in compilation.
Improve “start automatic non-participatory sync” (SANS) generation when spanning functions.
Improve the error message (add a new exception) when profiling an application that was not built with profiling information enabled.
Add new functions to get the numeric limits of a type based on the target.
Add a new Poplar runtime engine option
remoteBuffer.allocateOnHost
that allows you to control if memory is allocated on the host when using remote buffers. SeeRuntimeOptions
for more details.The vertex
compute()
method can now specify a return type ofvoid
orbool
.Poplar uses KiB/MiB not KB/MB, when appropriate, when reporting memory sizes.
Poplar detects mismatches between graph construction and runtime Poplar options. Mismatches that will have a functional impact will throw an error, otherwise it is reported as a warning.
Improved the formatting for the output of
PrintTensor
to make it easier to read, including summarisation for large tensors. ThePrintTensorFmt
class can be used to customise output.
Bug Fixes
Fixed profiling counter for stream copy when they overflow.
Fixed a bug in the rearrangeOnHost validation for remote buffers where a malformed program was not detected.
Reduced graph construction time when
debug.lowerProgDump
is not enabled.Fixed stack size calculation when profiling an application.
Fixed tile exception when running YOLOv5 due to incorrect SANS analysis.
Fixed replicated exchanges that cause tile exceptions.
Improved documentation for
Vector
andVectorList
types.Optimized
lowerCopiesToCompleSets
in certain cases to reduce compile time.Optimized the graph construction time for
addConstant
with large values.Fixed the Poplar examples to ensure they compile with C++ 11.
Fixed the initial value saturation of unsigned integers in a vertex (when calling
graph.setInitialValue
).Improved syntax highlighting for code blocks in documentation.
Other improvements
The Poplar engine option
debug.retainDebugInformation
is now set to false by default to reduce host memory. You must explicitly enable this option to usegetReport
andprintProfile
.Avoid duplicate information in the profile for each replica to reduce the file size.
Known issues
None
Compatibility changes
None