calculateMetadataForConversionfunctions to find appropriate metadata to use when casting a given array of half, float or double precision data to either of the two FP8 datatypes.
The two FP8 datatypes are F143 and F152 where the three digits indicate the number of bits used to represent the sign, exponent and mantissa respectively.
Executablepointers to be passed to the Poplar
Expanded the user home directory (
~) for paths in
Invalid indices used in
Tensor:: squeezenow throw an exception.
Fix float to quarter rounding errors for extremely low exponents.
Make print tensor formatting more user friendly. This includes summarising large tensors, aligning tensor columns so that they are easier to read and printing all numbers with the same precision.
Optimization to detect when you can perform a transpose to make rearranging the copies more efficient.
Add new functions to API to allow Poplar targets to be serializable so they can be used with high-level execution caching capabilities.
Additional FP8 support:
Add support for constant tensors of type FP8
Add support for conversion of Inf, NaN to FP8 without metadata.
Add support to reinterpret from FP8 to unsigned char using the existing
Add new conversion functions to convert data to and from FP16 on the host, including converting FP8 to and from FP16.
Extend the existing conversion functions to support conversion between FP32 and FP8 on the host.
Engine::connectStreamwill throw an exception if FP8 data is used without the metadata being defined.
Allow saturation mode when casting float or half to FP8 on the host.
Add new functions to determine appropriate metadata given a buffer of half, float or double data on the host.
General reduction of memory use in compilation.
Improve “start automatic non-participatory sync” (SANS) generation when spanning functions.
Improve the error message (add a new exception) when profiling an application that was not built with profiling information enabled.
Add new functions to get the numeric limits of a type based on the target.
Add a new Poplar runtime engine option
remoteBuffer.allocateOnHostthat allows you to control if memory is allocated on the host when using remote buffers. See
RuntimeOptionsfor more details.
compute()method can now specify a return type of
Poplar uses KiB/MiB not KB/MB, when appropriate, when reporting memory sizes.
Poplar detects mismatches between graph construction and runtime Poplar options. Mismatches that will have a functional impact will throw an error, otherwise it is reported as a warning.
Fixed profiling counter for stream copy when they overflow.
Fixed a bug in the rearrangeOnHost validation for remote buffers where a malformed program was not detected.
Reduced graph construction time when
debug.lowerProgDumpis not enabled.
Fixed stack size calculation when profiling an application.
Fixed tile exception when running YOLOv5 due to incorrect SANS analysis.
Fixed replicated exchanges that cause tile exceptions.
Improved documentation for
lowerCopiesToCompleSetsin certain cases to reduce compile time.
Optimized the graph construction time for
addConstantwith large values.
Fixed the Poplar examples to ensure they compile with C++ 11.
Fixed the initial value saturation of unsigned integers in a vertex (when calling
Improved syntax highlighting for code blocks in documentation.
The Poplar engine option
debug.retainDebugInformationis now set to false by default to reduce host memory. You must explicitly enable this option to use
Avoid duplicate information in the profile for each replica to reduce the file size.