5.10. Poplar
3.1.0
New features
Added host
calculateMetadataForConversion
functions to find appropriate metadata to use when casting a given array of half, float or double precision data to either of the two FP8 datatypes.The two FP8 datatypes are F143 and F152 where the three digits indicate the number of bits used to represent the sign, exponent and mantissa respectively.
Allow shared
Executable
pointers to be passed to the PoplarEngine
.Expanded the user home directory (
~
) for paths inPOPLAR_ENGINE_OPTIONS
.
Bug Fixes
Invalid indices used in
Tensor::expand
andTensor:: squeeze
now throw an exception.Fix float to quarter rounding errors for extremely low exponents.
Other improvements
Make print tensor formatting more user friendly. This includes summarising large tensors, aligning tensor columns so that they are easier to read and printing all numbers with the same precision.
Known issues
None
Compatibility changes
None
3.0.0
New features
Optimization to detect when you can perform a transpose to make rearranging the copies more efficient.
Add new functions to API to allow Poplar targets to be serializable so they can be used with high-level execution caching capabilities.
Additional FP8 support:
Add support for constant tensors of type FP8
Add support for conversion of Inf, NaN to FP8 without metadata.
Add support to reinterpret from FP8 to unsigned char using the existing
Tensor::reinterpret
method.Add new conversion functions to convert data to and from FP16 on the host, including converting FP8 to and from FP16.
Extend the existing conversion functions to support conversion between FP32 and FP8 on the host.
Engine::connectStream
will throw an exception if FP8 data is used without the metadata being defined.Allow saturation mode when casting float or half to FP8 on the host.
Add new functions to determine appropriate metadata given a buffer of half, float or double data on the host.
General reduction of memory use in compilation.
Improve “start automatic non-participatory sync” (SANS) generation when spanning functions.
Improve the error message (add a new exception) when profiling an application that was not built with profiling information enabled.
Add new functions to get the numeric limits of a type based on the target.
Add a new Poplar runtime engine option
remoteBuffer.allocateOnHost
that allows you to control if memory is allocated on the host when using remote buffers. SeeRuntimeOptions
for more details.The vertex
compute()
method can now specify a return type ofvoid
orbool
.Poplar uses KiB/MiB not KB/MB, when appropriate, when reporting memory sizes.
Poplar detects mismatches between graph construction and runtime Poplar options. Mismatches that will have a functional impact will throw an error, otherwise it is reported as a warning.
Improved the formatting for the output of
PrintTensor
to make it easier to read, including summarisation for large tensors. ThePrintTensorFmt
class can be used to customise output.
Bug Fixes
Fixed profiling counter for stream copy when they overflow.
Fixed a bug in the rearrangeOnHost validation for remote buffers where a malformed program was not detected.
Reduced graph construction time when
debug.lowerProgDump
is not enabled.Fixed stack size calculation when profiling an application.
Fixed tile exception when running YOLOv5 due to incorrect SANS analysis.
Fixed replicated exchanges that cause tile exceptions.
Improved documentation for
Vector
andVectorList
types.Optimized
lowerCopiesToCompleSets
in certain cases to reduce compile time.Optimized the graph construction time for
addConstant
with large values.Fixed the Poplar examples to ensure they compile with C++ 11.
Fixed the initial value saturation of unsigned integers in a vertex (when calling
graph.setInitialValue
).Improved syntax highlighting for code blocks in documentation.
Other improvements
The Poplar engine option
debug.retainDebugInformation
is now set to false by default to reduce host memory. You must explicitly enable this option to usegetReport
andprintProfile
.Avoid duplicate information in the profile for each replica to reduce the file size.
Known issues
None
Compatibility changes
None