5.1.8. Poplar libraries (PopLibs) changelog
2.6.0+5997
New features
Added support for 8-bit floating point numbers for casts, convs, matmuls and data ops like dynamic slice and transpose
Extended the convolution planner to support serial splits of any amount
Added a stable version of the top-k op
Bug fixes
Fixed a bug in LSTMs where the final cell had an incorrect result when using variable time steps
Fixed an incorrect compile time error when using literals in a map expression
Fixed a bug where strided reductions didn’t account for input partials offsets when merging reductions for a single vertex
Fixed a bug that prevented upsampling in half precision
Other improvements
Added an option to the histogram op to specify output tensor type
Improved documentation of the triangular solve ops
Extended the range supported by the nx1 convolution kernel
Improve the convolution library to elide some copies when the expandDims transfom is used
Ported more kernels to be of type MultiVertex
Added new methods for CTC validation that don’t use beam search
Extended the embedding layer planner to serialise embeddings when the temporary memory requirements would not fit within the tile memory
Added a method to dump the C++ kernel generated by a map expression
Added output allocator methods for elementwise and convolution ops
Improved the error messages generated when convolution validation fails
Added an option to the convolutions to disable stochastic rounding
Improved the PVTI output for convolutions by attaching op specific metadata to the event
Prevented the user from being able to lose precision with the partial type when using the block sparse library
2.5.1
New features
Added support for the ROIAlign layer
Added support for a stable sort using the new bitonic sort algorithm
Extended embedding layer to support groups
Bug fixes
Fixed a segfault that could happen for reductions
Fixed incorrect documentation of the return type of the random functions
Fixed incorrect documentation for building the third-party dependencies in the README
Fixed an issue in the CTC planner where it used the wrong memory estimate for the reduction
Added DebugContext in the fill operation
Other improvements
Optimised the scaled add codelets to utilise interleaved memory
Improved support for parallelising a transpose across workers
Prevent the partials type from being smaller than the output type in all layers
Attached user source location to PopLibs exceptions
Optimisations to the ERF layer
Added int32 support to the power elementwise operation
Improvements for MultiSlice when given a single offset
Added a default memory proportion to the embedding planner