5.1.7. Poplar changelog
2.6.0+5997
New features
Added support for 8-bit floating point types in Poplar
Added support for compiling into a subset of IPU memory
Added the ability to allocate tensors in external memory
Added modular compilation support and the ability to execute code that lives in a tensor
Added experimental support for automatic transpose detection in Poplar’s copy library
Added a lightweight profiling mode (see the tutorial)
Added JSON as a way to describe codelet state so C++ codelets are no longer required
Bug fixes
Fixed an underflow bug that caused stream copies to be represented incorrectly in the profiler
Fixed the wrong exception type being thrown for a class of error
Added checks against segfaults when using
poplar::Tensor
objects incorrectlyPrevented erroneous sync lookahead from occurring when explicitly disabled
Fixed a crash in cycle estimation inside of a virtual graph
Moved runtime checks out of the static
debug.verify
option and into the correct,debug.runtimeVerify
optionFixed an issue where registers are not dumped when supervisor context triggers an exception
Fixed the wrong HSP group creation when EngineOptions change between compilation and executable loading
Fixed a use-after-free error in
getTargetSystemString
Fixed an issue in the merge variables pass where merging was being treated incorrectly as bidirectional
Fixed an error throw when using large, multi-buffered data streams
Fixed a use-after-free in the exchange scheduler
Other improvements
Debug information no longer retained in host memory when not needed
Don’t introduce rearrangements for global exchanges if tile mapping already satisfies constraints
Improved east then west vs west then east decision in gateway exchange to increase bandwidth
Added support for generating sans instructions during gateway exchange
Optimised host memory usage with better data layout in internal data structures
Better information provided to the user when a race condition in a compute set is detected
Provide a timeout mechanism when workers do not update HSP
Enabled write combining for datastreams
Extended the support for compilation passes that support compiling IPU’s separately to reduce host memory consumption
Made
poplar::Target
hashable and serializableBetter debug information provided when host and remote buffers fail to allocate
Allow engines to share an executable
Extended
poplar::Type
to provide information about whether a type is a floating point type or notExtended IO overlap support to include loops with static counts
The number of workers is now available to codelets as a constexpr value
Added repeat count for loops to the program tree in the profiler tools
Added an optimisation pass to remove regions at WriteUndefs that are not live
Extended overlay support to also include host and external exchange code
Start all device-to-host RDMA transfers at once before waiting for their completion
Added an optimisation pass to merge cross replica copies where possible
2.5.1
New features
Added support for storing code off-chip during model execution (initial implementation only supports internal exchange code)
Compile time improvements
Drastically reduced the amount of host memory needed when compiling very large models. Most of these optimisations are enabled by default. There is a new experimental Poplar Engine option that allows compilation to be serialised - you can specify the number of tiles for which lowering is done concurrently.
Added support for gp files to contain different configurations for the same architecture (for example, debug and release codelets)
Bug fixes
Fixed some private symbols leaking from libpoplar.so
Fixed a deadlock that can happen when stream callbacks don’t progress
Fixed an issue where pipeline stages would sync and run serially when profiling
Fixed a crash that could happen when creating the profile file
Fixed an issue where DELTANELEMENTS would cause a codelet to be mistakenly identified as a recursive function
Fixed a liveness issue from stream copy splitting that caused a variable to be always live
Fixed an issue where PrintTensor programs did not work for multi-ILD targets
Fixed an issue where unused constants could still be allocated on the device
Provided error handling for missing stream callbacks rather than crashing
Provided error handling for invalid codelet types (eg. 3D vectors) rather than crashing
Stopped the worker register dump from being logged twice on an exception
Fixed broken links in the user guide and API documentation
Changed the permissions of the archive to allow it to be read by the tools
Other improvements
Removed the old and deprecated profie formats
Better error handling when passing a null pointer into
Graph::addConstant
Added an option to log the Poplar log to the system log
Attached user source location to Poplar exceptions
Added methods to hash the envrionment and engine options for a compilation
Always output symbols in the ELF when user is saving the archive
Compressed the final executable to drastically reduce the size
Add an option to write NaN’s into dead tensors to help debug WriteUndef issues
Improved the codelet codegen from the compiler
Added documentation for engine options that control which exceptions are enabled
Better error message when POPLAR_ENGINE_OPTIONS is an invalid JSON string
Improved documentation for which types are supported
Improved documentation on MultiVertex and, in particular, a race condition that is possible if it is used incorrectly
Improved explanation of different syncConfiguration options in the user guide