5.1.7. Poplar changelog

2.6.0+5997

New features

  • Added support for 8-bit floating point types in Poplar

  • Added support for compiling into a subset of IPU memory

  • Added the ability to allocate tensors in external memory

  • Added modular compilation support and the ability to execute code that lives in a tensor

  • Added experimental support for automatic transpose detection in Poplar’s copy library

  • Added a lightweight profiling mode (see the tutorial)

  • Added JSON as a way to describe codelet state so C++ codelets are no longer required

Bug fixes

  • Fixed an underflow bug that caused stream copies to be represented incorrectly in the profiler

  • Fixed the wrong exception type being thrown for a class of error

  • Added checks against segfaults when using poplar::Tensor objects incorrectly

  • Prevented erroneous sync lookahead from occurring when explicitly disabled

  • Fixed a crash in cycle estimation inside of a virtual graph

  • Moved runtime checks out of the static debug.verify option and into the correct, debug.runtimeVerify option

  • Fixed an issue where registers are not dumped when supervisor context triggers an exception

  • Fixed the wrong HSP group creation when EngineOptions change between compilation and executable loading

  • Fixed a use-after-free error in getTargetSystemString

  • Fixed an issue in the merge variables pass where merging was being treated incorrectly as bidirectional

  • Fixed an error throw when using large, multi-buffered data streams

  • Fixed a use-after-free in the exchange scheduler

Other improvements

  • Debug information no longer retained in host memory when not needed

  • Don’t introduce rearrangements for global exchanges if tile mapping already satisfies constraints

  • Improved east then west vs west then east decision in gateway exchange to increase bandwidth

  • Added support for generating sans instructions during gateway exchange

  • Optimised host memory usage with better data layout in internal data structures

  • Better information provided to the user when a race condition in a compute set is detected

  • Provide a timeout mechanism when workers do not update HSP

  • Enabled write combining for datastreams

  • Extended the support for compilation passes that support compiling IPU’s separately to reduce host memory consumption

  • Made poplar::Target hashable and serializable

  • Better debug information provided when host and remote buffers fail to allocate

  • Allow engines to share an executable

  • Extended poplar::Type to provide information about whether a type is a floating point type or not

  • Extended IO overlap support to include loops with static counts

  • The number of workers is now available to codelets as a constexpr value

  • Added repeat count for loops to the program tree in the profiler tools

  • Added an optimisation pass to remove regions at WriteUndefs that are not live

  • Extended overlay support to also include host and external exchange code

  • Start all device-to-host RDMA transfers at once before waiting for their completion

  • Added an optimisation pass to merge cross replica copies where possible

2.5.1

New features

  • Added support for storing code off-chip during model execution (initial implementation only supports internal exchange code)

  • Compile time improvements

  • Drastically reduced the amount of host memory needed when compiling very large models. Most of these optimisations are enabled by default. There is a new experimental Poplar Engine option that allows compilation to be serialised - you can specify the number of tiles for which lowering is done concurrently.

  • Added support for gp files to contain different configurations for the same architecture (for example, debug and release codelets)

Bug fixes

  • Fixed some private symbols leaking from libpoplar.so

  • Fixed a deadlock that can happen when stream callbacks don’t progress

  • Fixed an issue where pipeline stages would sync and run serially when profiling

  • Fixed a crash that could happen when creating the profile file

  • Fixed an issue where DELTANELEMENTS would cause a codelet to be mistakenly identified as a recursive function

  • Fixed a liveness issue from stream copy splitting that caused a variable to be always live

  • Fixed an issue where PrintTensor programs did not work for multi-ILD targets

  • Fixed an issue where unused constants could still be allocated on the device

  • Provided error handling for missing stream callbacks rather than crashing

  • Provided error handling for invalid codelet types (eg. 3D vectors) rather than crashing

  • Stopped the worker register dump from being logged twice on an exception

  • Fixed broken links in the user guide and API documentation

  • Changed the permissions of the archive to allow it to be read by the tools

Other improvements

  • Removed the old and deprecated profie formats

  • Better error handling when passing a null pointer into Graph::addConstant

  • Added an option to log the Poplar log to the system log

  • Attached user source location to Poplar exceptions

  • Added methods to hash the envrionment and engine options for a compilation

  • Always output symbols in the ELF when user is saving the archive

  • Compressed the final executable to drastically reduce the size

  • Add an option to write NaN’s into dead tensors to help debug WriteUndef issues

  • Improved the codelet codegen from the compiler

  • Added documentation for engine options that control which exceptions are enabled

  • Better error message when POPLAR_ENGINE_OPTIONS is an invalid JSON string

  • Improved documentation for which types are supported

  • Improved documentation on MultiVertex and, in particular, a race condition that is possible if it is used incorrectly

  • Improved explanation of different syncConfiguration options in the user guide