5.1.6. PopART changelog

2.6.0+5997

New features

  • Improvements for explicit pipelining (support for overlapped IO)

  • Support half-precision tensors in scaledVarUpdate

  • Allow AddArg0GradOp to change its output tensor type after construction

  • Improved support for gradient clipping in accumulate outer fragment paralleliser transform

  • Add execution context constraints in AliasModelGrower

  • Add RoiAlign operation

  • Add init_type to ops.init to allow for uninitialised tensors

  • Allow AiGraphcoreOpset1::Reshape to use -1 dim

  • Add AiGraphcoreOpset1::slice to mimic Slice-1

  • Improved implementation of resize gradient reduceDimension

  • Separate load and store landing pad tensors for remote exchanges when required

  • Improved support for int16/uint16

  • Use LeakyRelu output tensor instead of input to compute the gradient

  • Add transform to backup inplace updated tensors when they are required for recomputation

  • Add step to verify that users aren’t using modifying (inplace) operations in autodiff

  • Add support for custom programs, introduce special custom program for implicit pipeline forward only (experimental)

  • Add pass argument to in_sequence in PopXL, to allow transforms to add topological constraints after an operation is created

  • Add shape inference to tensor remap operation

  • Enable profiling of cached executables. See PopVision documentation

  • Add code loading to PopXL. See Graphs

  • Add custom operation support to PopXL. See Custom operations

  • Add tanh, conv, averagepool, argmin, argmax, exp, histogram, sqrt, maximum, log, onehot and roialign to PopXL. See Supported operations

  • Add negative log likelihood loss in PopXL

  • Add per-replica variable initialisation and retrieval to PopXL. See Replication

  • Add support for torch input tensors in PopXL

  • Improved device management in PopXL

  • Add .vscode workspace file

  • Add argument type check to popxl.Session.get_tensors_data in PopXL

  • Add support for enabling engine caching via POPXL_CACHE_DIR environment variable

  • Avoid use of deprecated variables in GCL

  • Add “zeroInfinity” option and plan option flag “enableReducedClassesInLabel” support to CTC operation

  • Add ability to run CTC operation in validation mode

  • Switch to new collective balanced reorder API

  • Add “disableOptimizerStateTensorStreams” option to selectively disable streaming and storing of optimizer tensors

  • Improved handling of setting weights from host and loading weights to the host in PopXL via context managers

Bug Fixes

  • Tidy up linting issues

  • Fix subgraph pruner used by autodiff

  • Fix allreduce logic

  • Fix autodiff bugs

  • Fix bug related to change in collective balanced reorder padding behaviour

  • Fix missing RTSGroup error

  • Fix bug in equal in PopXL

  • Fix partialTypeMatMuls support in PopXL

  • Fix for torch linear mode test

  • Fix for misssing pipelineStage attribute

  • Reload engine and connect streams on every re-attach through popxl.Session context manager

  • Fix bug where VariableSettings CommGroup is not respected by AllReduce in gradient and accumulator reduction

  • Fix for random number state management when replicated graph option is set

  • Fix alias zero copy setting verification

  • Fix non-determinism bugs in accumulate outer fragment paralleliser and multi collective transforms

  • Fix explicit recompute (annotation issue and recompute to recompute connections)

  • Fix for resize gradient operation

  • Fix mechanism to write variable data when using a cached binary in PopXL

  • Various fixes for executable caching (added missing tensors, random seed, anchors, and various bug fixes)

  • Correct incorrect state-tensor initial vector dimensions

  • Revert inplace WhereOp to outplace when it is not parallel writable

  • Don’t overwrite the number of tiles to 4 when a custom IPU model config is used

  • Print before erase to avoid memory error

Optimisations

  • Reduce number of dummy graph objects constructed for lowering MatMul operations

  • Remove cases of a clone followed by a mapTensorLinearly

  • Remove unnecessary uses of mapTensorLinearly

  • Optimisations in parsing ONNX protobuf files

  • Speed up ignore applyInplacePattern by ignoring graph.isSchedulable

  • Faster & more memory efficient implementation of cubic resize

  • Take compilation-affecting engine options into account when calculating hashes for the purpose of executable caching

Logging and documentation

  • General improvement of PopXL user guide (including sessions, remote variables, replica grouping, code loading, custom operations, links to related PyTorch Numpy and ONNX operations, MNIST example)

  • General improvements to PopART API documentation

  • Error message improvements

2.5.1

New features

  • Add PopXL API (experimental)

  • Add support for RNN operator (preview)

  • Improvements to automatic loss scaling (experimental)

  • Add improved ability to manage PRNG behaviour across replicas (experimental)

  • Add ability to retrieve random seed

  • Add an overload of Builder.setAvailableMemoryProportion which can target multi-output nodes

  • Ensure initial inputs of gradient graphs match any user-specified provided grads

  • Ensure outputs of gradient graphs match any user-specified required grads

  • Add ability to run exported models using the Poplar Triton Backend via PopEF integration

  • Add visualisations for inplace modified and aliased tensors and graph inputs and outputs to Dot visualizer

  • DynamicSliceOp and DynamicUpdateOp can drop the first dimension of the slice if it is 1

  • Support AnchorReturnType::Final in MainLoops transform

  • Improved replicated tensor sharding (RTS) compatibility for operations

  • Make gradient clipping compatible with replicated tensor sharding (RTS)

  • Improved linter support

  • Add ability to show ONNX model proto in human readable text

  • Various improvements to executable caching

  • Add ability to perform per-replica reads and writes of variable values

  • Improved quality of debug information

  • Use slice plan in SparseAccumulateOpx

  • Add ability to merge collective operations

  • Add ability to dynamically switch off the backwards pass when using implicit pipelining

  • Add ability to refresh engine cache on-the-fly

Bug Fixes

  • Fix the logic that replaces DropoutOp with IdentityOp

  • Improve device handling in tests

  • Fix for potential deadlock condition in test runner

  • Fix in lowering logic for trailing subgraph parts that contain only calls to child subgraph parts

  • IdentityLossOpx will no longer attempt to unwind (resulting in an error) when there is a reduction

  • Fix subgraph autodiff logic

  • Allow CallOp to not have outputs connected for all of its callee outputs

  • Fix Python binding for DeviceManager::tryAttachUntilTimeout

  • Correctly promote inplace aliased and modified tensors through the Loop operation

  • Fix unwinding through multiple consecutive slice operations

  • Fix unwinding issue in MaxOpx

  • Enable bufferingDepth to be used when SessionOptions::enablePrefetchDatastreams isn’t set

  • Fix dtype clone in SparseAccumulateOpx::createInputTensor

  • Fix bug in ReplicatedTensorShardingTracer

  • Fix compile error if accl2 type is not FLOAT

  • Fix PowArg0GradOpPattern for fp16

Optimisations

  • Allow non-broadcasted indices as an input to the scatterreduce operation

  • Add ExpandCast pattern to reverse the order of an expand followed by cast to reduce memory footprint

  • Add inplace versions of WhereOp

  • Allow IdentityInplaceOp to unwind, reducing memory use when it cannot be made inplace

  • Split operators_test in two

  • Add TensorRemapOp for point-fixes of bad tensor layouts

  • Explicit recomputation support for pipelining

  • Alias zero copy tracks variables and multi-context tensors less conservatively

  • Improve graph traversal through loop-carried tensors

Logging and documentation

  • Add compile-time option to log device access events to a file

  • Improved CommGroupType::None comments

  • Fix code listings

  • Update to internal documentation build system

  • Various small user guide and API improvements

  • Add how to execute imported model to documentation