Legal notice

The information included in Release notes is only for use with or for Graphcore products.
Use of the information included in Release notes is at your own risk, and subject to the terms of the Graphcore end-user license agreement.
All Graphcore products including hardware and software, described in Release notes, are subject to continuous changes at any time, without notice.

Graphcore ® and Poplar ® are registered trademarks of Graphcore Ltd.

AI-Float™, Colossus™, Exchange Memory™, In-Processor-Memory™, IPU-Core™, IPU-Exchange™, IPU-Fabric™, IPU-Link™, IPU-M2000™, IPU-Machine™, IPU-POD™, IPU-Tile™, PopART™, PopLibs™, PopVision™, PopTorch™, Streaming Memory™ and Virtual-IPU™ are trademarks of Graphcore Ltd.

All other trademarks are the property of their respective owners.

Scope of this document

This document contains the Release notes for the Poplar SDK 2.2.0 for Graphcore’s IPU product family. The software deliverables covered by this document are the following:

Driver & Utilities: Driver and associated utilities needed by the Graphcore IPU.
PopART: The Poplar Advanced Run Time is a flexible ONNX-compatible runtime supporting both training & inference.
PopTorch: The PopTorch library provides a set of extensions for PyTorch to enable it to run on the Graphcore IPU hardware.
Poplar: A graph programming framework for the IPU.
PopDist/PopRun: Poplar Distributed Configuration Library (PopDist) is a library for configuring and coordinating distributed execution of (large-scale) machine learning applications.
TensorFlow: An implementation of the TensorFlow framework for the Graphcore IPU.

Package contents

The downloaded unified Poplar SDK will contain the following packages:

Ubuntu 18.04

Package	Version
Driver & Utilities	1.0.52
PopART	2.2.0+166889
PopTorch	2.2.0+22705
Poplar	2.2.0+166889
PopDist/PopRun	2.2.0
TensorFlow 1	Graphcore TensorFlow 2.2.0
TensorFlow 2	Graphcore TensorFlow 2.2.0

CentOS 7.6

Package	Version
Driver & Utilities	1.0.52
PopART	2.2.0+166889
PopTorch	2.2.0+22705
Poplar	2.2.0+166889
PopDist/PopRun	2.2.0
TensorFlow 1	Graphcore TensorFlow 2.2.0
TensorFlow 2	Graphcore TensorFlow 2.2.0

Note

See Appendix A for TensorFlow additional requirements.

Product support and compatibility matrix

SUPPORTED: These products are actively worked on: they will receive new features, general updates and security updates.

Notice of deprecation will be sent in advance for supported products.
DEPRECATED: These products will only receive security updates.

These products are expected to work with the indicated products however correctness is not guaranteed.

It is advised not to upgrade to this software version, unless strictly necessary.

In the future, these products can move to a Not Supported state, without further notice.

Support level will reflect the deprecated status.
NOT SUPPORTED: These products are not expected to work with this release.

No support will be provided.

Important

Deprecated products can be moved to a Not supported status without further notice.

IPU-M2000 System Software compatibility matrix

IPUM Model	Version	Support level	Notes
IPU-M2000 300-0024	2.2.0	Supported	N/A

IPU PCIe Hardware Support level

Model	Revision	ICU Firmware version	Driver version	Support level	Notes
C2 300-0004	All revisions	1.4.14	1.0.52	Supported

Note

Use Firmware revision in accordance with IPU revision.

Important

For Firmware revision, compatibility is only enforced for patch versions.

Driver Support level

OS	Support level	Supported Kernel Version	Notes
CentOS 7.4/7.5	Supported	3.10	CentOS LTS kernel.
CentOS 7.6	Supported	3.10	CentOS LTS kernel.
Microsoft Windows	Supported	Windows Server 2019
Ubuntu 18.04	Supported	5.4	Ubuntu LTS kernel.

Warning

It is strongly recommended to update the kernel module of the driver to the version included with this 2.2.0 release.

This is to avoid incompatibilities with the non-kernel components of this SDK

SDK 2.2.0 Support level

OS	Support level	Notes
Microsoft Windows	Not Supported
CentOS 7.6	Supported	In some specific instances we encountered a less than optimal model compilation time. Investigations are ongoing to address the problem.
Ubuntu 18.04	Supported

Supported toolchain

Ubuntu 18.04

Tool	Support level	Version
GCC/G++	Supported	7.2.0
libstdc++	Supported	6.0.24
libc	Supported	2.27
binutils	Supported	2.30

CentOS 7.6

Tool	Support level	Version
GCC/G++	Supported	7.3.1
libstdc++	Supported	6.0.24
libc	Supported	2.17
binutils	Supported	2.28

Supported tools

Tool	Support level	Version
Python	Supported	3.6
Boost library	Deprecated	1.70	Older versions of boost are no longer supported.

List of changes

The following sections will list changes in version 2.2.0, as well as older releases, for all products contained in the Poplar SDK.

There are three main sections, divided by argument:

Changelogs: Changelogs section lists important bug fixes and relevant functionality that has been added. Minor fixes or features will not be listed.
Known issues: Known Issues section will list all important issues known to date. This section will list issues that will impact Poplar functionality.
Compatibility changes: Compatibilities changes section will capture any change that needs to apply existing code, to remain compatible with this version of the SDK.

Changelogs

Product	Changelog
Driver & Utilities	Changelog Driver & Utilities
PopART	Changelog PopART
PopTorch	Changelog PopTorch
Poplar	Changelog Poplar
Poplar Libraries	Changelog Poplar Libraries
GCL	Changelog GCL
PopDist/PopRun	Changelog PopRun/PopDist
Libpva Library	Changelog Libpva Library
TensorFlow	Changelog TensorFlow

Driver & Utilities Changelog

Kernel Module

1.0.52

T38126: Improved error handling when IPU cards are disconnected.
T36583: Avoid clearing PL DDR on docker restart.
T36158: Add a mechanism for measuring IPU utilisation and mark count monitoring.
T34827: Detect attachBuffer() failures.
T30346: Display AER error counts in gc-hosttraffictest on M2000.

1.0.51

T36583: Partial support to avoid clearing PL DDR on docker restart
T30346: Partial support to display AER error counts in gc-hosttraffictest on M2000
T34827: Fixed error when detaching buffers in the PCIe driver.
T36158: Add a mechanism for measuring IPU utilisation and mark count monitoring.
T38126: Improved error handling when IPU cards are disconnected.

Low level libraries and tools

2.2.0+166889

T25931: Support reading from the middle of a stream by limiting the bytes read when reading binaries.
T30865: Logging now uses ISO8601 UTC timestamps and %p in GCDA_LOG_DEST will be replaced with the process ID.
T32069: Improved consistency of PCIe Id / PCI id terminology within gc- info.
T36207: gc-hostsynclatencytest fixes for native IPU-M.
T38422: Fixed boost exception on application shutdown when generating PVTI and using GCDA_MONITOR.
T39043: Fix missing power and temperature fields when requesting all fields in gc-monitor.
T39458: Bind IPUoF-servers cq_handler to dedicated CPU and use CQ polling mode for all IPUoF-servers.
T39680: Improved IPUoF latency and avoid spikes in IPUoF HSP update.
T39698: Improved performance when checking for SoC errors via GCDA.
T39887: Enhanced gc-hostsynclatencytest to output additional statistics.
T39891: Fixed GCDA device discovery from multiple threads.
T39956: IPUoF client detaches from device after RDMA fabric error if IPUoF server is reachable.
T40048: Enhanced gc-binary error information on failure.
T40067: GCDA environment variables are now ignored, if present but set to zero or empty.
T40430: Fixed ICU comms lockup during multi-threaded use of GCDA_LOGGING.
T40567: Reduced the thresholds for IPU clock throttling log messages.
T40717: Improve handling of IPUoF exceptions within GCDA.
T41042: Removed the call to gc-inventory from the GCIPUINFO library.
T41365: PVTI documentation improvements.
T41418: Fix a race condition between RDMA disconnect and HSP update.
T41427: Fix incorrect statistics in gc-iputraffictest when using a large number of iterations.
T41556: The LIBPVTI and LIBPVA documentation has been split out from the Poplar documents to separate documents for each library.
T41641: gc-powertest now supports finer power level control with -p option.
T41779: Added gc-flops, a tool to measure floating point chip performance (Mk2).
T41868: Fixed invalid register field check when checking for errors via $CMGMTEVVR.
T41882: Reduce IPUoF initial latency in HSP update.
T41936: Support JSON output for gc-info device status commands.
T42053: Show state key in gc-info –tile-overview.
T42097: Replace gc-inventory-based gcipuinfo library with interface to GCDA.
T42099: Added Python and Go support to the pure API variant of gcipuinfo.
T42190: Avoid printing duplicate board temperature and power in process table for gc-monitor.
T42369: Extended gc-flops timeout.
T42445: Added an interface to GCDA to expose the IPU chip ID.
T42502: Ensure PVTI generation is completely disabled when PVTI_OPTIONS={"enable":"false"}.
T42557: Improved classification of different logging levels in GCDA and IPUoF.
T42560: Added JSON support to gc-flops.
T42632: Improved IPUoF logging format.
T42708: When a Newmanry configuration is supplied during attach, the IPU will be reset.
T42822: Link training failures report “Link Training Error” rather than “setupChassis failed”.
T43014: Fix lockup when running with GCDA_MONITOR over IPUoF.

2.1.0

T38422: Fixed boost exception on application shutdown.
T40362: Modification of ICU comms to avoid race condition during intensive IPU sync conditions.
T39479: IPU utilisation, power and temperature sensor data added to PVTI.
T39908: Avoid parsing invalid driver version field when running gc-monitor over fabric.
T39773: Set correct PHY equalisation settings.
T35524: Add [parent.device] style for MultiIPU id’s in GCDA logging statements.
T39393: Improve tile parity reset speed for C200 and M2000 based systems.
T39068: Fix gc-info memory dump when using debug server.
T38926: Fixed gc-inventory error message and exit code when Fabric device discovery fails.
T39329: Make gc-gwlinkstraffictest timeout configurable.
T38738: Fixed gc-inventory JSON output when no devices are found.
T32643: Reduce docker image size of ipuof server.
T38981: Introduce non-blocking CQ event mode.
T39375: Fixed a busy loop in IPUoF server when client application was terminated during QP initialisation.
T39164: Fixed segfault during IPUoF device discovery.
T37392: re-create IPUoF-server QPs for each run.
T38090: Add support for runtime enabling and disabling of graphs and series in PVTI.
T39608: Add C++ example program to gcipuinfo library.
T25931: Support reading from the middle of a stream by limiting the bytes read when reading binaries.
T36157: Report Poplar MultiIPU application code, data and stack sizes per IPU.
T36158: Add a mechanism for measuring IPU utilisation and mark count monitoring.
T34827: Detect attachBuffer() failures.
T30346: Display AER error counts in gc-hosttraffictest on M2000.
T35916: Add gcipuinfo library.
T36683: Add documentation for the gateway links test.
T28235: Add more IPU-Machine specific details to GCDA and command line tools documentation.
T32019: Removed tools zip file inside the same tools zip file.
T35962: Permit loading of first kilobyte of memory when using the secondary IPU bootloader.
T31166: Added documentation for gc-exchangewritetest.
T32949: Enhanced gc-iputraffictest features (MultiIPU, iterations, sync).
T35327: Added additional libs to support external ICUComms linkage.
T35012: Document gc-info—tile-overview.
T34666: Added libpvti headers to command line tools install.
T38727: Device attributes are now generated from a common JSON description file.
T38629: Harmonise and improve device attribute labels. The previous attribute labels are still supported for backwards compatibility.
T36630: Device attributes are now specified in a common header file.
T37935: Change default sync method for C2 and C200 from hybrid to polling.
T38378: Fix parity reset for first memory element.
T36687: Add GCDA_SWAP_NLC_MODE environment variable for internal testing.
T38203: Fix IPUDebug::initIPURegs initialisation of worker registers.
T36694: Add support for recording IPU telemetry for the system analyzer.
T37772: “IPU” index attribute on IPU-Machines is now 0-3.
T37753: Fix Python checkForSOCErrors function.
T33826: Added isIPUMachineGateway() convenience method.
T35781: segmentation fault fixed.
T35490: Improve ICU attach failure reporting.
T37275: Fixed gc-info—show-insn.
T37104: Updated gc-info—ipu-arch to stop it attaching to the IPU.
T36628: IPU architecture information resolved during device discovery instead of attach.
T35955: Rename some options/fields for better clarity.
T35954: In gc-monitor, display tile memory usage per IPU in terms of percentage of total tile-memory.
T28548: Add support for all Mk2 sync zones.
T30698: Use shared contiguous buffer on MultiIPU devices.
T31712: Enhanced support for contiguous buffers.
T33010: Improved binary load performance via via parallelism via the bootloader.
T36330: Support bootloading tile binaries less than 16K.
T36023: Build system enhancement to locate IPU test binaries.
T34815: Support staged IPU reset across multiple applications.
T35543: Fixed clearing of XB debug state registers.
T33617: Improve GCDA and IPUoF error reporting.
T35778: Enhance bootloader to prevent tile column data leak through buffers.
T35651: Ensure mirrorBuffer has completed before starting next bootloader transfer.
T34986: Improved the interface to the tile overview state functions.

PopART Changelog

2.2.0+166889

New features

Added explicit pipelining IR support (experimental).
Added overlapping device side IO support (experimental).
Script added to summarise op constructor info for use in upcoming API additions.
Add low-level python bindings for upcoming API additions.
Improve type-hints for the PopART Python module.
Add demos for creating training and inference model directly in the PopART IR.
Add a transform which will enable the tracking of user-specified tensor gradients when training with automatic loss scaling.
Add static factory function to ClipNormSettings to allow clipping all weights in a model.
Add a way to globally set the matmul options. Use SessionOptions::matmulOptions = std::map<std::string, std::string> and refer to the matmul() section in the Poplar documentation for available options.
Add PackedDataBlockOp for working with packed sequences of data.
ResizeOp, support for sizes input.
ResizeOp, support for modes linear and cubic.
ResizeOp, support for coordinate_transformation_mode attribute.
ResizeOp, extend support for nearest_mode attribute.
Support gradient clipping for Lamb optimizer.
Remove ONNX and Protobuf from public headers and their targets from CMake export.
Support using Lamb optimiser on weights that have been serialised, using LambSerialisedWeight pattern.
Add python bindings for poplar_recoverable_runtime_error, poplar_unrecoverable_runtime_error, and poplar_application_runtime_error.
Implement (without an ONNX builder binding) TiedGatherOp and TiedGatherGradOp, see TiedGatherPattern for details.

Bug Fixes

Check input intersection in ConcatOp::bwdRegMap.
Add checks to ensure RTS propagation through RemoteLoad/RemoteStore is safe.
Support MultiExchange partial lowering to avoid circular task dependencies.
Avoid inplacing of ops which might result in race conditions.
Use the correct data type in pooling ops when using fp16. When doing so the partials were left in float32 which is now handled by Poplar.
Fix auto-diff transform bug.
Avoid annotating priorities in graph scheduler when non-optimal schedule is requested.
Clear PathFromLoss on cloned op when op sharding.
Modify current pybind11 bindings to allow for upcoming API additions.
Change CastOp::getGradOps() to only add gradients to the backward graph if input type is floating point.
Allow StepIO runtime assertions when an IR has been built without an ONNX model.
Fix NaN-loss when training with automatic loss scaling.
Organise source and test directories for upcoming API additions.
Fix type mismatch with float16 optimizer state.
Fix false positive matches in executable cache with different optimizer hyper-parameters.
Fix issues found when using gradient clipping with serialized matmuls.
In CMake, fix bug where there was a missing target dependency.
Add schedulePriority to Op attributes when serialising Ir with JSON.

Optimisations

Use embedding planner in ScatterReduceGrad.
Improve performance of aliasing checks.
Improve recompute pruning for final forward pipeline stage.
Remove gradCast from SGD2Decompose.
SessionOptions::delayVarUpdates only takes effect if options explicitRecomputation and explicitMainLoops are both off, as otherwise the optimisation is not needed.
Add SessionOptions::scheduleNonWeightUpdateGradientConsumersEarly which, if VarUpdates are being delayed, ensures that Ops which consume gradients but are not for updating weights (like gradient accumulation AccumulateOp, automatic loss scaling HistogramOp) are still scheduled as early as possible.
TiedGather and TiedGatherAccumulate patterns, which apply various optimisations to scenarios where a weight is consumed by both a Gather and a MatMul, but transposed on one side; the Gather and MatMul are on different pipeline stages; and an optimiser with extra state tensors is being used (like SGD with momentum).
- Disables Poplibs fully_connected_pass on MatMul as the resulting tile layout leads to less exchange in this scenario
- Elides the grad sum accumulator tensor for the weight by accumulating directly into the optimiser state tensor
- The optimiser state tensor is now consumed by two ops on different pipeline stages, so a stash and restore is introduced, but since both ops are on the same virtual graph (as they consume the same weight), we can elide this too.
- Replace the GatherGrad → Accumulate with a single SparseAccumulate, eliding the extra dense tensor inbetween
- Ensures the tile layouts of the weight and the gradient accumulator tensors are such that exchange is minimised during the weight update

Logging and documentation

Create a reserved prefix for AutomaticLossScaleProxy.
Add a script for measuring test coverage locally.
Updated readme to use correct version of pybind11.
Amend license information for suffixtree.
Fix compilation progress log messages.
Enhanced, comprehensive documentation of SGD optimizer and its implementation.

2.1.0

New features

Add explicit accumulation and step loop
Generalize the pruning algorithm
Refactor HostLoad and HostStore for use with explicit loops
Implement LoopScanOut pattern (ONNX scan out support)
Support Recompute checkpoints on the final forward pipeline stage
Add experimental option scaled_optimizer_state to Adam optimizer. Improves numerical stability of FLOAT16 accl1_type.
Allow CastGradOp in path of SerializeMatMul transform
Support setting activations for LSTMOp
Support gradient clipping for Adam and Lamb
Add an isDirectViewChain method for finding chains of inplace view changing ops.
Add additional checks and error messages to serialised matmuls. Behaviour is unchanged but the error messaging and checking is clearer.
Ability to use sequence_lens tensor in LSTM ops. See https://github.com/onnx/onnx/blob/master/docs/Changelog.md#LSTM-14 for details.
Added Host Store / Load Ops with transform, this is a pre-cursor to overlapped IO.
Add new variant of SGD optimiser that has separate gradient accumulation and velocity tensors
Add enum SGDAccumulatorAndMomentum for selecting whether to use combined or separate gradient accumulation and velocity tensors
Add ability to create Session directly from an Ir
Add ability to pass forward tensors required in the backwards pass by adding them as outputs of the forward graph
Refactoring of grad creation logic for subgraph ops
Make it possible for some graph inputs to not have an associated gradient tensor in backwards pass
Add RandomSetup transform
- 1. support for outlining dropouts
- 1. support for random ops in LoopOp and CallOp
Add mocking support
Add a TensorFlow variant of the RMSProp optimizer
Allow BatchNormalization to be in inference mode during training
Support uint8 streams and uint8 casts
Add the ReduceMedian op
Redesign AutomaticLossScale transform so that the loss scale update factor is persistent between runs
Support Adaptive and Adam optimizers with auto loss scaling (Preview)
Support auto loss scaling with sharding, pipelining, gradient accumulation, and replicated graphs (Preview)
Implement constant folding in ONNX passes
Add ONNX constant folding for shape op
Add ONNX pattern to remove PopART gemm
Support ONNX batch norm shape inference for 5 outputs
Add BitwiseNot, Fmod, ReduceMedian, Remainder, Round ONNX shape inference
Add a CTC beam search decoder Op to do CTC inference
Implemented bitwise operators
Use a snap::Graph rather than poplar::Graph
Expose Poplar callbacks to end users

Bug Fixes

Fix logic in the unwinding method of the Expand Op
Fix nested-loop compile time bug
Fix compile time bugs related to slow inplacing pass
Fix elementwise inplace reshaping of RTS tensors
Rewrite pattern to fix SubtractArg1GradOp
Fix modifiedRegionsByOps
Update aliasing info after matmul serialization
Add missing constexpr for detach
Explicit main loops fixes and additional tests
Fix randomsetup non-determinism
Add missing synthetic data check for host IO
Stop removeScope from mangling tensor names
Fix lower case data type name
Adjust opx modify check in irlowering (opxModifyChecking) to support integer types
Traverse only aliases with non-empty region overlap to the modified tensor
Add new required topo cons each time the optimizer changes
Fix case of missing metaShape propagation (RTS)
Fix type errors with FLOAT16 optimizer state and non-constant lossScaling
Fix recursive loop in MultiConvBaseOp::getPads
Do not save accum__ tensor in serialised executable, this fixes an error when trying to use gradient accumulation and saving a serialised executable.
Fix for map::at error in lstm sdk tests by ensuring seq_lens tensor is found correctly.
Sort tensors in Tensors::getOfType, this fixed an non-determinism across different OSes
Fixed dynamicslice shape inference
Correct NLL loss for when ignoring all indices to ensure parity with PyTorch.
Fix spurious CMake warning about non-existant POPLAR_INSTALL_DIR
Correctly use default copy/move semantics for OptimizerValue
Remove source of non-determinism in storage of topological constraints
Avoid spdlog formatting exception when stack trace contains ‘{}’
Disallow inplacing on an output of a recomputation
Moved POPART_PRINT_TENSORS code that prints outputs to after the op in the Poplar sequence
Fix mapping issue for tensorLocationSettingsOverride
Avoid applying PostNRepl pattern if output is graph output
Fix for test with callbacks that sometimes fail when multithreaded callbacks are enabled
Fix for dealing with missing edge gradients with SmoothL1Loss
Removed use of ‘final’ in DropoutOp classes to make subclassing possible
Made it so createOp returns DerivedOp* instead of Op*
Set default prefetch buffering depth to 1
If doing implicit recomputation, require that ops with a path from the loss are scheduled after those that have a path to the loss
Do not alias input for IdentityLossOp with no reduction
Fix ONNX gemm performance regression
Set default options for test device when simulate target
Add missing input to loss scale update op.
Throw error in Ir if using AnchorReturnType other than ALL or SUM with explicit main loops
Fix casting from 16-bit floating point to 8-bit (un)signed integer to behave like NumPy’s casting
Remove leakage of private dependencies in the public headers

Optimizations

Use Poprithms to do inplacing, which makes the transform orders of magnitude faster
Accelerate scheduler in non-optimal case by ignoring certain annotations
Accelerate scheduler by using transitive closure optimizations to insert additional edges
Add unit test for modifiedRegionsByOps
Improve prune speed and graph traversal utility
Improve implicit tensor and graph scope handling
Schedule pipeline Restore operations as late as possible
Set the global seed on each replica to the same and offset random operations by the replica index on the device
Add patterns to avoid keeping additional copies of tensors due to outlining
Calculate SoftmaxGrad from the output activation to avoid recomputing
Add tests for negative padding
Refactor Op::setup methods
Speed up annotateAccumulateOuterFragment by returning optimizer prefixes as const
Make TaskId type cheaper to construct
Change Tensor::anyAlias to use Poprithms backend
Various small refactorings (removal of nEdgesToLoss and Op::readyToCreateGradients)
Testing and refactoring of autodiff
Made host-reduce transformation test more robust
If CastOp has same input type as output type, convert to identity
Release the GIL before entering Session::run
Enable Poplar multithreaded IO by default
Optimise PopART gather memory usage with slice planning

Logging and documentation

Add logging and timing to simplify triage of compile time bottlenecks
Remove or lower noisy warnings
Change default log level to WARN
Moved/ Removed the examples in PopART to public_examples where relevant. Otherwise removed unnecessary examples.
Added Graphcore opset functions to the Python api docs
Update Python docs and add missing docstrings where appropriate.
Add developer notes for transforms
Added stream operators for std::tuple and std::pair
Remove logging on prefetch failure
Added Trompeloeil instructions to README.md
Make SessionOptions comments Doxygen-friendly
Support weight-specific optimizer tensors for automatic loss scaling
Add an optional user progress logger for graph compilation

PopTorch Changelog

2.2.0+22705

Migrated to PyTorch version 1.9.0
Support for torch.roll
Support for torch.clone
Add modelName session option that can be passed to PopART
Support List inputs to a model
Tuples/Lists of constants can now be returned by a model
Add enableProfiling convenience method in poptorch.Options to enable profile report generation
Fix bug with torch.Tensor.repeat when applied to an input during training
Fix bug with aten::to when applied to a constant used as an input to another node
Improved error message when encountering untraceable types during compilation
Support for torch.gather. Please note: this operator is known to cause long compilation times. Consider using a onehot-based solution instead or torch.index_select if appropriate.
Using a convolution layer op with the value of padding greater than or equal to kernel_size is now supported.
Support for torch.Tensor.new_ones and torch.Tensor.new_zeros
Support for torch.flip
Support for PopART ops attributes
Support for exceptions categories

2.1.0

Support for torch.unbind
Add option to set poptorch.Options using options specified in a config file.
Add mode=poptorch.DataLoaderMode.AsyncRebatched
Support for PopART name scopes via poptorch.NameScope
Add mixed precision automatic casting
Support for torch.cross
Support for torch.functional.one_hot
Support for torch.int8 data types
Support for torch.median
Support for torch.index_select
Support for torch.scatter_add
Add poptorch.Options.Precision.enableFloatingPointExceptions to control floating point exception behavior
Support for inplace changes to inputs.
Add option to log the number of IPU cycles used in executing the main graph
Support for torch.nn.GRU
Add automatic loss scaling option which can be enabled via poptorch.Options.- Training.setAutomaticLossScaling. (Preview)
Add poptorch.BlockFunction decorating for assigning an existing function to a block.
Add mechanism for inspecting arbitrary tensors
Add custom operator for CTC beam search decoding: poptorch.ctc_beam_search_decoder
Add a separate tensor variant (now default) to the SGD optimiser.
Add a TensorFlow variant to the RMSProp optimiser.
Use of SGD via PyTorch’s or PopTorch’s API now results in use of the new separate tensor variant by default. To revert to the previous default variant, use poptorch.optim.SGD with use_combined_accum=True.

Poplar Changelog

2.2.0+166889

New features

Split Poplar’s runtime errors into categories to allow for automatic recovery

Bugs fixes

Fix deadlock when the number of worker threads is high in comparison with the amount of stream callbacks to handle
Fixed occasions when in-place binary elementwise operations would return the wrong results when both inputs alias each other
Fixed typo in the profiler for GetGlobalConsensus programs
Changed host exchange to use the correct sync type
Fixed a foreign key issue when generating the profile
Avoid initialisation of already allocated remote buffers
Fix error during compilation when putting both MultiVertex and Vertex codelets into the same compute set
Updated some broken documentation links
Reduced overly verbose trace logging
Removed imprecise supervisor stack check that caused false positives

Other improvements

Improved the performance of the deterministicWorkers: portable engine option.
Support unbuffered completion mode in the PCI complex
Removed unnecessary copy of the binary from a serialised executable
Optimised the codegen when patching remote buffer copy headers
Removed the V1 and experimental profiler formats
Improved the latency (including random spikes) during model execution
Store lowered vars in the main profile file
Optimize away starting syncId copies by feeding the program’s dataflow result back to itself
Extend program analysis to eliminate syncId copies inside repeat loops when possible
General documentation improvements to Graph.hpp and Engine.hpp
Update Loop/Repeat* descriptions with cross references and a bit more info
Improved host memory usage during compilation
Error messages now provide enough information to identify the IPU at fault when there are multiple Poplar processes
Better error checking when extracting the archive from the executable
Standardise printing of StreamCopy programs
Allow engines to have names
Add support for serialising the executable from the engine
Set profiler.perExecutionStreamCopyCycles as default
Add a symlink to the debug.cbor in the parent directory
Print compute set name when add vertex fails
Extend logging to include which IPUs are in each sync group
Better document what types are supported by Hardware, poplar and frameworks.
Updated IPU Programmer’s Guide
Updated the tutorials links in the user guide

2.1.0

New features

Many compile time improvements across graph construction, engine compilation and profiling
Made exchange code relocatable, so it will be copied into the executable region instead of the model going out of memory
Added support for replication when using the IPUModel
Multi-IPU support for a single remote buffer
New improved bootloader for faster binary load times
Extended stream callback interface that support waiting and resuming

Bugs fixes

Fixed a quadratic compile time behaviour when applying element constraints
Print the correct link register in crash dumps
Fixed an issue where autoReport would fail silently
Fixed a codegen issue where excessive sync instructions would be generated when profiling
Better error handling when setting the print stream on an engine
Fixed out of bound access when using replica subset
Fixed an overflow of the cycle counter
Fixed a data race that could occur when a stream callback calls prefetch and complete simultaneously
Fixed a double move that could happen during data flow analysis
Fixed some compilation warnings when compiled with clang
Allow the Poplar SDK to be installed in a different tree structure and still find the libraries
Skip execution profiling when all replicas are marked as not to be profiled
Made an error during code lowering less cryptic
Fixed an assertion failure when an unconnected data stream is used
Fixed a crash when using the experimental profiler format
Added a missing WriteUndef that was causing tensors to remain always live
Clear remote buffers during device prepare
Sorted some graph state because it was causing compilation to not be deterministic across machines
Variables that require init were allocated in the uninitialisable region when they should not
Fixed an issue that was preventing executables larger than 32GB from being loaded
Added a missing implicit sync during profiling that was causing the profile to appear mangled
Fixed latency spikes seen on single IPU models
Removed an incorrect assertion that was firing for some broadcast streams
Corrected calculation of whether a stream is exclusive on CPU if there are multiple copies
Avoid initialisation of already allocated remote buffers
Fixed a crash in Poplar when using NUMA aware parallel callbacks

Other improvements

Support If programs for integer types other than BOOL
Reduce memory allocation overheads
Updated assembly programming guide for Mk2 (and renamed to Vertex Programming Guide)
Document IPU specific builtins.
Improved sub-graph replication efficiency for sparse layers
More support for merging host copies on the device
Exposed WORKER_SCRATCH_SIZE for customer vertices
Report all variables involved in exchanges
Check for and report all IPU errors by default
Better validation and error reporting for sync misconfigurations
Enable gateway mode when a target is created from a device which has a gateway interface
Added ability to determine if a IPU is connected to a gateway interface
Support for multiple memory configurations with IPU-M2000s
Optimise runtime by pre-computing some more information at compile time rather than runtime
Better log output on host sync timeout shows what each tile’s state is
Better liveness analysis when profiling
Added PVTI tracepoints to Poplar compilation and more to the runtime
Optimised some instrumentation codegen
Added more Repeat program options to Poplar
Profiler now collects the engine options used during compilation
Enabled write combining by default when on gateway systems
Optimised the copy library by extending the range supported by the fastest vertices
Optimised the host exchange codegen
Added documentation of the exchange planning options
Optimised device memory usage by the service tables when on gateway systems
Optimised speed of error polling code in the case there is no error
Made the timeout after which we check for exceptions / SOC errors configurable
Add date and time information to the profiler
Allow write combining on Mk1 devices
Improved logging when the host is waiting for the device
Improved data flow analysis allows deducing the sync ID instead of querying the host for it
Include CPU usage in the logging for compilation
Added some basic host memory plotting to PVTI during compilation
Improved the documentation for loops and repeats
Added a new Vertex type: MultiVertex for sharing vertex state between instances
Improved how the sub-graph replicator handles constants

Poplar Libraries Changelog

2.2.0

New features

Added support to the embedding planner to optimise for cycles

Bug fixes

Fixed some compile time issues when planning large 3D convolutions
Fixed for elementwise ops that could fail with bug inputs
Fix copyright notice in ConvPartialsStridesPacking.hpp
Added a missing alignment for a non-linearity vertex that could cause a tile exception
Updated popfloat to match the 1-byte data type used by TensorFlow
Fixed a NaN exception when using Gfloat types in popfloat
Added missing debug context for call to Fill in the embedding layer

Other improvements

Added popops/NormaliseImage.hpp to Poplibs API document
Added support for user design priorities in popsolver
Added an optimisation to hoist the broadcasting of non-sliced operands out of the loop during serialised convolutions
Use the new builtins for isfinist and isnan
Updated some of the existing vertices to use MultiVertex
Extended the range of the nx1 convolution vertex
Optimise the 1x1 convolution vertex inner loop
Allow expressions in popops to be compared for equality
Add error function (erf) in poplibs
Optimisations for LSTM variable time steps
Estimate time step interval for WU matmuls inside LSTM’s
Sped up the unit tests
Use the 64-bit load store instructions in the dynamic slice 1D vertex
Improved error messages related to elementwise map expressions
Optimised the performance of the GELU vertices
Add support for doing a MAX operation during a multi-update
Improved the documentation for pooling
Add faster versions of innermost loop of exponent unary op
Allow for 32-bit partials during pooling for a 16-bit input

2.1.0

New features

Many improvements to the time spent in graph construction
CTC loss inference using a beam search
Distributed batch norm

Bug fixes

Deduplicated repeated information in the documentation
Fixed a bank conflict that could occur in the broadcast vertices
Fixed an issue where a call to varianceToInvStdDev is slower than equivalent map expression
Fixed a possible over-write in the cast vertices
Fixed a bug in the calculation of the RELU in the LSTM layer
Fixed cases in SLIC codelet where data was being over read
Fixed a bug in the convolution planner that meant it was not pruning as much of the search space as it could do
Fixed a liveness issue that caused the input of some convolutions to remain live
Added missing aXPlusbY vertices
Fixed a liveness issue where intermediates tensors remain live in reductions
Correctly account for kernel dilation when computing input range in convolution planner
Fixed an issue where poputil::duplicate with GATHER_AND_PRESERVE_TILE_ORDER_AND_ALIASES does not copy correctly
Do not map scale tensor in ScaledAdd
Changed the behaviour when casting to 8-bit types to match the C++ standard
Fixed a crash that could occur during convolution option validation
Fixed an overwrite by the BroadcastVectorInnerInPlace vertex
Fixed some issues when dividing in codelets above a certain range
Added a missing reduction vertex
Don’t use doubles in the implementations of sin() and cos()
Remove unnecessary variables from the public interface that was causing ABI imcompatibilities

Other improvements

Added scaledAdd specialisations where input types can be both half and float
Better test coverage for elementwise vertices
Added the swish non-linearity
Added the hard sigmoid non-linearity
Added non-inplace variants for non-linearities
Added multi-convolution variant of weightsTransposeChansFlipXY
Made the PopLibs RNN activations configurable
Improved the elementwise library to produce better output groupings
Add support for (u)int8 dynamic slicing
Added PVTI tracepoints to PopLibs for graph construction
Specialisations for inner broadcast vertices with small region sizes
Use MultiVertex for the TopK vertices and some of the elementwise vertices
Implement efficient dot products when using the triangle solve library
Added support for half partials when using the VMAC convolution vertices
Improved the efficiency of the VMAC vertices
Added the option to gather the output of a convolution to improve efficiency of following layers
Added cube root support to popops
Added support for an outer stride to the reduction vertices
Added support to the LSTM library for sequence size to be a runtime variable
Reduced the amount of vertex state for broadcast elementwise ops
Optimised map expressions that include division
Added a new operator to cast, pad and normalise image channels
Added support for processing the WU matmuls in the LSTM/RNN/GRU layers in mini batches
Better tile mapping for vertices used for loss calculations
Better reduction output mapping to avoid subword writes
Added support for the error function (erf)
Better worker utilisation for the MultiUpdate and MultiAdd vertices
Add support to the pooling library to allow for 32-bit partials when given 16-bit input

GCL Changelog

2.2.0

Bug fixes

IO tile allocator now returns tile pairs if requested by the caller
Fix for multi-ILD Collective::MEAN operator

Other improvements

Improve host rearrangement performance for transposed tensors

2.1.0

New features

Syncful collectives are now the default for three phase reductions over GW-Links
Added CollectiveOperator::MEAN
8 IPUs per replica support
1 IPU per replica over GW-Links support (peripheral ring)

Bug fixes

Fix for unrecognised option ‘axis’ when called through non-replicated collectievs
Fix for CollectiveBalancedReorder host rearrangement overflow
Fixed GW-Link send direction which enables daisy-chaining more than 2 IPU-PODs

Other improvements

Add string representation of a CommGroup
Apply dithering based on tensor size and type in selection of tile to map control code to
Map the step counter to the subgraph’s first tile
Use a minimum grain size for each fragment on a tile
Broadcast tensor instead of ring reduce for small tensors
Use unsigned integer type instead of a smaller type for grain size in the call to quad directional ring
Improve mapping of elements in all-reduce
Compensate for tensors spanning multiple IPU in tile calculation
Re-enable syncless GCL to only use a tile sub-set to improve latency
Add debug context for calls to dynamicUpdate
Include replica size in error messages
Optimize overhead for fp16 element-wise reduction dispatch
Do not enable address bit clearing for service table reset
Relax region constraints for syncless xreq bundle
Double the speed of element-wise fp16 reductions for syncless collectives
Support for cycle counting individual phases of three phase allreduce

PopDist Changelog

2.2.0

New features

Improved error reporting in case of a missing IPU device.

2.1.0

New features

Support offline mode with PopTorch without attaching to device.
Prevent poptorch.Options.Distributed being changed when using PopDist.
Update to use new TensorFlow IPUConfig option configuration API.

PopRun Changelog

2.2.0

New features

Added checks for IPU/GW-Link routing and sync type of existing partititons. The existing partition is checked against the values passed to --ipu-link-routing-type, --gw-link-routing-type and --sync-type. In case of a mismatch, the partition will be updated if --update-partition=yes is provided.
Improved error message when the application was terminated by SIGKILL.

2.1.0

New features

Show full hostnames after the topology table if they cannot fit inside the table.
Added command-line arguments for additional V-IPU options: --ipu-link-routing-type, --gw-link-routing-type and --sync-type.
Improved error reporting when user program is missing from the command-line invocation.
Added support for passing an environment variable to a specific instance by using --instance-mpi-local-args=<instance-index>:-x VAR=VALUE.
Added initial support for the Slurm workload manager. All the resources allocated by Slurm are made available to PopRun.
Removed dependency on the user locale. Avoids crashing in the case of an incorrectly configured user locale.
Improved NUMA node binding when using cpusets. Only the NUMA nodes allowed by the current cpuset are used.
Forward V-IPU timeout argument --vipu-server-timeout to IPUoF by internally passing the environment variable IPUOF_VIPU_API_TIMEOUT.
Improved SSH error reporting. Instead of hanging on authentication issues, a clear error is reported.
Automatically enable the gateway mode target option when using V-IPU.
Added support for running programs in the current working directory without a ./ prefix for consistency with mpirun.
Automatically enable NUMA awareness when there is more than one instance per host.
Support passing --mpi-local-args and --mpi-global-args multiple times by merging the values.
Verify the final state of partition after creation/reset. An error is reported if the partition was not created/reset correctly.
Get V-IPU server address from local V-IPU configuration if not specified as command-line argument.
Set the target options based on values reported by the V-IPU server.

Libpva Library Changelog

2.2.0

New features

Add APIs to get liveness information from the compilation report.
Add APIs to the lowered variable information from the compilation report. See LoweredVariable
The openReport API now optionally takes the debug.cbor input file.
Add APIs read the DebugContext information from the debug.cbor and associated with programs and variables.
The documentation for libpva has been moved from the Poplar user guide to a standalone user guide.

Bug fixes

Fix issue with lists with more than 2 to the power 16 elements being truncated.
Fixed issue in Python binding that prevent access to VertexInstances & ComputeSets

2.1.0

New features

First release of the PopVision Analysis Library.
Removed the preview namespace.
Includes report of engine options.
Added information for vertex instances and compute sets.
Add API TileCycleTotals to get total cycles by type calculated by the execution.
Added API to read the timestamp of when the report was created.
The GlobalExchangeProgram now inclues the number of exchange and sync cycles.
SyncType now includes for GlobalExchangeProgram & SyncProgram
The Target now includes supervisorInstrFetchDelay, interleavedMemoryStart & memoryElementOffset

Bug fixes

Fixed a bug with the Run.steps not returning the correct list when multiple IPUs are profiled.

TensorFlow Changelog

2.2.0

New features

Migrated codebase from TensorFlow 2.1 to TensorFlow 2.4.
Improved Keras integration, see the documentation for full details.
Added support for concurrent pipeline stages - see Concurrent pipeline stages section in documentation for full details.
Improved operation scheduling to reduce memory usage.
Performance optimisations when using the experimental replicated_optimizer_state_sharding option with pipelining.
Compile-time and run-time optimisations.
Added EffectiveTransformer Keras layer to efficiently handle transformers without padding the input sequences.
Added AssumeEqualAcrossReplicas Keras layer and assume_equal_across_replicas operator for marking operations in the graph as equal across replicas to aid with divergent control flow.

Bug fixes

Fixed a memory leak which caused host memory usage increase when using Keras.

2.1.0

New features

Added tensorflow.python.ipu.config.IPUConfig, a new IPU system options configuration API designed for usability which will eventually replace the old API.
Improved the ON_DEMAND IPU connection type to wait for available IPUs in the system.
Improved support and performance of recomputation checkpoints in pipelined models including RecomputationCheckpoint Keras layer
Added an option to accumulate pipeline results, with an ability to control the accumulation data type, in order to improve performance.
Added support for configurable activations in IPU Keras RNN layers.
Added PVTI integration into the XLA compiler.
XLA compile time optimisations.
Add experimental support for distributed batch normalisation.
Support for Keras Upsample2D for nearest-neighbour and bilinear interpolations.
Improved integration with PopVision Graph Analyzer tool.
Performance optimisations for convolution neural networks.
Implementation of CTC beam search including ipu.keras.layers.CTCInferenceLayer and ipu.keras.layers.CTCPredictionsLayer Keras layers.
Support for resetting the IPU system configuration within a Python process to allow for different configurations between training and inference.
Support for setting the same seed for the hardware random number on all model replicas.
Support for on-device assert operations.
New IPU specific operations: tf.python.ipu.statistics_ops.histogram, tf.python.ipu.statistics_ops.histogram_update, tf.python.ipu.image_ops.normalise_image, tf.python.ipu.nn_ops.hard_sigmoid, tf.python.ipu.nn_ops.swish, tf.python.ipu.slicing_ops.sequence_slice, tf.python.ipu.slicing_ops.sequence_slice_pack and tf.python.ipu.slicing_ops.sequence_slice_unpack.
Improved performance of tf.math.erf, reduction operations, tf.python.ipu.rand_ops.dropout and the ipu.keras.layers.Dropout Keras layer.
Provided TensorFlow 2 version of ipu.keras.optimizers.GradientAccumulationOptimizer and ipu.keras.optimizers.MapGradientOptimizer optimizers.
Added automatic checking of tensor sizes before placing them IO tiles in order to avoid out-of-memory errors.
Improved Poplar tensor allocation, including conditional operations, to improve model performance.
Improved documentation for distribution strategies.
Added CPU feature guard to give a meaningful error message when a built TensorFlow wheel is not compatible with the CPU architecture in the system.
Add support for setting prefetch depth with estimators.

Bug fixes

Fixed numerical issues where random number generator operations were fused incorrectly.
Support for 8-bit integer for IPU Keras Models.
Fixed a crash where if a model variable was used in different pipeline stages it could cause a data type mismatch error.

Known issues

The following section will detail known issues in v2.2.0.

Each product will be detailed separately.

Product	Paragraph
Driver & Utilities	Driver & Utilities known issues
PopART	PopART known issues
PopTorch	PopTorch known issues
Poplar	Poplar known issues
Poplar Libraries	Poplar Libraries known issues
GCL	GCL known issues
PopDist/PopRun	PopRun/PopDist known issues
Libpva Library	Libpva Library known issues
TensorFlow	TensorFlow known issues

Driver & Utilities known issues

1.0.52

None.

1.0.51

None.

PopART known issues

2.2.0+166889

None.

2.1.0

Using a conv op with the value of padding greater than the convolution kernel size will result in an error when training. Use a pad op instead for the excess padding.

PopTorch known issues

2.2.0+22705

None.

2.1.0

Using a convolution layer op with the value of padding greater than or equal to kernel_size results in an error when training. Use a constant pad layer instead of the excess padding prior to the convolution.

Poplar known issues

2.2.0+166889

None.

2.1.0

None.

Poplar Libraries known issues

2.2.0

None.

2.1.0

None.

GCL known issues

2.2.0

None.

2.1.0

None.

PopDist known issues

2.2.0

None.

2.1.0

None.

PopRun known issues

2.2.0

None.

2.1.0

None.

Libpva Library known issues

2.2.0

None.

2.1.0

None.

TensorFlow known issues

2.2.0

The experimental_normalise_gradients feature of TF2 can produce unstable results when the number of replicas or the gradient_accumulation_steps_per_replica is large.

2.1

None.

Compatibility changes

The following section will detail compatibility changes in v2.2.0

Product	Paragraph
Driver & Utilities	Driver & Utilities compatibility changes
PopART	PopART compatibility changes
PopTorch	PopTorch compatibility changes
Poplar	Poplar compatibility changes
Poplar Libraries	Poplar Libraries compatibility changes
GCL	GCL compatibility changes
PopDist/PopRun	PopRun/PopDist compatibility changes
Libpva Library	Libpva Library compatibility changes
TensorFlow	TensorFlow compatibility changes

Driver & Utilities Compatibility changes

1.0.52

None.

1.0.51

None.

PopART Compatibility changes

2.2.0+166889

[API] Remove deprecated grouped matmul option. It will be left to the user to perform the grouping manually by concatenating inputs.
[API] Remove deprecated Patterns::Patterns(std::vector<PreAliasPatternType> types) constructor. Use Patterns::Patterns(std::vector<std::string> types) instead.
[API] Remove deprecated bool Patterns::isPatternEnabled(PreAliasPatternType t) method. Use bool Patterns::isPatternEnabled(std::string t) instead.
[API] Remove warnings for GCL_REAL_COLLECTIVES and GCL_MAX_BYTES_PER_TILE use session options useSynclessCollectives and gclOptions["maxBytesPerTile"], respectively, instead.
[API] Remove deprecated if op constructor. See willow/include/popart/op/if.hpp for the replacement constructor.

2.1.0

[API] Replace the poprithms anneal scheduler with the poprithms shift scheduler
[API] Remove Undefined default from TensorLocation, TensorStorage and introduce OptionalTensorLocation
[API] Remove deprecated SessionOption `accumulationReductionType
[API] Remove access to the tile mapping from the public API
[API] Deprecate grouped matmul SessionOption
[API] Deprecate loss grad op output scaling behaviour when replicatedGraphCount > 1 and reduction is ReductionType::Mean
[API] Expose the automatic loss scaling hyperparameters to the user via SessionOptions
[API] Remove deprecated functions that took debugPrefix

PopTorch Compatibility changes

2.2.0+22705

Removed accumulationReductionType which was deprecated in 2.1 in favour of accumulationAndReplicationReductionType in poptorch.Options.Training
Removed runningVarianceAlwaysFloat which was deprecated in 2.1 and replaced by runningStatisticsAlwaysFloat in poptorch.Options.Precision,

2.1.0

Removed Options.Popart which was deprecated in v2.0 and replaced with Options._Popart
Removed MultiConvPartialsType which was deprecated in v2.0
Deprecated poptorch.Options.Training.accumulationReductionType in favour of poptorch.Options.Training.accumulationAndReplicationReductionType
Deprecated runningVarianceAlwaysFloat in favour of runningStatisticsAlwaysFloat in poptorch.Options.Precision, as this new option computes both the running mean and variance in FP32 when this option is set to True.

Poplar Compatibility changes

2.2.0+166889

The “device” value for the engine option debug.computeInstrumentationLevel has been deprecated
The methods poplar::Graph::createReplicatedGraph and poplar::Graph::getNonReplicatedTensor have been deprecated. Use the top-level replication API instead.

2.1.0

Removed deprecated APIs and engine options from Poplar
Deprecated use of V1/V2 profile format
The following APIs have been deprecated:
- poplar::ProfileValue
- poplar::program::Sequence variadic constructor.
- poplar::Engine::getGraphProfile, poplar::Engine::getExecutionProfile, poplar::Engine::getProfile and poplar::Engine::resetExecutionProfile in favour of the PVA library instead.
- poplar::Graph::trace, PVTI has used to track graph construction time instead
The following engine options have been deprecated:
- target.maxStreamCallbackThreadsPerNumaNode, use the streamCallbacks.* options instead
- profile.format when format is v1 and experimental

Poplar Libraries Compatibility changes

2.2.0

None.

2.1.0

The following methods have been deprecated:
- poplin::preplanConvolutions and poplin::preplanMatMuls, use poplin::preplan instead
- The fields dataType, batchSize, timeSteps, layerSizes in popnn::gru::GruParams and popnn::lstm::LstmParams, these fields now exist in the RnnParams struct.
- poplin::rnn::RnnParams::timeSteps, use maxTimeSteps instead
- The GRU and auGRU overloads that take a realTimeSteps parameter
- The popops::reduce overload that takes a ComputeSet, use popops::reduceMany instead

GCL Compatibility changes

2.2.0

The following APIs have been removed:
- gcl::allReduce methods using popops::Operation - use gcl::allReduce with popops::CollectiveOperator instead
- gcl::allReduceToDestination methods using `popops::Operation - use gcl::allReduceToDestination with popops::CollectiveOperator instead
- gcl::allReduceInPlace methods using popops::Operation - use gcl::allReduceInPlace with popops::CollectiveOperator instead
- gcl::reduceScatter methods using popops::Operation - use gcl::reduceScatter with popops::CollectiveOperator instead
gcl::perIPUTiles argument list was extended and it can now return IO tiles that are tile pairs if requested by the caller

2.1.0

None.

PopDist Compatibility changes

2.2.0

None.

2.1.0

None.

PopRun Compatibility changes

2.2.0

None.

2.1.0

None.

Libpva Library Compatibility changes

2.2.0

There has been a change to the classes for liveness. — Instead of programStep.notAlwaysLiveBytes you now have to use programStep.notAlwaysLiveMemory.bytes — Instead of programStep.notAlwaysLiveVariables[x].name you now have to use programStep.notAlwaysLiveMemory.variables[x].name

2.1.0

None.

TensorFlow Compatibility changes

2.2.0

IPU specific Keras API for building models has been removed. See the the TensorFlow documentation for full details.
C++ Poplar TensorFlow libraries are private by default.
feed_name does not need to be specified for IPUInfeedQueue and IPUOutfeedQueue.
See the API changes section in the TensorFlow documentation for full details.

2.1.0

replication_factor does not need to be specified for IPUInfeedQueue and IPUOutfeedQueue.
See the API changes section in the TensorFlow documentation for full details.

Appendix

Appendix A : Additional requirements

PopVision Graph Analyser

To be able to view profiling reports generated by SDK v2.2.0, PopVision Graph Analyser v2.4 or later and PopVision System Analyser v1.2 are required.

TensorFlow

To correctly execute TensorFlow code please ensure:

Intel platforms

Use Python 3.6 as minimum version
A CPU compatible with the AVX-512 instruction set is needed.

AMD plaforms

Use Python 3.6 as minimum version
A CPU compatible with the Znver1 instruction set is needed.