Scope of this document

This document contains the Release notes for the Poplar SDK 2.4.0 for Graphcore’s IPU product family. The software deliverables covered by this document are the following:

Driver & Utilities

Driver and associated utilities needed by the Graphcore IPU.

PopART

The Poplar Advanced Run Time is a flexible ONNX-compatible runtime supporting both training & inference.

PopTorch

The PopTorch library provides a set of extensions for PyTorch to enable it to run on the Graphcore IPU hardware.

Poplar

A graph programming framework for the IPU.

PopDist/PopRun

Poplar Distributed Configuration Library (PopDist) is a library for configuring and coordinating distributed execution of (large-scale) machine learning applications.

TensorFlow

An implementation of the TensorFlow framework for the Graphcore IPU.

IPU TensorFlow Addons

A collection of Graphcore IPU specific features for the TensorFlow framework.

Package contents

The downloaded unified Poplar SDK will contain the following packages:

Ubuntu 18.04

Package

Version

Driver & Utilities

1.0.57

PopART

2.4.0+2529

PopTorch

2.4.0+40669

Poplar

2.4.0+2151

PopDist/PopRun

2.4.0+2151

TensorFlow 1

Graphcore TensorFlow 2.4.0

TensorFlow 2

Graphcore TensorFlow 2.4.0

IPU TensorFlow Addons

2.4.0

Ubuntu 20.04

Important

The Ubuntu 20.04 SDK is a preview release and is not yet fully qualified on all hardware platforms.

Package

Version

Driver & Utilities

1.0.57

PopART

2.4.0+2529

PopTorch

2.4.0+40669

Poplar

2.4.0+2151

PopDist/PopRun

2.4.0+2151

TensorFlow 2

Graphcore TensorFlow 2.4.0

IPU TensorFlow Addons

2.4.0

CentOS 7.6

Package

Version

Driver & Utilities

1.0.57

PopART

2.4.0+2529

PopTorch

2.4.0+40669

Poplar

2.4.0+2151

PopDist/PopRun

2.4.0+2151

TensorFlow 1

Graphcore TensorFlow 2.4.0

TensorFlow 2

Graphcore TensorFlow 2.4.0

IPU TensorFlow Addons

2.4.0

Debian 10

Package

Version

Driver & Utilities

1.0.57

PopART

2.4.0+2529

PopTorch

2.4.0+40669

Poplar

2.4.0+2151

PopDist/PopRun

2.4.0+2151

TensorFlow 2

Graphcore TensorFlow 2.4.0

IPU TensorFlow Addons

2.4.0

Note

See Appendix A for TensorFlow additional requirements.

Product support and compatibility matrix

SUPPORTED
These products are actively worked on: they will receive new features, general updates and security updates.
Notice of deprecation will be sent in advance for supported products.
DEPRECATED
These products will only receive security updates.
These products are expected to work with the indicated products however correctness is not guaranteed.
It is advised not to upgrade to this software version, unless strictly necessary.
In the future, these products can move to a Not Supported state, without further notice.
Support level will reflect the deprecated status.
NOT SUPPORTED
These products are not expected to work with this release.
No support will be provided.

Important

Deprecated products can be moved to a Not supported status without further notice.

IPU-M2000 System Software compatibility matrix

IPUM Model

Version

Support level

Notes

IPU-M2000 300-0024

2.4.0

Supported

N/A

IPU PCIe Hardware Support level

Model

Revision

ICU Firmware version

Driver version

Support level

Notes

C2 300-0004

All revisions

1.4.14

1.0.57

Deprecated

N/A

Note

Use Firmware revision in accordance with IPU revision.

Important

For Firmware revision, compatibility is only enforced for patch versions.

Driver Support level

OS

Support level

Supported Kernel Version

Notes

CentOS 7.4/7.5

Supported

3.10

CentOS LTS kernel.

CentOS 7.6

Supported

3.10

CentOS LTS kernel.

Microsoft Windows

Supported

Windows Server 2019

Ubuntu 18.04

Supported

5.4

Ubuntu LTS kernel.

Ubuntu 20.04

Supported

5.4

Ubuntu LTS kernel.

Debian 10

Supported

4.19

Debian LTS kernel.

Warning

It is strongly recommended to update the kernel module of the driver to the version included with this 2.4.0 release.
This is to avoid incompatibilities with the non-kernel components of this SDK

SDK 2.4.0 Support level

OS

Support level

Notes

Microsoft Windows

Not Supported

CentOS 7.6

Supported

In some specific instances we encountered a less than optimal model compilation time. Investigations are ongoing to address the problem.

Ubuntu 18.04

Supported

Ubuntu 20.04

Supported

Debian 10

Supported

Supported tools

Ubuntu 18.04

Tool

Support level

Version

GCC/G++

Supported

7.2.0

libstdc++

Supported

6.0.24

libc

Supported

2.27

binutils

Supported

2.30

Python

Supported

3.6

Boost library

Deprecated

1.70

Ubuntu 20.04

Tool

Support level

Version

GCC/G++

Supported

9.3.0

libstdc++

Supported

10.3.0

libc

Supported

2.31

binutils

Supported

2.34

Python

Supported

3.8

Boost library

Deprecated

1.71

CentOS 7.6

Tool

Support level

Version

GCC/G++

Supported

7.3.1

libstdc++

Supported

6.0.24

libc

Supported

2.17

binutils

Supported

2.28

Python

Supported

3.6

Boost library

Deprecated

1.70

Debian 10

Tool

Support level

Version

GCC/G++

Supported

8.3

libstdc++

Supported

6.0.24

libc

Supported

2.28

binutils

Supported

2.28

Python

Supported

3.7.3

Boost library

Deprecated

1.70

List of changes

The following sections will list changes in version 2.4.0, as well as older releases, for all products contained in the Poplar SDK.
There are three main sections, divided by argument:
Changelogs

Changelogs section lists important bug fixes and relevant functionality that has been added. Minor fixes or features will not be listed.

Known issues

Known Issues section will list all important issues known to date. This section will list issues that will impact Poplar functionality.

Compatibility changes

Compatibilities changes section will capture any change that needs to apply existing code, to remain compatible with this version of the SDK.

Changelogs

Product

Changelog

Driver & Utilities

Changelog Driver & Utilities

PopART

Changelog PopART

PopTorch

Changelog PopTorch

Poplar

Changelog Poplar

Poplar Libraries

Changelog Poplar Libraries

GCL

Changelog GCL

PopDist/PopRun

Changelog PopRun/PopDist

Libpva Library

Changelog Libpva Library

TensorFlow

Changelog TensorFlow

IPU TensorFlow Addons

Changelog IPU TensorFlow Addons

Driver & Utilities Changelog

Kernel Module

1.0.57

  • T45456: PCIe driver uses pin_user_pages API with Linux kernels 5.8.0+.

  • T47498: Added Host Link Correctable Errors.

  • T48270: Update the IPU PCIe driver to correctly use the DMA API.

  • T48616: Driver scripts improvements.

  • T49874: Clear allocated PL-DDR memory prior to use on native PCIe.

1.0.55

  • T38724: Implement PCIe P2P support in IPU device driver.

  • T42607: Sensor data (power and temperature) is displayed between containers / namespaces.

  • T42657: Changes to support Linux kernel 5.10.

  • T42745: Fix error handling when IPU is removed.

  • T43360: Improve the error message when the PCIe driver fails to load due to a missing device file.

  • T45666: Provide an API to enable the Multi Read Service Table in the Gateway.

Low level libraries and tools

2.4.0+2151

  • T29027: Add GCDA_OPTIONS environment variable to allow setting runtime options as json.

  • T30646: Extended gc-iputraffictest to support testing of more than 16 IPUs.

  • T37217: gc-monitor extended to support multi GCD partitions.

  • T38068: Add single IPU mode for iputraffictest.

  • T43718: Added Python documentation to tracing library.

  • T45122: Allow Poplar to reconfigure links in static partitions.

  • T45371: Added APIs to attach/RDMA-write to IPU tile memory and simple peer-to- peer RDMA write to tile tests to measure the P2P bandwidth and latency.

  • T45594: Query the IPU for the architecture during device discovery rather than using the architecture defined by the VIRM configuration.

  • T45785: Added API to query the last error status.

  • T46259: Updated PVTI to support binary meta data.

  • T46401: Gc-monitor support for multi-GCD partitions.

  • T46855: Error if both an IPUoF configuration file and the IPUOF_VIPU_API_* environment variable is used.

  • T47225: Improve SERDES link training to allow auto link negotiation.

  • T47348: Avoid printing a driver version warning in gc-monitor when no IPUs are found.

  • T47414: When invoked without a device id, gc-reset will now correctly choose a the largest device for partitions greater than 16 IPUs.

  • T47498: Added Host Link Correctable Errors.

  • T47619: Fix segfault in gcipuinfo when no devices are found.

  • T47640: Initialise the IPU code/data/stack size attributes and the IPU utilisation attributes prior to attach.

  • T47727: Fix failure to start if port of RDMA device is UP but no IP address configured.

  • T47913: Improve handling of IPUoF configuration errors.

  • T48317: SoC configuration code tidy up.

  • T48377: Added documentation for GCDA attributes.

  • T48434: Added gRPC health check in IPUoF client and server.

  • T48435: Add device health check API to gcipuinfo.

  • T48437: Return getDevices result by value.

  • T48553: A new GCDA_OPTIONS feature to simulate SoC errors.

  • T48907: Set gRPC deadline in all IPUoF client requests.

  • T48911: Fast fabric error reporting during PORT_DOWN or connection unreachable.

  • T48939: Increase server robustness to link down.

  • T48947: Catch fabric exceptions when storing the sensor value in sensor loop.

  • T48956: Fixing missing error propagation in some cases.

  • T49126: Fix bug affecting gc-monitor on non-reconfigurable partitions.

  • T49134: Log rather than throw when automatically detaching during object destruction.

  • T49205: Prevent potential long delay when read_config_register calls times out.

  • T49448: Reduce timeout on CM QP failure.

  • T49477: Added gc-podman and container support package.

  • T49802: Improve shutdown time when using GCDA_MONITOR.

  • T49853: Fixed device ID initialisation in IPUoF server constructor.

  • T50043: gcipuinfo: add path parameter to application event record retrieval API.

  • T50044: Extend the timeout for attach during clearing of memory at IPUoF server startup.

  • T50404: Fixed some error messages when server is killed early.

  • T50424: Fixed some error message when PL DDR clearing is not complete when shutting down server.

  • T50857: Fix data race in multithreaded link training when using partial link training config.

  • T51093: gcipuinfo: add attributes to application event record listing IPUoF hosts.

  • T51526: gc-monitor: track IPUs that are in use by other headnodes.

  • T5764: Add documentation for runtime options.

2.3.0

  • T30646: Extended gc-iputraffictest to support testing of more than 16 IPUs.

  • T38726: Implemented IPUoF/RDMA to IPU P2P APIs.

  • T39915: Fixed GCDA link configuration memory leak.

  • T40012: Provide more information in device attach error messages.

  • T40561: gc-hosttraffictest: disable data checking in read-only mode.

  • T41038: Add metadata support to PVTI’s tracepoints.

  • T41365: PVTI documentation improvements.

  • T42308: gc-flops prints help by default. Documentation added.

  • T42478: Update physical slot attribute to contain sensible values for cases where slot information cannot be read from system.

  • T42557: Improved classification of different logging levels in GCDA and IPUoF.

  • T42582: gc-monitor documentation updated.

  • T42607: Sensor data (power and temperature) is displayed between containers / namespaces.

  • T42745: Fix error handling when IPU is removed.

  • T43014: Fix lockup when running with GCDA_MONITOR over IPUoF.

  • T43383: Improved gc-links error message when attempting to train links on IPU-POD systems.

  • T43404: gc-iputraffictest: reduce time required to initialise tiles.

  • T43575: Add ForceParityReset option to reset API.

  • T43694: Throw exception if attempting to start an application when the IPU bootloader has not loaded anything.

  • T43910: Remove the assumption that all images in graphcore_binary are tile images.

  • T43943: Add IGNORE_ICU_ATTACH_ERROR to suppress exception upon ICU attach failure.

  • T44256: ipuof server: disable HSP interrupt handlers.

  • T44349: gwlinkstraffictest: fix GiB and GiB/s output.

  • T44539: Fix IPU-POD Kubernetes IPU connection failure due to invalid GID.

  • T44573: Log IPU sync groups to aid debug.

  • T44725: Preserve the exception type when catching and rethrowing exceptions.

  • T44959: Add ‘Hardware’ target type used as argument to GCDA’s getDevices API which will return either PCIe or IPUoF devices.

  • T44970: Added GCDA_LOG_MASK environment variable to filter GCDA debug log messages.

  • T45026: Add missing gcipuinfo golang support files.

  • T45090: Add gcipuinfo C wrapper library.

  • T45150: Use null metadata id for trace events without metadata.

  • T45327: Added IPUoF mirror fence API.

  • T45493: Added support for RDMA NICs without RoCEv1 support.

  • T45592: IPU cycle counter is not enabled during reset if ICU firmware version is 2.0.0 or later.

  • T45666: Provide an API to enable the Multi Read Service Table in the Gateway.

  • T45740: PVTI dabatabase schema updated.

  • T45785: Added API to query the last error status.

  • T45879: Update to report an error if $IPUOF_VIPU_API_PARTITION_ID is defined, but the partition ID is invalid.

  • T46139: Bind added metadata with bindBlob instead of bindText in PVTI.

  • T46212: GCDA_LOGGING and GCDA_LOG_LEVEL environment variables no longer enable IPUoF logging.

  • T46213: gc-info –tile-overview will now check for exceptions in workers.

  • T46259: Updated PVTI to support binary meta data.

2.2.0

  • T25931: Support reading from the middle of a stream by limiting the bytes read when reading binaries.

  • T30865: Logging now uses ISO8601 UTC timestamps and %p in GCDA_LOG_DEST will be replaced with the process ID.

  • T32069: Improved consistency of PCIe Id / PCI id terminology within gc- info.

  • T36207: gc-hostsynclatencytest fixes for native IPU-M2000.

  • T38422: Fixed boost exception on application shutdown when generating PVTI and using GCDA_MONITOR.

  • T39043: Fix missing power and temperature fields when requesting all fields in gc-monitor.

  • T39458: Bind IPUoF-servers cq_handler to dedicated CPU and use CQ polling mode for all IPUoF-servers.

  • T39680: Improved IPUoF latency and avoid spikes in IPUoF HSP update.

  • T39698: Improved performance when checking for SoC errors via GCDA.

  • T39887: Enhanced gc-hostsynclatencytest to output additional statistics.

  • T39891: Fixed GCDA device discovery from multiple threads.

  • T39956: IPUoF client detaches from device after RDMA fabric error if IPUoF server is reachable.

  • T40048: Enhanced gc-binary error information on failure.

  • T40067: GCDA environment variables are now ignored, if present but set to zero or empty.

  • T40430: Fixed ICU comms lockup during multi-threaded use of GCDA_LOGGING.

  • T40567: Reduced the thresholds for IPU clock throttling log messages.

  • T40717: Improve handling of IPUoF exceptions within GCDA.

  • T41042: Removed the call to gc-inventory from the GCIPUINFO library.

  • T41365: PVTI documentation improvements.

  • T41418: Fix a race condition between RDMA disconnect and HSP update.

  • T41427: Fix incorrect statistics in gc-iputraffictest when using a large number of iterations.

  • T41556: The LIBPVTI and LIBPVA documentation has been split out from the Poplar documents to separate documents for each library.

  • T41641: gc-powertest now supports finer power level control with -p option.

  • T41779: Added gc-flops, a tool to measure floating point chip performance (Mk2).

  • T41868: Fixed invalid register field check when checking for errors via $CMGMTEVVR.

  • T41882: Reduce IPUoF initial latency in HSP update.

  • T41936: Support JSON output for gc-info device status commands.

  • T42053: Show state key in gc-info –tile-overview.

  • T42097: Replace gc-inventory-based gcipuinfo library with interface to GCDA.

  • T42099: Added Python and Go support to the pure API variant of gcipuinfo.

  • T42190: Avoid printing duplicate board temperature and power in process table for gc-monitor.

  • T42369: Extended gc-flops timeout.

  • T42445: Added an interface to GCDA to expose the IPU chip ID.

  • T42502: Ensure PVTI generation is completely disabled when PVTI_OPTIONS={"enable":"false"}.

  • T42557: Improved classification of different logging levels in GCDA and IPUoF.

  • T42560: Added JSON support to gc-flops.

  • T42632: Improved IPUoF logging format.

  • T42708: When an IPU-Link configuration is supplied during attach, the IPU will be reset.

  • T42822: Link training failures report “Link Training Error” rather than “setupChassis failed”.

  • T43014: Fix lockup when running with GCDA_MONITOR over IPUoF.

PopART Changelog

2.4.0+2529

New features

  • Remove optional downcasting of ‘gs’ in the OptimizerDecompose Pattern, so the atomic scalar tensor is always in FP32

  • Add a new SessionOption ‘ensureFp32LossScaleTensor’. If your optimizer uses loss scaling and your model produces an FP16 loss tensor, enabling this SessionOption means that the loss scale tensor will be an FP32 tensor, and will be combined with FP16 activations as late as possible to produce the first FP16 gradients

  • Implement IncrementModOp which does y = (x + i) % m efficiently

  • Add DynamicSliceInplaceOp to update an existing slice from a larger tensor

  • Add Ir::removeIsolatedGraphs method to prune unused graphs

  • Add outplace version of RemoteLoadOp (the original version is now called RemoteLoadInplaceOp)

  • Add a way to connect Poplar HostFunction callbacks to a session. These HostFunction programs can be added via custom ops

  • Add new API methods DeviceManager::tryAcquireAvailableDevice and DeviceManager::tryAcquireDeviceById that return a nullptr if no device is acquired

  • Make the MatMulPattern, MatMulLhsGradPattern and MatMulRhsGradPattern patterns mandatory (they cannot be disabled)

  • Remove use of Poplar’s ‘planMinimisationTarget’ option

  • Set Poplar engine option ‘target.deterministicWorkers’ based on session options

  • Improvements to RNG state handling

  • Update PyTorch version in requirement files

  • Add additional test graphs

  • Add support for updating the available_memory_proportion of an operator

  • Use the PopLibs slice planner across PopART operators: Gather, Scatter, and ScatterReduce and their gradients

  • The environment variable POPART_CACHE_DIR can be used to enable model caching and set the cache directory

  • Implement constant folding for ReduceProd operator

  • Use buffering depth settings for device-to-host streams

  • Implement executeOpNTimesEveryMTimes

  • Add accessor for optimiser state tensors

  • Adding outlining information to debug context of Call operations

  • Make topk return an int32

Bug Fixes

  • Fixed an issue where gradient clipping introduced cycles in the graph

  • Fix loading from a serialized executable when the Ir object passed to popx::serialization::deserializeExecutable has already called its addAdditionalModelProtoTensors method

  • Allow a ReduceGradOp to change its output tensor type after construction

  • Enable and fix dependency-free fallback for tensor layout creators

  • Add missing updaterScaleOp→settings.optimizerOp for TensorFlow-like RMSProp in PopART

  • Fix ElementWiseBinaryBaseOp::getReplicatedTensorShardingIndices() for broadcast case where one tensor is already sharded

  • Fix Regions::flatIndex and dimIndex for non-full shapes

  • Change debug names of tensors when lowering to Poplar so that PopVision displays them correctly

  • Add missing ResizeGradOp::clone() implementation

  • Change final to override where required by custom Ops

  • Add missing clone function to AddArg*Grad Ops

  • Fix bug in AliasZeroCopy::disableDeadCodeNodes where disabled nodes were still considered as live

  • Remove cast in SparseAccumulate allowing PopLibs to select a specialisation based on dtype

  • During build, force FindPython to always pick virtualenv Python, if there is one

  • Assign output of cloneNcopy to a variable

  • Add owned_attributes to Attributes

  • Fix ReduceOp::setup to not accept indices outside the specified range

  • Fix get loss scale in loss scale update op

  • ConvTranspose Op now has a valid gradient: models using transpose convolution now train correctly

  • Convolution now supports a truncated kernel which can occur when calculating a gradient of a convolution in some cases

  • CopyVarUpdate Op now succeeds in obscure cases in which the tensor inputs are not parallel writable

  • Regenerate generated files on new build

  • Fix for LeakyReLU not working in FP16

  • Robustness improvements to remote tensor sharding

  • Add missing accumulatorPrefs to reservedPrefixes()

Optimisations

  • Prevent recomputation of ops in the final forward PipelineStage along one ‘path to the loss’ when an op along another path is set to RecomputeType::Checkpoint

  • Clean up LoopOp and loop body graph input/output indexing

  • Improve inheritPlacementAttributes to extend searching Op attributes across graphs

  • Add connectInTensorLike function to simplify connecting of IpuCopyOps

  • Speed up topocons with large graphs, improving overlapped IO graph compilation time

  • Custom op example compiles faster after removing unnecessary compiler option from Makefile

  • Use LossScaleUpdateOp with sum operation

  • Use updated Poprithms scheduling API

Logging and documentation

  • Document getCollectiveLinkedGroup

  • Fix doc identifier for IncrementModOp

  • Document Shape type

  • Document Region type

  • Document RemoteLoad operation

  • Document RemoteStore operation

  • Updated documentation of dataflow, loop, mainloops and subgraphoutline

  • Improve formatting of Python documentation

  • Improve documentation for ReductionType and MeanReductionStrategy enum types

  • Add sections for documenting limitations and added current Clip-11 limitation

  • Improved error message when not providing constant min/max thresholds for Clip11 Op

  • Minor corrections to PopART C++ API documentation

  • PopART C++ API Doc: Fixing availableMemoryProportion reference documentation

2.3.0

New features

  • Add requirement files

  • Support AnchorReturnType::Sum in MainLoops transform

  • Modify addLoop(Input|Output) to automatically adjust the “modifies” & “aliases” maps when a new input/output shifts indices

  • Add a runtime_error class to PopART & change errors that lead to Poplar engine calls to this new error type

  • Add constructors for errors with IDs

  • Add swish activation function op

  • Add ability to use copyvarupdate ops from the builder

  • Allow truncation support for conv ops to permit certain convtranspose ops

  • Support convtranpose when the calculated padding is negative

  • Add enableConvDithering convolution option that might help tile balance

  • Add support for setting available_memory_proportion for Gather, Scatter, and ScatterReduce

  • Changed Builder::setAvailableMemoryProportion to allow setting the available_memory_proportion on any operator.

  • Changed Scatter to use popops::multiSlice which improves tile utilisation

  • Add Python bindings for Builder::virtualGraph(const std::set<TensorId> &, int64_t) and Builder::getVirtualGraph(const std::set<TensorId> &)

  • Return Mk2 device by default when creating offline device.

  • MeanReductionType: Specify how gradient should be mean reduced across gradient_accumulation and replication

  • Support RTS with CommGroups for RTS-128

  • Add support for overlapped host to device IO

  • Add shape dimension check for when no InputShapeTensor is provided

  • Add lossScaleUpdateFactor to onnx model definition

  • Add RecomputeAll mode

  • Add support for RunningMean in TiedGatherPattern

  • Add compatibility with zsh shell

Bug Fixes

  • Replace unnecessary custom functions with library calls

  • Make the error in Tensor::setTensorData() an internal_error & improve the error message

  • Remove unused forward declarations in popart.cpp

  • Fix saving/restoration of conv parameters which could affect ops in some instances

  • Fix dynamicslice gradient calculation in situations where there is just one gradient input (no sum)

  • Fix missing gradient in SoftSign operator

  • Fix for DynamicSlice when axis_size % slice_size != 0

  • Fix for DynamicUpdate when axis_size % slice_size != 0

  • Fix log module under which Ir preparation compile time breakdown is logged (is module Ir now)

  • Fix Softplus gradient calculation

  • Fix various typos

  • Allow outplace round op.

  • LSTMOp, only create new pass through tensor if not already present.

  • Fix segfault issue in ReverseOpx.

  • Fix potential bug in LSTMGradOp::gradInputInfo.

  • Guarantee that an onnx node’s output tensor id exists in the model’s value info field by the end of shape inference

  • Correct some shape methods to use the correct type alias

  • Make Ir in Session a shared_ptr for easier use in Python

  • Do not allow matrix multiplication serialization factor to be ⇐ 0 when doing serialization.

  • Improve speed of Pipeline::setFinalFwdStageRecomputation

  • Fix maxWeightNorm=0 behaviour for Adam based optimizers

  • Fix replication_factor >1 issue when explicit host copy ops are enabled

  • Fix random seed compatibility with useHostCopyOps=True

  • Fix use of internal aliases in pipelined IpuCopyOpx

  • Fix replicaGroupSize for the ReplicatedAllGatherTest_CommGroup_All test

  • Fix to ClipWeightGradientsByNorm with sharding

  • Add Clip_11 to the verification list for ElementWiseUnaryOutplaceOpx

Optimisations

  • Avoid the need for an extra ‘Scale’ operation in the graph with non-constant loss scaling in the optimizer

Logging and documentation

  • Update tests/popart/README.md info on setting test devices to have correct paths

  • Correct tensor location log for optimizerStateTensorLocationSettings.location

  • Describe shared_ptr usage for session.ir

  • Document overlapped IO and RTS

  • Add documentation for PyStepIO, PyStepIOCallback and StepIOCallback

  • Add documentation for popart.ir to Python API docs

  • Reduce the amount of logging at level ‘info’

  • Align PopART’s log format with Poplar, GCDA, V-IPU, etc.

  • Add mode options to setSerializeMatMul() docstring

PopTorch Changelog

2.4.0+40669

  • Support for deepcopy functionality in poptorch.Options class

  • Added functionality to add a name scope for each operator present in the module. This function is enabled by default. It can be disabled using poptorch.Options.disableModuleNamescope.

  • Support for a greater number of convolution and transpose convolution parameters including those which result in input/kernel/output truncation, either for inference (transpose) or gradient calculation.

  • Migrated to PyTorch version 1.10.0

  • Support for gradient clipping by norm in poptorch.optim optimizers

  • Support saving and restoring internal optimiser state with PopTorch optimisers via optimizer.state_dict() and optimizer.load_state_dict()

  • Add removeBlocks function to remove block annotations from a Model / Layer.

  • Support for CPU ops using poptorch.CPU.

  • Support for im2col.

  • Make optimizers work with LR schedulers.

  • Switched to gold linker by default.

2.3.0

  • Support for torch.bitwise_and, torch.bitwise_or, torch.bitwise_xor

  • Support for torch.logical_and, torch.logical_or,

  • Support K-dimensional NLLLoss, K-dimensional CrossEntropyLoss

  • Support for non-default affine parameter flags in normalisation ops

  • Support for torch.Tensor.T

  • Support for torch.bool in torch.zeros, torch.zeros_like, torch.ones, torch.ones_like

  • Support for torch.scatter and its in-place variant

  • Support for in-place modification to buffers on IPU

  • Support for taking slices of scalars

  • Support version of bilinear upsampling specifying intended output size instead of scale factors

  • Add support for overlapping host IO on inputs via poptorch.set_overlap_for_input.

  • Add option for setting number of IO tiles via numIOTiles in poptorch.Options (required for poptorch.TensorLocationSettings.useIOTilesToLoad and poptorch.set_overlap_for_input.)

  • Improve PopTorch’s parity with PyTorch’s Softplus

  • Improve implementation of torch.SiLU by using Poplar’s Swish operator

  • Additional support for operation overloads

  • Fix issue where PopTorch recalculated upsampling scales in fp16

  • Fix issue where the last use of poptorch.set_available_memory would be pruned

  • Add documentation on available memory proportion to incorporate embeddings and indexing operations

  • Add documentation on how users can generate debug information

  • Support replicated tensor sharding when running on multiple processes

  • Allow selection for a non-constant x input.

  • Support for enableConvDithering convolution option

Poplar Changelog

2.4.0+2151

New features

  • Extended memory (greater than 16GB) for remote buffers in Poplar

  • Allow users to create a target for predefined Graphcore machines (eg. IPU-M2000)

  • Compile time improvements for key models

  • Compressed the Poplar executable

  • Added “Host Function” program: a new type of host exchange for embeddings

Bug fixes

  • Host memory at the end of compilation was not the same as it was at the start

  • Fixed segmentation fault when using host-to-device ring buffer with rearrangement on host

  • Fixed bug where findUnbroadcastTensor gives incorrect result for a concatenation of a broadcast tensor and a non-broadcast tensor

  • No exception was thrown when reconfigurable partition and Poplar config mismatch with many instances

  • Fixed a bug which limited the size of the GP files

  • Fixed a bug when creating a GP file from vertices in separate source files with the same field name

  • Made load time relocations deterministic to avoid a race condition

  • Fixed a bug where contiguous PrintTensor statements were being printed in reverse order

  • Use GCDA when handling multiple HSPs so that the PVTI events are generated correctly

  • Removed a case of undefined behaviour in merge variables when there are no merge candidates

  • Fixed a bug where you would get a non-const pointer for an Input field in a codelet

  • Fixed error in code example in Poplar User Guide

  • Fixed an error in the Poplar User Guide where wrong values were used for size/alignment of float vectors

Other improvements

  • Added an optimisation to inline nested calls

  • Added support for source destination tensor with different layout in CrossReplicaCopy’s

  • Generate Graph report after compilation

  • Add support for a new LOOP program in Poplar for an endless loop on the device

  • Extend NextSyncId analysis to build a nextSyncId table for each programId

  • Added support for safely stopping an Engine that has not finished running a program

  • Provided a way to set the host sync time out at a smaller granularity than 1 second.

  • Improvements to the new Poplar backtraces

  • Outline MultiVertex supervisor stubs

  • Added an optimisation pass to eliminate no-op WriteUndefs during lowering

  • Added mirrorFence(N) support to Poplar

  • Optimised the overhead for code copies when groups of exchanges in a sequence are all outlined

  • Included Poplar hash in the executable

  • Changed the default for deterministicWorkers to always work across replicas

  • Allow Poplar to trivially look ahead and process future sync points before the IPU reaches them

  • Documented which options can be changed at runtime via POPLAR_RUNTIME_OPTIONS

  • Lots of improvements reducing the host memory needed and the number of allocations during a compilation

  • Log all exceptions leaving Poplar

  • Added documentation for what kind of vertex members that are valid

  • Documented the restrictions on creating remote buffers to Poplar users on IPU-M2000 platforms

  • Added float16 and float32 as type aliases in Poplar

2.3.0

New features

  • Output a backtrace on tile errors that identifies the program the error occoured in

  • Add initial support for having more than 16 GiBs of remote buffers per IPU, controlled by the target.extendedMemory option

  • Add support for UNSIGNED_LONGLONG and LONGLONG data types

  • Add optional needAlignWorkers vertex field to specify whether a vertex requires worker alignment when target.deterministicWorkers is enabled

Bug fixes

  • Fix TEXCPT_INVALID_ADDR exception when using remote buffers

  • Fix host sync timeout error with large stream copy

  • Fix divergent control flow error when profiling a program containing a switch

  • Fix incorrect copy code generation when an Input field and an InOut field are connected to the same tensor

  • Add additional system analyser trace operations to ensure all host I/O operations are included in the trace

  • Fix poplar::Graph::findUnbroadcastTensor returning an incorrect result for a partially broadcast multi-dimensional tensor

Other improvements

  • Add optimisation to improve performance of stream copies in loops

  • Reduce host exchange code size

  • Improve variable allocator so more applications fit in memory

  • Reduce code size and runtime overhead of enabling target.deterministicWorkers

  • Improve graph compilation speed

  • Optimise compiler data structures to reduce host memory usage

  • Improve efficency of stream copies on IPU-M2000 systems in applications with many IPUs per replica

  • Update the format of Poplar logging messages to be consistent with tools such as VIPU

  • Improve speed of writing profile information to disk, particularly on network file systems

  • Extend optimisation that duplicates compute to avoid exchange so it works in more cases

  • Include the tile a memory parity errror occoured on in the error message attached to the exception

  • Extend the Abort program to take an optional message

  • Make poplar::Graph::getTileMapping optionally return incomplete tile mappings for tensors that may lay outside the virtual graph

  • Optimise remote buffer and stream read and write bandwidth on IPU-M2000 systems

  • Support setting some engine options at runtime so they can be changed without recompiling the graph

Poplar Libraries Changelog

2.4.0+2151

New features

  • A new slice planner for faster embeddings

  • Extended popops to support embeddings where the indices are known at compile time

  • Added support for the Error Function (ERF) to PopLibs

Bug fixes

  • Fixed all compiler warnings that were in the public headers

  • Fixed a bug where only a single MultiVertex instance was generated for some elementwise operations

  • Avoided possible overread in CTC Inference codelet

Other improvements

  • Added a method to validate convolution and matmul options

  • Removed zeroing of output for input channel serial splits

  • Added structured rearrangements for fwd/gradA layers

  • Improved the documentation of the normalisation functions

  • Added an option to allow runtime bounds checking of embedding indices

  • Documented the partial type for convolutions

  • Added an optimisation to try to fuse the constituent parts of a mean function into a scaled reduce

  • Added new SLIC and VMAC vertices that generate more efficient exchange code

  • Specialised map expressions with a scalar multiply of type float and a tensor of type half to scaledAdd

  • Incorporated identity operations into element-wise expression optimisations

  • Added a partials type to ADD operation in multiUpdate

  • Optimised the memory overhead of the Reduce vertex state and improved the speed by creating fused vertices for scalar operations

  • Use the new rptsize_t type in the elementwise codelets

  • Dither reductions across tiles that are created with the reduceMany API

  • Improved the performance of the log1p vertex

2.3.0

New features

  • Improve performance of LSTM and GRU operations with a variable sequence length

  • Introduce a new variant of popnn::lstmBwd and popnn::lstmBwdWithWU that can output both the gradient of the output and the gradient of the cell state.

  • Many performance improvements for multiSlice, multiUpdate, multUpdateMax operations

  • Add popops::regroupIfPossible function which only regroups a tensor if it can be done efficiently

Bug fixes

  • Fix bug that caused element-wise operations to sometimes unnecessarily use a less efficient implementation that took more memory

  • Fix bug in pooling that gave incorrect results when the stride was larger than the kernel size

  • Emit an error if the value of the availableMemoryProportion option is less than 0.0

  • Fix bug in mixed precision popops::mulInPlace that caused it to error when passed a scalar tensor

Other improvements

  • Extend unary and binary element-wise operations to support long long and unsigned long long types

  • Extend popops fill operation to support long long and unsigned long long types

  • Extend map expression with support long long and unsigned long long types

  • Extend dynamic slice and update with support long long and unsigned long long types

  • Support cast operations on char types

  • Dither the tile mapping of different RNN operations to improve memory balance across tiles

  • Update the format of PopLibs logging messages to be consistent with logs produced by other tools such as VIPU

  • Reduce memory usage of code generated for fused popops map expressions

  • Update popops::multiUpdateAdd to support a scale tensor of type float when the tensor to update has type half

  • Reduce code size by sharing some common code between different popops codelets

  • Dither the tile mapping of a temporaries used in reduce and convolution operations to improve memory balance across tiles

  • Dither the tile mapping of convolution weights and biases to improve memory balance across tiles

  • Add variant of dynamic slice that writes the result to a tensor passed in as an argument

  • Improve performance of element-wise negate operation

  • Improve memory usage of popops scaled add operation by reducing vertex state

  • Reduce vertex state required for large convolutions

  • Use faster bitonic sort implementation for sortKeyValue and sortKeyValueInPlace APIs in more cases

GCL Changelog

2.4.0

New features

  • Added two-phase AllGather support (AllGather over GW-Links)

  • Exposed ReduceScatter and AllGather with many input tensors

  • Added support for non-commutative SQUARE_ADD reduction operator

  • Added handling for wide-only AllReduces

Bug fixes

  • Added a check for IPU number when using DNC

  • Fixed multiple narrowing bugs

  • Fixed warning about serial reductions

  • Invalid CommGroup::replicaGroupSize now throws

Other improvements

  • Fixed zero padding for CBR tensors mapped to only one tile

  • Added CommGroup to log messages

  • Introduced logging modules

  • Zero-padding the CBR tensor before using it for reductions

  • Added grain size to each replica in tensor created for CBR

  • Input tensor is now checked for optimised layout

2.3.0

New features

  • Extended exceptions and error handling

  • Added parallel multi-tensor collectives

  • Non-replicated collectives (TensorCollectives) moved from PopLibs to GCL

  • Three-phase orthogonal allReduce support

Bug fixes

  • Fix for unaligned tensor splitting

  • Fix for building with gcc -Og/-O1 flags

  • Fix for replicatedReduceScatter Local reduction with CommGroup

Other improvements

  • Change bdcast to broadcast in debug strings

  • Added internal_tests component for exporting collective tests

  • Unified all CMakeLists to use CMake 3.18.4

  • Syncless topologies are now verified before use

  • Single tensor replicatedAllReduceWithOutput() dispatches the multi-tensor one

  • Replicated collectives functions get CrossReplica suffix

  • Added error code linter

  • Added multi-tensor allReduceInPlace()

  • Step counter code is made optional

  • Described the logical GCL topologies in the documentation

  • Using optimised broadcast reduction on SWNC when possible

  • Various improvements to the test framework

PopDist Changelog

2.4.0

New features

None.

2.3.0

New features

None.

PopRun Changelog

2.4.0

New features

  • Added support for automatically generating executable cache path when multiple hosts are specified. Generated cache path will be removed when the process exits or fails

  • Enabled --tag-output by default. This option can now be omitted from --mpi-global-args. To turn the feature off, pass --tag-output=no.

  • Enabled --allow-run-as-root by default. This option can now be omitted from --mpi-global-args. To turn the feature off, specify --allow-run-as-root=no.

  • Passed POPLAR_ENGINE_OPTIONS to all instances by default. This feature cannot be turned off.

  • PopRun now unsets IPUOF_CONFIG_PATH before launching instances

2.3.0

New features

  • Export some commonly used environment variables by default. The environment variables PATH, LD_LIBRARY_PATH and PYTHONPATH are exported by default to all instances. Passing them to --mpi-local-args="-x ENV_VAR" is no longer needed.

  • Add support for Slurm hostlists. The --host argument now supports the Slurm hostlist syntax. For example, host[1-3,5] will expand to host1,host2,host3,host5.

  • Pick up configuration options from Slurm. The number of instances, replicas, IPUs per replica and the available hosts are picked up from Slurm environment variables if they exist. If an option is provided both by a command-line argument and by Slurm, the command-line argument take precedence.

  • Allow disabling executable caching. The executable cache can be disabled by passing an empty string using --executable-cache-path "".

  • If there is only a single V-IPU partition available, it will now be used automatically without the need for specifying its name using --vipu-partition.

  • Increase default V-IPU server timeout. The default value of --vipu-server-timeout is now 120 seconds.

  • The new argument --only-stdout-from-instance allows suppressing the standard output from all instances except the given one. This is different from the existing --only-output-from-instance in that it allows standard error from all instances.

Libpva Library Changelog

2.4.0

New features

  • Added the Python str to all the libpva objects.

  • Added C++ operator<< methods to all libpva objects.

  • CodeCopy program has a new property to get the list of variables copied.

  • Added new API to get the Poplar Engine options for compilation and execution.

  • Added the id, name and parent properties to the DebugContext.

Bug fixes

  • None

2.3.0

New features

  • Added equality operators for Programs so they can be used as keys in maps or sets

  • Added new APIs to query the vertex instances by tile. Previous we reported number of vertices, but now you will be able to determine which tiles they are on.

  • Added support to show the dwarf memory category.

  • Added an API to query which variables as associated with a debug context.

  • Added CodeCopy as a new Program type.

  • The ProgramVisitor now has a default handler visitProgram.

  • Added an API function on the Program to report how much control code on each tile is used.

Bug fixes

  • None

2.2.0

New features

  • Add APIs to get liveness information from the compilation report.

  • Add APIs to the lowered variable information from the compilation report. See LoweredVariable

  • The openReport API now optionally takes the debug.cbor input file.

  • Add APIs read the DebugContext information from the debug.cbor and associated with programs and variables.

  • The documentation for libpva has been moved from the Poplar user guide to a standalone user guide.

Bug fixes

  • Fix issue with lists with more than 2 to the power 16 elements being truncated.

  • Fixed issue in Python binding that prevent access to VertexInstances & ComputeSets

TensorFlow Changelog

2.4.0

New features

  • Added an implementation of ipu.cross_replica_ops.cross_replica_mean() to provide better numerical stability.

  • Exposed set_infeed_queue_options and set_outfeed_queue_options functions for Sequential and Functional Keras models to allow configuration of IPUInfeedQueue and IPUOutfeedQueue.

  • Performance improvements for scatter and gather operations with static indices.

  • Added an IPU optimised implementation ipu.math_ops.segment_sum to perform a sorted segment sum with a fixed number of segments.

  • Exposed available_memory_proportion for Keras RNN Layers.

  • Allowed the gradient_accumulation_count parameter of ipu.pipelining_ops.pipeline to be a runtime value instead of a constant to allow dynamic batch sizes.

  • Added support for TensorFlow 2 Keras API using popdist and poprun.

  • Optimisations for the tf.random.shuffle operation.

Bug fixes

  • Reduced the runtime overhead when iteratively calling fit(), evaluate() or predict() on a Keras model.

  • Compile time improvements.

2.3.1

New features

  • Extended the IPU embedded runtime to perform an IPU reset automatically if possible when a recoverable exception occurs. See Error Handling in IPU embedded application runtime section in the documentation for full details.

Bug fixes

None.

2.3.0

New features

  • Improved performance of concurrent pipeline stages.

  • Migrated codebase from TensorFlow 2.4.1 to TensorFlow 2.4.3.

  • Performance optimisations when using replicated_optimizer_state_sharding option with pipelining or GradientAccumulationOptimizerV2.

  • Added IPUConfig.optimizations.math for controlling arithmetic optimisations of the model compilation.

  • Improved integration with the Graphcore PopVision System Analyser.

  • Add support for hooks with IPUPipelineEstimator.

  • Compile-time and run-time optimisations.

  • PopLibs options can be specified for slice operations via the IPUConfig.slices.poplar_options config option and via PipelineStageOptions.

Bug fixes

  • Pipelined Keras models now correctly set the training argument passed to the Keras layers in the model.

  • Improved performance (latency and throughput) of callbacks when using asynchronous_callbacks.

  • EffectiveTransformer weight initialisers have been exposed to the user.

  • IPU specific Keras layers can now be serialised to allow the model to be saved and restored.

IPU TensorFlow Addons Changelog

2.4.0

New features

  • Initial release.

  • Implementation of the SGD, Adam and LAMB optimizers with IPU specific features to improve model performance.

Bug fixes

None.

Known issues

The following section will detail known issues in v2.4.0.
Each product will be detailed separately.

Product

Paragraph

Driver & Utilities

Driver & Utilities known issues

PopART

PopART known issues

PopTorch

PopTorch known issues

Poplar

Poplar known issues

Poplar Libraries

Poplar Libraries known issues

GCL

GCL known issues

PopDist/PopRun

PopRun/PopDist known issues

Libpva Library

Libpva Library known issues

TensorFlow

TensorFlow known issues

IPU TensorFlow Addons

IPU TensorFlow Addons known issues

Driver & Utilities known issues

1.0.57

None.

1.0.55

None.

PopART known issues

2.4.0+2529

None.

2.2.1

None.

PopTorch known issues

2.4.0+40669

None.

2.3.0

None.

Poplar known issues

2.4.0+2151

None.

2.2.0

None.

Poplar Libraries known issues

2.4.0+2151

None.

2.2.0

None.

GCL known issues

2.4.0

None.

2.3.0

None.

PopDist known issues

2.4.0

None.

2.3.0

None.

PopRun known issues

2.3.0

None.

Libpva Library known issues

2.4.0

None.

2.3.0

None.

2.2.0

None.

TensorFlow known issues

2.4.0

  • Using mixed_precision.Policy('mixed_float16') with pipelined Keras models results in compilation errors.

  • The experimental_normalize_gradients feature of TensorFlow 2 can produce unstable results when the number of replicas or the gradient_accumulation_steps_per_replica is large.

2.3.1

No new known issues.

2.3.0

  • Using mixed_precision.Policy('mixed_float16') with pipelined Keras models results in compilation errors.

  • The experimental_normalize_gradients feature of TensorFlow 2 can produce unstable results when the number of replicas or the gradient_accumulation_steps_per_replica is large.

  • Using TensorFlow operations instead of Keras layers inside of a pipelined Keras model definition can result in a compilation error when one of the inputs is a constant.

IPU TensorFlow Addons known issues

2.4.0

None.

Compatibility changes

The following section will detail compatibility changes in v2.4.0

Product

Paragraph

Driver & Utilities

Driver & Utilities compatibility changes

PopART

PopART compatibility changes

PopTorch

PopTorch compatibility changes

Poplar

Poplar compatibility changes

Poplar Libraries

Poplar Libraries compatibility changes

GCL

GCL compatibility changes

PopDist/PopRun

PopRun/PopDist compatibility changes

Libpva Library

Libpva Library compatibility changes

TensorFlow

TensorFlow compatibility changes

IPU TensorFlow Addons

IPU TensorFlow Addons compatibility changes

Driver & Utilities Compatibility changes

1.0.57

None.

1.0.55

None.

PopART Compatibility changes

2.4.0+2529

  • [API] Deprecate behaviour whereby methods DeviceManager::acquireAvailableDevice and DeviceManager::acquireDeviceById return a nullptr if no device is acquired

  • [API] Remove debugPrefix methods

  • [API] Remove use of GCL_NUM_IO_TILES

  • [API] Remove use of deprecated method snap::program::Sequence::add(poplar::program::Program)

  • [API] Remove deprecated MeanReductionStrategy::PostAndLoss option

  • [API] Remove setting perExecutionStreamCopyCycles

2.3.0

  • [API] Remove HostReduce

  • [API] Remove PreAliasPatternType

  • [API] Remove deprecated grouped matrix multiplication option

  • [API] Deprecate some pattern constructors

  • [API] Deprecate unused environment variables

  • [API] Remove explicit pipelining flag

PopTorch Compatibility changes

2.4.0+40669

  • Deprecated poptorch.Options.anchorMode in favour of poptorch.Options.outputMode

  • Deprecated poptorch.Options.defaultAnchorMode in favour of poptorch.Options.defaultOutputMode

  • Deprecated poptorch.AnchorMode in favour of poptorch.OutputMode

2.3.0

  • Default mean reduction strategies have changed from the deprecated PostAndLoss strategy to Post or Running based on optimiser accumulation type

  • Mean reduction strategy can now be set via poptorch.Options.Training.setMeanAccumulationAndReplicationReductionStrategy.

  • Add warning that IPU-specific optimiser states cannot be read from the host, when calling get_state() on poptorch.optim optimisers

Poplar Compatibility changes

2.4.0+2151

  • Support for non-top-level replicated graphs has been removed

  • The opt.enableSwSyncs option has been removed

2.2.0

  • The “device” value for the engine option debug.computeInstrumentationLevel has been deprecated

  • The methods poplar::Graph::createReplicatedGraph and poplar::Graph::getNonReplicatedTensor have been deprecated. Use the top-level replication API instead.

Poplar Libraries Compatibility changes

2.4.0+2151

  • TensorCollectives.hpp was moved to GCL and the methods: popops::allReduce, popops::allGather and popops:reduceScatter were replaced by gcl::allReduceWithinReplica, gcl::allGatherWithinReplica and gcl::reduceScatterWithinReplica

  • The following methods were deprecated:

    • popops::sort, popops::sortInPlace instead use popops::topK()

    • popops::sortKeyValue, sortKeyValueInPlace instead use popops::topKKeyValue()

2.3.0

  • The following methods from TensorCollectives.hpp have been deprecated:

    • popops::allReduce instead use gcl::allReduceWithinReplica

    • popops::allGather instead use gcl::allGatherWithinReplica

    • popops:reduceScatter instead use gcl::reduceScatterWithinReplica

  • The versions of popnn::lstmBwd and popnn::lstmBwdWithWU with an optional parameter to output only the gradient of the cell state have been deprecated. Use the new versions that output both the gradient of the output and the gradient of cell state instead.

GCL Compatibility changes

2.4.0

  • The following methods have been removed from the public API:

    • popops::allReduce(), popops::allGather and popops:reduceScatter (replaced by gcl::allReduceWithinReplica(), gcl::allGatherWithinReplica() and gcl::reduceScatterWithinReplica())

2.3.0

  • New methods have been added to replace the deprecated ones in Poplibs:

    • gcl::allReduceWithinReplica(), gcl::allGatherWithinReplica() and gcl::reduceScatterWithinReplica() should be used instead of popops::allReduce(), popops::allGather and popops:reduceScatter from TensorCollectives.h

  • The following APIs are deprecated:

    • gcl::allReduce() instead use gcl::allReduceCrossReplica,

    • gcl::allGather instead use gcl::allGatherCrossReplica

    • gcl::reduceScatter instead use gcl::reduceScatterCrossReplica

  • Internal function getNumXBsUsed() was removed from the public API

PopDist Compatibility changes

2.4.0

None.

2.3.0

None.

PopRun Compatibility changes

2.4.0

None.

2.3.0

None.

Libpva Library Compatibility changes

2.4.0

  • None

2.3.0

  • None

2.2.0

  • There has been a change to the classes for liveness. — Instead of programStep.notAlwaysLiveBytes you now have to use programStep.notAlwaysLiveMemory.bytes — Instead of programStep.notAlwaysLiveVariables[x].name you now have to use programStep.notAlwaysLiveMemory.variables[x].name

TensorFlow Compatibility changes

2.4.0

  • IPUMultiReplicaStrategy has been renamed to PopDistStrategy.

  • See the API changes section in the TensorFlow documentation for full details.

2.3.1

None.

2.3.0

  • Custom user op metadata interface update - the metadata interface for custom user ops has been updated with an additional parameter.

  • See the API changes section in the TensorFlow documentation for full details.

IPU TensorFlow Addons Compatibility changes

2.4.0

None.

Appendix

Appendix A : Additional requirements

PopVision Graph Analyser

  • To be able to view profiling reports generated by SDK v2.4.0, PopVision Graph Analyser v3.2.0 or later and PopVision System Analyser v2.2.0 or later are required.

TensorFlow

To correctly execute TensorFlow code please ensure:

Intel platforms

  • Use Python 3.6 as minimum version

  • A CPU compatible with the AVX-512 instruction set is needed.

AMD plaforms

  • Use Python 3.6 as minimum version

  • A CPU compatible with the Znver1 instruction set is needed.