Scope of this document

This document contains the Release notes for the Poplar SDK for Graphcore’s IPU product family. The software deliverables covered by this document are the following:

Driver & Utilities

Driver and associated utilities needed by the Graphcore IPU.

PopART

The Poplar Advanced Run Time is a flexible ONNX-compatible runtime supporting both training & inference.

PopTorch

The PopTorch library provides a set of extensions for PyTorch to enable it to run on the Graphcore IPU hardware.

Poplar

A graph programming framework for the IPU.

PopDist/PopRun

Poplar Distributed Configuration Library (PopDist) is a library for configuring and coordinating distributed execution of (large-scale) machine learning applications.

TensorFlow

An implementation of the TensorFlow framework for the Graphcore IPU.

Package contents

The downloaded unified Poplar SDK will contain the following packages:

Ubuntu 18.04

Package

Version

Driver & Utilities

1.0.50

PopART

2.0.0

PopTorch

2.0.0

Poplar

2.0.0

PopDist/PopRun

2.0.0

TensorFlow 1

Graphcore TensorFlow 2.0.0

TensorFlow 2

Graphcore TensorFlow 2.0.0

CentOS 7.6

Package

Version

Driver & Utilities

1.0.50

PopART

2.0.0

PopTorch

2.0.0

Poplar

2.0.0

PopDist/PopRun

2.0.0

TensorFlow 1

Graphcore TensorFlow 2.0.0

TensorFlow 2

Graphcore TensorFlow 2.0.0

Note

See Appendix A for TensorFlow additional requirements.

Product support and compatibility matrix

SUPPORTED
These products are actively worked on: they will receive new features, general updates and security updates.
Notice of deprecation will be sent in advance for supported products.
DEPRECATED
These products will only receive security updates.
These products are expected to work with the indicated products however correctness is not guaranteed.
It is advised not to upgrade to this software version, unless strictly necessary.
In the future, these products can move to a Not Supported state, without further notice.
Support level will reflect the deprecated status.
NOT SUPPORTED
These products are not expected to work with this release.
No support will be provided.

Important

Deprecated products can be moved to a Not supported status without further notice.

IPU-M2000 System Software compatibility matrix

IPUM Model

Version

Support level

Notes

IPU-M2000 300-0024

2.0.0

Supported

N/A

IPU PCIe Hardware Support level

Model

Revision

ICU Firmware version

Driver version

Support level

Notes

C2 300-0004

All revisions

1.4.14

1.0.50

Not supported

Note

Use Firmware revision in accordance with IPU revision.

Important

For Firmware revision, compatibility is only enforced for patch versions.

Driver Support level

OS

Support level

Supported Kernel Version

Notes

CentOS 7.4/7.5

Supported

3.10

CentOS LTS kernel.

CentOS 7.6

Supported

3.10

CentOS LTS kernel.

Microsoft Windows

Supported

Windows Server 2019

Ubuntu 18.04

Supported

4.15

Ubuntu LTS kernel.

Warning

It is strongly recommended to update the kernel module of the driver to the version included with this release.
This is to avoid incompatibilities with the non-kernel components of this SDK

SDK Support level

OS

Support level

Notes

Microsoft Windows

Not Supported

CentOS 7.6

Supported

In some specific instances we encountered a less then optimal model compilation time. Investigations are ongoing to address the problem.

Ubuntu 18.04

Supported

Supported toolchain

Ubuntu 18.04

Tool

Support level

Version

GCC/G++

Supported

7.2.0

libstdc++

Supported

6.0.24

libc

Supported

2.27

binutils

Supported

2.30

CentOS 7.6

Tool

Support level

Version

GCC/G++

Supported

7.3.1

libstdc++

Supported

6.0.24

libc

Supported

2.17

binutils

Supported

2.28

Supported tools

Tool

Support level

Version

Python

Supported

3.6

Boost library

Deprecated

1.70

Older versions of boost are no longer supported.

List of changes

The following sections will list changes in version , as well as older releases, for all products contained in the Poplar SDK.
There are three main sections, divided by argument:
Changelogs

Changelogs section lists important bug fixes and relevant functionality that has been added. Minor fixes or features will not be listed.

Known issues

Known Issues section will list all important issues known to date. This section will list issues that will impact Poplar functionality.

Compatibility changes

Compatibilities changes section will capture any change that needs to apply existing code, to remain compatible with this version of the SDK.

Changelogs

Product

Changelog

Driver & Utilities

Changelog Driver & Utilities

PopART

Changelog PopART

PopTorch

Changelog PopTorch

Poplar

Changelog Poplar

Poplar Libraries

Changelog Poplar Libraries

PopDist/PopRun

Changelog PopRun/PopDist

Libpva Library

Changelog Libpva Library

TensorFlow

Changelog TensorFlow

Driver & Utilities Changelog

Kernel Module

1.0.50

  • T34412: Do not fail in driver_load.sh script when the driver is automatically loaded.

  • T28718: Added on chip memory recording. Visible via gc-monitor.

  • T29903: Implemented reset of IPU-M service tables.

  • T34199: Fix PCIe driver mailbox compatibility with Poplar SDK 1.4.

  • T33255: Driver now logs message when IPU clock has been throttled.

  • T32751: Added external memory info when available via IPUoF.

  • T32193: Improve resilience for ICU comms during ICU event notifications.

  • T31153: Fixed ICU async message handling.

  • T32313: Use Python 3 to generate PCIe driver defines.

  • T32162: Fix race in starting and stopping IPU sync utilisation kthread.

  • T31152: Added mechanism to clear ipu driver nlc_total_errcnt sysfs attribute.

  • T28692: Added IPU sync utilisation in PCIe driver.

  • T29859: Fixed access to multiple contiguous buffers in the PCIe driver.

  • T29750: Fix PCIe driver build for CentOS 8.2.

  • T28727: Add support to store total tile memory usages and export them as attributes over IPUoF.

  • T28231: Added support for reading AER error status from the PCIe driver.

  • T26628: Accumulate NLC correctible error count in sysfs.

Low level libraries and tools

2.0.0

  • T33422: MultiIPU bootloader support.

  • T32608: Use monotonic clock for duration calculations in GCDA and hw_testing.

  • T33121: Add --ipum-loopback mode to gc-iputraffictest, for testing IPU-M loopback links.

  • T33281: Extended gc-iputraffictest timeout when using the --all-links --amp options.

  • T32498: gc-iputraffictest: display per-board power details.

  • T31578: Add --pulseamp option to gc-iputraffictest.

  • T29899: Add environment variable GCDA_EXT_LINKS_TX_EQ to override default TX EQ setting for external links of M2000.

  • T31149: Add --amp option to gc-iputraffictest.

  • T28484: Don’t allow gc-hosttraffictest to run on a multi-IPU device.

  • T31247: PVTI documentation updates.

  • T24109: libpvti traces saved in sqlite format by default.

  • T30878: Added process ID into the PVTI log file name.

  • T29035: PVTI documentation updates.

  • T30033: Added PVTI python decorators.

  • T34775: Make IPU binary loading thread safe.

  • T28718: Added on chip memory recording. Visible via gc-monitor.

  • T29903: Implemented reset of IPU-M service tables.

  • T34199: Fix pcie driver mailbox compatibility with Poplar SDK 1.4.

  • T32751: Added external memory info when available via IPUoF.

  • T32911: Added GCDA functionality to enable IPU GW-Links.

  • T32193: Improve resilience for ICU comms during ICU event notifications.

  • T31153: Fixed ICU async message handling.

  • T32219: Fix to allow allocation of >16GB of streaming memory.

  • T23659: More control over which architectures are supported during software builds.

  • T14918: Display full path to executable in gc-monitor.

  • T29973: Fixed memory leak in the PCIe driver userspace library.

  • T28721: Update gc-monitor to add new -f/--field option to show tile-mem/ext- mem/sync utilisation at runtime, if available.

  • T34860: BTNC routing must be used with loop back cables.

  • T34711: Improved ICU timeout error handling.

  • T30788: Added POD sw version info into gc-monitor.

  • T32626: Fix GSD sync configuration.

  • T34365: Improved error message when failing to open a required target support library.

  • T31942: Fixed symbol name clash in the ICU source code vs Windows.

  • T28727: Add support to store total tile memory usages and export them as attributes over IPUoF.

  • T33779: Improved ICU mailbox failure recovery.

  • T33275: Clear the ICU mailbox on error so that it has a better chance of recovery.

  • T23314: Fixed ICU access regression in Windows.

  • T31440: Added --tile-overview to gc-info.

  • T31486: Add GCDA Board API and record information from all board power/temperature sensors.

  • T31912: gc-docker discovers Fabric devices.

  • T30500: Improvements and simplification of reset.

  • T32266: Added required double read of ‘volatile’ SERDES registers.

  • T31151: Correctable error count is displayed for Fabric devices.

  • T22666: Default to --ipc=host in gc-docker.

  • T28724: Add XML and CSV output formats to gc-monitor.

  • T30710: Fixed gc-info --phy-dump for M2000.

  • T31592: Improved debug logs during reset.

  • T31255: Ensure IPU reset is done before Newmanry reset.

  • T29684: Updated test tools to separate configure and attach via GCDA.

  • T30814: Fix clearing of .bss in tile memory.

  • T28716: Make gc-monitor more POD specific.

  • T29171: Device reset stability improvements.

  • T28805: Added support for gc-powertest over IPUoF.

  • T29033: Added support for configuring extra sync zones via GCDA.

  • T29444: Parameterise response time of tile exception notifications in GCDA RPC.

  • T30152: Removed unused ICU mailbox mutex.

  • T30090: Prevent GCDA users from being able to configure links when the IPU may be transmitting packets.

  • T30143: Guarantee a SoC and IPU reset happen before a parity reset.

  • T29036: Add an iterator in IPU arch info to access <name, base-address> of all instances of a hardware block.

  • T29931: Fixed disassembler output.

  • T14152: Extended info available via IPU arch info.

  • T34724: Fix rdma_disconnect to use the correct CM ID.

  • T33533: Improved IPUoF server shutdown.

  • T34144: Improved IPUoF error reporting on attach and detach failure.

  • T34223: Fix IPUoF server crash if application tries to release contiguous memory buffer before memory clearing has finished.

  • T33937: IPUoF HSP polling performance improvement.

  • T33406: Increased the maximum allowed V-IPU server timeout value.

  • T33424: Improved full SQ efficiency. Released in IPUoF server version 1.4.0 and later.

  • T31458: Avoid send queue overflow in Fabric client and server in non-polling mode.

  • T33067: Improved IPUoF server and client log messages.

  • T32493: set the max_send_sge correctly on the loopback QP.

  • T31710: Improved mirror_host_buffer/RDMA-write latency.

  • T30737: Ensure all user buffers are detached when a user process detaches.

  • T30998: Added an environment variable for setting V-IPU timeout.

  • T29870: Implemented PL-DDR clear at startup.

PopART Changelog

2.0.0

New features

  • Change optimizer to use globalReplicationFactor when distributed

  • Add error when trying replicated tensor sharding and global replication

  • Copy partialType in serializematmuls

  • Allow Offline device with distributed replication

  • Ensure enableEngineCaching is set before loading cache and handle loading errors

  • Allow Session to load binaries from serialized files

  • Update compileAndExport to use same file format as the executable cache

  • Fuse cache files and support multiple cache entries

  • Add ability to PopArt to build in a standalone Conda environment

  • Add DepthToSpace as a custom op

  • Add backwards op for atan2

  • Re-enable synthetic_data_test that requires HW

  • Add enhanced debug information to popart

  • Make multiconv op attributes optional

  • Add support for biased convolutions in popart MultiConv

  • Add numerous doxygen comments and comment improvements

  • Add SessionOption accumulationAndReplicationReductionType, which controls the reduction type of accumulated gradients and reduction of replicas

  • Stop using reduction option of loss operations to determine how loss gradients are reduced if accumulationReductionType is used

  • Add reverse operation in the aiGraphcore domain

  • Support input/output/anchor tensors of type int8 (including synthetic data)

  • Support stashing/restoring/IPUCopying an int8 input tensor when pipelining

  • Infer tensor to update in VarUpdateOps from input Tensors

  • Check early for CMake versions in the found dependencies

  • Improve model compile time by planning convolutions and matmuls in parallel

  • Use a more poplar tensor-expression efficient implementation of ResizeOp on non-neg integers

  • Add Connectionist Temporal Classification (CTC) loss

  • Add support for setting Poplar options for LSTM operations

  • Improve the debug information that is recorded in PopART

  • Improve the PopART user guide

  • Add ability to copy inputs/outputs to subgraphs in a just-in-time manner

  • Improve error message when unknown tensors are used in the builder

  • Add Sequence slice op working with packed sequences

  • Add custom op shape inference that doesn’t require onnx

  • Add environment variable, POPART_TRACE_TENSORS, as an alternative to PrintTensorOp

  • Support user added CallOps being used with pipelining

  • Improve error messages in the case of optimizer compatibility errors

  • Change PriTaskDependency to allow delaying picking tensor creators and express more complex forms of dependencies

  • Add LAMB optimizer off-chip test with mean reduction

  • Add support for ONNX ScanOp (forward-only)

  • Add global batch size test with batch serialisation

  • Implement operator support for AllReduce & ReduceScatter

  • Add transformation to unroll and decompose LoopOps

  • Add dead code elimination support as post-IR optimisation

  • Add range-based indexing for remote buffers when used in LoopOps

  • Add transform to merge adjacent LoopOps together

  • Allow pipelining to resolve correct IPU from CallOps

  • Add new warnings and errors for incompatible user settings

  • Improve IR transform order to run critical transforms before outlining

  • Improve replicated tensor sharding (RTS) optimizers for pipelining

  • Add reshape to correct output shape for binary/mul gradoppattern outputs

  • Avoid double scoping TensorIds when creating new outputs for operations within subgraphs

  • Improve pre/post loss and RemoteLoad/RemoteStore scheduling

  • Improve catching all types of (indirectly or directly) unmodifiable tensors for inplacing

  • Ensure input and output data type of replicated allgather remains identical

  • Remove extra outline attributes from ops that do not need them

  • Add custom reshape with one input and one attribute

  • Migrate DepthToSpace and SpaceToDepth to onnxpasses

  • Add C code comments to python docs for cases where C and python functionality are the same.

  • Add support for negative axes in reduce ops

  • Move graph compiled logging to info level

  • Add logging feature: Compile time breakdown into main components

  • Add ONNX → ONNX transformations.

  • Add Mod and Remainder Operators

  • Run deserialized executable on a different device

  • Make loadEngineAndConnectStreams public

  • Add isAttached function to DeviceInfo

  • Allow resetting device associated with session

  • Enable loading OfflineIpu executables into Ipu device

  • Add assert operation to popart using poplar’s Abort and AbortOnCondition program

  • Added hashing for all session option values (improved serialization)

  • Defer cached engine loading to Session::prepareDevice

Bug Fixes

  • Fix bug where it could fail to retrieve subgraph virtual graph IDs

  • Improve unwinding and fix cases of circular tensor dependencies

  • Fix batch axis pickup on weights for batch serialisation

  • Fixes for engine caching of optimizer state data

  • Make instancenorm, convolutions, LSTM/GRU re-setupable

  • Fix aliasing (fwd/bwd region mapping) for elementwise broadcasting

  • Fix replicated tensor sharding (RTS) for elementwise broadcasting

  • Fix batch serialization in the case where graph replication factor is greater than zero

  • Fix incorrect computation in the ‘atan2’ and ‘fmod’ gradient operators

  • Fix the scaling of the output of the IdentityLossGradOp when the ‘reduction’ option is set to popart.ReductionType.Mean

  • Fix an outlining bug related to pruning

PopTorch Changelog

2.0.0

  • Added support for the following activation functions:

    • torch.nn.acosh

    • torch.nn.asinh

    • torch.nn.atanh

    • torch.nn.Hardshrink

    • torch.nn.SiLU

    • torch.nn.Softplus

    • torch.nn.Softshrink

    • torch.nn.Threshold

  • Add support for the following random sampling operations:

    • torch.bernoulli

    • torch.distributions.Bernoulli

  • Add experimental support for torch.nn.CTCLoss

  • Added Adam optimizer

  • Added support for torch.nn.AdaptiveAvgPool1d, torch.nn.AdaptiveAvgPool3d

  • Migrated to PyTorch version 1.7.1

  • Add support for aten::index, aten::index_put_

  • Add support for torch.zeros_like, torch.ones_like

  • Allow the user to specify which Optimizer attributes are constant or not.

  • Allow the user to specify mode=poptorch.DataLoaderMode.Async in poptorch.DataLoader constructor instead of explicitly creating an AsynchronousDataAccessor

  • Add support for torch.nn.EmbeddingBag

  • Added support for torch.clamp_max and torch.clamp_min

  • Add support for torch.min(tensor, dim=., keepdim=.) and torch.max(tensor, dim=., keepdim=.) overloads.

  • Add support for poptorch.isRunningOnIpu. This function returns True when executing on IPU and False when executing the model outside IPU scope.

  • Add support for torch.amax and torch.amin

  • Add support for attributes in custom ops.

  • Add support for precompilation and reloading exported executables (poptorch.PoplarExecutor.compileAndExport and poptorch.load)

  • Add support for slices with variable start index (slice size must be constant).

  • Add ipuHardwareVersion function to read the version of the IPU hardware present on the system.

  • Changed default targetd Ipu version for the model and offline compilation to 2.

  • Changed accumulationReductionType(reduction) option to now apply to replication reduction as well

  • Add environment variable POPTORCH_CACHE_DIR

  • Deprecated Options.Popart, Options._Popart may be used experimentally.

Poplar Changelog

2.0.0

New features

  • A new implementation of the host IO runtime that doesn’t block until needed

  • Compile time improvements, especially related to complicated tensor expressions

  • Switched to V3 of the profiler format by default

Bugs fixes

  • Fixed a host memory explosion due to certain tensor expressions

  • Fixed a bug when resetting replica subset

  • Fixed a compile time issue when using non-default cell ordering in a GRU

  • Fixed a hang caused by not respecting the sync group during control flow

  • Fixed a crash when storing a Target in a hash table

  • Fixed an overflow in the profiler during long executions

  • Prevented the ProfilerAnalyseMemory stage from being called twice during lower

  • Added error checking when creating a tensor with too many elements

  • Fixed a bug caused by assuming an incorrect master tile in gateway mode

  • Fixed a memory spike that only occurs while using gateway mode

  • Don’t output RTTI information (which wasn’t supported anyway) when using popc

  • IPU device NUMA information is now recomputed when the device changes

  • Fixed a corruption in data streams when using serialised executables

  • Profiling now only tracks the first replica

  • Avoid database external reads during profiling before completion

  • Do not reset ring counts completely when reconnecting a stream callback

  • Fixed a tile exception caused by exceeding the 16-bit delay value in exchange lowering

  • Fixed wrong link in Poplar User Guide

  • Fixed problem with image filename in Poplar User Guide

Other improvements

  • Optimised the performance when using the syncReplicasIndependently option

  • Use fast RDMA transfers by default

  • Better error message when trying to attach to unsupported number of IPUs

  • Make poplar::DeviceManager and poplar::Engine thread-safe

  • Code gen optimisation for poplar::program::Switch

  • Support POD64 sync configuration where master is in the middle

  • Export a CMake config version file for Poplar

  • Added a new poplar::program::Abort program

  • Added metadata about the memory usage to the final ELF files

  • Remove limitation that vertex state must go in the lower 256KBs

  • Better logging related to Poplar’s temporary files

  • Add Poplar functions to get/modify/restore CSR

  • Optimised the interleave tensor expression to make it dimension agnostic

  • Added support for counting number of FLOPs as well as cycles for a model

  • Report cycles for all executions of the same StreamCopy rather than just the last one

  • Improved binary load times by using a bootloader on the device

  • Added a method to help printing tensor shapes

  • Better support for DebugContext among Poplar’s native programs

  • New method for performing upsample by repetition on a tensor

  • Improvements to the profiler to distuingish between host and remote buffer IO

  • Support for GS2 sync in Poplar

  • Added an engine option for better logging of copy lowering

  • popc now sets -Wdouble-precision by default

  • Better error checking when setting a print stream

  • Improved documentation around the fact that random numbers when using the IPUModel are not reliable

  • Removed references to an internal macro in the vertex/assembly documentation

Poplar Libraries Changelog

2.0.0

New features

  • Added support for CTC loss in training graphs

  • Added support for Faster Transformer using dynamic sequence padding

  • Reimplemented TopK and sort using a bitonic sort algorithm

  • Extended LSTM and GRU support to better handle layers with large sequences

  • Added support for Cholesky decomposition

  • Early Access POD128 (2xPOD64) and 2xPOD16 support for replica size 4 IPUs as demonstrated by RN50 and Bert L. Enabled by 3 phase allreduce where phase 2 crosses the gateway links.

  • Early Access Grouped collectives API enabling collective operations on a sub-group of all replicas constituting the application e.g. all replicas within a rack (over IPU links) or across racks over gateway links only.

Bug fixes

  • Fixed a bug where where doing abs on an int treating it as a float

  • Fixed a bug in the embeddings where the wrong vertex was chosen

  • Reduced the probability of overflow because of Block Sparse masking

  • Fixed some performance issues with the sparse_fc_layer tool

  • Fix problems in block sparse matmul documentation

  • Fixed an overflow in the transpose vertex that caused a tile exception

  • Fixed the operators of SlicePlan when used in an associative container

  • Fixed a bug where the wrong vertex was chosen for a fully connected layer that had different input and output type

  • Instantiate histogram codelets correctly for absolute input values

  • Remove internal API from the public convolution header

  • Fixed a bug that caused popops::scaledAddTo(a, X, b, Y) to fail when X is integer type

  • Fixed a compile time explosion when planning certain convolution

  • Updated documentation to fix broken links

  • Fixed discrepancies in the 1x1 convolution vertex cycle estimates

  • Fix histogram graph construction to allow for multi-dimensional tensors

  • Force convolution groups to be 1 when the outer product vertex is used

  • Fixed reference to “Unnamed Group” in the Poplibs documentation

  • Fixed prearranges using the triangular solver with better tile mappings

  • Clarified unclear documentation of the remapOutputTensor convolution option

  • Fixed an error with the API in the popsparse documentation

Other improvements

  • Compile time improvements during graph construction

  • Added support for casting between int8 and fp16/fp32

  • Add a grain size to created output tensor for convolutions if it cannot be based off the input operand tensors

  • Improve calculation of the division of workers in the elementwise operations

  • Improved the triangular solver layer with a planner and allocation functions

  • Update the planner to specialise the convolution partial types based on the candidate vertex type

  • Optimise binary elementwise operations that return a boolean

  • Added non-inplace variants for non-linearities

  • Added enhanced debug information to poprand

  • Moved poplibs collective operations to GCL (see known issues for API change)

  • Updated collectives documentation to reflect this move

  • Improved logging during graph constructions

  • Improved logging of convolutions

  • Added popops::hasInfOrNan and popops::hasInf operators

  • Added mixed precision version of aXMinusbY codelet

  • Added support for elementwise equality of 16-bit integer types

  • Improved the code generation of the popops::iota op

  • Added support for reduce LogAdd in the reduction library

  • Improve documentation on what types are supported by popops::fill

  • Added an introduction to poplibs to the README

PopDist Changelog

2.0.0

New features

  • Added documentation

  • PopTorch support

  • Improved all user error messages

  • ipus_per_replica is now optional when calling getDeviceId

PopRun Changelog

2.0.0

New features

  • Added documentation

  • POD native synchronisation support

  • Improved input validation

  • Offline mode support (running application without requiring IPUs)

  • Support multi IPU-Link domain and multi-host in offline mode

  • Newly created V-IPU partitions are not reset

  • Ability to specify a timeout for V-IPU server requests

  • Partitions created by PopRun will be automatically evicted

  • PopRun will provide interactive progress status while running

  • All available NUMA nodes may be used and pinned consecutively

  • OpenMPI 4.0 is now bundled with the Poplar SDK, removing OpenMPI as an external dependency.

  • Temporary executable caching to avoid redundant compilations on the same host

  • Added verification of the number of replicas in existing partitions

Libpva Library Changelog

2.0.0

New features

  • Preview version of the PopVision Analyser Library for programmatic access to the profile.pop report. API provide for C++ and Python applications.

Bug fixes

  • None

TensorFlow Changelog

2.0.0

New features

  • Add support for overlapping communication and computation when using IPUInfeedQueue or the IPU Keras API. See the Efficient IPU I/O documentation section for more details.

  • Support 8-byte signed/unsigned integer datatypes for IPUInfeedQueue, IPUOutfeedQueue and datasets passed to any of the IPU Keras models.

  • Improved support for Cholesky and Triangular Solver operations. See ipu.utils.set_optimization_options for controlling the block size of the operations.

  • Improved performance and integration when using PopDist and PopRun for multi-instance programs.

  • Add a pre-compilation tracing mode for compiling programs for IPUs on machines without IPU hardware. See the Compiling and pre-compiling executables section in the documentation for more details.

  • Use OpenMPI bundled with the Poplar SDK for Horovod, removing external dependency on OpenMPI.

  • Implemented IPU versions of nce_loss and sampled_softmax_loss. Note that these are not compatible with host embeddings or compute_accidental_hits.

  • Implemented IPU versions of ctc_loss, ctc_loss_with_logits and Keras ipu.keras.CTCLoss.

  • Implemented IPU version of Dynamic GRU layer (PopnnDynamicGRU) to support dynamic sequence lengths.

  • Implemented IPU version of Attention Update GRU layer (PopnnAUGRU) with support for dynamic sequence lengths.

  • Improvements for tensor allocations when using embeddings or one-hot operations.

  • Take gradient accumulation into account for steps in IPUPipelineEstimator.

  • Use temporary executable cache from PopDist as a fallback to avoid redundant compilations.

  • Add synthetic_data_categories flag to TF_POPLAR_FLAGS for fine grain control of synthetic data.

  • Add further optimizations when enable_fast_math is enabled.

  • Generated profiling files for the PopVision Graph Analyser tool are now stored in directories in a tf_report{ISO date}{Process ID} format. Note that when using PopDist this format is tf_report{ISO date}{Process ID}__instance_{Instance Number} to allow profiling of each instance.

  • Improved integration with the PopVision Graph Analyser.

  • Improved documentation of the IPU Keras API.

  • Add unique_sharding and keep_input_layouts to ipu.outlined_function. See the API documentation for more details.

  • Improved integration with remote buffers when using ipu.outlined_function in custom TensorFlow optimizers.

  • Support the axis parameter in InstanceNormalization.

  • Improvements to the performance of the data feeding mechanism.

  • Automatically display a compilation progress bar when compiling larger models. Can be disabled by setting TF_POPLAR_FLAGS=--show_progress_bar=false.

  • Improved error messages and error reporting.

Bug fixes

  • Make average the default operation for Horovod allreduce.

  • Fix handling of multi input and multi output IPU Keras models.

  • Fix the grouping in the IPU Keras InstanceNormalization and LayerNormalization.

  • Fix handling of datasets with nested structures passed to the IPU Keras models.

  • Fix groups in IPU Keras Layer and Instance norm

  • Fix initializer colocation in IPUMultiWorkerStrategy.

  • Fix to recomputation in pipelining when a pipeline stage contains stateful operations which cannot be recomputed.

  • Prevent TemporaryVariable operations from being generated.

  • Fix edge cases where a transposed constant could be used as-is without transposition.

  • Support --log_cycle_count for cached executables.

Known issues

The following section will detail known issues in v.
Each product will be detailed separately.

Product

Paragraph

Driver & Utilities

Driver & Utilities known issues

PopART

PopART known issues

PopTorch

PopTorch known issues

Poplar

Poplar known issues

Poplar Libraries

Poplar Libraries known issues

PopDist/PopRun

PopRun/PopDist known issues

Libpva Library

Libpva Library known issues

TensorFlow

TensorFlow known issues

Driver & Utilities known issues

1.0.50

  • Killing an application whilst it is in the process of attaching to an IPU within a POD can cause the IPUs to reject further connections, requiring a partition reset to recover.

PopART known issues

2.0.0

None.

PopTorch known issues

2.0.0

None.

Poplar known issues

2.0.0

  • A bootloader is now used by default on Mk2 targets which reduces the tle memory available slightly

  • Deprecated the following dangerous API’s:

    • Engine::Engine(Graph &&, …​)

    • Engine::readTensor(StringRef, void *)

    • Engine::writeTensor(StringRef, const void *)

    • Engine::connectStream(const DataStream &stream, void *, void *)

    • Engine::connectStream(const DataStream &stream, void *)

    • Engine::connectStreamToCallback(const DataStream &, StreamCallbackHandle)

    • Engine::connectStreamToCallback(const DataStream &, unsigned, StreamCallbackHandle)

    • Engine::copyFromRemoteBuffer(const RemoteBuffer &, void *, int, unsigned)

    • Engine::copyToRemoteBuffer(const RemoteBuffer &, void *, int, unsigned)

    • compileGraph(Graph &&)

  • Deprecated the poplar::cycleCount and poplar::cycleStampt API’s that do not take an explicit sync type.

  • When running TensorFlow CNN applications on POD128 the Poplar Engine option "target.maxStreamCallbackThreadsPerNumaNode": "auto" must be set to ensure convergence to reference accuracy and throughput.

  • A deadlock will occur during replicated training if the fastest replica is executing sufficiently fast that it empties its data feed queue, when the slowest replica has a full queue.

  • Some tables do not appear in the HTML Poplar documents. See the documents published on https://docs.graphcore.ai for the correct versions.

Poplar Libraries known issues

2.0.0

  • The following API’s have been deprecated:

    • popops::reduceScatter(…​, popops::Operation, …​)

    • popops::allReduce(…​, popops::Operation, …​)

    • poplin::createTriangularSolveInputLHS(…​, std::size_t blockSize, …​)

    • poplin::createTriangularSolveInputRHS(…​, std::size_t blockSize, …​)

    • poplin::triangularSolve(…​, std::size_t blockSize, …​)

    • poplin::getTriangularSolveMatMulPrePlanParameters(…​, std::size_t blockSize, …​)

PopDist known issues

2.0.0

None.

PopRun known issues

2.0.0

None.

Libpva Library known issues

2.0.0

None.

TensorFlow known issues

2.0.0

  • IPU Keras API ignores the shuffle and verbose arguments.

Compatibility changes

The following section will detail compatibility changes in v

Product

Paragraph

Driver & Utilities

Driver & Utilities compatibility changes

PopART

PopART compatibility changes

PopTorch

PopTorch compatibility changes

Poplar

Poplar compatibility changes

Poplar Libraries

Poplar Libraries compatibility changes

PopDist/PopRun

PopRun/PopDist compatibility changes

Libpva Library

Libpva Library compatibility changes

TensorFlow

TensorFlow compatibility changes

Driver & Utilities Compatibility changes

1.0.50

None.

PopART Compatibility changes

2.0.0

  • [API] Deprecate accumulationReductionType SessionOption

  • [API] Remove previously deprecated LegacyOpFactoryFunction

  • [API] Deprecate Opx::debugPrefix

PopTorch Compatibility changes

2.0.0

None.

Poplar Compatibility changes

2.0.0

None.

Poplar Libraries Compatibility changes

2.0.0

None.

PopDist Compatibility changes

2.0.0

None.

PopRun Compatibility changes

2.0.0

None.

Libpva Library Compatibility changes

2.0.0

None.

TensorFlow Compatibility changes

2.0.0

  • TensorFlow packages have been renamed from gc-tensorflow to tensorflow. This ensures that the Graphcore port of the TensorFlow framework can be detected by pip when using other libraries dependent on TensorFlow.

  • The default Poplar IPU Model used for tests has changed to IPU version 2.

  • Expose offload_weight_update_variables to the IPU Keras Pipelining API.

  • Custom IPU operations metadata API (now level 4) has changed to include parameters input_to_output_tensor_aliasing and is_hashable. See the Custom IPU operations documentation section for further details.

  • Updated optimizer namespace structure to allow simpler import statements.

  • ipu.keras.SequentialPipelineModel has been renamed to ipu.keras.PipelineSequential. The old name is deprecated and will be removed in a future release.

  • The accumulation_count and accumulation_dtype arguments in the constructors of the ipu.keras.Model and ipu.keras.Sequential classes have been renamed to gradient_accumulation_count and gradient_accumulation_dtype. The old names are deprecated and will be removed in a future release.

  • Autosharding has been deprecated and will be removed in a future release.

  • For a comprehensive list of all the API calls which have been deprecated or removed, and how to update them, see the API changes section in the documentation.

Appendix

Appendix A : Additional requirements

PopVision Graph Analyser

  • To be able to view profiling reports generated by SDK v1.3.0, PopVision Graph Analyser v2.1 is required.

TensorFlow

To correctly execute TensorFlow code please ensure:

Intel platforms

  • Use Python 3.6 as minimum version

  • A CPU compatible with the AVX-512 instruction set is needed.

AMD plaforms

  • Use Python 3.6 as minimum version

  • A CPU compatible with the Znver1 instruction set is needed.