5.1.9. Graphcore Communication Library (GCL) changelog

2.6.0+5997

New features

  • Started supporting orthogonal groups using store and forward when GCL_OPTION syncful.useStoreAndForward is set to true

  • Added toPopopsOperation() in Collective API

  • Added the possibility to create GraphViz .gv files from crossReplicaCopy() maps when GCL_CROSS_REPLICA_COPY_GRAPH_PATH environment variable is set to visualize cross replica communication patterns

  • Collective Balanced Reorder API: Added setter for gatheredToRefSlices

  • Collective Balanced Reorder API: Added a size safe API

  • Implemented and enabled multi-phase ReduceScatter

  • Broadcast method is now used when collective method is auto for small tensors and small group sizes

Bug fixes

  • Fixed readTileMemory

  • Fixed CommGroup::size() logging for orthogonal groups

  • Verify optionals after unidirectional ring calls

  • Fixed unchecked optional accesses

  • Avoided exceptions in fromPublic()

  • Set proper initial value for logical op in TestIO.cpp

Other improvements

  • Started using multislicing in broadcast AllReduce when group size is greater than 2

  • Parameterized grain sizes for bidirectional reduceScatter

  • Added ReduceScatter/AllGather information to DebugNameAndInfo structure

  • Adjusted ringAllReduce grain size based on which collective method is used

  • Improved error handling of floating point conversions

  • Multiple improvements to the documentation

  • Added dispatcher logging for multi phase collectives

  • Performance improvements to meetInMiddle` in CollectivesProgram

2.5.0

New features

  • Extended GCL group API to include interleaved groups

  • Added a broadcast/oneToAll collective

  • Added handling for GCL_OPTIONS environment variable

  • Added support for many tensor multi phase reductions

  • Several latency improvements for GW-Links traffic

Bug fixes

  • Fixed grain size used in Collective Balanced Reorder API for multi phase AllReduce

  • Fixed SQUARE_ADD operation for multi phase AllReduce

  • Fixed uneven use of GW-Links on IPU-POD128 system

Other improvements

  • Added syncful.useOptimisedLayout GCL option

  • Multiple improvements to GCL’s memory footprint

  • Added support for n-phased cycle counts

  • Parallelised host side result validation

  • Relaxed mapping requirements for non-replicated collectives

  • Exposed concatChunks in the Collectives API

  • Added guards preventing modifications of input tensor

  • Added a GCL code example to the Poplar and PopLibs User Guide