5.11. Graphcore Communication Library (GCL)
Ensured that ReduceScatter always chooses the same collective method as AllGather that follows it and consumes its output.
Fixed the AllReduce performance glitch for 1 IPU per replica models with 90-100MB tensors.
Simplified public API for collectives.
Improved GCL documentation.
Optimized AllToAll collective on SlidingWindow-routed fabrics.
Allowed specifying chains of collective methods.
Prevent AllReduce outputs from being flagged as always-live.
Removed support for syncless collectives. Use the default (syncful) collectives implementation instead. The option to select a collective implementation has been deprecated.
The GCL option
syncful.useForwardingToSupportStridedGroups=autois now exposed externally.
By doing through-routing over intermediate replicas, sliding window reachability is extended for certain configurations. For a replica size equal to 1, this option avoids deadlocks.
The broadcast method is no longer selected when the stride is greater than 1.
Three-phase collective operations are now scheduled properly for consecutive communication groups that span multiple ILDs.
The GCL documentation has been updated and split out into a separate document: GCL User Guide and API Reference. More detail on collective operations and logical topologies has been added.
Added getters to the public API.
useSynclessCollectivesoption from the public API. The option was deprecated in SDK 2.6 and has now been removed.
libgcl.so. The GCL shared library was renamed which means that programs directly linking with GCL need to pass the
-lgclflag to the linker instead of