4. Setting options

Several functions have options to modify their behaviour. These are specified, using the poplar::OptionFlags class, as a series of option-value pairs, represented as strings.

There are two general classes of options that control GCL behaviour:

  • Modifying the collective operation

  • Options to control optimisation

4.1. Environment variables

Some options can be specified using environment variables. This will override the values in the program. The environment variable for setting options is GCL_OPTIONS.

4.1.1. Option values

The option values are typically either numeric or one of a list of enumerated values. The allowed range of values is documented, where relevant. The default value is shown in square brackets.

The options, and their allowed and default values, are described below:

  • method (anticlockwise_ring, auto, bidirectional_ring_pair, broadcast, clockwise_ring, meet_in_middle_ring, quad_directional_ring) [=auto]

    This option controls the logical communication topology of the network. If set to auto, GCL will try to deduce the optimal method based on built-in heuristics. The detailed description of each method can be found in Section 3.4, Collective methods.

  • syncful.maxBroadcastSize Integer [=2048]

    This option sets the maximum data size value for which the broadcast operation will be performed. For small tensors it is beneficial to broadcast the tensor to all replicas and do the reductions locally so the network latency cost is paid only once. However, the memory use increases for larger group sizes and data volumes. This option controls the group_size * numBytes size beyond which broadcast AllReduce will not be used.

  • syncful.useOptimisedLayout (true, false) [=true]

    This option controls whether GCL will reuse the same data layout for source buffers. If the input tensor has been allocated in a GCL-friendly way, reusing the same layout for the source buffers will minimise code when copying fragments to the source buffers. Setting this to false might reduce the cycle count at the cost of higher memory usage.

  • syncful.useForwardingToSupportStridedGroups (auto, true, false) [=auto]

    This option controls whether the store and forward technique is enabled in GCL. This technique is useful if generated traffic patterns try to go beyond the reachability of the sliding window or can potentially deadlock. When store and forward is enabled, data movement between the replicas is broken down into several steps where intermediate replicas act as lighthouses that receive and forward the data on the way towards the destination. This extends the reachability of the sliding window and may decrease the number of overlapping communication rings, which breaks cyclic dependencies in the network.

For example, the gcl::allReduceCrossReplica() function has an options parameter that can control the internal reduction method (in this case, it will perform a broadcast instead of sending individual packets to each participating replica):

// Run the allReduce with using a broadcast collective
allReduceCrossReplica(graph, datas, op, prog, {}, {"method": "broadcast"});

An invalid_option or gcl::error exception may be thrown if the value of the option is not recognised or is out of range.

4.1.2. Logging

GCL can output information about its activity and you can control the level of logging information using environment variables.

  • GCL_LOG_LEVEL (TRACE, DEBUG, INFO, WARN, ERR, OFF) [=WARN]

    Controls the amount of information written to the log output for all modules.

  • GCL_API_LOG_LEVEL (TRACE, DEBUG, INFO, WARN, ERR, OFF) [=WARN]

    Controls the amount of information written to the log output for API module that includes detailed information about the collective operation calls.

  • GCL_LOG_DEST (stderr, stdout, filename) [=stderr]

    Defines the output for the logging information. The value can be stdout, stderr or a file name.

Table 4.1 GCL logging levels

Log level

Description

OFF

No logging information.

ERR

Only error conditions will be reported.

WARN

Warnings, for example, when the software cannot achieve what was requested. The default.

INFO

Very high level information.

DEBUG

Useful per-graph information.

TRACE

The most verbose level. All useful per-tile information.

4.1.3. Graph generation

There are situations where it might be useful to visualize the communication patterns taking place between the replicas. This is controlled by setting the GCL_CROSS_REPLICA_COPY_GRAPH_PATH environment variable to point to the directory where the graph should be saved.