4. Setting options
Several functions have options to modify their behaviour.
These are specified, using the poplar::OptionFlags
class,
as a series of option-value pairs, represented as strings.
There are two general classes of options that control GCL behaviour:
Modifying the collective operation
Options to control optimisation
4.1. Environment variables
Some options can be specified using environment variables. This will
override the values in the program. The environment variable for setting options is GCL_OPTIONS
.
4.1.1. Option values
The option values are typically either numeric or one of a list of enumerated values. The allowed range of values is documented, where relevant. The default value is shown in square brackets.
The options, and their allowed and default values, are described below:
method (anticlockwise_ring, auto, bidirectional_ring_pair, broadcast, clockwise_ring, meet_in_middle_ring, quad_directional_ring) [=auto]
This option controls the logical communication topology of the network. If set to
auto
, GCL will try to deduce the optimal method based on built-in heuristics. The detailed description of each method can be found in Section 3.4, Collective methods.syncful.maxBroadcastSize Integer [=2048]
This option sets the maximum data size value for which the broadcast operation will be performed. For small tensors it is beneficial to broadcast the tensor to all replicas and do the reductions locally so the network latency cost is paid only once. However, the memory use increases for larger group sizes and data volumes. This option controls the
group_size * numBytes
size beyond which broadcast AllReduce will not be used.syncful.useOptimisedLayout (true, false) [=true]
This option controls whether GCL will reuse the same data layout for source buffers. If the input tensor has been allocated in a GCL-friendly way, reusing the same layout for the source buffers will minimise code when copying fragments to the source buffers. Setting this to
false
might reduce the cycle count at the cost of higher memory usage.syncful.useForwardingToSupportStridedGroups (auto, true, false) [=auto]
This option controls whether the store and forward technique is enabled in GCL. This technique is useful if generated traffic patterns try to go beyond the reachability of the sliding window or can potentially deadlock. When store and forward is enabled, data movement between the replicas is broken down into several steps where intermediate replicas act as lighthouses that receive and forward the data on the way towards the destination. This extends the reachability of the sliding window and may decrease the number of overlapping communication rings, which breaks cyclic dependencies in the network.
For example, the gcl::allReduceCrossReplica()
function has
an options
parameter that can control the internal reduction method (in this case, it will perform a broadcast instead of sending individual packets to each participating replica):
// Run the allReduce with using a broadcast collective
allReduceCrossReplica(graph, datas, op, prog, {}, {"method": "broadcast"});
An invalid_option
or gcl::error
exception may be thrown if the value of the option is not recognised or is out of range.
4.1.2. Logging
GCL can output information about its activity and you can control the level of logging information using environment variables.
GCL_LOG_LEVEL (TRACE, DEBUG, INFO, WARN, ERR, OFF) [=WARN]
Controls the amount of information written to the log output for all modules.
GCL_API_LOG_LEVEL (TRACE, DEBUG, INFO, WARN, ERR, OFF) [=WARN]
Controls the amount of information written to the log output for API module that includes detailed information about the collective operation calls.
GCL_LOG_DEST (stderr, stdout, filename) [=stderr]
Defines the output for the logging information. The value can be
stdout
,stderr
or a file name.
Log level |
Description |
---|---|
|
No logging information. |
|
Only error conditions will be reported. |
|
Warnings, for example, when the software cannot achieve what was requested. The default. |
|
Very high level information. |
|
Useful per-graph information. |
|
The most verbose level. All useful per-tile information. |
4.1.3. Graph generation
There are situations where it might be useful to visualize the communication patterns taking place between the replicas. This is controlled by setting the GCL_CROSS_REPLICA_COPY_GRAPH_PATH
environment variable to point to the directory where the graph should be saved.