4. Targeting the Poplar XLA device

The Poplar XLA devices are named /device:IPU:X, where X is an integer which identifies that logical device. This can consist of one or more physical IPU devices, as described below.

A Python context handler is available for setting up all appropriate scoping when you create the graph. This will place all operations built inside it on the chosen Poplar XLA device:

1with ipu_scope("/device:IPU:0"):
2  xla_result = ipu.ipu_compiler.compile(my_net, [x_data, y_data, p_angle])

For very simple graphs, it is sufficient to use the IPU scope to define the parts of the graph which will be compiled. For most graphs, the function ipu_compiler.compile() must be used. This must be placed inside an IPU device scope.

The function ipu_compiler.compile() will cause all operations created by the Python function passed into its first argument to be placed on the IPU system, and be compiled together into a single Poplar executable.

4.1. Supported types

Poplar and the PopLibs libraries support the following data types:

  • tf.float32

  • tf.float16

  • tf.int32

  • tf.bool

4.2. Device selection

Hardware configuration options enable you to select the number of IPU devices. By default, TensorFlow will create one virtual device (/device:IPU:0) with a single IPU. The first available single IPU will be used.

Two API options on the IPUConfig are available for controlling which or how many IPUs this virtual device will use:

  • auto_select_ipus allows you to specify a quantity of IPUs to use. The virtual device will be given that many IPUs.

  • select_ipus allows you to choose a specific IPU hardware device using its ID. The device IDs can be seen with the gc-info command line tool. An ID can represent a single IPU device or a larger “multi-IPU” device that contains a group of closely connected single IPU devices.

    For example, the largest single IPU device in a 16-IPU system has the ID 15, while the largest multi-IPU device has the ID 30. (See the IPU Command Line Tools document for details of how device IDs map to available IPUs.)

You can also pass a list (or tuple) to either of these options. This will configure a separate TensorFlow virtual device for each value in the list (/device:IPU:0, /device:IPU:1, and so on).

Once the hardware structure has been specified, the configure_ipu_system() method of the config object must be used to attach to and initialise the hardware:

1from tensorflow.python import ipu
2
3cfg = ipu.config.IPUConfig()
4# Select multi-IPU with device ID 30, which contains all 16 IPUs of a 16-IPU system.
5cfg.select_ipus = 30
6cfg.configure_ipu_system()

This example will allocate a single TensorFlow virtual device (/device:IPU:0) which will use all 16 IPUs in a 16-IPU system.

For more examples, see the documentation for the options in the Python API: auto_select_ipus, select_ipus.

4.3. Configuring system options

In addition to auto_select_ipus and select_ipus, the IPUConfig class has many other options for system configuration. The class is a nested structure of attributes which organises these options into several categories and sub-categories. To use it, instantiate it and treat its attributes like ordinary Python variables:

 1# Initialize an IPUConfig instance
 2cfg = ipu.config.IPUConfig()
 3
 4# Ask for 2 IPUs on /device:IPU:0
 5cfg.auto_select_ipus = 2
 6
 7# Change our mind and decide we need 4 IPUs instead. This is fine since
 8# setting any config attribute has no effect until the config is used to
 9# configure the IPU system
10cfg.auto_select_ipus = 4
11
12# Configure the system with the config, creating /device:IPU:0 with 4 IPUs
13cfg.configure_ipu_system()

Some attributes are not configuration options themselves, but rather names for general categories of grouped options. Categories cannot be set. You can access an arbitrarily nested attribute with chained dot notation, and an attribute’s full name indicates exactly where it is in the IPUConfig nested structure. For example:

 1cfg = ipu.config.IPUConfig()
 2
 3# Set the IPU Model version, which is in the "ipu_model" category
 4# Its full name is "ipu_model.version"
 5cfg.ipu_model.version = "ipu2"
 6print(cfg.ipu_model.version)  # ipu2
 7
 8# Set the multi replica distribution process count, which is in the
 9# "multi_replica_distribution" sub-category of the "experimental" category
10# of the config
11cfg.experimental.multi_replica_distribution.process_count = 2
12print(cfg.experimental.multi_replica_distribution.process_count)  # 2
13
14# You cannot set to a category, since it's a grouping of options and is not an
15# option itself
16cfg.experimental = 2  # Will error
17
18cfg.configure_ipu_system()

Options are type checked when they’re set and if an option cannot be found, then a similarly spelled one may be suggested. Additionally, setting to a deprecated option will give you a warning:

1cfg = ipu.config.IPUConfig()
2
3# Try to set an option to an incorrect type
4cfg.ipu_model.version = True  # Will error immediately asking for a string
5# Make a spelling mistake when writing the option name
6cfg.ipu_model.vrsion = "ipu2"  # Will ask if you meant "version"

The metadata for any attribute, including its default, allowed types, docstring, full name and whether or not it is deprecated can all be accessed through the get_attribute_metadata() function, which takes a string representing the attribute’s full name, relative to the category you are calling the function on. For example:

 1cfg = ipu.config.IPUConfig()
 2
 3# Access by full name from the base config:
 4metadata = cfg.get_attribute_metadata("ipu_model.version")
 5# Access by name relative to the "ipu_model" sub-category:
 6metadata = cfg.ipu_model.get_attribute_metadata("version")
 7
 8# Use the metadata
 9print(metadata.types)  # [str]
10print(metadata.default)  # "ipu2"
11print(metadata.deprecated)  # False indicates it is not deprecated
12print(metadata.__doc__)  # "Specify the ipu version to be used by the..."
13
14# Check a value against the option's type
15metadata.check_type(5)  # Will fail, since this option needs a string.
16
17# Print a deprecation message if the option is deprecated
18metadata.warn_if_deprecated()

This is useful for forwarding IPU configuration options to command line interfaces in applications. Note that you can also access the metadata of categories themselves, but the types and default fields will be empty. You can see a full description of the available metadata in the AttributeMetadata class.

When you are finished adjusting the IPUConfig instance, use it to configure the IPU system by calling its configure_ipu_system() method. The options set on an instance will not have any effect until this is done. Note that configuring the system does not alter the instance. For example:

 1cfg = ipu.config.IPUConfig()
 2cfg.auto_select_ipus = 4
 3cfg.ipu_model.compile_ipu_code = False
 4cfg.ipu_model.version = "ipu2"
 5cfg.scheduling.algorithm = ipu.config.SchedulingAlgorithm.Clustering
 6...
 7
 8cfg.configure_ipu_system()
 9# The IPU system can now be used.
10
11# The config can still be accessed after configuration.
12print(cfg.ipu_model.version)  # ipu2

In addition to auto_select_ipus and select_ipus, some other options on the IPUConfig which can be used to configure the hardware and compiler are highlighted below:

  • allow_recompute turns on recomputation, to reduce the memory requirement of the model at the expense of speed.

  • selection_order to control the mapping between the “virtual” IPUs and physical IPUs of a multi-IPU device.

  • compilation_poplar_options sets general options to be passed to the Poplar compiler.

  • convolutions.poplar_options, matmuls.poplar_options and pooling.poplar_options pass specific options directly to the PopLibs convolution, matmul and pooling operations.

  • ipu_model is a category containing options that control the Poplar IPU Model device type.

  • floating_point_behaviour is a category containing options that allow you to configure the IPU device’s floating point control register.

  • optimizations is a category containing options that can toggle various optimizations, which generally have a performance or memory use trade-off.

  • scheduling contains options that specify and control the scheduling algorithm the Poplar XLA backend uses to schedule the operations in the graph before it is lowered to Poplar.

To view the full list, see IPUConfig.

4.3.1. TF_POPLAR_FLAGS environment variable

The options passed through the IPUConfig are tied to the application that uses that config to configure the IPU system. Some configuration options are instead provided by an environment variable called TF_POPLAR_FLAGS.

If you set TF_POPLAR_FLAGS=--help and execute a TF session, it will output some help for each option. The available options are described below:

Table 4.1 TensorFlow configuration options

Option

Description

--dump_schedule_as_dot

Dump the schedule of the XLA graph to the user console.

--executable_cache_path=path

Enables the Poplar executable cache. See Caching of compiled executables.

--fallback_scheduler

Uses the standard TensorFlow scheduler, instead of the Graphcore specific one.

--help

Print information for all the options.

--log_cycle_count=int

Log the number of cycles used in evaluating the main graph. The numeric argument indicates the tile on which the cycle count operation will be created. This may be used as an alternative to profiling for graphs with dynamic control flow.

--max_compilation_threads=int

Sets the maximum number of threads which Poplar is allowed to use for compiling the executable.

--max_infeed_threads=int

Sets the maximum number of threads which each infeed queue is allowed to use when accessing data from datasets.

--null_data_feed

Cause any infeed queues to copy garbage data to the IPU rather than real data. This option can be used to determine whether the dataset provided to the infeed queue is the bottleneck during execution.

--tensor_map_file_path=path

Cause a JSON file containing the tile mapping of all tensors to be written to this directory.

--use_ipu_model

Use the Poplar IPUModel for graph compilation and execution.

--use_synthetic_data

Prevent the system from downloading or uploading data to the IPU when executing code. This can be useful for testing performance without the overhead of data transfer.

Using this option, all data transfer is prevented. You can instead use --synthetic_data_categories to prevent the transfer of specific categories of tensor data.

When using this option, the graph’s transferred input tensors will never be initialized and can therefore have undefined content. You can avoid this with the --synthetic-data-initializer option.

The outputs from any outfeeds will also be uninitialized tensors on the host which may also contain undefined content.

This option cannot be used when dequeuing an IPUOutfeedQueue which is in IPUOutfeedMode.LAST mode.

When using synthetic data, communication between the host and IPUs is turned off. This can cause an error: “Host sync timed out”. The amount of time the host waits before it times out can be changed using the poplar runtime options. For example: export POPLAR_RUNTIME_OPTIONS='{"target.hostSyncTimeout":350}'.

--synthetic_data_categories

Prevent the system from downloading or uploading data of the given types to the IPU when executing code. This can be useful for testing performance without the overhead of data transfer.

The values can be any of: infeed, outfeed, seed, hostembedding or parameters.

For example, --synthetic_data_categories='infeed,outfeed' will use synthetic data just for infeeds and outfeeds.

When using this option, the graph’s transferred input tensors will never be initialized and can therefore have undefined content. You can avoid this with the --synthetic-data-initializer option.

This option is a more selective alternative to --use_synthetic_data; you shouldn’t specify both.

When using synthetic data, communication between the host and IPUs is turned off. This can cause an error: “Host sync timed out”. The amount of time the host waits before it times out can be changed using the poplar runtime options. For example: export POPLAR_RUNTIME_OPTIONS='{"target.hostSyncTimeout":350}'.

--synthetic_data_initializer=X

When using synthetic data, by default, the graph’s input tensors will not be initialized and therefore will have undefined content. You can use this option to initialize these tensors on the device.

The argument X can be set to uniform, normal, or a number.

When uniform is specified, each input tensor is initialized with uniformly distributed random numbers (of the numerical type of the tensor). The range of the uniform distribution is between the minimum and maximum representable numbers for the specific numerical type of each tensor (for example, for FP16, the range would be [-65504.0, +65504.0], whereas for uint16, it would be [0,65535]).

When normal is specified, each input tensor is initialized with random numbers drawn from the Gaussian distribution of mean 0 and standard deviation 1, when the tensor type is floating point. For integral types, a constant of value 1 is used instead.

Finally, when the argument X is a number, only its integer part is used to initialize the tensors.

For the --synthetic_data_initializer option to have an effect, you must also specify --use_synthetic_data or --synthetic_data_categories.

--while_loop_brute_force_max_trip_count=int

Sets the upper bound for how many iterations a while loop will be simulated for in order to brute force the number of times it will be executed.

--show_progress_bar=true|false|auto

Whether to show the compilation progress bar. Either true, false or auto. When set to auto, the progress bar will only be enabled when attached to a console, VLOG logging is disabled and compiling a graph which can take more than few seconds to compile. Defaults to auto.

--on_demand_device_poll_time=int

When using ‘ON_DEMAND’ connection type, configure how often to poll for the device (in milliseconds) when a device is not available - defaults to 1000ms. Minimum is 100ms.

--on_demand_device_timeout=int

When using ‘ON_DEMAND’ connection type, configure how long to wait (in milliseconds) for a device before timing out - defaults to 3600000ms (1 hour).

--ipu_model_tiles=int

When specified and when using the Poplar IPUModel target, sets the number of tiles for the IPUModel device created. This flag has no effect if the --use_ipu_model flag is not used. This flag is ignored if the IPUConfig.ipu_model.tiles_per_ipu is set.

--sync_replica_start

Add a global synchronisation point at the start of each replica’s main Poplar program. This can be used to force each replica to not execute until all replicas have started.

--disable_poplar_version_check

If set, the Poplar version check will be disabled.

Multiple options can be specified at the same time by concatenating them like command line switches, for example: TF_POPLAR_FLAGS=--executable_cache_path=/tmp/cache --log_cycle_count=123.

4.4. Supported operations

A list of supported TensorFlow operations is provided in TensorFlow operators supported by the IPU.

4.5. Unsupported operations

TensorFlow core operations which use variable buffers or strings are not supported. For instance, JpegDecode.

Unsupported operations will cause the compilation to fail.

By including config=tf.ConfigProto(log_device_placement=True) as an argument to the creation of the session, you can check whether the operations in your graph have been targeted at the Poplar device. For example:

# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

4.6. Error Handling

Note

This section only applies to the execution using the XLA/Poplar runtime. If you are using the IPU embedded application runtime see Error Handling.

The error and exception handling by TensorFlow is divided into two categories:

  • Poplar graph construction and compilation errors which occur during construction and compilation of TensorFlow programs.

  • Poplar runtime errors which occur during the execution of the compiled program.

The following sections describe the actions you need to take when these errors occur.

4.6.1. Construction and compilation errors

These errors are reported to the user using the TensorFlow Status error classes. The error messages contain information about why the error occurred and what action the user is required to take in order to stop the error from occurring.

4.6.2. Runtime errors

These errors and exceptions occur when running a Poplar program. The full list of all the exceptions and their meanings can be found in the Poplar documentation in the Exceptions section of the Poplar API reference manual.

These runtime errors are handled in the following manner:

  • application_runtime_error - a tensorflow.errors.InternalError error is raised. The error message contains the reason why the error occurred. An IPU reset will be performed before the next execution of a Poplar program.

  • recoverable_runtime_error- a tensorflow.errors.InternalError error is raised. The error message contains the reason why the error occurred and recovery_action string attribute. This attribute can contain:

    • IPU_RESET: IPU reset will be performed before the next execution of a Poplar program.

    • LINK_RESET: Reset the IPU-Links in a non-Pod system. This retrains the IPU-Links between IPUs.

    • PARTITION_RESET: Reset the IPU partition in a Pod system. This retrains the IPU-Links between IPUs.

    • FULL_RESET: Power cycle the system.

  • Unknown runtime errors - a tensorflow.errors.Unknown error is raised. The error message might contain the reason why the error occurred. When these errors occur manual intervention is required before the system is operational again.

  • All other runtime errors - a tensorflow.errors.InternalError error is raised. The error message might contain the reason why the error occurred. When these errors occur manual intervention might be required before the system is operational again. The error message might contain a required recovery action.