11. API reference 

broadcastBuffers(broadcast_buffers=True)

Broadcast buffers to all replicas.

Only non-broadcast buffers are currently supported, which means each replica will hold a set of buffers not in sync with other replicas’ buffers. To enable non-broadcast buffers, set this option to False.

Parameters: broadcast_buffers (bool) –

clone()

Create an unfrozen deep copy of the current options.

Return type: poptorch.Options

connectionType(connection_type)

When to connect to the IPU (if at all).

Parameters

connection_type (poptorch.ConnectionType) –

Always: Attach to the IPU from the start (default).
OnDemand: Wait until the compilation is complete and the executable is ready to be run to attach to the IPU.
Never: Never try to attach to an IPU: this is useful for offline compilation, but trying to run an executable will raise an exception.

Return type

For example:

>>> opts = poptorch.Options()
>>> opts.connectionType(poptorch.ConnectionType.OnDemand)

defaultOutputMode()

Returns

True: outputMode() is currently set to
default.
False: outputMode() is not set to
default.

Return type

bool

deviceIterations(device_iterations)

Number of iterations the device should run over the data before returning to the user (default: 1).

This is equivalent to running the IPU in a loop over that the specified number of iterations, with a new batch of data each time. However, increasing deviceIterations is more efficient because the loop runs on the IPU directly.

Parameters: device_iterations (int) –
Return type: poptorch.Options

disableModuleNamescope()

Disable option adding name scope for each operator present in the module. This option is enabled by default. The operator name scope is be based on the names appearing in the named_modules function from torch.nn.Module.

For example:

>>> class Model(torch.nn.Module):
>>>     def __init__(self, num_groups, num_channels):
>>>         super().__init__()
>>>         self.gn = torch.nn.GroupNorm(num_groups, num_channels)
>>>     def forward(self, x):
>>>         return self.gn2(x)

With namescope enabled the name will be gn/GroupNormalization, with disabled it will be GroupNormalization.

Return type: poptorch.Options

enableExecutableCaching(path)

Load/save Poplar executables to the specified path, using it as a cache, to avoid recompiling identical graphs.

Parameters: path (str) – File path for Poplar executable cache store; setting path to None`` disables executable caching.
Return type: poptorch.Options

enableProfiling(profile_dir=None)

Enable profiling report generation.

To generate debug information associated with the profiling data, please specify autoReport.directory, and either autoReport.all or autoReport.outputDebugInfo in the POPLAR_ENGINE_OPTIONS environment variable. e.g.

POPLAR_ENGINE_OPTIONS={"autoReport.directory":"/profile/output",\
"autoReport.all":"true"}``

or:

POPLAR_ENGINE_OPTIONS={"autoReport.directory":"/profile/output",\
"autoReport.outputDebugInfo":"true"}``

Debug information and the rest of the profiling data will be stored in /profile/output directory. Values specified in the environment variable take precedence over profile_dir when both are given.

Parameters: profile_dir (str) – path to directory where report will be created. Defaults to current directory.
Return type: poptorch.Options

enableStableNorm(enabled)

Set whether a stable version of norm operators is used. This stable version is slower, but more accurate than its unstable counterpart.

Parameters

enabled (bool) –

True: Use stable norm calculation.
False: Do not use stable norm calculation.

Return type

enableSyntheticData(enabled)

Set whether host I/O is disabled and synthetic data is generated on the IPU instead. This can be used to benchmark models whilst simulating perfect I/O conditions.

Parameters

enabled (bool) –

True: Use data generated from a random normal distribution on the IPU. Host I/O is disabled.
False: Host I/O is enabled and real data is used.

Return type

from_json(string)

Sets values of the object from a JSON string.

The format of the JSON string is:

{“name.of.accessor”: value}

Examples

>>> Options().from_json(
...     '{"Precision.enableFloatingPointExceptions":true}'
... )
>>> Options().from_json('{"_Popart.set":["OptionName", 1]}')

Parameters: string (str) –

inputReplicaGrouping(input_group_size, input_group_type)

Allows the input batches to be split between groups of replicas, in a similar way to what replicaGrouping() does for weight tensors.

Parameters

input_group_size (int) – Number of replicas to place in each input replica group. Must be a factor of replication_factor. Defaults to 1, which will divide the input evenly among all replicas.
input_group_type (poptorch.CommGroupType) – Arrangement type to use when placing replicas into input replica groups. Cannot be poptorch.CommGroupType.All. Defaults to poptorch.CommGroupType.Consecutive. For an explanation of the arrangement types, see CommGroupType and Section 4.4.3, Grouping tensor weights across replicas.

Return type

loadFromFile(filepath)

Load options from a config file where each line in the file corresponds to a single option being set. To set an option, simply specify how you would set the option within a Python script, but omit the options. prefix.

For example, if you wanted to set options.deviceIterations(1), this would be set in the config file by adding a single line with contents deviceIterations(1).

This method can be called multiple times on the same Options object. The options will not be reset to their defaults in between.

For example, if c1.cfg contains the following:

deviceIterations(32)
replicationFactor(2)

and c2.cfg contains the following:

deviceIterations(4)

then calling:

options.loadFromFile('c1.cfg')
options.loadFromFile('c2.cfg')

is equivalent to calling:

options.deviceIterations(4)
options.replicationFactor(2)

Parameters: filepath (str) –
Return type: poptorch.Options

logCycleCount(log_cycle_count)

Log the number of IPU cycles used in executing the main graph.

The cycle count will be printed when this option is enabled by setting the environment variable POPTORCH_LOG_LEVEL=DEBUG. This option requires IPU hardware to run.

Note: This will have a small detrimental impact on performance.

Parameters

log_cycle_count (bool) –

True: Enable logging the IPU cycle count.
False: Do not enable IPU cycle count logging.

Return type

logDir(log_dir)

Set the log directory

Parameters: log_dir (str) – Directory where PopTorch saves log files (default: current directory)
Return type: poptorch.Options

maxRepeatLogs(max_lines)

For often-repeated log lines, set the maximum number of repeated: lines that will be logged.

Parameters: max_lines (Optional[int]) – If None, show all log messages. Otherwise suppress repeated messages after max_lines lines. The default is to suppress after 4 lines.
Return type: poptorch.Options

modelName(name)

Set the model name

Parameters: name (str) – Name of the model defaults to “inference” or “training” depending on the type of model created. Used when profiling to set the subdirectory of the report directory to output the profiling too.
Return type: poptorch.Options

outputMode(output_mode, output_return_period=None)

Specify which data to return from a model.

Parameters

output_mode (poptorch.OutputMode) –
- All: Return a result for each batch.
- Sum: Return the sum of all the batches.
- Final: Return the last batch.
- EveryN: Return every N batches: N is passed in as output_return_period.
- Default: All for inference, Final for training.
output_return_period (Optional[int]) –

Return type

For example:

>>> opts = poptorch.Options()
>>> opts.outputMode(poptorch.OutputMode.All)
... # or
>>> opts.outputMode(poptorch.OutputMode.EveryN, 10)

randomSeed(random_seed)

Set the seed for the random number generator on the IPU.

Parameters: random_seed (int) – Random seed integer.
Return type: poptorch.Options

relaxOptimizerAttributesChecks(relax=True)

Controls whether unexpected attributes in setOptimizer() lead to warnings or debug messages.

By default PopTorch will print warnings the first time it encounters unexpected attributes in setOptimizer().

Parameters

relax (bool) –

True: Redirect warnings to the debug channel.
False: Print warnings about unexpected attributes (default behaviour).

Return type

replicationFactor(replication_factor)

Number of times to replicate the model (default: 1).

Replicating the model increases the data throughput of the model as PopTorch uses more IPUs. This leads to the number of IPUs used being scaled by replication_factor, for example, if your model uses 1 IPU, a replication_factor of 2 will use 2 IPUs; if your model uses 4 IPUs, a replication factor of 4 will use 16 IPUs in total.

Parameters: replication_factor (int) – Number of replicas of the model to create.
Return type: poptorch.Options

setAvailableMemoryProportion(available_memory_proportion)

Sets the amount of temporary memory made available on a per-IPU basis.

Use this setting to control the amount of temporary memory available to operations such as:

convolution
matrix multiplication
embedding lookups
indexing operations

Parameter should be a dictionary of IPU IDs and float values between 0 and 1. (for example, {"IPU0": 0.5})

The floating point value has the same meaning and effect as documented in set_available_memory().

Parameters: available_memory_proportion (Dict[str, float]) –

setExecutionStrategy(strategy)

Set the execution strategy to use to partition the graph.

Parameters: strategy (Union[poptorch.ParallelPhasedExecution, poptorch.SerialPhasedExecution]) – Must be an instance of once of the execution strategy classes.
Return type: poptorch.Options

showCompilationProgressBar(show=True)

Show / hide a progress bar while the model is being compiled. (The progress bar is shown by default)

Parameters: show (bool) –
Return type: poptorch.Options

sourceLocationExcludes(excludes)

When printing the IR all the frames containing one of the excluded strings will be ignored.

This is helpful to get the IR to trace back to user code rather than some function inside a framework.

Parameters: excludes (List[str]) – Replace the current list of exclusions with this one.
Return type: poptorch.Options

syncPattern(sync_pattern)

Controls synchronisation in multi-IPU systems.

This option can be used to allow subsets of IPUs to overlap their work. For example, one set of IPUs could be communicating with the host while other IPUs are processing data.

This option is typically used together with replicated execution, in which case it takes effect on a per-replica basis. If replication is not used, it will apply to all IPUs.

Parameters

sync_pattern (poptorch.SyncPattern) –

Full: Require all IPUs to synchronise on every communication between IPUs or between IPUs and host. This is the default.
SinglePipeline: Allow IPUs to synchronise with the host independently, without having to synchronise with each other. This permits any one IPU to perform host IO while other IPUs are processing data.
ReplicaAndLadder: Allow an IPU group to communicate with the host without requiring synchronisation between groups. This permits multiple IPU groups to alternate between performing host IO and computation.

Return type

updatableNamedBuffers(buffers)

List of model named buffers that can be updated with call to buffersFromHost(). This allows to update just a subset of model weights instead of all or them as it happens with weightsFromHost() call.

Parameters: buffers (List[str]) –
Return type: poptorch.Options

useIpuId(ipu_id)

Use the IPU device specified by the ID (as provided by gc-info).

A device ID may refer to a single or to a group of IPUs (a multi-IPU device). The number of IPUs associated with the ID must be equal to the number of IPUs used by your annotated model multiplied by the replication factor.

For example if your model uses 1 IPU and the replication factor is 2 you will need to provide a device ID with 2 IPU; if your model is pipelined across 4 IPUs and the replication factor is 4, you will need to provide a device ID which represents a multi-IPU device of 16 IPUs.

You can use the the command-line tool gc-info: running gc-info -l, shows each device ID and a list of IPUs associated with the ID.

Parameters: ipu_id (int) – IPU device ID of a single-IPU or multi-IPU device
Return type: poptorch.Options

useIpuModel(use_model)

Whether to use the IPU Model or physical hardware (default)

The IPU model simulates the behaviour of IPU hardware but does not offer all the functionality of an IPU. Please see the Poplar and PopLibs User Guide for further information.

This setting takes precedence over the POPTORCH_IPU_MODEL environment variable.

Parameters

use_model (bool) –

True: Use the IPU Model.
False: Use IPU hardware.

Return type

poptorch.options._DistributedOptions

useOfflineIpuTarget(ipu_version=2)

Create an offline IPU target that can only be used for offline compilation.

Note

the offline IPU target cannot be used if the IPU model is enabled.

Parameters: ipu_version (int) – IPU version to target (1 for Mk1, 2 for Mk2, 21 for Mk2 with FP8 support). Default: 2.
Return type: poptorch.Options

class poptorch.options._DistributedOptions

Options related to distributed execution.

You should not use these when using PopRun/PopDist. Instead use popdist.poptorch.Options to set these values automatically.

Can be accessed via poptorch.Options.Distributed:

>>> opts = poptorch.Options()
>>> opts.Distributed.configureProcessId(0, 2)

Return type: None

configureProcessId(process_id, num_processes)

Manually set the current process ID and the total number of processes.

Parameters

process_id (int) – The ID of this process.
num_processes (int) – The total number of processes the execution is distributed over.

Return type

disable()

Ignore the current options / environment variables and disable distributed execution.

Return type: poptorch.options._DistributedOptions

property numProcesses: int: Total number of processes the execution is distributed over.

property processId: int: Id of the current process.

setEnvVarNames(var_num_processes, var_process_id)

Utility to read and set processId and numProcesses from environment variables.

Useful if you use a third party library to manage the processes used for the distributed execution such as mpirun.

For example: mpirun -np 4 myscript.py

By default the OpenMPI OMPI_COMM_WORLD_SIZE and OMPI_COMM_WORLD_RANK variables are used.

Parameters

var_num_processes (str) –
var_process_id (str) –

Return type

poptorch.options._DistributedOptions

class poptorch.options._PrecisionOptions(popart_options)

Options related to processing the PyTorch JIT graph prior to lowering to PopART

Can be accessed via poptorch.Options.Precision:

>>> opts = poptorch.Options()
>>> opts.Precision.enableFloatingPointExceptions(True)

Parameters: popart_options (poptorch.options._PopartOptions) –
Return type: None

enableFloatingPointExceptions(enabled)

Set whether floating point exceptions are enabled on the IPU.

When enabled, an exception will be generated when the IPU encounters any one of the following:

Operation resulting in subtraction of infinities

Divisions by zero or by infinity

Multiplications between zero and infinity

Real operations producing complex results

Comparison where any one operand is Not-a-Number

Parameters

enabled (bool) –

True: raise RuntimeError on floating point exception
False: do not raise RuntimeError (default)

Return type

poptorch.options._PrecisionOptions

enableStochasticRounding(enabled)

Set whether stochastic rounding is enabled on the IPU.

Stochastic rounding rounds up or down a values to half (float16) randomly such that that the expected (mean) result of rounded value is equal to the unrounded value. It can improve training performance by simulating higher precision behaviour and increasing the speed or likelihood of model convergence. However, the model is non-deterministic and represents a departure from (deterministic) standard IEEE FP16 behaviour.

In the general case, we recommend enabling stochastic rounding for training where convergence is desirable, but not for inference where non-determinism may be undesirable.

Parameters

enabled (bool) –

True: Enable stochastic rounding on the IPU.
False: Disable stochastic rounding.

Return type

poptorch.options._PrecisionOptions

halfFloatCasting(half_float_casting)

DO NOT USE: about to be removed.

Parameters: half_float_casting (poptorch.HalfFloatCastingBehavior) –
Return type: poptorch.options._PrecisionOptions

runningStatisticsAlwaysFloat(value)

DO NOT USE: about to be removed.

Parameters: value (bool) –
Return type: poptorch.options._PrecisionOptions

setPartialsType(dtype)

Set the data type of partial results for matrix multiplication and convolution operators.

The matrix multiplication and convolution operators store intermediate results known as partials as part of the calculation. You can use this option to change the data type of the partials. Using torch.half reduces on-chip memory use at the cost of precision.

Parameters

type (torch.dtype) – The type to store partials, which must be either torch.float or torch.half
dtype (dtype) –

Return type

poptorch.options._PrecisionOptions

class poptorch.options._JitOptions(**default_values)

Options related to PyTorch’s JIT compiler.

Can be accessed via poptorch.Options.Jit:

>>> opts = poptorch.Options()
>>> opts.Jit.traceModel(True)

traceModel(trace_model)

DO NOT USE: about to be removed.

Parameters: trace_model (bool) –
Return type: poptorch.options._JitOptions

class poptorch.options._TensorLocationOptions(**default_values)

Options controlling where to store tensors.

Can be accessed via poptorch.Options.TensorLocations:

>>> opts = poptorch.Options()
>>> opts.TensorLocations.setActivationLocation(
...     poptorch.TensorLocationSettings().useOnChipStorage(False))

numIOTiles(num_tiles)

Assigns the number of tiles on the IPU to be IO rather than compute.

Allocating IO (input/output) tiles reduces the number of IPU tiles available for computation but allows you to reduce the latency of copying tensors from host to the IPUs using the function set_overlap_for_input(), IPUs to host using the function set_overlap_for_output() or to use off-chip memory with reduced by setting the option useIOTilesToLoad(). As reducing the number of computation tiles may reduce performance, you should not use any IO tiles until you have successfully run your model and used profiling to identify “streamCopy” entries which take up a significant proportion of execution time.

Parameters: num_tiles (int) –
Return type: poptorch.TensorLocationSettings

setAccumulatorLocation(location)

Parameters: location (poptorch.TensorLocationSettings) – Update tensor location settings for accumulators.
Return type: poptorch.options._TensorLocationOptions

setActivationLocation(location)

Parameters: location (poptorch.TensorLocationSettings) – Update tensor location settings for activations.
Return type: poptorch.options._TensorLocationOptions

setOptimizerLocation(location)

Parameters: location (poptorch.TensorLocationSettings) – Update tensor location settings for optimiser states.
Return type: poptorch.options._TensorLocationOptions

setWeightLocation(location)

Parameters: location (poptorch.TensorLocationSettings) – Update tensor location settings for weights.
Return type: poptorch.options._TensorLocationOptions

class poptorch.TensorLocationSettings(**default_values)

Define where a tensor is stored

>>> opts = poptorch.Options()
>>> opts.TensorLocations.setActivationLocation(
...     poptorch.TensorLocationSettings().useOnChipStorage(False))

minElementsForOffChip(min_elements)

A minimum number of elements below which offloading won’t be considered.

Parameters: min_elements (int) –
Return type: poptorch.TensorLocationSettings

minElementsForReplicatedTensorSharding(min_elements)

Only enable replicated tensor sharding (RTS) for tensors with more than min_elements elements.

Parameters: min_elements (int) –
Return type: poptorch.TensorLocationSettings

useIOTilesToLoad(use=True)

Load tensor through IO tiles

Parameters: use (bool) – Use IO tiles if True, use Compute tiles if False.
Return type: poptorch.TensorLocationSettings

useIOTilesToStore(use=True)

Use IO tiles to store tensors.

(relevant for replicated tensor sharded tensors)

Parameters: use (bool) – Use IO tiles if True, use Compute tiles if False.
Return type: poptorch.TensorLocationSettings

useOnChipStorage(use=True)

Permanent tensor storage

Parameters: use (bool) – True: use on chip memory. False: use off chip memory. None: keep it undefined.
Return type: poptorch.TensorLocationSettings

useReplicatedTensorSharding(use=True)

Enable replicated tensor sharding

(relevant for weights and optimiser states)

Parameters: use (bool) –
Return type: poptorch.TensorLocationSettings

class poptorch.options._TrainingOptions(popart_options)

Options specific to model training.

Note

You must not set these options for inference models.

Can be accessed via poptorch.Options.Training:

>>> opts = poptorch.Options()
>>> opts.Training.gradientAccumulation(4)

Parameters: popart_options (poptorch.options._PopartOptions) –
Return type: None

accumulationAndReplicationReductionType(reduction_type)

Set the type of reduction applied to reductions in the graph.

When using, a value for greater than one for gradientAccumulation() or for replicationFactor(), PopTorch applies a reduction to the gradient outputs from each replica, and to the accumulated gradients. This reduction is independent of the model loss reduction (summing a mean-reduced loss and a sum-reduced loss in a PyTorch model is valid).

This setting governs both the accumulation of the loss gradients in replicated graphs and of all of the gradients when using gradient accumulation.

Parameters

reduction_type (poptorch.ReductionType) –

Mean (default): Reduce gradients by calculating the mean of them.
Sum: Reduce gradients by calculating the sum of them.

Return type

poptorch.options._TrainingOptions

gradientAccumulation(gradient_accumulation)

Number of micro-batches to accumulate for the gradient calculation.

Accumulate the gradient gradient_accumulation times before updating the model using the gradient. Other frameworks may refer to this setting as “pipeline depth”.

Accumulate the gradient gradient_accumulation times before updating the model using the gradient. Each micro-batch (a batch of size equal to the batch_size argument passed to DataLoader) corresponds to one gradient accumulation. Therefore gradient_accumulation scales the global batch size (number of samples between optimiser updates).

Note

Increasing gradient_accumulation does not alter the (micro-)batch size used for batch normalisation.

A large value for gradient_accumulation can improve training throughput by amortising optimiser update costs, most notably when using PipelinedExecution or when training is distributed over a number of replicas. However, the consequential increase in the number of samples between optimiser updates can have an adverse impact on training.

The reason why the efficiency gains are most notable when training with models with multiple IPUs which express pipelined model parallelism (via PipelinedExecution or by default and annotating the model BeginBlock or Block) is because the pipeline has “ramp up” and “ramp down” steps around each optimiser update. Increasing the gradient accumulation factor in this instance reduces the proportion of time spent in the “ramp up” and “ramp down” phases, increasing overall throughput.

When training involves multiple replicas, including the cases of sharded and phased execution, each optimiser step incurs a communication cost associated with the reduction of the gradients. By accumulating gradients, you can reduce the total number of updates required and thus reduce the total amount of communication.

Note

Increasing the global batch size can have adverse effects on the sample efficiency of training so it is recommended to use a low or unity gradient accumulation count initially, and then try increasing to achieve higher throughput. You may also need to scale other hyper-parameters such as the optimiser learning rate accordingly.

Parameters: gradient_accumulation (int) –
Return type: poptorch.options._TrainingOptions

setAutomaticLossScaling(enabled)

Set whether automatic loss scaling is enabled on the IPU.

When using float16/half values for activations, gradients, and weights, the loss value needs to be scaled by a constant factor to avoid underflow/overflow. This adjustment is known as loss scaling. This setting automatically sets a global loss scaling factor during training.

Note: Automatic loss scaling is a preview feature. It is well tested and enabled in some of our example applications, but may not behave as expected in all models. Recommendation: if your model with automatic loss scaling enabled does not converge or triggers a compilation error, then you will need to set the loss scale manually.

Parameters

enabled (bool) –

True: Enable automatic loss scaling on the IPU.
False: Disable automatic loss scaling.

Return type

poptorch.options._TrainingOptions

setConvolutionDithering(enabled)

Enable convolution dithering.

If true, then convolutions with different parameters will be laid out from different tiles in an effort to improve tile balance in models.

Use MultiConv to apply this option to specific set of convolutions.

Parameters: enabled (bool) – Enables or disables convolution dithering for all convolutions.
Return type: poptorch.options._TrainingOptions

setMeanAccumulationAndReplicationReductionStrategy(mean_reduction_strategy)

Specify when to divide by a mean reduction factor when accumulationAndReplicationReductionType is set to ReductionType.Mean.

The default reduction strategy depends on the optimizer used. The default strategy is Running when the accum_type of the optimizer is set to half-precision (float16) format. Otherwise the Post strategy is used as this strategy is typically more performant but the Post strategy is less numerically robust.

Parameters

mean_reduction_strategy (poptorch.MeanReductionStrategy) –

Running: Keeps the reduction buffer as the current mean. This is preferred for numerical stability as the buffer value is never larger than the magnitude of the largest micro batch gradient.
Post: Divides by the accumulationFactor and replicatedGraphCount after all of the gradients have been reduced. In some cases this can be faster then using Running, however is prone to overflow.
PostAndLoss (deprecated): Divides by the replicatedGraphCount before the backwards pass, performs the gradient reduction across micro batches, and then divides by the accumulationFactor. This is to support legacy behaviour and is deprecated.

Return type

poptorch.options._TrainingOptions

11.2. Helpers

poptorch.ipuHardwareIsAvailable(num_ipus=1)

Indicates whether any IPU hardware with num_ipus is present in the system.

Note: This function doesn’t check if the IPU is free or already being used.

Parameters: num_ipus (int) – The number of IPUs required.
Returns: True if physical IPUs are available, False otherwise.
Return type: bool

poptorch.ipuHardwareVersion()

Indicates what IPU hardware version is available in the system.

Raise an exception if no hardware is available.

Returns: The IPU hardware version or -1 if unknown.
Return type: int

poptorch.setLogLevel(level)

Changes the volume of messages printed in the console (stdout)

Parameters

level (Union[str, int]) –

TRACE: Print all messages.
DEBUG: Print debug messages and above.
INFO: Print info messages and above.
WARN: Print warnings and errors.
ERR: Print errors only.
OFF: Print nothing.

class poptorch.profiling.Channel(name)

Profiling channel.

Note

If the libpvti profiling library is not available at runtime this class becomes a no-op.

Example:

>>> channel = poptorch.profiling.Channel("MyApp")
>>> with channel.tracepoint("TimeThis"):
...     functionToTime()
>>> channel.instrument(myobj, "methodName", "otherMethod")

instrument(obj, *methods)

Instrument the methods of an object.

Parameters

obj – Object to instrument
methods – One or more methods to wrap in profiling trace points.

tracepoint(name)

Create a context tracepoint

>>> with channel.tracepoint("DoingSomething"):
...     expensiveCall()

Parameters: name – Name associated to this tracepoint.

11.3. PopTorch ops

poptorch.ctc_beam_search_decoder(probs, lengths, blank=0, beam_width=100, top_paths=1)

Add a connectionist temporal classification (CTC) beam search decoder: to the model.

Calculates the most likely top paths and their probabilities given the input logarithmic probabilities and the data lengths.

Parameters

probs (Tensor) – Logarithmic probabilities tensor with the shape of [input_length, batch_size, num_classes].
lengths (Tensor) – Tensor representing lengths of the inputs of shape [batch_size].
blank (int) – Integer identifier of the blank class (default: 0).
beam_width (int) – Number of beams used during decoding (default: 100).
top_paths (int) – Number of most likely paths to return (default: 1).

Returns

Three tensors representing paths’ probabilities - of shape [batch_size, top_paths], paths’ lengths - of shape [batch_size, top_paths] and the decoded paths - of shape [batch_size, top_paths, input_length].

Return type

poptorch.ipu_print_tensor(tensor, title='', print_gradient=True, summarise_threshold=1000, edge_items=3, max_line_width=80, digits=4, float_format='auto', separator=', ', open_bracket='(', close_bracket=')')

Adds an op to print the contents of the IPU tensor.

When this is executed the tensor will be copied back to host and printed.

When this operation is called in the backward pass it will print the gradient of the tensor.

The operation is an identity operation and will return the exact same tensor. The returned tensor must be used in place of the original tensor in the rest of the program, to make sure that the print operation isn’t optimised away.

For example, if the original code looks like this:

def forward(self, c, d, b)
  a = c + d
  return a + b

If the result of ipu_print_tensor() is not used, the function will be optimised out by the graph optimiser and the tensor will not be printed.

So if you want to print the value of a, you should do:

def forward(self, c, d, b)
  a = c + d
  x = poptorch.ipu_print_tensor(a)
  return x + b

Optionally, you can add a second string argument to be used as a title, as shown in the following example. The value of a will be printed after the title “summation”. The value of the gradient of a will be printed after the title “summation_gradient” if the operation is called in the backward pass.

def forward(self, c, d, b)
    a = c + d
    x = poptorch.ipu_print_tensor(a, "summation"))
    return x + b

Warning

To prevent the print operation being optimised out by the graph optimiser, you must use the output of the print.

Parameters

tensor (Tensor) – The tensor to print.
title (str) – An optional title to print before the tensor value. Defaults to “”.
print_gradient (bool) – Whether to print the gradient tensor associated with this tensor. Defaults to True.
summarise_threshold (int) – If the number of elements of the tensor exceeds this threshold the output will be summarised. Only the edge elements will be displayed with an ellipsis indicating skipped elements. A value of 0 will disable summarisation. Defaults to 1000.
edge_items (int) – Number of edge elements to include at the beginning and end when summarisation is enabled. Defaults to 3.
max_line_width (int) – Lines longer than this limit will be split across multiple lines. A value of 0 will disable line splitting. Defaults to 75.
digits (int) – Number of digits to display. For integers this limit can be exceeded if any number is large enough. For floating points this does not include the exponent. The number of digits is used in conjunction analysis of the tensor to determine the width of each element to align all elements when printed. A value of 0 disables this analysis and each elements will be printed in an unaligned format. Defaults to 4.
float_format (str) – Determines the floating point format to use. Automatic mode determines the appropriate format based on the data. Defaults to “auto”. One of: - “auto”: Automatically determine the format through analysis. - “fixed”: Use fixed point e.g. -100.00. - “scientific”: Use scientific notation e.g. -1.123e+10. - “none”: Do not display all elements with the same format
separator (str) – Character used to delineate values. Defaults to ” “.
open_bracket (str) – Character used to open a tensor. Defaults to “[“.
close_bracket (str) – Character used to close a tensor. Defaults to “]”.

Returns

The input tensor unchanged.

Return type

poptorch.for_loop(count, body, inputs)

An on-device for loop. This loop will execute on device for count number of iterations.

The body should be a Python function containing the PyTorch code you wish to execute in a loop. It should take as input the same number of tensors as it outputs. Each iteration will have the previous output passed in as input.

Parameters

count (int) – Number of iterations of the loop.
body (Callable[[List[Tensor]], List[Tensor]]) – The function to be executed.
inputs (List[Tensor]) – The initial inputs to the function.

Return type

poptorch.recomputationCheckpoint(*tensors)

Operation for checkpointing values in a computational pipeline stage.

When recomputation is enabled, these values will not be recomputed and they will be stored in memory between forward and backwards passes instead.

Parameters: tensors (List[Tensor]) – One or more tensors which should be check-pointed.
Returns: Tensors (same number and shape as the input tensors).
Return type: List[Tensor]

poptorch.identity_loss(x, reduction)

Marks a tensor as being part of the loss calculation and, as such, will back-propagate through it in the PopTorch autograd.

This function should be called on the (final) loss of a model so that it is used as the start of backpropagation. This is equivalent to calling x.backward() on a tensor x when running on the CPU.

This function is necessary to combine multiple losses into a custom loss. It ensures that the tensor is part of the loss calculation and, as such, should be part of the backpropagation in PopTorch autograd.

Multiple calls to identity_loss can be made inside the same model provided they are all dependant: all marked losses must be traceable into a single final tensor itself marked by a call to identity_loss otherwise an error is raised.

Parameters

x (Tensor) – The calculated loss.
reduction (str) –
Reduce the loss output as per PyTorch loss semantics. Supported values are:
- "sum": Sum the losses.
- "mean": Take the mean of the losses.
- "none": Don’t reduce the losses.

Returns

The loss tensor with the specified reduction applied.

Return type

class poptorch.MultiConv

Combines all convolution layers evaluated inside this scope into a single multi-convolution.

Multi-convolutions allow for a set of data-independent convolutions to be executed in parallel. Executing convolutions in parallel can lead to an increase in the data throughput.

For example:

>>> with poptorch.MultiConv():
...     y = self.convA(x)
...     v = self.convB(u)

Combines the two data-independent convolutions into a single multi-convolution.

Refer to the PopLibs documentation for further information on multi-convolutions.

availableMemoryProportions(value)

The available memory proportion per convolution, each [0, 1).

For more information, please refer to the technical note on optimising temporary memory usage.

Parameters: value (Union[float, List[float]]) – Can be a float value in which case the same value is used for all of the convolutions. Otherwise, can be a tuple or list containing as many float values as the number of convolutions.
Returns: self, to support method chaining.
Return type: poptorch.MultiConv

cycleBackOff(value)

Cycle back off proportion.

Parameters: value (float) – Number between 0 and 1.
Returns: self, to support method chaining.
Return type: poptorch.MultiConv

enableConvDithering(value)

Enable per-convolution dithering.

Parameters: value (Union[bool, List[bool]]) – Can be a bool value in which case the same value is used for all of the convolutions. Otherwise, can be a tuple or list containing as many bool values as the number of convolutions.
Returns: self, to support method chaining.
Return type: poptorch.MultiConv

partialsTypes(value)

The partials type used for each convolution.

Parameters: value (Union[dtype, List[dtype]]) – Can be a single instance of torch.dtype in which case the same value is used for all of the convolutions. Otherwise, can be a tuple or list containing as many torch.dtype values as the number of convolutions.
Returns: self, to support method chaining.
Return type: poptorch.MultiConv

perConvReservedTiles(value)

Tiles to reserve for each convolution.

Parameters: value (int) – Number of tiles.
Returns: self, to support method chaining.
Return type: poptorch.MultiConv

planType(value)

Select the multi-convolution execution strategy.

Parameters: value (poptorch.MultiConvPlanType) – An instance of MultiConvPlanType.
Returns: self, to support method chaining.
Return type: poptorch.MultiConv

class poptorch.CPU(layer_to_call, ID)

Allow the execution of a CPU op in the middle of an inference IPU graph.

Important

CPU ops are only supported in inference graphs.

Example:

>>> class Model(torch.nn.Module):
>>>     def __init__(self):
>>>         super().__init__()
>>>         self.cpu = poptorch.CPU(self.myCpuOp, "MyCPUOp")
>>>
>>>     def myCpuOp(self, x):
>>>         return x * 2.0
>>>
>>>     def forward(self, x):
>>>         # The arguments passed to "cpu" are forwarded to "myCpuOp"
>>>         out = self.cpu(x)
>>>         out = self.cpu(out)
>>>         out = self.cpu(out)
>>>         return out

Parameters

layer_to_call (Callable) –
ID (str) –

__init__(layer_to_call, ID)

Execute a given function on the CPU.

Param

layer_to_call Python function to execute on the CPU. The arguments passed when the CPU wrapper is called will be forwarded to layer_to_call.

Param

ID Name of the CPU op.

Parameters

layer_to_call (Callable) –
ID (str) –

execute(): Implementation detail.

registerPersistentData(): Implementation detail.

class poptorch.NameScope(name)

Create a name scope for a code block. All operators originating from this block will have their names prefixed by the given string.

>>> with poptorch.NameScope("CustomString"):
...     y = self.bmm(a, b)
...     z = torch.relu(y)

Parameters: name (str) –

class poptorch.MultiConvPlanType(value)

Selects the execution strategy for a poptorch.MultiConv

Parallel: Execute multiple convolutions in parallel (Default).
Serial: Execute each convolution independently. This is equivalent to using the independent convolution API.

class poptorch.custom_op(inputs, name, domain, domain_version, example_outputs, attributes=None)

Applies a custom operation, implemented within PopART, to the inputs.

Parameters

inputs (tuple) – A tuple of input tensors, for example, (x, y).
name (str) – Unique name of the PopART custom op.
domain (str) – Domain for the op.
domain_version (int) – Version of the domain to use.
example_outputs (iterable) – A tuple of tensors with the same type and shape as the outputs. The value does not matter as all values will be set to zero for tracing purposes.
attributes (dict) – A dictionary of attributes for the custom op. All attribute keys must be strings. All attribute values must be floats, ints, strings, or a list/tuple containing only floats, only ints or only strings (not a mix of types within the list).

Returns

The outputs of the forward op of the custom op.

Return type

poptorch.nop(tensor)

A no-operation: it is functionally the same as an identity but is never eliminated by PopART patterns or inlining, so it is useful for debugging.

Parameters: tensor (Tensor) – The tensor to pass to the no-op.
Returns: The same tensor which was input.
Return type: Tensor

poptorch.dynamic_slice(tensor, dim, start, size, step)

Torch native dynamic slices can’t be properly intercepted by backends, so this op is provided to enable dynamic slicing in poptorch applications.

Parameters

tensor (Tensor) – The tensor to slice.
dim (int) – The dimension to slice along.
start (Tensor) – The start index.
size (int) – The slice size. Must be a constant int.
step (int) – The slice step. Must be a constant int.

Returns

The sliced tensor.

Return type

poptorch.dynamic_update(input, src, dim, start, size)

Torch native dynamic slices can’t be properly intercepted by backends, so this op is provided to enable dynamic update slice in poptorch applications.

Parameters

input (Tensor) – The tensor to update.
src (Tensor) – The tensor to embed into input
dim (int) – The dimension to slice along.
start (Tensor) – The start index.
size (int) – The slice size. Must be a constant int.

Returns

The sliced tensor.

Return type

poptorch.serializedMatMul(lhs, rhs, mode, factor=0, keep_precision=False)

Calculates a matrix product using a serialized matrix multiplication.

The matrix multiplication, lhs*rhs, is split into separate smaller multiplications, calculated one after the other, to reduce the memory requirements of the multiplication and its gradient calculation.

Parameters

lhs (torch.Tensor) – Left-hand side input matrix.
rhs (torch.Tensor) – Right-hand side input matrix.
mode (poptorch.MatMulSerializationMode) –
Which dimension of the matmul to serialize on: for matrix A (m by n) multiplied by matrix B (n by p).
- InputChannels: Split across the input channels (dimension m).
- ReducingDim: Split across the reducing dimension (n).
- OutputChannels: Split across the output channels (dimension p).
- Disabled: Same as an ordinary matrix multiplication.
factor (int) – Number of serialized multiplications. Must be a factor of the dimension to serialize on.
keep_precision (bool) – (Half/float16 inputs only) The forward op when serializing over ReducingDim and the backwards ops when serializing over InputChannels involve an addition step. If keep_precision is True, these additions will occur using float32 rather than half precision partials, matching those used for the individual matrix multiplications.

Return type

torch.Tensor

poptorch.set_available_memory(tensor, available_memory_proportion)

Sets the amount of temporary memory made available to an operation.

The operators that can be tuned with this setting include:

convolution
matrix multiplication
embedding lookups
indexing operations

When applied to the output of a supported operation, it controls the trade-off between execution cycles and the temporary memory used during the execution of the operation.

The value should be between 0 and 1 (inclusive) and represents a proportion of available memory on the IPU. The default value is 0.6 (therefore, by default, PopTorch will not use more than 60% of IPU memory for temporary data).

PopTorch passes this setting to the PopLibs operator planner, which will try to constrain the use of temporary memory to below this value. Generally, an operation that has more temporary memory available will run in fewer cycles.

For a specific operation, the necessary amount of temporary memory may be more than amount specified by this option. In this case, a warning message will be generated.

For more information, please refer to the technical note on optimising temporary memory usage.

>>> class BasicNetwork(nn.Module):
...     def __init__(self):
...         super().__init__()
...         self.conv = nn.Conv2d(4, 4, 3, stride=2)
...
...     def forward(self, x):
...         out = self.conv(x)
...         out = poptorch.set_available_memory(out, 0.2)
...         return out

Parameters

tensor (Tensor) – Output tensor from a supported operation (otherwise the statement will be an identity).
available_memory_proportion (float) – Proportion between 0.0 and 1.0 of tile memory to be made available for temporary memory (default 0.6).

Returns

The input tensor, as if calling an identity function.

Return type

poptorch.set_overlap_for_input(input_tensors, mode)

Sets host overlap setting for input_tensors.

You can increase performance in some cases by overlapping the copying from the host to IPUs with computation. However, this requires a number of IPU tiles to be set aside as IO tiles using numIOTiles() which may affect computation performance.

You should use this function at the start of your model’s forward method for each applicable input and use the returned tensors in future ops.

Parameters

input_tensors – The input tensors for which enable overlapping host IO. This can be either a single tensor, or any combination of tuple, list, or dict of tensors.
mode (poptorch.OverlapMode) – Control to what extent the host IO overlaps computation.

Returns

the input tensors, specified for overlap.

See also

OverlapMode.

poptorch.set_overlap_for_output(output_tensors, mode)

Sets host overlap setting for output_tensors.

You can increase performance in some cases by overlapping the copying from the IPUs to host with computation. However, this requires a number of IPU tiles to be set aside as IO tiles using numIOTiles() which may affect computation performance.

You should use this function at the end of your model’s forward method, for each applicable output, just before returning the tensors.

Parameters

output_tensors – The output tensors to enable overlapping host IO for. This can be either a single tensor, or any combination of tuple, list, or dict of tensors.
mode (poptorch.OverlapMode) – Control to what extent the host IO overlaps computation.

Returns

the output tensors, specified for overlap.

See also

OverlapMode.

poptorch.nearest(x, y, batch_x=None, batch_y=None)

PopTorch implementation of the torch_cluster nearest operator.

This op clusters points in x together which are nearest to a given query point in y.

Parameters

x (Tensor) – Node feature matrix.
y (Tensor) – Node feature matrix.
batch_x (Optional[Union[List[int], Tensor]]) – Batch vector, which assigns each node to a specific sample. batch_x needs to be sorted.
batch_y (Optional[Union[List[int], Tensor]]) – Batch vector, which assigns each node to a specific sample. batch_y needs to be sorted.

poptorch.fps(src, ptr, ratio=0.5, random_start=False)

PopTorch implementation of the torch_cluster fps operator.

This op is a sampling algorithm from the “PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space” paper, and iteratively samples the most distant point with regard to the rest points.

Parameters

src (Tensor) – Point feature matrix.
ptr (List[int]) – Pointer vector which defines ranges of nodes assigned to a specific sample.
ratio (float) – The sampling ratio.
random_start (bool) – If set to False, use the first node in src as the starting node.

Returns

A tensor of src point indexes.

Return type

poptorch.cond(condition, then_body, then_inps, else_body, else_inps)

An on-device if/else operation. This creates two branches of instructions executed conditionally on the device. Only for inference.

The then_body and else_body should be Python functions containing the PyTorch code you wish to execute conditionally on the device. The condition is passed in the form of a boolean Tensor and the branch to be executed is decided in runtime directly on the device. There are a few conditions on the branch functions:

then_body and else_body can accept an arbitrary number of inputs (including zero).
Tensors defined in the cond caller (the outer graph) can be used inside then_body and else_body implicitly just as if they were passed through the inputs list.
then_body and else_body have to return the same number of corresponding outputs. This is because the result of the cond op is assigned to a common list of tensors.
all the tensors utilized by then_body and else_body are passed in by copy, so updating any of the tensors inside then_body and else_body does not affect the original tensors. To update a tensor passed in, its new value has to be returned from the body and assigned to the original tensor (please note that the number of outputs from then_body and else_body has to match).

Parameters

condition (Tensor) – The condition controlling the execution of then_body and else_body.
then_body (Callable[[List[Tensor]], List[Tensor]]) – The function to be executed if condition is True.
then_inps (List[Tensor]) – then_body input tensors.
else_body (Callable[[List[Tensor]], List[Tensor]]) – The function to be executed if condition is False.
else_inps (List[Tensor]) – else_body input tensors.

Return type