13. PopART Python API

This chapter describes the PopART Python API. Many classes are wrappers around the equivalent C++ class, for example popart.builder.Builder wraps the C++ Builder class (renamed BuilderCore in Python). There are more detailed descriptions of some functions in Section 14, PopART C++ API.

13.1. Sessions

13.1.1. Training session

class popart.TrainingSession(fnModel, dataFlow, loss, optimizer, deviceInfo, inputShapeInfo=<popart_core.InputShapeInfo object>, patterns=None, userOptions=<popart_core.SessionOptions object>, name='training')

Session for training.

TrainingSession is a runtime instance that provides an interface for executing ONNX graphs on IPU hardware with training provided by optimizing a loss tensor using an optimizer and automatic differentiation (backpropagation).

Parameters
Return type

None

property accumulationFactor

Get the gradient accumulation factor.

compileAndExport(filename)

Compile the graph and export it to a file.

This method will first create poplar::Graph and compile poplar::Executable. Next, it will export the executable and metadata to the file. The exported file will be in the PopEF format. This means that the file can be used to run inference using the Triton Inference Server with the Graphcore Triton backend. See the Poplar Triton Backend User Guide for more information.

This method raises an popart.OutOfMemoryException error if an out of memory event occurs. In addition, it raises an OSError if there are any file system related errors.

Parameters

filename (str) – The name of the file where the compiled executable and metadata will be saved. If it does not exist, the file will be created.

Raises
Return type

None

property dataFlow

Get the configuration for the data feeds and fetches.

initAnchorArrays()

Create the anchor arrays to feed data back into Python.

Returns

Dictionary of anchor tensor names and their

relevant NumPy arrays.

Return type

Dict[str, np.array]

prepareDevice(loadEngine=True)

Prepare the network for execution.

This will create poplar::Graph and poplar::Engine, and set up poplar::Streams.

Parameters

loadEngine (bool) – If true, load the engine and connect the streams once the device is ready.

Raises

popart.OutOfMemoryException – If an out of memory event occurs.

Return type

None

property replicationFactor

Get the replication factor.

class popart_core._TrainingSessionCore
broadcastWeights(self: popart_core._TrainingSessionCore, rootRank: int = 0) None

Broadcasts the weight from the PopRun instance with index rootRank to all other instances.

Parameters

rootRank – The index of the PopRun instance from which the weights should be broadcasted.

compileAndExport(self: popart_core._TrainingSessionCore, filename: str, err: popart_core.OutOfMemoryError) None
connectStreamToCallback(self: popart_core._TrainingSessionCore, arg0: str, arg1: Callable[[capsule], None], arg2: int) None

Connect a Poplar stream with a callback.

This method will be called whenever the stream will be read or was written to by the device. The memory location will only be valid for reading or writing for the duration of the callback.

Parameters
  • streamHandle – The name of the stream to connect to.

  • callback – The callback to be called whenever the stream is to be read or was written to by the device.

  • index – The replica index to connect to, when using replicated graphs. Default=0.

getCycleCount(self: popart_core._TrainingSessionCore, id: str = '') int

Copy the cycle count tensor from the device to the host.

Parameters

id – The identifier of the cycle count tensor.

getInfo(self: popart_core._TrainingSessionCore, arg0: str) popart_internal_ir.TensorInfo

Get the tensor information for a tensor.

Parameters

TensorId – The identifier of the tensor to get the tensor information for.

Returns

The tensor information for the tensor.

getIr(self: popart_core._TrainingSessionCore) popart_internal_ir.Ir
getRNGState(self: popart_core._TrainingSessionCore) List[int]
getRandomSeed(self: popart_core._TrainingSessionCore) int

Get the value of the random number generator seed.

Calling setRandomSeed() with this value (at a later stage) reinstates the random state logic that seeds random operations.

Returns

The value used to seed current random operations.

getReport(self: popart_core._TrainingSessionCore) pva::Report

Retrieve the graph report from the poplar::Engine.

The options which were passed to the Session constructor will influence the information in the report.

This method may only be called after prepareDevice() has been called.

Returns

The PopVision Analysis report object.

getSerializedGraph(self: popart_core._TrainingSessionCore) bytes

Retrieve the serialized graph from the poplar::Engine.

A JSON format report is produced.

This method may only be called after prepareDevice() has been called.

Returns

A string containing the serialized graph.

getSummaryReport(self: popart_core._TrainingSessionCore, resetProfile: bool = True) str

Retrieve the summary from from the ``poplar::Engine.`:code:`

The options which were passed to the Session constructor will influence the information in the report.

This method may only be called after prepareDevice() has been called.

Parameters

resetProfile – If true:code:, resets the execution profile. Default = true.

Returns

A string containing the report.

getTensorIds(self: popart_core._TrainingSessionCore) Set[str]

Returns the ids of all tensors in the model.

pre prepareDevice() must have been called.

loadEngineAndConnectStreams(self: popart_core._TrainingSessionCore) None

Load the engine on the device and connect the streams.

This will set up the poplar::Streams.

Note: This call is optional. The engine will implicitly be loaded on the device when required.

loadExecutable(self: popart_core._TrainingSessionCore, filename: str) None

Load the compiled executable and metadata from a file.

The file must have been created with compileAndExport(const std::string).

Parameters

filename – The name of the file to load the executable and metadata from.

modelToHost(self: popart_core._TrainingSessionCore, arg0: str) None

Write the current model to an ONNX file.

Parameters

fn – The path to file. The path can be absolute or relative. If you plan to run your program in multiple processes simultaneously, you should avoid possible race conditions by writing to different files, for example by using temporary files.

prepareDevice(self: popart_core._TrainingSessionCore, loadEngine: bool = True, err: popart_core.OutOfMemoryError) None

Prepare the network for execution.

This will create the poplar::Graph and ``poplar::Engine.`:code:`

Parameters

loadEngine – If true, load the engine and connect the streams once the device is ready.

readWeights(self: popart_core._TrainingSessionCore, arg0: popart_core.IWeightsIO) None

Read the weights from the host stream memory and write to the host.

This method may only be called after weightsToHost() has been called.

Parameters

weightsIo – The weight data that is read from the host stream memory is written to the addresses in p weightsIo.out.

resetHostWeights(self: popart_core._TrainingSessionCore, modelProtoOrFilename: str, ignoreWeightsInModelWithoutCorrespondingHostWeight: bool = False) None

Reset weights with weights in an ONNX model.

Note that the only differences between the ONNX model and the current model must be the weights. No other differences are allowed.

This method only updates the weights on the host. weightsFromHost() must be called after this method to update the weights on the device.

Parameters
  • model – An ONNX model protobuf, or the name of a file containing an ONNX model protobuf.

  • ignoreWeightsInModelWithoutCorrespondingHostWeight – If true, do not throw an error if there are initializers in the ONNX model without corresponding initializer tensor(s) in the session’s IR.

run(*args, **kwargs)

Overloaded function.

  1. run(self: popart_core._TrainingSessionCore, stepio: popart_core.IStepIO, debugName: str = ‘’) -> None

Run one step.

Read input data from address in p stepIO.in.

Write the output data to addresses in p stepIO.out.

Parameters
  • stepIO – The input and output data.

  • debugName – A debug string to identify this run in logs.

  1. run(self: popart_core._TrainingSessionCore, programHandle: str, stepio: popart_core.IStepIO, debugName: str = ‘’) -> None

Run one step.

Read input data from address in p stepIO.in.

Write the output data to addresses in p stepIO.out.

Parameters
  • stepIO – The input and output data.

  • debugName – A debug string to identify this run in logs.

saveExecutable(self: popart_core._TrainingSessionCore, path: str, savePopartMetadata: bool = True, saveVariables: bool = True) None

Save a compiled graph with additional data to a file.

PopART is able to save its state after the model compilation is complete, so that it can be restored at a later time. To make this possible, it is necessary to save such elements as:

  • a serialised Poplar executable,

  • its associated metadata,

  • tensor data blobs if model parameters have not been frozen (refer to the SessionOptions::constantWeights for more information),

  • a PopART-specific opaque blob to store information only relevant to PopART. This is needed to restore PopART state.

The file will be in the PopEF format. This means that the file can be used to restore the state of the PopART program without recompiling the graph, or run inference using the Triton Inference Server with the Graphcore Triton backend. See the Poplar Triton Backend User Guide for more information. If you want to analyze file structure saved by the function please refer to the PopEF dump tool.

pre prepareDevice() must have been called.

Parameters
  • path – The name of the file or directory where the compiled executable, metadata and variables will be saved. If you specified a path to the directory, the function will write the data to the file: “<path>/executable.popef”. If the file exists, the function will overwrite the old data with the new ones.

  • savePopartMetadata – If you do not need the option to restore the PopART state later, you can set the flag to false to reduce disk space taken up by the file.

  • saveVariables – If you don’t need to save variables (tensors) state, you can set the flag to false if you want to save them later or in a different location. The function will save data consistent with the variables contained within the model.

saveVariables(self: popart_core._TrainingSessionCore, path: str) None

Save all variables to a file.

The function will save data consistent with the variables contained within the model.

The file will be in the PopEF format. If you want to analyze tensors saved by the function please refer to the PopEF dump tool.

pre prepareDevice() must have been called.

Parameters

path – The name of the file or directory where the compiled variables will be saved. If you specified a path to the directory, the function will write the data to the file: “<path>/variables.popef”. If the file exists, the function will overwrite the old data with the new ones.

setRNGState(self: popart_core._TrainingSessionCore, rngValue: List[int]) None
setRandomSeed(self: popart_core._TrainingSessionCore, seedValue: int) None

Set the value of the random number generator seed.

This method explicitly seeds all random operations. Additionally, this method derives a new state for the random number generator (RNG) from the seed and sets it on the device. This RNG state is used to resolve stochastic rounding. Note that to deterministically store and restore the combined random state for a session, do the following:

C++: ``` // Store random state (session s0). auto seed = s0.getRandomSeed(); auto rngState = s0.getRNGState();

// Restore random state (session s1). s1.setRandomSeed(seed); // <– affects RNG state, order important s1.setRNGState(rngState); ```

Python: ``` # Store random state (session s0). seed = s0.getRandomSeed() rngState = s0.getRNGState()

# Restore random state (session s1). s1.setRandomSeed(seed) // <– affects RNG state, order important s1.setRNGState(rngState) ```

Parameters

seedValue – The value of the seed.

updateEngineCache(self: popart_core._TrainingSessionCore) None

Update cacheEntries from engine cache directory and update ir::hashMatched_ with the updated cacheEntries

updateExternallySavedTensorLocations(self: popart_core._TrainingSessionCore, arg0: str, arg1: str) None

Update the tensor locations of tensors in the session’s ONNX model.

A new file will be created at this point, and written to when the ONNX model is saved with a subsequent call to modelToHost().

Parameters
  • fromLocation – All externally saved tensors with location p fromLocation will have their location updated to p toLocation.

  • toLocation – The updated tensor locations. This must not already exist.

updateOptimizerFromHost(self: popart_core._TrainingSessionCore, arg0: popart_core.Optimizer) None

Update the optimizer from the host.

This method updates the optimizer and the associated hyperparameters but not the optimizer state tensors.

NOTE: The optimizer parameter has to be compatible with the optimizer passed to the TrainingSession constructor. For example, you cannot call this function with an SDG1 optimizer if you created the session with an SDG0 optimizer. This is because it is not possible to change the IR after a session has been constructed.

Parameters

optimizer – A pointer to a popart::Optimizer.

weightsFromHost(self: popart_core._TrainingSessionCore) None

Copy weights from the host to the device.

weightsToHost(self: popart_core._TrainingSessionCore) None

Copy the weights from the device to the host steam memory.

writeWeights(self: popart_core._TrainingSessionCore, arg0: popart_core.IWeightsIO) None

Write the weights from the host to the IR tensor memory.

This method may only be called after weightsFromHost() has been called.

Parameters

weightsIo – The weight data is written to the addresses in p weightsIo.out.

13.1.2. Inference session

class popart.InferenceSession(fnModel, dataFlow, deviceInfo, inputShapeInfo=<popart_core.InputShapeInfo object>, patterns=None, userOptions=<popart_core.SessionOptions object>, name='inference')

Session for running inference.

InferenceSession is a runtime instance that provides an interface for executing ONNX graphs on IPU hardware, without any automatic differentiation (backpropagation).

Parameters
Return type

None

property accumulationFactor

Get the gradient accumulation factor.

compileAndExport(filename)

Compile the graph and export it to a file.

This method will first create poplar::Graph and compile poplar::Executable. Next, it will export the executable and metadata to the file. The exported file will be in the PopEF format. This means that the file can be used to run inference using the Triton Inference Server with the Graphcore Triton backend. See the Poplar Triton Backend User Guide for more information.

This method raises an:py:class:popart.OutOfMemoryException error if an out of memory event occurs. In addition, it raises an OSError if there are any file system related errors.

Parameters

filename (str) – The name of the file where the compiled executable and metadata will be saved. If it does not exist, the file will be created.

Raises
Return type

None

property dataFlow

Get the configuration for the data feeds and fetches.

classmethod fromIr(ir, deviceInfo, name='fromIr')

Create a session for inference from an IR.

Parameters
  • ir (Ir) – The IR to create the session from.

  • deviceInfo (DeviceInfo) – DeviceInfo object specifying the device type (IPU, IPUModel or CPU) and number of each type.

  • name (str) – The name of this inference session. Default: “fromIr”.

Returns

An inference session.

Return type

InferenceSession

initAnchorArrays()

Create the anchor arrays to feed data back into Python.

Returns

Dictionary of anchor tensor names and their

relevant NumPy arrays.

Return type

Dict[str, np.array]

prepareDevice(loadEngine=True)

Prepare the network for execution.

This will create poplar::Graph and poplar::Engine, and set up poplar::Streams.

Parameters

loadEngine (bool) – If true, load the engine and connect the streams once the device is ready.

Raises

popart.OutOfMemoryException – If an out of memory event occurs.

Return type

None

property replicationFactor

Get the replication factor.

class popart_core._InferenceSessionCore
areHostWeightsInSync(self: popart_core._InferenceSessionCore) bool

Are all the weights in sync with the ipu?

checkInplacingAmbiguity(self: popart_core._InferenceSessionCore) None
compileAndExport(self: popart_core._InferenceSessionCore, filename: str, err: popart_core.OutOfMemoryError) None
copyDeviceWeightsToHost(self: popart_core._InferenceSessionCore) None

Copy data from the device, to the host buffers, to the tensor.tensorData() buffers. Will not run a WeightsToHost program if weights already in sync with ipu. After WeightsToHost, marks the weights as in sync with the ipu.

getAllTensorIds(self: popart_core._InferenceSessionCore) Set[str]

Returns the ids of all tensors in the model.

pre prepareDevice() must have been called.

getCycleCount(self: popart_core._InferenceSessionCore, id: str = '') int

Copy the cycle count tensor from the device to the host.

Parameters

id – The identifier of the cycle count tensor.

getInfo(self: popart_core._InferenceSessionCore, arg0: str) popart_internal_ir.TensorInfo

Get the tensor information for a tensor.

Parameters

TensorId – The identifier of the tensor to get the tensor information for.

Returns

The tensor information for the tensor.

getIr(self: popart_core._InferenceSessionCore) popart_internal_ir.Ir
getRNGState(self: popart_core._InferenceSessionCore) List[int]
getRandomSeed(self: popart_core._InferenceSessionCore) int

Get the value of the random number generator seed.

Calling setRandomSeed() with this value (at a later stage) reinstates the random state logic that seeds random operations.

Returns

The value used to seed current random operations.

getReport(self: popart_core._InferenceSessionCore) pva::Report

Retrieve the graph report from the poplar::Engine.

The options which were passed to the Session constructor will influence the information in the report.

This method may only be called after prepareDevice() has been called.

Returns

The PopVision Analysis report object.

getSerializedGraph(self: popart_core._InferenceSessionCore) bytes
getSummaryReport(self: popart_core._InferenceSessionCore, resetProfile: bool = True) str

Retrieve the summary from from the ``poplar::Engine.`:code:`

The options which were passed to the Session constructor will influence the information in the report.

This method may only be called after prepareDevice() has been called.

Parameters

resetProfile – If true:code:, resets the execution profile. Default = true.

Returns

A string containing the report.

loadEngineAndConnectStreams(self: popart_core._InferenceSessionCore) None

Load the engine on the device and connect the streams.

This will set up the poplar::Streams.

Note: This call is optional. The engine will implicitly be loaded on the device when required.

loadExecutable(self: popart_core._InferenceSessionCore, filename: str) None

Load the compiled executable and metadata from a file.

The file must have been created with compileAndExport(const std::string).

Parameters

filename – The name of the file to load the executable and metadata from.

markHostWeightsInSync(self: popart_core._InferenceSessionCore) None

Mark the d2hWeightBuffers as in sync with the ipu.

markHostWeightsOutOfSync(self: popart_core._InferenceSessionCore) None

Mark the d2hWeightBuffers as out of sync with the ipu.

modelToHost(self: popart_core._InferenceSessionCore, arg0: str) None

Write the current model to an ONNX file.

Parameters

fn – The path to file. The path can be absolute or relative. If you plan to run your program in multiple processes simultaneously, you should avoid possible race conditions by writing to different files, for example by using temporary files.

prepareDevice(self: popart_core._InferenceSessionCore, loadEngine: bool = True, err: popart_core.OutOfMemoryError) None

Prepare the network for execution.

This will create the poplar::Graph and ``poplar::Engine.`:code:`

Parameters

loadEngine – If true, load the engine and connect the streams once the device is ready.

readWeights(self: popart_core._InferenceSessionCore, arg0: popart_core.IWeightsIO) None

Read the weights from the host stream memory and write to the host.

This method may only be called after weightsToHost() has been called.

Parameters

weightsIo – The weight data that is read from the host stream memory is written to the addresses in p weightsIo.out.

resetHostWeights(self: popart_core._InferenceSessionCore, modelProtoOrFilename: str, ignoreWeightsInModelWithoutCorrespondingHostWeight: bool = False) None

Reset weights with weights in an ONNX model.

Note that the only differences between the ONNX model and the current model must be the weights. No other differences are allowed.

This method only updates the weights on the host. weightsFromHost() must be called after this method to update the weights on the device.

Parameters
  • model – An ONNX model protobuf, or the name of a file containing an ONNX model protobuf.

  • ignoreWeightsInModelWithoutCorrespondingHostWeight – If true, do not throw an error if there are initializers in the ONNX model without corresponding initializer tensor(s) in the session’s IR.

run(*args, **kwargs)

Overloaded function.

  1. run(self: popart_core._InferenceSessionCore, stepio: popart_core.IStepIO, debugName: str = ‘’) -> None

Run one step.

Read input data from address in p stepIO.in.

Write the output data to addresses in p stepIO.out.

Parameters
  • stepIO – The input and output data.

  • debugName – A debug string to identify this run in logs.

  1. run(self: popart_core._InferenceSessionCore, programHandle: str, stepio: popart_core.IStepIO, debugName: str = ‘’) -> None

Run one step.

Read input data from address in p stepIO.in.

Write the output data to addresses in p stepIO.out.

Parameters
  • stepIO – The input and output data.

  • debugName – A debug string to identify this run in logs.

saveExecutable(self: popart_core._InferenceSessionCore, path: str, savePopartMetadata: bool = True, saveVariables: bool = True) None

Save a compiled graph with additional data to a file.

PopART is able to save its state after the model compilation is complete, so that it can be restored at a later time. To make this possible, it is necessary to save such elements as:

  • a serialised Poplar executable,

  • its associated metadata,

  • tensor data blobs if model parameters have not been frozen (refer to the SessionOptions::constantWeights for more information),

  • a PopART-specific opaque blob to store information only relevant to PopART. This is needed to restore PopART state.

The file will be in the PopEF format. This means that the file can be used to restore the state of the PopART program without recompiling the graph, or run inference using the Triton Inference Server with the Graphcore Triton backend. See the Poplar Triton Backend User Guide for more information. If you want to analyze file structure saved by the function please refer to the PopEF dump tool.

pre prepareDevice() must have been called.

Parameters
  • path – The name of the file or directory where the compiled executable, metadata and variables will be saved. If you specified a path to the directory, the function will write the data to the file: “<path>/executable.popef”. If the file exists, the function will overwrite the old data with the new ones.

  • savePopartMetadata – If you do not need the option to restore the PopART state later, you can set the flag to false to reduce disk space taken up by the file.

  • saveVariables – If you don’t need to save variables (tensors) state, you can set the flag to false if you want to save them later or in a different location. The function will save data consistent with the variables contained within the model.

saveVariables(self: popart_core._InferenceSessionCore, path: str) None

Save all variables to a file.

The function will save data consistent with the variables contained within the model.

The file will be in the PopEF format. If you want to analyze tensors saved by the function please refer to the PopEF dump tool.

pre prepareDevice() must have been called.

Parameters

path – The name of the file or directory where the compiled variables will be saved. If you specified a path to the directory, the function will write the data to the file: “<path>/variables.popef”. If the file exists, the function will overwrite the old data with the new ones.

setEngineIsLoaded(self: popart_core._InferenceSessionCore, isLoaded: bool) None
setRNGState(self: popart_core._InferenceSessionCore, rngValue: List[int]) None
setRandomSeed(self: popart_core._InferenceSessionCore, seedValue: int) None

Set the value of the random number generator seed.

This method explicitly seeds all random operations. Additionally, this method derives a new state for the random number generator (RNG) from the seed and sets it on the device. This RNG state is used to resolve stochastic rounding. Note that to deterministically store and restore the combined random state for a session, do the following:

C++: ``` // Store random state (session s0). auto seed = s0.getRandomSeed(); auto rngState = s0.getRNGState();

// Restore random state (session s1). s1.setRandomSeed(seed); // <– affects RNG state, order important s1.setRNGState(rngState); ```

Python: ``` # Store random state (session s0). seed = s0.getRandomSeed() rngState = s0.getRNGState()

# Restore random state (session s1). s1.setRandomSeed(seed) // <– affects RNG state, order important s1.setRNGState(rngState) ```

Parameters

seedValue – The value of the seed.

updateExternallySavedTensorLocations(self: popart_core._InferenceSessionCore, arg0: str, arg1: str) None

Update the tensor locations of tensors in the session’s ONNX model.

A new file will be created at this point, and written to when the ONNX model is saved with a subsequent call to modelToHost().

Parameters
  • fromLocation – All externally saved tensors with location p fromLocation will have their location updated to p toLocation.

  • toLocation – The updated tensor locations. This must not already exist.

weightsFromHost(self: popart_core._InferenceSessionCore) None

Copy weights from the host to the device.

weightsToHost(self: popart_core._InferenceSessionCore) None

Copy the weights from the device to the host steam memory.

writeWeights(self: popart_core._InferenceSessionCore, arg0: popart_core.IWeightsIO) None

Write the weights from the host to the IR tensor memory.

This method may only be called after weightsFromHost() has been called.

Parameters

weightsIo – The weight data is written to the addresses in p weightsIo.out.

13.1.3. Session Options

class popart.SessionOptions
property accumulateOuterFragmentSettings

Configuration setting for operations in the accumulate outer fragment.

property accumulationAndReplicationReductionType

Specify how gradients are reduced when using gradient accumulation and graph replication. Default: ReductionType::Sum.

property accumulationFactor

Specify the number of micro-batches to accumulate before applying the varUpdate.

property accumulatorTensorLocationSettings

Tensor location for gradient accumulator tensors.

property activationTensorLocationSettings

Tensor location settings for activation/gradient tensors.

property aliasZeroCopy

Enable zero-copy for subgraphs.

property autoRecomputation

Enable recomputation of operations in the graph in the backward pass. This will reduce model size at the cost of computation cycles.

Default: RecomputationType::None (no recomputation).

property batchSerializationSettings

Configuration setting for batch serialization.

property cachePath

Folder to save the poplar::Executable to.

property compileEngine

Setting to only build the Poplar graph but not compile not.

If false, the backend will build the Poplar graph but not compile it into an Engine. In this case, no execution can be performed, and nothing can be transferred to the device. API calls which retrieve information from the graph building stage, such as tile mapping introspection, can still be used.

property constantWeights

Specify an optimization for an inference session to have constant weights.

Set this option to false in order to change the weights with a call to Session::resetHostWeights() after the session has been prepared. This option has no effect on a training session.

Default: true.

property createImplicitPipeliningFwdOnlyProgram

deprecated Create a custom program containing the forward pipeline only.

property customCodeletCompileFlags

Compile flags for the custom codelets. For example -g to generate debug info. See the Poplar documentation for poplar::Engine for more information.

property customCodelets

List of codelet files (with file extension) to be added to the Poplar graph. See the Poplar documentation for poplar::Graph for more information.

property decomposeGradSum

Enable replacement of single sums of partial gradients with a tree of additions. This can reduce max liveness at the cost of extra cycles. A typical use case for this would be if a large weight tensor is used as an input to many operations.

Default: false (not enabled).

property delayVarUpdates

Options to delay variable updates as much as possible.

property disableGradAccumulationTensorStreams

Disable saving of weight gradient tensors off the device.

If true, the weight gradient tensors are not saved off the device when devicex.weightsFromHost() is called. note This option is overridden if syntheticDataMode is not SyntheticDataMode::Off.

note Weight gradient tensors that are also optimiser tensors will only be disabled if both disableGradAccumulationTensorStreams and ``disableOptimizerStateTensorStreams`:code:` are true.

property disableOptimizerStateTensorStreams

Disable streaming of optimizer tensors.

If true, streaming of optimizer tensors is disabled. This setting can be used to conserve memory if you are not interested in checkpointing the optimizer state. note Weight gradient tensors that are also optimiser tensors will only be disabled if both disableGradAccumulationTensorStreams and ``disableOptimizerStateTensorStreams`:code:` are true.

property dotChecks

When to write .dot files during IR construction.

property dotOpNames

Enable inclusion of the op name in the .dot file (the op type is always exported). Enabled when true. Default: false.

property enableConstantFoldingOfMultipleConsumers

Specify whether to enable constant folding on ops that inputs have multiple consumers. Default true (enabled).

property enableDistributedReplicatedGraphs

Enable training with Poplar replicated graphs across multiple PopART instances.

Default: false (not enabled).

property enableEfficientOverlapIOTopoCons

Enable efficient overlap io topo constraints.

Suppose we have the N bins in each of three stage(8 for before loop /7 for insdie loop /6 for after loop), and L ops for each bins, vallina implementaiton of overlapio creates topocons of complexity O(N*N*L*L).

To make sure InitOps in each step are scheduled before HostLoadOps, we only need to keep topo constrains in each bin and let the last of op of each bin Bin0 is scheduled before the first op of Bin1 next to Bin0. Then total complexity O(N*N*L*L) is reduced to (N*L).

Default: false (not enabled).

property enableEngineCaching

Enable Poplar executable caching. The file is saved to the location defined with ``cachePath.`:code:` The file will be in the PopEF format. This means that it can be used to run inference using the Triton Inference Server because Graphcore provides a backend to it. See the Poplar Triton Backend user guide for more information.

Default: false (not enabled).

enableExplicitIR(self: popart_core.SessionOptions, arg0: bool) None
property enableExplicitMainLoops

Enable explicit main loop transformation, and disable implicit training loops.

note This will be deprecated and enabled by default.

property enableFloatingPointChecks

Enable that exceptions are thrown when floating point errors occur.

Default: false (not enabled).

property enableFullyConnectedPass

Enable the global fullyConnectedPass option for matmuls.

See also

poplin::matMul(poplar::Graph, poplar::Tensor, poplar::Tensor,

poplar::program::Sequence, poplar::Type, poplar::DebugContext, poplar::OptionFlags, matmul::PlanningCache).

property enableGradientAccumulation

false (not enabled).

Type

Enable gradient accumulation. Default

property enableLoadAndOffloadRNGState

Enable load and offload of device RNG state from host.

Default: false (not enabled).

property enableMergeExchange

Enable merging remote and host IO operations to facilitate IO overlap. true to enable, otherwise false.

Default=:code:true.

property enableNonStableSoftmax

Enable the non-stable softmax Poplar function.

By default, the stable softmax Poplar function is used. The input tensor to softmax, \(x\), is preprocessed by subtracting \(max(x)\) from each element before computing the exponentials, ensuring numerical stability. If the inputs to the softmax operations are small enough to not cause overflow when computing the exponential, then the non-stable version can be enabled instead, to increase the speed.

Default: false (not enabled).

property enableOutlining

Enable outlining. This identifies and extracts repeated parts of computational graph into subgraphs. Enabled when true. Default: true.

property enableOutliningCopyCostPruning

Enable inclusion of the cost of copying of cached sections should be in the outlining cost model. Enabled when true. Default: true.

property enablePipelining

false (not enabled).

Type

Enable pipelining of virtual graphs. Default

property enableReplicatedGraphs

false (not enabled).

Type

Enable replication of graphs. Default

property enableStableNorm

If true, computes the mean first and subtracts the activations from it before computing the variance. The implementation with this flag set to true is slower than when set to false. The stable version requires the first order moment to be estimated and applied to the sample set before the second order central moment is calculated.

property enableStochasticRounding

Enable stochastic rounding.

PopART will set the Poplar engine option target.deterministicWorkers to true if this option is set and to false if it is not set. Adding a value for “target.deterministicWorkers” to SessionOptions::engineOptions overrides this behaviour.

Default: false (not enabled).

property enableSupportedDataTypeCasting

Enable casting to supported data types. If enabled (true), casts any tensor of unsupported data types to supported data types when lowering to Poplar. Currently, this implies casting:

  • INT64 -> INT32

  • UINT64 -> UINT32

The cast will throw an error for incompatible data types and over/underflows, and will warn about narrowing casts.

Default: true (enabled).

property enableVariablesCaching

Enable variable caching.

This means that the caching process will save variables as additional PopEF blobs to the file location defined with cachePath. If PopART will require data for variables (during cache reading process), they will be automatically read from the cache file.

Default: true (enabled).

property ensureFp32LossScaleTensor

Ensure that the loss scale tensor is fp32 and that this is combined with fp16 activations as late as possible to produce the first fp16 activation gradients. This makes it possible to choose a loss scale value greater than max(fp16). This is also recommended when automatic loss scaling is enabled. Only compatible with models that have an fp16 loss scale tensor. true ensures that the loss scale tensor is fp32.

Default: false.

property executionPhaseSettings

Configuration settings for execution phases.

property explicitRecomputation

Enable explicit recomputation.

Default: false (not enabled).

property exportPoplarComputationGraph

Enable export of Poplar computational graph. Enabled when true. Default: false.

property exportPoplarVertexGraph

Enable export of Poplar vertex graph. Enabled when true. Default: false.

property finalDotOp

See firstDotOp.

property firstDotOp

The ops written to the .dot file will be a part of the schedule, controlled by firstDotOp and finalDotOp. In particular, it will be [max(0, firstDotOp), min(N ops in IR, finalDotOp)).

getGlobalReplicationFactor(self: popart_core.SessionOptions) int

Get the global replication factor.

Returns

s - If enableDistributedReplicatedGraphs is true, then return globalReplicationFactor. - If enableReplicatedGraphs is true, then return replicatedGraphCount. - otherwise return 1.

property globalReplicaOffset

The first replica index that this PopART instance is running.

property globalReplicationFactor

The total number of replicas in a multi-instance, replicated-graph training session (this should be left as the default value (1) if distributed replicated graphs are disabled). This value includes local replication.

property groupHostSync

Specify to group the streams from the host to the device at the beginning of the schedule, and the streams from the device to the host at the end of the schedule. This trades off memory usage for speed.

When true, tensors will stay live for longer. note This setting has no effect when useHostCopyOps is enabled (true).

Default: false (not enabled).

property instrumentWithHardwareCycleCounter

Add instrumentation to the program to count the number of device cycles (of a single tile, on a single IPU) that the main program takes to execute. Expect this to have a small detrimental impact on performance.

property kahnTieBreaker

Specify which method is used to control how ops are scheduled.

The initial scheduling is done with Kahn’s algorithm. When several ops are free to be scheduled, this controls which method is used.

Options are described in the [Poprithms KahnTieBreaker enum](https://github.com/graphcore/poprithms/blob/sdk-release-2.4/poprithms/poprithms/include/poprithms/schedule/shift/kahndecider.hpp).

property logDir

A directory for log traces to be written into.

property meanAccumulationAndReplicationReductionStrategy

Specify when to divide by a mean reduction factor when accumulationAndReplicationReductionType is set to ReductionType::Mean.

Default: MeanReductionStrategy::Post.

property mergeVarUpdate

Enable merging of VarUpdates into groups of VarUpdates, by flattening and concatenating variable tensors and updating tensors.

Default: MergeVarUpdateType::None (no merging).

property mergeVarUpdateMemThreshold

Specify the memory threshold for VarUpdateOp merging algorithms.

The MergeVarUpdateType::AutoLoose and MergeVarUpdateType::AutoTight VarUpdateOp merging algorithms have a threshold on the total memory of variable tensors to merge for updating. Defined as total memory in bytes.

Default: 1000000.

property numIOTiles

Number of IPU tiles dedicated to IO.

property optimizerStateTensorLocationSettings

Tensor location for optimizer state tensors.

property opxAliasChecking

Enable running Opx checks to verify that IR tensor aliasing information corresponds to the lowered Poplar tensor aliasing.

Default: false (not enabled).

property opxModifyChecking

Enable running Opx checks to verify that IR tensor modification information corresponds to the lowered Poplar tensor modifications.

Default: false (not enabled).

property outlineSequenceBreakCost

Specify the penalty applied to outlining potential sub-graphs if the sub-graph to be created breaks up a sequence of operations that are more efficient (for example for overlapping compute and exchange) when outlined together.

Default: 10000.0f.

property outlineThreshold

Specify the incremental value that a sub-graph requires, relative to its nested sub-graphs (if any), to be eligible for outlining.

A high threshold results in fewer sub-graphs being outlined, a negative value results in all being outlined. The gross value of a sub-graph is the sum of its constituent ops’ Op::getSubgraphValue() values. To disable outlining, it is better to set enableOutlining to false than to set this value to infinity. The default value of 1.0f results in all high value operations such as convolution being cached, but standalone low value operations such as ReLU will not be.

Default: 1.0f.

property partialsTypeMatMuls

Set the partials type globally for matmuls. Can be overridden individually with Builder.setPartialsType(). Valid values are "float" and "half". By default, this is not set, so no global partials type is imposed.

property rearrangeAnchorsOnHost

Enable rearrangement (in memory) of anchor tensors to be done on the host.

Before anchor tensors are streamed from device to host, they are not necessarily arranged in memory as required when they are to be copied from host stream to host. This can be done on the device or on the host.

Default: true (Rearrangement done on host to save memory, but often at the expense of cycles, especially for larger anchor tensors.).

property rearrangeStreamsOnHost

Enable rearrangement (in memory) of stream tensors to be done on the host. Before stream tensors are streamed from host to device, they are not necessarily arranged in memory as required when they are to be copied from host stream to device. This can be done on the device or on the host.

Default: false (Rearrangement done on device).

property replicatedGraphCount

Specify the number of model replications. If enableReplicatedGraphs`:code:` is `true`, ``replicatedGraphCount will set the number of model replications. For example, if the model uses 1 IPU, a replicatedGraphCount of 2 will use 2 IPUs. If the model is pipelined across 4 IPUs, a replicatedGraphCount of 4 will use 16 IPUs in total. Therefore, the number of IPUs requested must be a multiple of replicatedGraphCount. If the training is done across multiple instances of the program then the replicatedGraphCount is the number of replicas for this instance.

property separateCallOpPdfs

Enable creation of separate PDFs for each subgraph when generating PDFs of IR graphs. Enabled when true. Default: true.

property serializedPoprithmsAnnealGraphsDir

The directory to serialize Poprithms graphs to.

PopART uses Poprithms for scheduling PopART graphs. The Poprithms graphs created for scheduling can be optionally serialised (written to file). If serializedPoprithmsShiftGraphsDir is empty, then the graphs will not be serialised. The names of serialization files will be poprithms_shift_graph_i.json for the lowest non-existing values of i. The directory must already exist, PopART will not create it.

property serializedPoprithmsShiftGraphsDir

The directory to serialize Poprithms graphs to.

PopART uses Poprithms for scheduling PopART graphs. The Poprithms graphs created for scheduling can be optionally serialised (written to file). If serializedPoprithmsShiftGraphsDir is empty, then the graphs will not be serialised. The names of serialization files will be poprithms_shift_graph_i.json for the lowest non-existing values of i. The directory must already exist, PopART will not create it.

property stashAllTensorsInferencePipeline

Specify whether to enable stash all needed tensors when inference pipeline. Default false (disabled).

property strictOpVersions

Enable strict op version checks.

Strict op version checks will throw an error if the exact version of an op required for the model opset is not supported. Turning this check off will cause PopART to fall back to the latest implementation of the op that is supported. $.. warning:

Turning off these checks may cause undefined behaviour.

Default: true (enabled).

property subgraphCopyingStrategy

Specify how copies for inputs and outputs for subgraphs are lowered.

Setting this value to SubgraphCopyingStrategy::JustInTime may save memory at the cost of fragmenting subgraphs into multiple Poplar functions. This may be particularly useful when a number of weight updates are outlined in one subgraph, as it may prevent multiple weight tensors from being live at the same time inside the subgraph.

Default: SubgraphCopyingStrategy::OnEnterAndExit.

property swapLimitScheduler

The maximum number of improving steps allowed by the scheduling algorithm before a solution must be returned.

property syntheticDataMode

Specify whether to use real or synthetic data to initialize input tensors. Streaming to/from the host is only enabled for SyntheticDataMode::Off which indicates that real data is being used.

Default: SyntheticDataMode::Off.

property throwIfLog2ScaleTensorNotInRange

Specify whether to throw a Poplar error at runtime if any fused ops that consume a log2 scale tensor receive a log2 scale value that is not in the 6-bit signed integer range [-32, 31). Setting this option to false will not throw an error, however may lead to undefined behaviour if the value of the log2 scale tensor is outside the range. Default true (enabled).

property timeLimitScheduler

The maximum allowed time (in seconds) that can be spent searching for a good graph schedule before a solution must be returned.

property updatableNamedBuffers

List of model named buffers that can be updated with call to buffersFromHost(). This allows to update just a subset of model weights instead of all or them as it happens with weightsFromHost() call.

property useHostCopyOps

Enable use of IR graph operations for data and anchor streams.

Default: false (not enabled).

property useLoopCandidateCreator

Specify whether to use the loop candidate to create tensor. For a constant consumed by a LoopOp and a non-LoopOp, using LoopOp as the creator may improve perforamance. Default false (disabled).

property virtualGraphMode

Specify how to place ops on virtual graphs to achieve model parallelism, either manually using model annotations, or automatically.

Default: VirtualGraphMode::Off.

property virtualGraphSplitRatios

Specify split ratios when VirtualGraphModel::Auto enabled.

These values represent split ratios in each device and each of the values is in range (0, 1).

For example, to uniformly split the whole graph on 4 IPUs, the value should be [0.25, 0.25, 0.25, 0.25].

property weightTensorLocationSettings

Tensor location for weight tensors.

class popart.AccumulateOuterFragmentSchedule

Enum type that determines how the operations in the accumulate outer fragment will be scheduled across virtual graphs (only relevant to pipelined modes).

Members:

Scheduler : Don’t add additional constraints and let the scheduler work it out.

Serial : Add constraints that ensure ops are executed in virtual graph ID order.

OverlapCycleOptimized : Try and parallelise ops with different virtual graph IDs as much as possible.

OverlapMemoryOptimized : Try and parallelise ops with different virtual graph IDs but avoid certain steps that are costly in terms of memory usage.

property name
class popart.AccumulateOuterFragmentSettings
property excludedVirtualGraphs

Indicate to explicitly avoid parallelising the virtual graph IDs. note This setting is experimental and may change.

property schedule

Indicate how to schedule the accumulate outer fragment. note This setting is experimental and may change.

class popart.AutodiffSettings
class popart.AutodiffStitchStrategy

Members:

RecomputeMinimal

RecomputeAllNonInputs

AddFwdOutputs

SafeAddFwdOutputs

property name
class popart.AutomaticLossScalingSettings
class popart.BatchSerializationBatchSchedule

Enum type that describes how to change the batch serialisation subgraph schedule before outlining. note This setting is experimental and may change.

Members:

Scheduler : Don’t encourage any particular scheduling for ops within batch subgraphs (leave it to the scheduler) but tell the scheduler to schedule subgraphs in sequence.

Isomorphic : Encourage all ops within batch subgraphs to be scheduled identically and for each subgraph to be scheduled in sequence (good for outlineability).

OverlapOnIo : Attempt to put the remote load op for batch N+1 right after the compute phase of batch N.

OverlapOnCompute : Attempt to put the remote load op for batch N+1 right before the compute phase of batch N.

property name
class popart.BatchSerializationMethod

Enum type that describes how to apply the batch serialization. note This setting is experimental and may change.

Members:

UnrollDynamic : Unroll the batch with dynamic slicing.

UnrollStatic : Unroll the batch with static slicing.

Loop : Loop over the batch dimension.

property name
class popart.BatchSerializationSettings

A structure containing batch serialization settings.

property batchSchedule

Experimental value that changes how operations are scheduled.

property concatOnExecutionPhaseChange

Break batch serialization chains when the execution phase changes (by concatenating the compute batches to the local batch).

property concatOnPipelineStageChange

Break batch serialization chains when the pipeline stage changes (by concatenating the compute batches to the local batch).

property concatOnVirtualGraphChange

Break batch serialization chains when the virtual graph changes (by concatenating the compute batches to the local batch).

property factor

The number of compute batches to split operations into.

property method

Experimental value to control how batch serialization is applied.

property transformContext

Experimental value to control when batch serialization is applied.

class popart.BatchSerializationTransformContext

Enum type that describes when to apply batch serialization. note This setting is experimental and may change.

Members:

Forward : Apply batch serialiation before growing the backward pass.

Backward : Apply batch serialiation after growing the backward pass.

Fwd : Apply batch serialiation before growing the backward pass.

Bwd : Apply batch serialiation after growing the backward pass.

property name
class popart.CommGroup

Class to specify sub-groups of replicas.

Examples of derived sub-groups: - IPU-link domain sub-rack:

where N is power of two and replicaGroupSize > 1.

  • Complete IPU-link domain / full rack:

  • Using GW-links only:

property replicaGroupSize

Replica group size.

toReplicaGrouping(self: popart_internal_ir.CommGroup, numReplicas: int) popart_internal_ir.ReplicaGrouping
property type

Replica group type.

class popart.CommGroupType

PopART equivalent of GCL CommGroupType. Each of these enumeration constants have a corresponding GCL CommGroupType value.

Members:

All : All replicas viewed as one group, replica group size is ignored. */

Consecutive : Groups are consecutive in replica.

If there are N replicas denoted {0, … N-1} and group size is k, then there are N/k groups of size k:

{0, 1, … k-1}, {k, … 2k-1} … {N-k-1, … N-1}

Orthogonal : Groups are sliced orthogonal to the replica ordering.

If there are N replicas denoted {0, … N-1} and group size is k, then there are m = N/k groups of size k:

{0, m, 2m, …}, {1, m+1, 2m+1, …} … {m-1, 2m-1, … N-1}

Ungrouped : Each replica is in it’s own group, replica group size is ignored. */

property name
class popart.ExecutionPhaseIOSchedule

Enum type to specify when to load tensors.

Members:

Preload : Preload tensors in previous phase for use in current phase.

OnDemand : Load tensors just before they are required.

property name
class popart.ExecutionPhaseSchedule

Enum type to specify the order of processing optimizer operations for different weights of the same execution phase.

The steps for phased execution are:

-# Copy to IO tiles if necessary. -# Run collective operations if necessary. -# Load optimizer state. -# Update optimizer state. -# Apply optimizer. -# Store updated tensor if necessary.

Members:

Interleaving : Process above steps for one weight at a time (for example: 123456, 123456, 123456). The scheduler may interleave these steps.

Batch : Process above steps for all weights together, in a way that maximises overlap potential between compute and exchange (for example: 333, 111, 222, 444, 555, 666).

BatchClusteredIO : Process above steps for all weights together, in a way that maximises overlap potential between compute and exchange, and maximise stream copy merges by keeping RemoteLoad/RemoteStore operations clustered (for example: 333, 111, 222, 444, 555, 666).

property name
class popart.ExecutionPhaseSettings
property activationIOSchedule

The execution phase IO schedule for activation and gradient tensors.

property phases

Number of ExecutionPhases for the whole model.

property stages

Number of overlapping stages * 1: Parallel streaming memory, default for 1 IPU per replica. * 2: PingPong between 2 IPUs, default for 2 or more IPUs per replica.

property weightIOSchedule

The execution phase IO schedule for weight tensors.

class popart.GradientTensorTrackingMethod

Members:

ConvAndMatmulGradients

AllNonViewChangingGradientTensors

GradientsOfUserSpecifiedTensors

property name
class popart.Instrumentation

Members:

Outer : Outer loop instrumentation, graph over all IPUs.

Inner : Inner loop instrumentation, graph per IPU.

property name
class popart.IrSerializationFormat

Members:

JSON : JavaScript Object Notation (JSON).

property name
class popart.MeanReductionStrategy

Enum type that specifies when to divide by a mean reduction factor, when doing mean reduction over a sequence of tensors \(t_1, t_2, ..., t_k\).

Members:

Running : Keep the reduction buffer as the mean of the tensors accumulated so far. If \(t_1, ..., t_f\) has just been processed, the current accumulator \(s\) is the mean of these values, and the next accumulator update is \(s = \frac{f}{f+1} * s + \frac{1}{f+1} * t_{f+1}\) to keep \(s\) a running mean. This strategy guarantees \(s \le \max(a_1, ..., a_k)\) throughout the accumulation, therefore it will not overflow, but it is generally slower than MeanReductionStrategy::Post.

Post : Keep the accumulation factor as the running sum, and divide once by \(k\) at the end of the accumulation. This strategy will generally be faster than MeanReductionStrategy::Running, but is prone to overflow (especially when using fp16).

property name
class popart.MergeVarUpdateType

Enum type used to specify which VarUpdateOp ops to merge.

Members:

Off : Do not merge VarUpdateOp ops.

All : Merge all VarUpdateOp ops into as few groups as possible. This is a good choice when memory is not a constraint.

AutoTight : Merge into groups, so that VarUpdateOp ops process tensors of exactly SessionOptions::mergeVarUpdateMemThreshold in size.

AutoLoose : Merge into groups while attempting not to increase maximum variable liveness, and also not slice tensor variables so they will need to be processed by different VarUpdateOp ops.

property name
class popart.PyWeightsIO
class popart.RecomputationType

Enum type to specify which ops to recompute in the backward pass when doing auto-recomputation.

Members:

NoRecompute : No ops are recomputed (Default).

Standard : Recompute using algorithm that picks checkpoints to try and minimise max liveness.

NormOnly : Only Norm ops (+ non-linearities, if following) are recomputed.

RecomputeAll : Recompute all ops.

Pipeline : Recompute all forward pipeline stages.

property name
class popart.ReductionType

Members:

Mean : Take the mean of the input values.

NoReduction : Do not reduce the input values. Keep them stacked into a single tensor. So values \(t_1, ..., t_k\) get collected into a tensor \([t_1, ..., t_k]\).

Sum : Sum the input values and do not scale the output (Default).

property name
class popart.ReplicatedTensorSharding

Enum type to specify whether to shard tensors over replicas.

Members:

Off : Don’t shard tensors over replicas.

On : Do shard tensors over replicas.

property name
class popart.SubgraphCopyingStrategy

Members:

OnEnterAndExit : Copy all inputs before the start of the subgraph, copy all outputs after all ops in the subgraph. With this strategy, subgraphs will always map to a single Poplar function.

JustInTime : Copy inputs just before they are consumed and copy outputs as soon as they are produced. With this strategy, subgraphs may be lowered into multiple Poplar functions.

property name
class popart.SyntheticDataMode

Members:

Off : Use real data.

Zeros : Input tensors are initialised to all zeros.

RandomNormal : Input tensors are initialised with a random normal distribution ~N(0,1).

RandomUniform : Input tensors are initialised with a uniform distribution.

property name
class popart.TensorLocationSettings
property location

The default tensor location for this tensor type.

property minElementsForOffChip

The minimum number of elements below which offloading won’t be considered.

property minElementsForReplicatedTensorSharding

A minimum number of elements below which replicated tensor sharding won’t be considered.

class popart.TileSet

Enum type to specify a set of tiles.

Members:

Compute : The set of tiles designated for compute operations.

IO : The set of tiles designated for IO operations.

property name
class popart.VariableRetrievalMode

Members:

OnePerGroup : Returns one variable per group (defined by the

VariableSettings::sharedVariableDomain CommGroup), automatically returns the first replica of each group, where first means the one with the lowest replica ID.

AllReduceReplicas : As OnePerGroup, but performs an AllReduce among the

replicas in the same group according to VariableSettings::sharedVariableDomain !!! CURRENTLY UNSUPPORTED

AllReplicas : Returns all replica Weights

property name
class popart.VariableSettings
getCommGroupType(self: popart_core.VariableSettings) popart_internal_ir.CommGroupType
getGroupCount(self: popart_core.VariableSettings, arg0: int) int
getGroupRepresentative(self: popart_core.VariableSettings, group: int) int
getGroupSize(self: popart_core.VariableSettings) int
getRealGroupSize(self: popart_core.VariableSettings, arg0: int) int
getReplicaGrouping(self: popart_core.VariableSettings, arg0: int) popart_internal_ir.ReplicaGrouping
getRetrievalMode(self: popart_core.VariableSettings) popart_core.VariableRetrievalMode
getSharedVariableDomain(self: popart_core.VariableSettings) popart_internal_ir.CommGroup
getStride(self: popart_core.VariableSettings) int
groups(self: popart_core.VariableSettings, arg0: int) List[List[int]]
isUsingCommGroup(self: popart_core.VariableSettings) bool
numReplicasReturningVariable(self: popart_core.VariableSettings, arg0: int) int
shapeOnHost(self: popart_core.VariableSettings, arg0: List[int], arg1: int) List[int]
shapeOnReplica(self: popart_core.VariableSettings, arg0: List[int], arg1: int, arg2: str) List[int]
verify(self: popart_core.VariableSettings) None
class popart.VirtualGraphMode

Enum type used to specify a virtual graph mode.

Members:

Off : Virtual graphs are not enabled.

Manual : User must set the popart::Op::virtualGraph attribute on all ops.

Auto : Use the AutoVirtualGraph transform.

ExecutionPhases : Virtual graphs are tied to execution phases.

property name

13.2. Data input and output

Note

The base class for data input and output in PopART is popart::IStepIO. The way in which this class is used is detailed in Section 14.2, Data input and output (IStepIO).

class popart.PyStepIO

This class is an implementation of the IStepIO interface backed by user-provided dictionaries for both input and output. These dictionaries map TensorId values to numpy arrays for PopART to read from and write to, respectively.

__init__(self: popart_core.PyStepIO, inputs: Dict[str, numpy.ndarray], outputs: Dict[str, numpy.ndarray]) None

Construct a new PyStepIO instance.

Parameters
  • inputs – A dictionary with an entry for every input tensor, comprising a TensorId for the key and a numpy array for a value for PopART to read from. The numpy arrays are assumed to be size-compatible with a tensor of shape [replicationFactor, accumulationFactor, batchesPerStep, <tensor shape>].

  • outputs – A dictionary with an entry for every output tensor, comprising a TensorId for the key and a numpy array value to which PopART will write the associated data. The expected shape of this numpy array is explained in the C++ API documentation for popart::AnchorReturnTypeId. The convenience method Session.initAnchorArrays() is typically used to create a dictionary with suitable arrays.

enableRuntimeAsserts(self: popart_core.PyStepIO, arg0: bool) None

Enable (or disable) run-time checks that check the sizes of the provided numpy arrays.

Parameters

arg0 – Flag to enable/disable checks

class popart.PyStepIOCallback

This class is an implementation of the IStepIO interface backed by user-provided callback functions. This class inherits from IStepIO and implements those member functions by delegating the logic to the callback functions passed in the constructor. This gives the user full control as to how data buffers are provisioned.”

__init__(self: popart_core.PyStepIOCallback, input_callback: Callable[[str, bool], numpy.ndarray], input_complete_callback: Callable[[str], None], output_callback: Callable[[str], numpy.ndarray], output_complete_callback: Callable[[str], None]) None

Construct a new PyStepIOCallback instance.

Parameters
  • input_callback – Callable object that the PyStepIOCallback instance will use when IStepIO::in() is called. See IStepIO for details on how to implement this method.

  • input_complete_callback

    Callable object that the PyStepIOCallback instance will use when IStepIO::inComplete() is called. See IStepIO for details on how to implement this method.

  • output_callback

    Callable object that the PyStepIOCallback instance will use when IStepIO::out() is called. See IStepIO for details on how to implement this method.

  • output_complete_callback

    Callable object that the PyStepIOCallback instance will use when IStepIO::outComplete() is called. See IStepIO for details on how to implement this method.

class popart.InputShapeInfo
__init__(self: popart_core.InputShapeInfo) None
add(self: popart_core.InputShapeInfo, arg0: str, arg1: popart_internal_ir.TensorInfo) None
get(self: popart_core.InputShapeInfo, arg0: str) popart_internal_ir.TensorInfo
has(self: popart_core.InputShapeInfo, arg0: str) bool
class popart.DataFlow
__init__(*args, **kwargs)

Overloaded function.

  1. __init__(self: popart_core.DataFlow, batchesPerStep: int, anchorTensors: Dict[str, popart_core.AnchorReturnType]) -> None

  2. __init__(self: popart_core.DataFlow, batchesPerStep: int, anchorTensors: Dict[str, popart_core.AnchorReturnType]) -> None

  3. __init__(self: popart_core.DataFlow, batchesPerStep: int, anchorIds: List[str], anchorReturnType: popart_core.AnchorReturnType = <popart_core.AnchorReturnType object at 0x7f5e5a156bb0>) -> None

anchors(self: popart_core.DataFlow) List[str]
art(self: popart_core.DataFlow, arg0: str) popart_core.AnchorReturnType
batchesPerStep(self: popart_core.DataFlow) int
isAnchored(self: popart_core.DataFlow, arg0: str) bool
nAnchors(self: popart_core.DataFlow) int
setBatchesPerStep(self: popart_core.DataFlow, arg0: int) None

13.3. Tensors

class popart.DataType

Members:

UINT8

INT8

UINT16

INT16

INT32

INT64

UINT32

UINT64

BOOL

FLOAT

FLOAT16

BFLOAT16

FLOAT8_143

FLOAT8_152

DOUBLE

COMPLEX64

COMPLEX128

STRING

UNDEFINED

property name
class popart.ReplicatedTensorSharding

Enum type to specify whether to shard tensors over replicas.

Members:

Off : Don’t shard tensors over replicas.

On : Do shard tensors over replicas.

property name
class popart.TensorInfo(*args)

Python wrapper to TensorInfo to handle numpy types in constructor.

For example:

TensorInfo(dtype, shape)

TensorInfo(numpy.ndarray)

Raises

TypeError – Raised if incorrect type is used to create a tensorinfo.

Parameters

args (Union[Iterable, array]) –

Return type

None

class popart.TensorLocation

Class that describes the memory characteristics of one or multiple tensors.

See also: SessionOptions.

property loadTileSet

The tiles through which the tensor(s) are loaded onto the chip.

property replicatedTensorSharding

Whether to apply replicated tensor sharding (RTS) or not.

property shardingDomain

The GCL comm groups across which to shard the tensor

property storage

The memory location of the tensor(s).

property storageTileSet

The tiles on which the tensor(s) are stored.

class popart.TensorStorage

Enum type that determines where a tensor is stored.

Members:

OnChip : Store the tensor in on-chip memory.

OffChip : Store the tensor in streaming memory.

property name
class popart.TileSet

Enum type to specify a set of tiles.

Members:

Compute : The set of tiles designated for compute operations.

IO : The set of tiles designated for IO operations.

property name
class popart.tensorinfo.TensorInfo(*args)

Python wrapper to TensorInfo to handle numpy types in constructor.

For example:

TensorInfo(dtype, shape)

TensorInfo(numpy.ndarray)

Raises

TypeError – Raised if incorrect type is used to create a tensorinfo.

Parameters

args (Union[Iterable, array]) –

Return type

None

13.4. Optimizers

class popart.Optimizer
getLossScalingVal(self: popart_core.Optimizer) float
class popart.WeightDecayMode

Members:

Decay : Weight decay (e.g. AdamW)

L2Regularization : L2 regularization (e.g. PyTorch-like Adam)

property name
class popart.OptimizerValue
isConst(self: popart_internal_ir.OptimizerValue) bool
val(self: popart_internal_ir.OptimizerValue) float
class popart.OptimizerValueMap
getDefault(self: popart_core.OptimizerValueMap) popart_internal_ir.OptimizerValue

13.4.1. SGD

class popart.ClipNormSettings
static clipAllWeights(arg0: float) popart_core.ClipNormSettings
static clipWeights(arg0: List[str], arg1: float) popart_core.ClipNormSettings
class popart.SGD

Stochastic Gradient Descent (SGD) optimizer.

Akin to any optimizer implementation, this class is responsible for updating each weight tensor (\(w\)) in the model using the gradient (\(g\)) of the loss function with respect to the weight as calculated during the backwards pass.

The SGD optimizer has the following state for each weight:

  • velocity (\(v\))

The SGD optimizer has the following hyper parameters:

  • learning rate (\(\text{lr}\))

  • momentum (\(\text{mm}\))

  • weight decay (\(\text{wd}\))

  • dampening (\(\text{dm}\))

  • velocity scaling (\(\text{vs}\))

  • loss scaling (\(\text{ls}\))

  • nesterov

  • clip norm settings

The values of these parameters can be shared between all weights but some can be overridden with weight-specific values (see SGD::insertSpecific). Hyper parameters are captured using OptimizerValue objects and therefore can be either a constant value or a non-constant value that can be adjusted by the user.

In the following we will describe how this optimizer updates a weight using a gradient. In the context of this description the gradient is is the value of the gradient after any gradient accumulation has been performed and after the application of a loss scaling factor to the gradient has been corrected for.

When the optimizer needs to update a weight, \(w\), using a gradient, \(g\), it first updates the optimizer state as follows:

\[v' := v * \text{mm} + (1 - \text{dm}) * (g + \text{wd} * w) \text{ \ . }\]

Following the update of the optimizer state the optimizer uses said state to update the weight:

if nesterov is True: .. math:

g' := g + \text{wd} * w + \text{mm} * v' \text{ \ . }
\[w' := w - \text{lr} * g' \text{ \ . }\]

else: .. math:

w' := w - \text{lr} * v' \text{ \ . }

In addition to the above, the velocity scaling hyper parameter is a scaling factor that can provide improved numerical stability by ensuring the values stored in the optimizer state, \(v\), are scaled by this value. When using this parameter PopART will automatically deal with the artificially scaled velocity value during the weight update and other hyper parameters do not need to be adjusted).

In addition, the loss scaling hyper parameter is similar in nature to the velocity scaling parameter. It is a scaling value that is applied to the loss gradient at the start of the the backwards pass and, at the end of the backwards pass, this scaling is reversed by multiplying the gradients for each weight with the inverse of the loss scaling value prior to updating the optimizer state. Using loss scaling can also improve numerical stability in some cases.

Finally, it is possible to add clip norm settings for this optimizer. These clip norms compute the L2 norm for a group of weights and adds a scalar term to the weight update that effectively divides it by the norm (or a constant value that is provided as part of the clip norm, which ever is greater).

See the SGD notes in optimizer.hpp for a more detailed and comprehensive derivation of the SGD optimizer step in PopART.

dampenings(self: popart_core.SGD) popart_core.OptimizerValueMap
insertSpecific(self: popart_core.SGD, arg0: str, arg1: dict) None
learningRates(self: popart_core.SGD) popart_core.OptimizerValueMap
momentums(self: popart_core.SGD) popart_core.OptimizerValueMap
nesterov(self: popart_core.SGD) popart_core.OptimizerValueMap
velocityScalings(self: popart_core.SGD) popart_core.OptimizerValueMap
weightDecays(self: popart_core.SGD) popart_core.OptimizerValueMap
class popart.ConstSGD

Stochastic Gradient Descent (SGD) optimizer with constant learning rate, weight decay, loss scaling and clip norm settings (and default values for momentum, dampening or velocity scaling).

NOTE: See SGD for detailed meaning for these parameters.

NOTE: This class exists for backwards compatibility with the Python API and may be removed at some point in the future.

class popart.SGDAccumulatorAndMomentum

Members:

Combined : Implement SGD using a single tensor for the gradient accumulator (accum) and momentum (accl) tensors.

Separate : Implement SGD using separate tensors for the gradient accumulator (accum) and momentum (accl) tensors

property name

13.4.2. ConstSGD

class popart.ConstSGD

Stochastic Gradient Descent (SGD) optimizer with constant learning rate, weight decay, loss scaling and clip norm settings (and default values for momentum, dampening or velocity scaling).

NOTE: See SGD for detailed meaning for these parameters.

NOTE: This class exists for backwards compatibility with the Python API and may be removed at some point in the future.

13.4.3. Adam

class popart.AdamMode

Members:

Adam : Adam or AdamW mode, depending on weight decay setting (see [Kingma & Ba, 2015](https://arxiv.org/abs/1412.6980) and [Loshchilov & Hutter, 2018](https://arxiv.org/pdf/1711.05101.pdf)).

AdamNoBias : Like Adam but without bias correction.

Lamb : Lamb mode (see [You et al., 2020](https://arxiv.org/abs/1904.00962)).

LambNoBias : Like Lamb but without bias correction.

AdaMax : Adamax mode.

property name
class popart.Adam

AdamW, Lamb and AdaMax optimizer implementation.

Akin to any optimizer implementation, this class is responsible for updating each weight tensor (\(w\)) in the model using the gradient (\(g\)) of the loss function with respect to the weight as calculated during the backwards pass.

The optimizer has the following state for each weight:

  • first-order momentum (\(m\))

  • second-order momentum (\(v\))

  • time step (\(t\))

The optimizer has the following hyper parameters:

  • learning rate (\(\text{lr}\))

  • weight decay (\(\text{wd}\))

  • beta1 (\(\beta_1\))

  • beta2 (\(\beta_2\))

  • epsilon (\(\epsilon\))

  • loss scaling (\(\text{ls}\))

  • maximum weight norm (\(\text{mwn}\))

The values of these parameters can be shared between all weights but some can be overridden with weight-specific values (see Adam::insertSpecific). Hyper parameters are captured using OptimizerValue objects and therefore can be either a constant value or a non-constant value that can be adjusted by the user.

The values of AdamMode and WeightDecayMode passed to the constructor determines how weights are updated (see below).

In the following we will describe how this optimizer updates a weight using a gradient. In the context of this description the gradient is is the value of the gradient after any gradient accumulation has been performed and after the application of a loss scaling factor to the gradient has been corrected for.

When the optimizer needs to update a weight, \(w\), using a gradient, \(g\), it first computes a term \(g_\text{tmp}\), which is effectively is \(g\) with L2 regularization applied if the WeightDecayMode is set to WeightDecayMode::L2Regularization this, as follows:

\[\begin{split}g_\text{tmp} := \left\{\begin{aligned} g & \text{ \; (Decay) } \\ (g + \text{wd} * w) & \text{ \; (L2Regularization) \; . } \\ \end{aligned}\right.\\\end{split}\]

Secondly, the optimizer updates the optimizer state as follows:

\[\begin{split}m' &:= \beta_1 * m + (1 - \beta_1) * g_\text{tmp} \\ v' &:= \left\{\begin{aligned} \beta_2 * v + (1 - \beta_2) * g_\text{tmp}^2 & \text{ \; (Adam/AdamNoBias) } \\ \beta_2 * v + (1 - \beta_2) * g_\text{tmp}^2 & \text{ \; (Lamb/LambNoBias) } \\ \text{max}(\beta_2 * v, |g_\text{tmp}|) & \text{ \; (AdaMax) } \\ \end{aligned}\right.\\ t' &:= t + 1 \\\end{split}\]

Next, it computes the following terms:

\[\begin{split}m_\text{tmp} &:= \left\{\begin{aligned} m' & \text{ \; (AdamNoBias/LambNoBias) } \\ \frac{m'}{(1 - \beta_1^{t'})} & \text{ \; (Adam/Lamb/AdaMax) } \\ \end{aligned}\right.\\ v_\text{tmp} &:= \left\{\begin{aligned} v' & \text{ \; (AdamNoBias/LambNoBias) } \\ \frac{v'}{(1 - \beta_2^{t'})} & \text{ \; (Adam/Lamb/AdaMax) } \\ \end{aligned}\right.\\ u_\text{tmp} &:= \left\{\begin{aligned} \frac{m_\text{tmp}}{(\sqrt{v_\text{tmp}} + \epsilon)} + \text{wd} * w &\text{ \; (Decay) } \\ \frac{m_\text{tmp}}{(\sqrt{v_\text{tmp}} + \epsilon)} &\text{ \; (L2Regularization) } \\ \end{aligned}\right.\end{split}\]

Finally, the optimizer updates the weight as follows:

\[\begin{split}w' := \left\{\begin{aligned} w - \text{lr} * u_\text{tmp} &\text{ \; (Adam/AdamNoBias/AdaMax) } \\ w - \biggl(\frac{\text{min}(\lVert{w}\rVert, \text{mwn})}{\lVert{u_\text{tmp}}\rVert}\biggr) * \text{lr} * u_\text{tmp} &\text{ \; (Lamb/LambNoBias) } \\ \end{aligned}\right.\end{split}\]

In addition to the above, the loss scaling hyper parameter is similar in nature to the velocity scaling parameter. It is a scaling value that is applied to the loss gradient at the start of the the backwards pass and, at the end of the backwards pass, this scaling is reversed by multiplying the gradients for each weight with the inverse of the loss scaling value prior to updating the optimizer state. Using loss scaling can also improve numerical stability of the gradient calculations. If scaledOptimizerState is enabled then the the lossScaling will not be removed before updating the optimizer state. This can improve the numerical stability when accl1_type is set to FLOAT16.

NOTE: The maximum weight norm is referred to as \(\phi\) in You et al., 2020.

beta1s(self: popart_core.Adam) popart_core.OptimizerValueMap
beta2s(self: popart_core.Adam) popart_core.OptimizerValueMap
epss(self: popart_core.Adam) popart_core.OptimizerValueMap
insertSpecific(self: popart_core.Adam, arg0: str, arg1: dict) None
learningRates(self: popart_core.Adam) popart_core.OptimizerValueMap
maxWeightNorms(self: popart_core.Adam) popart_core.OptimizerValueMap
weightDecays(self: popart_core.Adam) popart_core.OptimizerValueMap

13.4.4. AdaDelta, RMSProp & AdaGrad

class popart.AdaptiveMode

Members:

AdaGrad

RMSProp

CenteredRMSProp

AdaDelta

property name
class popart.Adaptive
alphas(self: popart_core.Adaptive) popart_core.OptimizerValueMap
epss(self: popart_core.Adaptive) popart_core.OptimizerValueMap
insertSpecific(self: popart_core.Adaptive, arg0: str, arg1: dict) None
learningRates(self: popart_core.Adaptive) popart_core.OptimizerValueMap
momentums(self: popart_core.Adaptive) popart_core.OptimizerValueMap
weightDecays(self: popart_core.Adaptive) popart_core.OptimizerValueMap

13.5. Builder

class popart.builder.AiGraphcore(builder, version)

Return the builder interface for the given ai.graphcore version.

Raises

ValueError – Thrown if an invalid ai.graphcore opset version provided.

Parameters
Return type

None

call(args, num_outputs, callee, debugName='')

Add a call operation to the model.

This is a poplar extension, to expose manual code re-use to the builder

Parameters
  • args (List[int]) – List of tensor ids to feed as arguments.

  • num_outputs (int) – Number of output tensors from the called graph.

  • callee (Builder) – SubgraphBuilder for the graph to be called.

  • debugName (str) –

Keyword Arguments

debugName – A string to prepend to the name of the tensor. Default: “”.

Returns

Output tensor ids.

Return type

List[str]

class popart.builder.AiGraphcoreOpset1(builder, version)

Sub-class for backwards compatibility.

Will forward all calls to AiGraphcore class.

Parameters
Return type

None

class popart.builder.AiOnnx(builder, version)

Base class for the various AiOnnx builder interfaces.

The most recent version of ONNX operators that require special treatment such as Loop, Scan, Logical_If etc. go here. While, older versions where the function signature differs are implemented on a corresponding subclass.

Parameters
  • builder (Builder) – Parent class for access.

  • version (int) – ai.Onnx opset version to use; 6 <= version <= 10. Default: 10.

Return type

None

logical_if(args, num_outputs, else_branch, then_branch, name='')

If conditional operation.

Parameters
  • args (List[str]) – List of tensor ids to feed as arguments.

  • num_outputs (int) – Number of output tensors from the if operator.

  • else_branch (Builder) – SubgraphBuilder for the graph to run if condition is false. Has num_outputs outputs: values you wish to live-out to the subgraph created by the if operation, other tensors will not be accessible to the wider graph. The number of outputs must match the number of outputs in the then_branch.

  • then_branch (Builder) – SubgraphBuilder for the graph to run if condition is true. Has num_outputs outputs: values you wish to be live-out to the enclosing scope. The number of outputs must match the number of outputs in the else_branch.

  • name (str) –

Keyword Arguments

name – A string to prepend to the name of the tensor. Default: “”.

Returns

Output tensor ids.

Return type

List[str]

loop(args, num_outputs, body, debugContext='')

Construct a generic Looping op.

Parameters
  • args (List[str]) – List of tensor ids to feed as arguments.

  • num_outputs (int) – Number of output tensors from the loop operator.

  • body (Builder) – SubgraphBuilder for the graph to run in the loop.

  • debugContext (str) –

Keyword Arguments

debugContext – A string to prepend to the name of the tensor. Default: “”.

Returns

Output tensor ids.

Return type

List[str]

class popart.builder.AiOnnx10(builder, version)

Minimal builder interface for ai.onnx version 10.

Parameters
Return type

None

class popart.builder.AiOnnx11(builder, version)

Minimal builder interface for ai.onnx version 11.

Parameters
Return type

None

class popart.builder.AiOnnx6(builder, version)

Minimal builder interface for ai.onnx version 6.

Parameters
Return type

None

class popart.builder.AiOnnx7(builder, version)

Minimal builder interface for ai.onnx version 7.

Parameters
Return type

None

class popart.builder.AiOnnx8(builder, version)

Minimal builder interface for ai.onnx version 8.

Parameters
Return type

None

scan(args, num_outputs, body, num_scan_inputs, directions=[], debugContext='')

Scan-8 specific construct op.

Parameters
  • args (List[str]) – List of tensor ids to feed as arguments.

  • num_outputs (int) – Number of output tensors from the scan operator.

  • body (Builder) – SubgraphBuilder for the graph to run in the scan.

  • num_scan_inputs (int) – The number of scan_inputs

  • directions (List[int]) – A list of int which specifies the direction of the scan_input. 0 indicates forward direction and 1 indicates reverse direction. If not omitted, scan_input tensors will be scanned in the forward direction.

  • debugContext (str) –

Keyword Arguments

debugContext – A string to prepend to the name of the tensor. Default: “”.

Returns

Output tensor ids.

Return type

List[str]

class popart.builder.AiOnnx9(builder, version)

Minimal builder interface for ai.onnx version 9.

Parameters
Return type

None

scan(args, num_outputs, body, num_scan_inputs, scan_input_axes=[], scan_input_directions=[], scan_output_axes=[], scan_output_directions=[], debugContext='')

Construct a generic scan op.

Parameters
  • args (List[str]) – List of tensor ids to feed as arguments.

  • num_outputs (int) – Number of output tensors from the scan operator.

  • body (Builder) – SubgraphBuilder for the graph to run in the scan.

  • num_scan_inputs (int) – The number of scan_inputs

  • scan_input_axes (List[int]) – A list that specifies the axis to be scanned for the scan_input. If omitted, 0 will be used as the scan axis for every scan_input.

  • scan_input_directions (List[int]) – A list that specifies the direction to be scanned for the scan_input tensor. 0 indicates forward direction and 1 indicates reverse direction. If omitted, all scan_input tensors will be scanned in the forward direction.

  • scan_output_axes (List[int]) – A list that specifies the axis for the scan_output. The scan outputs are accumulated along the specified axis. If omitted, 0 will be used as the scan axis for every scan_output.

  • scan_output_directions (List[int]) – A list specifies whether the scan_output should be constructed by appending or prepending a new value in each iteration: 0 indicates appending and 1 indicates prepending. If omitted, all scan_output tensors will be produced by appending a value in each iteration.

  • debugContext (str) –

Keyword Arguments

debugContext – A string to prepend to the name of the tensor. Default: “”.

Returns

Output tensor ids.

Return type

List[str]

class popart.builder.AiOnnxMl(builder, version)

Return the builder interface for the given ai.onnx.ml version.

Raises

ValueError – Thrown if an invalid ai.onnx.ml opset version provided.

Parameters
Return type

None

class popart.builder.Builder(modelProtoOrFilename=None, opsets=None, builderCore=None)

A wrapper around the Builder C++ class.

Tis is renamed BuilderCore in pybind, to enable more Pythonic use. See builder.hpp for the class definition.

Parameters
  • modelProtoOrFilename (Union[str, bytes]) – Model protobuf string or file path of saved ONNX model proto. Default: None.

  • opsets (Dict[str, int]) – Dict of opset versions. Default: None.

  • builderCore (_BuilderCore) – _BuilderCore object if you want to create a subgraph builder using an existing buildercore object. Default: None.

Return type

None

createSubgraphBuilder()

Create a child builder to add ops to a subgraph using a call operation.

Returns

The child builder.

Return type

Builder

reshape_const(aiOnnx, args, shape, debugContext='')

Const version of the reshape op.

Parameters
  • aiOnnx (Opset) – Versioned aiOnnx opset, for example: aiOnnxOpset11.

  • args (List[str]) – List of tensor ids to feed as arguments.

  • shape (Iterable[int]) – Shape to reshape to, for example [3, 2, 4].

  • debugContext (str) –

Keyword Arguments

debugContext – String to use as a debug Context. Default: “”.

Returns

Output tensor ids.

Return type

List[int]

class popart.builder.Opset(builder, version)

Minimal base class for the opsets.

Parameters
  • builder (Builder) – An interface for a Builder, used for creating ONNX graphs.

  • version (int) – Opset version to use for the given opset sub-class.

Return type

None

class popart.builder._BuilderCore
addInitializedInputTensor(*args, **kwargs)

Overloaded function.

  1. addInitializedInputTensor(self: popart_core._BuilderCore, initVal: numpy.ndarray, debugContext: popart_internal_ir.DebugContext = ‘’) -> str

  2. addInitializedInputTensor(self: popart_core._BuilderCore, initVal: numpy.ndarray, variableSettings: popart::VariableSettings, debugContext: popart_internal_ir.DebugContext = ‘’) -> str

addInputTensor(*args, **kwargs)

Overloaded function.

  1. addInputTensor(self: popart_core._BuilderCore, tensorInfo: popart_internal_ir.TensorInfo, debugContext: popart_internal_ir.DebugContext = ‘’) -> str

  2. addInputTensor(self: popart_core._BuilderCore, dataType: str, shape: List[int], debugContext: popart_internal_ir.DebugContext = ‘’) -> str

  3. addInputTensor(self: popart_core._BuilderCore, tensorInfo: popart_internal_ir.TensorInfo, settings: popart_core.InputSettings = <popart_core.InputSettings object at 0x7f5e693295b0>, debugContext: popart_internal_ir.DebugContext = ‘’) -> str

  4. addInputTensor(self: popart_core._BuilderCore, dataType: str, shape: List[int], settings: popart_core.InputSettings = <popart_core.InputSettings object at 0x7f5e597da3b0>, debugContext: popart_internal_ir.DebugContext = ‘’) -> str

addInputTensorFromParentGraph(self: popart_core._BuilderCore, tensorId: str) None

Add a new named input tensor (from the parent graph) to the model.

Parameters

tensorId – The identifier string of the input tensor. This identifier must already exist in the name scope of the parent GraphProto and must appear topologically before this sub-graph.

addNodeAttribute(*args, **kwargs)

Overloaded function.

  1. addNodeAttribute(self: popart_core._BuilderCore, attributeName: str, attributeValue: int, nodeOutputNames: Set[str]) -> None

  2. addNodeAttribute(self: popart_core._BuilderCore, attributeName: str, attributeValue: List[int], nodeOutputNames: Set[str]) -> None

  3. addNodeAttribute(self: popart_core._BuilderCore, attributeName: str, attributeValue: float, nodeOutputNames: Set[str]) -> None

  4. addNodeAttribute(self: popart_core._BuilderCore, attributeName: str, attributeValue: List[float], nodeOutputNames: Set[str]) -> None

  5. addNodeAttribute(self: popart_core._BuilderCore, attributeName: str, attributeValue: str, nodeOutputNames: Set[str]) -> None

  6. addNodeAttribute(self: popart_core._BuilderCore, attributeName: str, attributeValue: List[str], nodeOutputNames: Set[str]) -> None

addOutputTensor(self: popart_core._BuilderCore, outputName: str) None
addUntypedInputTensor(self: popart_core._BuilderCore, debugContext: popart_internal_ir.DebugContext = '') str

Add a new input tensor without a type or shape to the model.

Parameters

debugContext – Optional debug information.

Returns

The tensor id of the input tensor.

checkpointOutput(self: popart_core._BuilderCore, nodeOutputNames: List[str]) List[str]
commGroup(self: popart_core._BuilderCore, type: int = 0, groupSize: int = 0) AttributeContextManager
customOp(self: popart_core._BuilderCore, opName: str, opVersion: int, domain: str, inputs: list, attributes: dict, numOutputs: int = 1, name: str = '') List[str]
embedReplicationFactor(self: popart_core._BuilderCore, replicationFactor: int) None

Embed the value of replicationFactor into the OnnxModel. Should be interpreted as 1 if not present in the model.

Parameters

replicationFactor – The replication factor.

excludePatterns(self: popart_core._BuilderCore, nodeOutputName: str, patternNames: List[str]) None
executionPhase(*args, **kwargs)

Overloaded function.

  1. executionPhase(self: popart_core._BuilderCore, nodeOutputNames: str, value: int = 0) -> None

  2. executionPhase(self: popart_core._BuilderCore, nodeOutputNames: Set[str], value: int = 0) -> None

  3. executionPhase(self: popart_core._BuilderCore, value: int = 0) -> AttributeContextManager

getAllNodeAttributeNames(self: popart_core._BuilderCore, nodeOutputNames: Set[str]) List[str]

Get all the attribute names from the ONNX node. This function will throw an exception if it cannot find the unique node.

Parameters

nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

Returns

The attribute names associated with the ONNX node.

getExecutionPhase(self: popart_core._BuilderCore) int
getFloatNodeAttribute(self: popart_core._BuilderCore, attributeName: str, nodeOutputNames: Set[str]) float

Get the value of an attribute for the ONNX node where the value is a float.

This function will throw an exception if it cannot find the unique node or if the attribute does not exist or if it has not been set to the float type.

Parameters
  • attributeName – The name of the attribute to find.

  • nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

Returns

Value of the attribute.

getFloatVectorNodeAttribute(self: popart_core._BuilderCore, attributeName: str, nodeOutputNames: Set[str]) List[float]

Get the value of an attribute for the ONNX node where the value is a std::vector<float>.

This function will throw an exception if it cannot find the unique node or if the attribute does not exist.

Parameters
  • attributeName – The name of the attribute to find.

  • nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

Returns

Value of the attribute.

getInputTensorIds(self: popart_core._BuilderCore) List[str]
getInt64NodeAttribute(self: popart_core._BuilderCore, attributeName: str, nodeOutputNames: Set[str]) int

Get the value of an attribute for the ONNX node where the value is a int64_t.

This function will throw an exception if it cannot find the unique node or if the attribute does not exist or if it has not been set to the int64_t type.

Parameters
  • attributeName – The name of the attribute to find.

  • nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

Returns

Value of the attribute.

getInt64VectorNodeAttribute(self: popart_core._BuilderCore, attributeName: str, nodeOutputNames: Set[str]) List[int]

Get the value of an attribute for the ONNX node where the value is a std::vector<int64_t>.

This function will throw an exception if it cannot find the unique node or if the attribute does not exist or if it has not been set to the std::vector<int64_t> type.

Parameters
  • attributeName – The name of the attribute to find.

  • nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

Returns

Value of the attribute.

getModelProto(self: popart_core._BuilderCore) bytes
getNameScope(self: popart_core._BuilderCore, name: str = '') str
getOutputTensorIds(self: popart_core._BuilderCore) List[str]
getPartialsType(self: popart_core._BuilderCore, nodeOutputName: str) str

Get the partials type for the given node.

Parameters

nodeOutputName – The tensor id of the output tensor of the ONNX node.

Returns

The partials type.

getPipelineStage(self: popart_core._BuilderCore) int
getRecomputeOutputInBackwardPass(*args, **kwargs)

Overloaded function.

  1. getRecomputeOutputInBackwardPass(self: popart_core._BuilderCore, nodeOutputName: str) -> bool

  2. getRecomputeOutputInBackwardPass(self: popart_core._BuilderCore, nodeOutputNames: Set[str]) -> bool

getStringNodeAttribute(self: popart_core._BuilderCore, attributeName: str, nodeOutputNames: Set[str]) str

Get the value of an attribute for the ONNX node where the value is a string.

This function will throw an exception if it cannot find the unique node or the attribute does not exist or it has not been set to the std::string type.

Parameters
  • attributeName – The name of the attribute for which the value is required.

  • nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

Returns

Value of the attribute.

getStringVectorNodeAttribute(self: popart_core._BuilderCore, attributeName: str, nodeOutputNames: Set[str]) List[str]

Get the value of an attribute for the ONNX node where the value is a vector of strings.

This function will throw an exception if it cannot find the unique node or if the attribute does not exist.

Parameters
  • attributeName – The name of the attribute for which the value is required.

  • nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

Returns

Value of the attribute.

getTensorDtypeString(self: popart_core._BuilderCore, id: str) str
getTensorShape(self: popart_core._BuilderCore, id: str) List[int]

Return an ONNX graph tensor shape, from either the input, output, or value_info lists in GraphProto.

Parameters

id – The id of the tensor for which dimensions are required.

Returns

A vector of the tensor dimensions.

getTrainableTensorIds(self: popart_core._BuilderCore) List[str]
getValueTensorIds(self: popart_core._BuilderCore) List[str]
getVirtualGraph(*args, **kwargs)

Overloaded function.

  1. getVirtualGraph(self: popart_core._BuilderCore) -> int

  2. getVirtualGraph(self: popart_core._BuilderCore, nodeOutputNames: str) -> int

  3. getVirtualGraph(self: popart_core._BuilderCore, nodeOutputNames: Set[str]) -> int

  4. getVirtualGraph(self: popart_core._BuilderCore, nodeOutputNames: List[str]) -> int

hasExecutionPhase(self: popart_core._BuilderCore) bool
hasPipelineStage(self: popart_core._BuilderCore) bool
hasVirtualGraph(self: popart_core._BuilderCore) bool
isInitializer(self: popart_core._BuilderCore, id: str) bool

Check if the ONNX tensor is in the initializer list of GraphProto.

Parameters

id – A tensor id.

Returns

True if the tensor is in the initializer list; false otherwise.

nameScope(self: popart_core._BuilderCore, name: str) NameContextManager
nodeHasAttribute(self: popart_core._BuilderCore, attributeName: str, nodeOutputNames: Set[str]) bool

Check whether the ONNX node has an attribute set.

This function will throw an exception if it cannot find the unique node.

Parameters
  • attributeName – The name of the attribute to find.

  • nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

Returns

true if the node has an attribute set; false otherwise.

outlineAttributes(self: popart_core._BuilderCore, arg0: dict) KeyValueContextManager
outputTensorLocation(*args, **kwargs)

Overloaded function.

  1. outputTensorLocation(self: popart_core._BuilderCore, nodeOutputNames: str, value: popart_core.TensorLocation = <popart_core.TensorLocation object at 0x7f5e597dd770>) -> None

  2. outputTensorLocation(self: popart_core._BuilderCore, value: popart_core.TensorLocation = <popart_core.TensorLocation object at 0x7f5e5a1504b0>) -> AttributeContextManager

pipelineStage(*args, **kwargs)

Overloaded function.

  1. pipelineStage(self: popart_core._BuilderCore, nodeOutputNames: str, value: int = 0) -> None

  2. pipelineStage(self: popart_core._BuilderCore, value: int) -> AttributeContextManager

recomputeOutput(*args, **kwargs)

Overloaded function.

  1. recomputeOutput(self: popart_core._BuilderCore, nodeOutputNames: str, value: popart_core.RecomputeType = <RecomputeType.Undefined: 0>) -> None

  2. recomputeOutput(self: popart_core._BuilderCore, value: popart_core.RecomputeType = <RecomputeType.Undefined: 0>) -> AttributeContextManager

recomputeOutputInBackwardPass(*args, **kwargs)

Overloaded function.

  1. recomputeOutputInBackwardPass(self: popart_core._BuilderCore, nodeOutputName: str, value: popart_core.RecomputeType = <RecomputeType.Recompute: 2>) -> None

  2. recomputeOutputInBackwardPass(self: popart_core._BuilderCore, nodeOutputNames: Set[str], value: popart_core.RecomputeType = <RecomputeType.Recompute: 2>) -> None

removeNodeAttribute(self: popart_core._BuilderCore, attributeName: str, nodeOutputNames: Set[str]) None

Remove an attribute from the ONNX node. This function will throw an exception if it cannot find the unique node or if the attribute does not exist.

Parameters
  • attributeName – The name of the attribute to find.

  • nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

saveInitializersExternally(self: popart_core._BuilderCore, ids: List[str], filename: str) None

Save tensor data externally.

The model data cannot exceed 2GB - the maximum size of a Protobuf message. To avoid this, for large models ONNX tensor data can be saved separately.

Parameters
  • ids – The names of tensors for which data is to be saved externally.

  • fn – The name of a file containing the binary tensor data. This can be an absolute or relative path. If a relative path, when the ONNX model is saved, external tensor data will be written to a path relative to the current working directory.

saveModelProto(self: popart_core._BuilderCore, filename: str) None

Save the builder’s ONNX ModelProto into the builder and validate it.

Parameters

fn – The name of a file containing an ONNX model protobuf.

schedulePriority(self: popart_core._BuilderCore, value: float) AttributeContextManager
setAvailableMemoryProportion(*args, **kwargs)

Overloaded function.

  1. setAvailableMemoryProportion(self: popart_core._BuilderCore, nodeOutputName: str, availableMemoryProportion: float) -> None

Set the available memory proportion for the given node.

This is used in the convolution op.

See also

Optimising Temporary Memory Usage for Convolutions and Matmuls on the IPU for some practical examples of using availableMemoryProportion.

Parameters
  • nodeOutputName – Name of the output tensor of the ONNX node.

  • availableMemoryProportion – The available memory proportion [0, 1).

  1. setAvailableMemoryProportion(self: popart_core._BuilderCore, nodeOutputNames: Set[str], availableMemoryProportion: float) -> None

Set the available memory proportion for the given node.

This is used in the convolution op.

See also

Optimising Temporary Memory Usage for Convolutions and Matmuls on the IPU for some practical examples of using availableMemoryProportion.

Parameters
  • nodeOutputName – Name of the output tensor of the ONNX node.

  • availableMemoryProportion – The available memory proportion [0, 1).

setEnableConvDithering(self: popart_core._BuilderCore, nodeOutputName: str, enableConvDithering: int) None

Enable convolution dithering.

Parameters
  • nodeOutputName – The tensor id of the output tensor of the ONNX node.

  • value – The value to enable convolution. This should be 1 to enable convolution dithering and 0 otherwise.

setGraphName(self: popart_core._BuilderCore, name: str) None

Set a graph name.

Parameters

name – The string to name the graph.

setInplacePreferences(self: popart_core._BuilderCore, nodeOutputName: str, prefs: Dict[str, float]) None
setPartialsType(self: popart_core._BuilderCore, nodeOutputName: str, partialsType: str) None

Set the partials type for the given node.

This is used in the convolution op.

Parameters
  • nodeOutputName – Name of the output tensor of the ONNX node.

  • partialsType – The type for the partials. Options are: FLOAT or

HALF.

setSerializeMatMul(self: popart_core._BuilderCore, nodeOutputName: Set[str], mode: str, factor: int = 0, keep_precision: bool = False) None
virtualGraph(*args, **kwargs)

Overloaded function.

  1. virtualGraph(self: popart_core._BuilderCore, nodeOutputNames: str, value: int = 0) -> None

  2. virtualGraph(self: popart_core._BuilderCore, nodeOutputNames: Set[str], value: int = 0) -> None

  3. virtualGraph(self: popart_core._BuilderCore, nodeOutputNames: List[str], value: int = 0) -> None

  4. virtualGraph(self: popart_core._BuilderCore, value: int) -> AttributeContextManager

13.5.1. AiGraphcoreOpset1

class popart.AiGraphcoreOpset1
abort(self: popart_core.AiGraphcoreOpset1, args: List[str] = [], debugContext: popart_internal_ir.DebugContext = '') None

Add an abort operation to the model.

The operation can be conditional or unconditional.

Parameters
  • args – A vector of input tensor ids.

  • debugContext – Optional debug information.

atan2(self: popart_core.AiGraphcoreOpset1, args: List[str], debugContext: popart_internal_ir.DebugContext = '') str

Add an ``atan2`:code:` operation to the model.

Returns the element-wise angle theta as a tensor. For :math:` -pi < theta le pi , such that for two input tensors :math:`x and \(y\) and given :math:` r ne 0 , then :math: x = r costheta , and :math: y = r sintheta `, element-wise.

In the case of :math:` x > 0 ` , :math:` theta = arctan(y/x)` .

Parameters
  • args – A vector of input tensor ids: [y:code:, x].

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

batchnormalization(self: popart_core.AiGraphcoreOpset1, args: List[str], num_outputs: int, epsilon: float = 9.999999747378752e-06, momentum: float = 0.8999999761581421, debugContext: popart_internal_ir.DebugContext = '') List[str]

Add a batch normalization operation to the model. This version uses N-1 as the population size for calculating running variance (like PyTorch). https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm1d.html

Whereas, the Onnx version uses N. https://github.com/onnx/onnx/blob/master/docs/Operators.md#BatchNormalization

Parameters
  • args – List of input tensor ids

  • num_outputs – The number of output tensor ids

  • epsilon – The ‘epsilon’ attribute

  • momentum – The ‘momentum’ attribute

  • name – Optional identifier for the operation

Returns

A list of normalized output tensors

bitwiseand(self: popart_core.AiGraphcoreOpset1, args: List[str] = [], debugContext: popart_internal_ir.DebugContext = '') str

Add a bitwise AND operation to the model.

The operation computes the bitwise AND of two integer tensors.

Parameters
  • args – Two broadcastable input tensors of type integer.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

bitwisenot(self: popart_core.AiGraphcoreOpset1, args: List[str] = [], debugContext: popart_internal_ir.DebugContext = '') str

Add a bitwise NOT operation to the model.

The operation computes the bitwise NOT of an integer tensor.

Parameters
  • args – An input tensor of type integer.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

bitwiseor(self: popart_core.AiGraphcoreOpset1, args: List[str] = [], debugContext: popart_internal_ir.DebugContext = '') str

Add a bitwise OR operation to the model.

The operation computes the bitwise OR of two integer tensors.

Parameters
  • args – Two broadcastable input tensors of type integer.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

bitwisexnor(self: popart_core.AiGraphcoreOpset1, args: List[str] = [], debugContext: popart_internal_ir.DebugContext = '') str

Add a bitwise XNOR operation to the model.

The operation computes the bitwise XNOR of two integer tensors.

Parameters
  • args – Two broadcastable input tensors of type integer.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

bitwisexor(self: popart_core.AiGraphcoreOpset1, args: List[str] = [], debugContext: popart_internal_ir.DebugContext = '') str

Add a bitwise XOR operation to the model.

The operation computes the bitwise XOR of two integer tensors.

Parameters
  • args – Two broadcastable input tensors of type integer.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

bucketize(self: popart_core.AiGraphcoreOpset1, args: List[str] = [], right: int = 0, debugContext: popart_internal_ir.DebugContext = '') str

Add a bucketize operation to the model.

The operation returns the indices of the buckets to which each value in the input tensor belongs. The ranges of each bucket are defined by the boundaries tensor. The returned index satisfies the following rules:

right == 1: boundaries[i-1] <= input[m][n]…[l][x] < boundaries[i] right == 0: boundaries[i-1] < input[m][n]…[l][x] <= boundaries[i]

Parameters
  • args – A vector of tensor IDs containing [input, boundaries]. Where * input is an N-D tensor or a scalar containing the search values * boundaries is a 1-D tensor defining ranges of the buckets. This must contain a monotonically increasing sequence.

  • right – If 0 (default) then the left boundary is closed.

  • debugContext – Optional debug information.

Returns

The tensor ID of the result tensor. The result tensor has the same size and shape as the input tensor.

call(self: popart_core.AiGraphcoreOpset1, args: List[str], num_outputs: int, callee: popart::Builder, debugContext: popart_internal_ir.DebugContext = '') List[str]

Add a call operation to the model.

This is a Poplar extension, to expose manual code re-use to the builder.

Parameters
  • args – A vector of input tensor ids.

  • callee – The subgraph to call into.

  • debugContext – Optional debug information.

Returns

A vector of tensors; the subgraph outputs.

copyvarupdate(self: popart_core.AiGraphcoreOpset1, args: List[str], debugContext: popart_internal_ir.DebugContext = '') str

Copies a tensor to an initalised tensor (variable).

This is used to update an initalised tensor (a variable created using addInitializedInputTensor()) which retains its value between iterations, by setting the value to the value of another tensor (the updater). The purpose is to manually update the tensor in use cases for variables other than trained parameters (weights) or tensors used by other ops.

Parameters
  • args – A vector of the input tensor ids containing the tensor to be updated, tensor and the tensor containing the values for the update, updater as [tensor, updater].

  • debugContext – Optional debug information.

Returns

to ensure correct ordering of the updated variable, you should use this variable for any op which should operate on the updated variable.

Return type

An alias to the updated variable

ctcbeamsearchdecoder(self: popart_core.AiGraphcoreOpset1, args: List[str], blank: int = 0, beam_width: int = 100, top_paths: int = 1, debug_context: popart_internal_ir.DebugContext = '') List[str]

Add a connectionist temporal classification (CTC) beam search decoder operation to the model.

Calculate the most likely p topPaths labels and their probabilities given the input p logProbs with lengths p dataLengths.

Parameters
  • args – A vector of input tensor ids. These are [logProbs, dataLengths], where logProbs is of shape [maxTime, batchSize, * numClasses], and dataLengths is of shape [batchSize].

  • blank – The integer representing the blank class.

  • beamWidth – The number of beams to use when decoding.

  • topPaths – The number of most likely decoded paths to return, must be less than or equal to p beamWidth.

  • debugContext – Optional debug information.

Returns

code:labelProbs, `labelLengths:code:, decodedLabels:code:], where labelProbs:code: is of shape [batchSize:code:, topPaths:code:], labelLengths:code: is of shape [batchSize:code:, topPaths:code:], and decodedLabels:code: is of shape [batchSize:code:, topPaths:code:, maxTime].

Return type

The names of the result tensors. These are [

ctcloss(self: popart_core.AiGraphcoreOpset1, args: List[str], reduction: popart_core.ReductionType = <ReductionType.Mean: 1>, blank: int = 0, outDataType: str = 'UNDEFINED', zeroInfinity: bool = False, debugContext: popart_internal_ir.DebugContext = '') str

Add a connectionist temporal classification (CTC) loss operation to the model.

With maximum input length T, batch size N, number of classes C and maximum target length S, this op calculates the CTC loss for a logarithmised probabilities tensor with shape [T, N, C], a class target tensor with shape [N, S], an input lengths tensor [N] and a target lengths tensor [N].

Note that C includes a blank class (default=0). The probabilities tensor is padded as required. Target sequences are also padded and are populated with values less than or equal to C, not including the blank class, up to their respective target lengths. Note that target lengths cannot exceed input lengths.

Parameters
  • args – A vector of input tensor ids [log_probs,:code:targets, input_lengths, target_lengths].

  • reduction – The type of reduction to perform on the individual losses.

  • blank – The integer representing the blank class.

  • outDataType – The data type of the output tensors. Default = UNDEFINED.

  • zeroInfinity – If true infinite losses and the associated gradients are zeroed-out. Default = false.

  • debugContext – Optional debug information

Returns

The tensor id of the result tensor.

depthtospace(self: popart_core.AiGraphcoreOpset1, args: List[str], blocksize: int, mode: str, debugContext: popart_internal_ir.DebugContext = '') str

Add a depth-to-space operation to the model.

This allows DepthToSpace_11 to be targeted from earlier opsets.

The purpose of a depth-to-space operation, also known as pixel shuffling, is to rearrange data from the depth (channels) dimension into the spatial (width and height) dimensions. It is an efficient means of learning upsampling alongside mixing convolution with bilinear interpolation and using transpose convolution.

Parameters

args – A vector containing a single tensor id of the input tensor

:param of shape [N: :param C: :param H: :param W]: :param where N is the batch axis: :param C is the: :param channel or depth: :param H is the height and W is the width.: :param blocksize: The size of the blocks to be moved. If the input is :param [N: :param C: :param H: :param W] and the blocksize is B: :param the output will be: :param [N: :type [N: B*B :param :code:`C/: :type :code:`C/: B*B :param mode: Specifies how the data is rearranged: :param * “DCR”: depth-column-row order :type * “DCR”: Default :param * “CRD”: column-row-depth order :param debugContext: Optional debug information.

Returns

A tensor which is a rearrangement of the input tensor.

detach(self: popart_core.AiGraphcoreOpset1, args: List[str], debugContext: popart_internal_ir.DebugContext = '') str

Add a detach operation to the model.

Parameters
  • args – A vector of input tensor ids.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

dynamicadd(self: popart_core.AiGraphcoreOpset1, args: List[str], axes: List[int], sizes: List[int], debugContext: popart_internal_ir.DebugContext = '') str

Add a dynamic add operation to the model.

Creates a copy of a tensor, tensor, with a slice tensor, slice, added at an offset position, offset. For example:

```

out = tensor out[offset] += slice

``:code:`

Args:

args: A vector of input tensor ids: [tensor:code:, offset:code:, slice]. axes: The axes along which to add the slice. sizes: The size of the slice along each axis. debugContext: Optional debug information.

Returns

The tensor id of the result tensor.

dynamicslice(self: popart_core.AiGraphcoreOpset1, args: List[str], axes: List[int], sizes: List[int], noOverlap: int = 0, debugContext: popart_internal_ir.DebugContext = '') str

Add a dynamic slice operation to the model.

Creates a new slice tensor, slice, at offset position, offset, in a tensor, tensor. For example: ```

slice = tensor[offset]

```

Args:

args: A vector of input tensor ids: [tensor, offset]. axes: The axes along which to slice. sizes: The size of the slice along each axis. noOverlap: Indicates whether the slice regions overlap or not. If 1,

slice regions do not overlap, otherwise they do overlap.

debugContext: Optional debug information.

Returns

The tensor id of the result tensor.

dynamicupdate(self: popart_core.AiGraphcoreOpset1, args: List[str], axes: List[int], sizes: List[int], noOverlap: int = 0, debugContext: popart_internal_ir.DebugContext = '') str

Add a dynamic update operation to the model.

Creates a copy of a tensor, tensor, and updates the elements of the copied tensor at offset position, offset, with the elements contained in the slice tensor, slice, For example: ```

out = tensor out[offset] = slice

```

Args:

args: A vector of input tensor ids: [tensor, offset, slice]. axes: The axes along which to update. sizes: The size of the slice along each axis. noOverlap: Indicates whether the updates overlap or not. If 1,

the updates do not overlap, otherwise they do overlap.

debugContext: Optional debug information.

Returns

The tensor id of the result tensor.

dynamiczero(self: popart_core.AiGraphcoreOpset1, args: List[str], axes: List[int], sizes: List[int], debugContext: popart_internal_ir.DebugContext = '') str

Add a dynamic zero operation to the model.

Creates a copy of a tensor, tensor, with a slice tensor at offset position, offset set to zero. For example: ```

out = tensor out[offset] = 0.0

```

Parameters
  • args – A vector of input tensor ids: [tensor, offset].

  • axes – The axes along which to zero elements.

  • sizes – The size of the slice along each axis.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

expm1(self: popart_core.AiGraphcoreOpset1, args: List[str], debugContext: popart_internal_ir.DebugContext = '') str

Add a expm1 operation to the model.

This calculates the element-wise exponential of the input tensor and subtracts one: :math:` exp(x) - 1 `.

Parameters
  • args – A vector of input tensor ids.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

fmod(self: popart_core.AiGraphcoreOpset1, args: List[str], debugContext: popart_internal_ir.DebugContext = '') str

Add an fmod operation to the model.

This is equivalent to the C fmod function. The result has the same sign as the dividend.

Parameters
  • args – A vector of input tensor ids.

  • debugContext – Optional debug information.

Returns

Computes the element-wise remainder of division. The remainder has the same sign as the dividend.

gelu(self: popart_core.AiGraphcoreOpset1, args: List[str], debugContext: popart_internal_ir.DebugContext = '') str

Add a GELU operation to the model.

This is a Poplar extension.

Parameters
  • args – A vector of input tensor ids.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

geluerf(self: popart_core.AiGraphcoreOpset1, args: List[str], debugContext: popart_internal_ir.DebugContext = '') str

Add a GELU_ERF operation to the model.

This is a Poplar extension.

Parameters
  • args – A vector of input tensor IDs.

  • debugContext – Optional debug information.

Returns

The tensor ID of the result tensor.

groupedgather(self: popart_core.AiGraphcoreOpset1, args: List[str] = [], axis: int = 0, group_size: int = 1, debugContext: popart_internal_ir.DebugContext = '') str
groupedscatterreduce(self: popart_core.AiGraphcoreOpset1, args: List[str] = [], axis_size: int, axis: int = -1, reduction: popart_core.ScatterReduction = <ScatterReduction.Sum: 0>, group_size: int = 1, enable_index_broadcast: int = 1, debugContext: popart_internal_ir.DebugContext = '') str
groupnormalization(self: popart_core.AiGraphcoreOpset1, args: List[str], num_groups: int, epsilon: float = 9.999999747378752e-06, debugContext: popart_internal_ir.DebugContext = '') List[str]

Add a group normalization operation to the model.

This is a Poplar extension.

The group will be created from a strided input.

Parameters
  • args – A vector of input tensor ids for input data x, scale scale, and bias bias as [x, scale, bias].

  • num_groups – The number of groups to separate the channels into.

  • epsilon – The epsilon value to use to avoid division by zero.

  • debugContext – Optional debug information.

Returns

A vector of output tensor ids for output data y, the mean mean

and the variance var as [y, mean, var].

identityloss(self: popart_core.AiGraphcoreOpset1, args: List[str], reduction: popart_core.ReductionType = <ReductionType.Mean: 1>, debugContext: popart_internal_ir.DebugContext = '') str

Add an identity loss operation to the model.

Calculates the loss using the identity operator.

Parameters
  • args – A vector of input tensor ids.

  • reduction – The type of reduction to perform on the individual losses.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

incrementmod(self: popart_core.AiGraphcoreOpset1, args: List[str] = [], increment: float, modulus: float, debugContext: popart_internal_ir.DebugContext = '') str

Add an incrementmod operation to the model.

The operation is of the form y = (x + increment) % modulus.

Parameters
  • args – A vector with a single input tensor id.

  • increment – A scalar increment

  • modulus – A scalar modulus

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

init(*args, **kwargs)

Overloaded function.

  1. init(self: popart_core.AiGraphcoreOpset1, shape: List[int], data_type: int, init_type: int, batch_axis: int, debugContext: popart_internal_ir.DebugContext = ‘’) -> str

Add an init operation to the model.

Parameters
  • shape – The shape of the tensor to initialise.

  • data_type – The data type to initialise tensor with. The value is the integer attribute taken from the DataType enum.

  • init_type – The mode of the tensor initialisation. The value is the integer attribute taken from the InitType enum.

  • batch_axis – Batch axis specifies the axis that the batches are split along and is a literal integer.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

  1. init(self: popart_core.AiGraphcoreOpset1, shape: List[int], data_type: int, init_type: int, debugContext: popart_internal_ir.DebugContext = ‘’) -> str

Add an init operation to the model.

Parameters
  • shape – The shape of the tensor to initialise.

  • data_type – The data type to initialise tensor with. The value is the integer attribute taken from the DataType enum.

  • init_type – The mode of the tensor initialisation. The value is the integer attribute taken from the InitType enum.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

l1loss(self: popart_core.AiGraphcoreOpset1, args: List[str], lambda: float, reduction: popart_core.ReductionType = <ReductionType.Mean: 1>, debugContext: popart_internal_ir.DebugContext = '') str

Add an l1 loss operation to the model.

Calculates the mean absolute error between each element in the input with a zero target.

Parameters
  • args – A vector of input tensor ids.

  • lambda – The scale factor of the L1 loss.

  • reduction – The type of reduction to perform on the individual losses.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

log1p(self: popart_core.AiGraphcoreOpset1, args: List[str], debugContext: popart_internal_ir.DebugContext = '') str

Add a log1p operation to the model.

This calculates the element-wise logarithm of the input tensor plus one: :math:` log(x + 1) `.

Parameters
  • args – A vector of input tensor ids.

  • name – Optional identifier for operation.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

lstm(self: popart_core.AiGraphcoreOpset1, args: List[str], outputFullSequence: int = 1, debugContext: popart_internal_ir.DebugContext = '') List[str]
multiconv(self: popart_core.AiGraphcoreOpset1, args: List[List[str]], dilations: List[List[int]] = [], inDilations: List[List[int]] = [], pads: List[List[int]] = [], outPads: List[List[int]] = [], strides: List[List[int]] = [], availableMemoryProportions: List[float] = [], partialsTypes: List[str] = [], planType: Optional[str] = None, perConvReservedTiles: Optional[int] = None, cycleBackOff: Optional[float] = None, enableConvDithering: List[int] = [], debugContext: popart_internal_ir.DebugContext = '') List[str]

Add a multi-convolution operation to the model.

Using this multi-convolution API ensures that the convolutions are executed in parallel on the device.

Functionally, a multi-convolution is equivalent to a series of single convolutions. Using this multi-convolution API is always equivalent to calling the single-convolution API (conv) once for each argument.

For example, calling: ```

A0 = conv({X0, W0, B0}) A1 = conv({X1, W1})

```

is functionally equivalent to calling:

```

{A0, A1} = multiconv({{X0, W0, B0}, {X1, Q1}).

```

It is possible that any two convolutions cannot be executed in parallel due to topological constraints. For example, the following:

```

B = conv({A, W0}); C = B + A D = conv({C, W1});

```

cannot be converted to:

```

{B, D} = multiconv({{A, W0}, {C, W1}}).

``:code:`

Note that it is not possible to create such a cycle by adding a multi-convolution with this API.

Calls to multiconv() are mapped to poplar::poplin::multiconv::convolution().

All input vectors must be either empty, or equal in length to the number of convolutions. Note that groups for each convolution are automatically inferred from the shapes of the data and weight inputs.

Args:
tensors: List of tensor ids for input tensors for data, weights and

biases as [data:code:, weight:code:,`bias:code:] for each convolution. `bias:code: is optional.

dilations: The dilations attributes for each convolution. inDilations: The input dilations attributes for each convolution. pads: The pads for each convolution. outPads: The output padding for each convolution. strides: The strides for each convolution. availableMemoryProportions: The available memory proportions per

convolution, each [0, 1).

partialsTypes: The partials type per convolution. planType: Run convolutions in parallel or series. perConvReservedTiles: The number of tiles to reserve per convolution

when planning.

cycleBackOff: Cycle back-off proportion, [0, 1). enableConvDithering: Enable convolution dithering per convolution. If

true:code:, then convolutions with different parameters will be laid out from different tiles in an effort to improve tile balance in models.

debugContext: Optional debug information.

Returns

A vector of tensor ids of the output tensor from each convolution.

See also

Optimising Temporary Memory Usage for Convolutions and Matmuls on the IPU for some practical examples of using availableMemoryProportion.

nearbyint(self: popart_core.AiGraphcoreOpset1, args: List[str], debugContext: popart_internal_ir.DebugContext = '') str

Add a nearby int rounding operation to the model.

Rounds the floating-point argument to an integer value in floating-point format.

Parameters
  • args – A vector of input tensor ids.

  • debugContext – Optional debug information.

Returns

The normalized output tensor ids.

nllloss(self: popart_core.AiGraphcoreOpset1, args: List[str], reduction: popart_core.ReductionType = <ReductionType.Mean: 1>, ignoreIndex: Optional[int] = None, inputIsLogProbability: bool = False, debugContext: popart_internal_ir.DebugContext = '') str

Add a negative log-likelihood loss operation to the model.

Calculates the negative log likelihood (NLL) loss given a probability tensor over classes, and a target tensor containing class labels.

Parameters
  • args – A vector of input tensor ids: probability and tensor.

  • reduction – The type of reduction to perform on the individual losses.

  • ignoreIndex – Optional class index to ignore in loss calculation.

  • inputIsLogProbability – If true the input tensor contains log-probabilities, otherwise raw probabilities. Default = false.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

nop(self: popart_core.AiGraphcoreOpset1, args: List[str], debugContext: popart_internal_ir.DebugContext = '') str

Add a no-op operation to the model.

Parameters
  • args – A vector of input tensor ids.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

normalize_image(self: popart_core.AiGraphcoreOpset1, args: List[str], scale: float = 1.0, debugContext: popart_internal_ir.DebugContext = '') str

Pad and normalize N three-channel image input (in (B,H,W,C) format).

The image is padded after it is transferred to IPU memory for the best performance. The normalized op uses an optimized IPU codelet to normalize the image input. The output always has four channels in the last dimension.

Parameters
  • args – A vector of input tensor ID.

  • scale – The normalization scalar parameter.

Returns

The tensor ID of the padded and normalized image input tensor.

packedDataBlock(self: popart_core.AiGraphcoreOpset1, args: List[str], maxSequenceLengths: List[int], resultSize: int, callbackBatchSize: int, callback: popart::Builder, debugContext: popart_internal_ir.DebugContext = '') str

Add a call operation to the model.

This is a Poplar extension, to expose manual code re-use to the builder.

Parameters
  • args – A vector of input tensor ids.

  • callee – The subgraph to call into.

  • debugContext – Optional debug information.

Returns

A vector of tensors; the subgraph outputs.

printtensor(self: popart_core.AiGraphcoreOpset1, args: List[str], print_gradient: int = 1, debugContext: popart_internal_ir.DebugContext = '', title: str = '', summariseThreshold: int = 1000, edgeItems: int = 3, maxLineWidth: int = 75, digits: int = 8, floatFormat: int = 0, separator: str = ' ', openBracket: str = '[', closeBracket: str = ']') str

Add a print tensor operation to the model.

This is a Poplar extension.

Parameters
  • args – A vector of tensor ids to print.

  • print_gradient – Indicates whether the gradient tensor(s) associated with the input tensor(s) are also printed. If 1, the gradient tensor(s) are also printed, otherwise the gradient tensor(s) are not printed.

  • debugContext – Optional debug information.

  • title – An optional title to print.

  • summariseThreshold – (default 1000) If the number of elements of the

  • edgeItems – (default 3) number of edge elements to include at the

  • maxLineWidth – (default 75) lines longer than this limit will be split

  • digits – (default 8) number of digits to display. For integers this

  • floatFormat – (default 0=Auto) determines the floating point format to

  • separator – (default space) character used to delininate values.

  • openBracket – (default square bracket) character used to open a

  • closeBracket – (default square bracket) character used to close a

tensor exceeds this threshold the output will be summarised. Only the edge elements will be displayed with an ellipsis indicating skipped elements. A value of 0 will disable summarisation. beginning and end when summarisation is enabled across multiple lines. A value of 0 will disable line splitting. limit can be exceeded if any number is large enough. For floating points this does not include the exponent. The number of digits is used in conjunction analysis of the tensor to determine the width of each element to align all elements when printed. A value of 0 disables this analysis and each elements will be printed in an unaligned format. use. 0=auto, 1=fixed, 2=scientific 3=none. Automatic mode determines the appropriate format based on the data. If digits==0 this option is disregarded and the floatFormat is set to none. tensor. tensor.

Returns

The tensor id of the result tensor.

reducemedian(self: popart_core.AiGraphcoreOpset1, args: List[str], axes: Optional[List[int]] = None, keepdims: int = 1, debugContext: popart_internal_ir.DebugContext = '') List[str]

Add reducemedian operation to the model.

This method computes the median values along the specified axes. In the case of an even number of elements, the lower of the two medians is selected. By default, the input tensor is reduced over all axes. Additionally, the operation also returns the indices of found median values in the reduction axis. If reduction is performed over multiple axes, the indices are “flattened” over the reduced axes, similar to numpy.ndarray.flat. The index may not be the first occurrence of the median value found in the input tensor.

Parameters
  • args – A vector with a single input tensor id.

  • axes – The axes over which the reduction is performed.

  • keepdims – If 1, the result tensors are of equal size as the input, but with reduction axes of size 1. Otherwise, the reduction axes are squeezed and the result tensors have fewer dimensions compared to the input. Default = 1.

  • debugContext – Optional debug information.

Returns

The names of the two result tensors, one for median values and one for indices.

remainder(self: popart_core.AiGraphcoreOpset1, args: List[str], debugContext: popart_internal_ir.DebugContext = '') str

Add a remainder operation to the model.

This is equivalent to Python’s modulo operator %. The result has the same sign as the divisor.

Parameters
  • args – A vector of input tensor ids.

  • debugContext – Optional debug information.

Returns

Computes the element-wise remainder of division. The remainder has the same sign as the divisor.

replicatedallreduce(self: popart_core.AiGraphcoreOpset1, args: List[str], collectiveOperator: Optional[popart::CollectiveOperator] = None, commGroup: Optional[popart_internal_ir.CommGroup] = None, debugContext: popart_internal_ir.DebugContext = '') str

DEPRECATED: Add a replicated allreduce operation to the model.

This is a Poplar extension, to expose manual code re-use to the builder.

Parameters
  • args – A vector of input tensor ids to reduce across.

  • commGroup – GCL CommGroup parameter.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

replicatedreducescatter(self: popart_core.AiGraphcoreOpset1, args: List[str], collectiveOperator: Optional[popart::CollectiveOperator] = None, commGroup: Optional[popart_internal_ir.CommGroup] = None, debugContext: popart_internal_ir.DebugContext = '') str

Add a replicated reduce-scatter operation to the model.

This is a Poplar extension, to expose manual code re-use to the builder.

Parameters
  • args – A vector of input tensor ids to reduce across.

  • collectiveOperator – A Graphcore Communication Library (GCL) collective operator.

  • commGroup – A GCL CommGroup parameter.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

reshape(self: popart_core.AiGraphcoreOpset1, args: str, shape: List[int], debugContext: popart_internal_ir.DebugContext = '') str

Add a reshape operation to the model.

This reshapes an input tensor. This reshape takes the target shape as an attribute instead of a tensor input as for the ONNX reshape op.

Parameters
  • arg – The tensor id of the input tensor.

  • shape – The shape of the output tensor. The output tensor must contain the same number of elements as the input tensor.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

reverse(self: popart_core.AiGraphcoreOpset1, args: List[str], dimensions: List[int], debugContext: popart_internal_ir.DebugContext = '') str

Add a reverse operator to the model.

This reverses or flips the tensor along the specified dimensions.

Parameters
  • args – A vector of input tensor ids.

  • dimensions – The dimensions along which to reverse the tensor. If this is empty then this is equivalent to the identity operator.

  • debugContext – Optional debug information.

Returns

The tensor id of the reversed tensor.

round(self: popart_core.AiGraphcoreOpset1, args: List[str], debugContext: popart_internal_ir.DebugContext = '') str

Add a rounding operation to the model.

This allows Round_11 to be targeted from earlier opsets.

Parameters
  • args – A vector of input tensor ids.

  • debugContext – Optional debug information.

Returns

The normalized output tensor ids.

scale(self: popart_core.AiGraphcoreOpset1, args: List[str], scale: float, debugContext: popart_internal_ir.DebugContext = '') str

Add a scale operation to the model.

This is a Poplar extension.

Parameters
  • args – A vector of input tensor ids.

  • scale – The scale to apply.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

scaledadd(self: popart_core.AiGraphcoreOpset1, args: List[str], scale0: float = 1.0, scale1: float = 1.0, debugContext: popart_internal_ir.DebugContext = '') str

Add a scaled add operation to the model.

The scaled add operation takes the form: ```

X = scale0 * T0 + scale1 * T1

` where ``scale0 is the scale factor to be applied to tensor T0 and scale1 is the scale factor to be applied to tensor T1.

Parameters
  • args – A vector of input tensor ids: [T0, T1, scale0, scale1].

  • scale0 – The scale to apply (if no scale0 tensor is supplied).

  • scale1 – The scale to apply (if no scale1 tensor is supplied).

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

scatterreduce(self: popart_core.AiGraphcoreOpset1, args: List[str] = [], axis_size: int, axis: int = -1, reduction: popart_core.ScatterReduction = <ScatterReduction.Sum: 0>, enable_index_broadcast: int = 1, debugContext: popart_internal_ir.DebugContext = '') str

Add a scatterreduce operation to the model.

Reduces all the values from the source tensor src at the indices specified along the given axis by index. In some frameworks this is also known as a split-apply-combine operation as well as a reduce or aggregate by key. In this analogy the src input is the data we are splitting and the indices define the groups for the reduction operation.

In pseudocode the operator can be expressed as: ```

for i in range(axis_size):

output[i] = reduce(src[index == i])

``:code:` where the looping over output indices is implicitly handled by poplar.

Parameters
  • args – A vector of tensor ids as [src:code:, index:code:].

  • axis_size – The size of the reduced axis.

  • axis – The axis to reduce along. Default = -1.

  • reduction – The type of reduction to apply. Default = ScatterReduction::Sum.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

sequenceslice(self: popart_core.AiGraphcoreOpset1, args: List[str], zeroUnused: int = 0, debugContext: popart_internal_ir.DebugContext = '') str

Slice a 2D tensor based on offsets.

The outermost dimension is sliced. For the following:
  • source is the source tensor.

  • destination is the destination tensor.

  • N is the number of elements to copy.

  • sourceOffset is the first element read from the source tensor.

  • destinationOffset is the first element written to in the destination

    tensor.

Then, for each entry in N, sourceOffset and destinationOffset:

```

destination[destinationOffset:destinationOffset+N][…] = source[sourceOffset:sourceOffset+N][…] ``:code:` Entries after the first N==0:code: may be ignored. Unreferenced elements of destination:code: are zeroed if zeroUnused:code: is set. The same output element should not be written by multiple inputs.

source:code: and destination:code: must have rank greater than or equal to 2. The outer dimension is sliced; the product of the inner dimensions must match. sourceOffset:code:, destinationOffset:code: and N must be 1-dimensional and of the same size. For example:

```

N = [1, 1, 1] sourceOffset = [0, 2, 4] destinationOffset = [0, 1, 2]

``:code:`

Args:
args: A vector of input tensor ids for the following tensors

[source:code:, destination:code:, N:code:, sourceOffset:code:, destinationOffset:code:].

zeroUnused: Determines whether to zero unreferenced destination

elements. If 1, the unreferenced elements are zeroed, otherwise they are not zeroed.

debugContext: Optional debug information.

shapeddropout(self: popart_core.AiGraphcoreOpset1, args: List[str], shape: List[int], ratio: float = 0.5, debugContext: popart_internal_ir.DebugContext = '') str

Add a shaped dropout operation to the model.

Applies a shaped dropout to the input tensor. This operator requires a shape parameter that is used to define the shape of the dropout mask so that strongly correlated features in the input tensor can be preserved. The provided shape must be broadcastable to the input tensor. Note that this operation targets the poprand library function of the same name.

Parameters
  • args – A vector of input tensor ids.

  • shape – The shape of dropout mask. This must be broadcastable to the input.

  • ratio – The probability of dropping an input feature. Default = 0.5.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

slice(self: popart_core.AiGraphcoreOpset1, args: List[str], ends: List[int], starts: List[int], axes: List[int], debugContext: popart_internal_ir.DebugContext = '') str

Add a slice to the model.

This version of slice uses the starts, ends and axes attributes rather than tensor inputs. This reduces the number of ops as constant tensors are treated as ops while attributes are not.

Parameters
  • args – A vector of input tensor ids.

  • ends – The ends attribute.

  • starts – The starts attribute.

  • axes – The axes attribute.

  • debugContext – Optional debug information.

Returns

The normalized output tensor id.

sort(self: popart_core.AiGraphcoreOpset1, args: List[str] = [], axis: int = - 1, descending: int = 0, stable: int = 0, debugContext: popart_internal_ir.DebugContext = '') List[str]

Add a sort operation to the model.

Parameters
  • args – A vector with a single input tensor id.

  • axis – The dimension to sort along.

  • descending – If ‘1’ then the elements are sorted in descending order by value.

  • stable – If ‘1’ then the sorting routine becomes stable, preserving the order of equivalent elements.

  • debugContext – Optional debug information.

Returns

A vector of (values, indices) is returned, where the values are the sorted values and indices are the indices of the elements in the original input tensor.

splinebasis(self: popart_core.AiGraphcoreOpset1, args: List[str] = [], degree: int, debugContext: popart_internal_ir.DebugContext = '') List[str]

Add a splinebasis operation to the model.

The operation returns two outputs: B-spline basis functions coefficients and weight indices for each spline coefficient.

Parameters
  • args

    vector of tensor IDs containing [pseudo, kernel_size,

    is_open_spline] where

    • pseudo is an 2-D tensor with pseudo coordinates, of shape numEdges * numDims.

    • kernel_size is a 1-D tensor containing kernel size at each dimension of edge’s pseudo coordinates.

    • is_open_slice is a 1-D tensor that for each dimension encodes whether open or closed B-spline basis must be used.

  • degree – B-spline basis function degree.

Returns

Basis and weightIndex tensors, both of shape numEdges * numSplines. Basis tensor contains B-spline basis functions coefficients. WeightIndex tensor contains weight indices for each spline coefficient.

splineweighting(self: popart_core.AiGraphcoreOpset1, args: List[str] = [], debugContext: popart_internal_ir.DebugContext = '') str

Add a splineweighting operation to the model.

The operation returns features weighted by a continuous B-spline kernel functions.

Parameters

args

A vector of tensor IDs containing [input, weight, basis, weightIndex] where: * input is an 2-D (numEdges * numInputChannels) tensor with input features. * weight is a 3-D (numEdges * numInputChannels * numOutputChannels)

tensor containing weights for B-Spline functions.

  • basis is a 2-D (numEdges * numSplines) tensor with B-spline basis produced by splinebasis op.

  • weightIndex is a 2-D (numEdges * numSplines) tensor with weight indices produced by splinebasis op.

Returns

Tensor of shape (numEdges * numOutputChannels) containing features weighted by a continuous B-spline kernel function.

subsample(self: popart_core.AiGraphcoreOpset1, args: List[str], strides: List[int], debugContext: popart_internal_ir.DebugContext = '') str

Add a sub-sample operation to the model.

This is a Poplar extension.

If multiple tensors are provided, the strides will be applied to them all.

Parameters
  • args – A vector of tensor ids to sub-sample.

  • strides – The strides to use.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

swish(self: popart_core.AiGraphcoreOpset1, args: List[str] = [], debugContext: popart_internal_ir.DebugContext = '') str

Add a swish operation to the model.

The operation computes the swish activation function, also known as the SiLU activation.

Parameters
  • args – A vector with a single input tensor id.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

tensorremap(self: popart_core.AiGraphcoreOpset1, args: List[str], remap_type: int = 0, debugPrefix: popart_internal_ir.DebugContext = '') str

13.6. Data flow

class popart.AnchorReturnTypeId

Members:

Final : Only return the tensor value for the last micro batch of the Session::run call for each replica. The buffer shape required for this anchor in IStepIO is [replicationFactor, <anchorTensorShape>] (with dimensions of size 1 removed).

EveryN : Return the tensor value for every N-th global batch for each replica and for all accumulation steps in that global batch. Note that the value of N is captured by AnchorReturnType. The buffer shape required for this anchor in IStepIO is [batchesPerStep / N, accumulationFactor, replicationFactor, <anchorTensorShape>] (with dimensions of size 1 removed).

All : Return the tensor value for all micro batches for each replica. The buffer shape required for this anchor in IStepIO is [batchesPerStep, accumulationFactor, replicationFactor, <anchorTensorShape>] (with dimensions of size 1 removed).

Sum : Return one tensor value for each replica, doing a sum reduction over the batchesPerStep and accumulationFactor dimensions. The buffer shape required for this anchor in IStepIO is [replicationFactor, <anchorTensorShape>] (with dimensions of size 1 removed).

property name
class popart.ExchangeStrategy

Enum type to specify an exchange strategy

JustInTime: .- outer loop ————-. |.- inner loop -----------.| || load - compute - store || |'------------------------'| ‘————————–’

OverlapInnerLoop: - Boxes denote subgraphs / subgraph Ops / loops - Inputs/outputs are loop carried in order

.- outer loop —————————————-. | .- inner loop -. | | load - compute - | - store | | | load - | - compute – | - store | | | load —– | - compute - store | | ‘————–’ | ‘—————————————————–’

^^^^^^^ ^^^^^^^ ^^^^^^^ overlap overlap overlap

OverlapLoops - Boxes denote subgraphs / subgraph Ops / loops - Numbers on boxes are matching subgraph/loop inputs and outputs - Overlap indicators indicate compute & load/store pairs overlapping in time

load

compute load load < overlap
| |

1 2 |

.– inner loop –. | | | | | | | store compute | | < overlap | load | | | < overlap | | | | | ‘—————-’ |

2 1 load compute < overlap | | | | 1 2 3 4

.- outer loop ———————————–. | | | | | | | compute store | store | < overlap | / | | 1 2 | | .– inner loop –. | | | | | | | | | store compute | | < overlap | | load | | | < overlap | | | | | | | ‘—————-’ | | 2 1 | | | | | | load compute | load | < overlap | | | | | | ‘————————————————’

3 4 2 1 | | | |

compute | store | < overlap
/
1 2
.– inner loop –.
| | | |
| store compute | < overlap
| load | | < overlap
| | | |
‘—————-’
2 1
| |
store compute store < overlap

store

OverlapStep: Not supported yet

Members:

JustInTime : Copy tensor when required

OverlapInnerLoop : Preload values in previous inner loop iteration for the next iteration

OverlapLoops : Preload values in the previous loop iteration for the next iteration (implies OverlapInnerLoop)

OverlapStep : Preload values in the previous host training step for next step (implies OverlapLoops) - not supported yet

property name
class popart.AnchorReturnType
exchangeStrategy(self: popart_core.AnchorReturnType) popart_core.ExchangeStrategy
id(self: popart_core.AnchorReturnType) popart_core.AnchorReturnTypeId
rp(self: popart_core.AnchorReturnType) int
tileSet(self: popart_core.AnchorReturnType) popart_core.TileSet
class popart.DataFlow
anchors(self: popart_core.DataFlow) List[str]
art(self: popart_core.DataFlow, arg0: str) popart_core.AnchorReturnType
batchesPerStep(self: popart_core.DataFlow) int
isAnchored(self: popart_core.DataFlow, arg0: str) bool
nAnchors(self: popart_core.DataFlow) int
setBatchesPerStep(self: popart_core.DataFlow, arg0: int) None
class popart.InputSettings
exchangeStrategy(self: popart_core.InputSettings) popart_core.ExchangeStrategy
replicatedStreamMode(self: popart_core.InputSettings) popart_core.ReplicatedStreamMode
tileSet(self: popart_core.InputSettings) popart_core.TileSet
class popart.ReplicatedStreamMode

Members:

Replicate :

Broadcast :

property name

13.7. Device manager

class popart.DeviceType

Members:

IpuModel : Use the Poplar IPU Model for graph compilation and execution. The IPU Model will simulate the behaviour of the IPU hardware. It will not completely implement every aspect of a real IPU. (Default).

Cpu : Use CPU for graph compilation and execution.

Ipu : Use IPU for graph execution.

OfflineIpu : Compile graph for later execution. This can be done even if IPUs are not present. Offline graph compilation is also useful for verifying memory constraints.

Sim : [For Graphcore internal use only] Use a simulator for graph compilation and execution.

property name
class popart.DeviceConnectionType

Members:

Always : Attach to the IPU from the start (Default).

OnDemand : Wait until the compilation is complete and the executable is ready to be run before attaching to the IPU.

Never : Never try to attach to an IPU. This is useful for offline compilation (DeviceType::OfflineIpu. Trying to run an executable will throw an error.

property name
class popart.SyncPattern

Controls synchronisation in multi-IPU systems.

Members:

Full : Require all IPUs to synchronise on every communication between IPUs or between IPUs and host (Default).

SinglePipeline : Allow IPUs to synchronise with the host independently, without having to synchronise with each other. This permits any one IPU to perform host IO while other IPUs are processing data.

ReplicaAndLadder : Allow an IPU group to communicate with the host without requiring synchronisation between groups. This permits multiple IPU groups to alternate between performing host IO and computation.

property name
class popart.DeviceInfo
attach(self: popart_core.DeviceInfo) bool
detach(self: popart_core.DeviceInfo) None
tryAttachUntilTimeout(self: popart_core.DeviceInfo) bool
class popart.DeviceManager
acquireAvailableDevice(self: popart_core.DeviceManager, numIpus: int = 1, tilesPerIpu: int = 0, pattern: popart_core.SyncPattern = <SyncPattern.Full: 0>, connectionType: popart_core.DeviceConnectionType = <DeviceConnectionType.Always: 0>, selectionCriterion: popart_core.DeviceSelectionCriterion = <DeviceSelectionCriterion.First: 0>) popart::DeviceInfo

Finds an available hardware device, with a certain number of IPUs. This method will attach to the device if connectionType is equal to DeviceConnectionType::Always. Throws an error if there are less than numIpus IPUs available.

Parameters
  • numIpus – The number of IPUs on the device [=1].

  • tilesPerIPU – The number of tiles per IPU. An input of 0 will match any number. (Default: 0).

  • pattern – The setting for when to synchronise in a multi-IPU system. (Default: SyncPattern::Full).

  • connectionType – The connection type, for deciding when to attach to the device.

  • selectionCriterion – How to select a device from the list of valid selections.

Returns

A device, which can be used with a session.

acquireDeviceById(self: popart_core.DeviceManager, id: int, pattern: popart_core.SyncPattern = <SyncPattern.Full: 0>, connectionType: popart_core.DeviceConnectionType = <DeviceConnectionType.Always: 0>) popart::DeviceInfo
Allocates the hardware device by ID. This ID can be found running :code:`gc-info

-l`. This method will attach to the device if connectionType is equal

to DeviceConnectionType::Always.

Parameters
  • id – The ID of the IPU to be used.

  • pattern – The setting for when to synchronise in a multi-IPU system. (Default: SyncPattern::Full).

  • connectionType – The connection type, for deciding when to attach to the device. (Default: DeviceConnectionType::Always).

Returns

A device, which can be used with a session.

createCpuDevice(self: popart_core.DeviceManager) popart::DeviceInfo
createIpuModelDevice(self: popart_core.DeviceManager, arg0: dict) popart::DeviceInfo
createOfflineIPUDevice(self: popart_core.DeviceManager, opts: dict) popart::DeviceInfo
createOfflineIpuFromDeviceInfo(self: popart_core.DeviceManager, arg0: popart::DeviceInfo) popart::DeviceInfo
createOfflineIpuFromSystemString(self: popart_core.DeviceManager, arg0: str, arg1: int) popart::DeviceInfo
createSimDevice(self: popart_core.DeviceManager, arg0: dict) popart::DeviceInfo
enumerateDevices(self: popart_core.DeviceManager, pattern: popart_core.SyncPattern = <SyncPattern.Full: 0>, numIpus: int = 1, deviceType: popart_core.DeviceType = <DeviceType.Ipu: 2>, connectionType: popart_core.DeviceConnectionType = <DeviceConnectionType.Always: 0>, tilesPerIPU: int = 0) List[popart::DeviceInfo]

Get the list of all devices with the required criteria.

Parameters
  • pattern – The setting for when to synchronise in a multi-IPU system. (Default: SyncPattern::Full).

  • numIpus – The number of IPUs required. (Default: 1).

  • deviceType – The type of the device required. (Default: DeviceType::Ipu).

  • connectionType – The setting for when to connect to the device. (Default: DeviceConnectionType::Always).

  • tilesPerIPU – The number of tiles per IPU required. (Default: 0).

Returns

The list of devices with the required criteria.

setOnDemandAttachTimeout(self: popart_core.DeviceManager, attachTimeout: int) None

If unable to attach to a device on first try, the attach timeout set here is the length of time (in seconds) that the DeviceManager will wait to try and attach. Note: this only takes effect when trying to attach with a DeviceConnectionType::OnDemand DeviceConnectionType.

Parameters

seconds – The attach timeout in seconds.

tryAcquireAvailableDevice(self: popart_core.DeviceManager, numIpus: int = 1, tilesPerIpu: int = 0, pattern: popart_core.SyncPattern = <SyncPattern.Full: 0>, connectionType: popart_core.DeviceConnectionType = <DeviceConnectionType.Always: 0>, selectionCriterion: popart_core.DeviceSelectionCriterion = <DeviceSelectionCriterion.First: 0>) popart::DeviceInfo

Finds an available hardware device, with the specified number of IPUs. This method will attach to the device if connectionType is equal to DeviceConnectionType::Always. This method is suitable when polling for an available device when resources are constrained.

Parameters
  • numIpus – The number of IPUs on the device (Default: 1).

  • tilesPerIPU – The number of tiles per IPU. An input of 0 will match any number. (Default: 0).

  • pattern – The setting for when to synchronise in a multi-IPU system. (Default: SyncPattern::Full).

  • connectionType – The setting for when to connect to the device. (Default: DeviceConnectionType::Always).

  • selectionCriterion – The method for selecting a device from the list of valid selections. (Default: DeviceSelectionCriterion::First).

Returns

A device, which can be used with a session. If no device is acquired, a nullptr is returned.

tryAcquireDeviceById(self: popart_core.DeviceManager, id: int, pattern: popart_core.SyncPattern = <SyncPattern.Full: 0>, connectionType: popart_core.DeviceConnectionType = <DeviceConnectionType.Always: 0>) popart::DeviceInfo
Allocates the hardware device by ID. This ID can be found running :code:`gc-info

-l`. This method will try to attach to the device if connectionType is equal to DeviceConnectionType::Always. This method is suitable when

polling for an available device when resources are constrained.

Parameters
  • id – The ID of the IPU to be used.

  • pattern – The setting for when to synchronise in a multi-IPU system. (Default: SyncPattern::Full).

  • connectionType – The connection type, for deciding when to attach to the device. (Default: DeviceConnectionType::Always).

Returns

A device, which can be used with a session. If no device is

acquired, a nullptr is returned.

13.8. Ops

13.8.1. Op definition for PopART IR

class popart.RecomputeType

Define the type of recomputation.

Members:

Undefined : Default value if RecomputeType has not been set.

Checkpoint : Do not recompute. Outputs from the op are kept from the forward pass.

Recompute : Recompute operation.

Recomputed : For explicit recomputation, this marks a cloned operation that had RecomputeType::Recompute set. After cloning, the original op is changed to RecomputeType::Checkpoint, and the cloned op is changed to Recomputed.

property name
class popart.OperatorIdentifier
class popart.OpDefinition

13.9. Patterns

class popart.Patterns

Bases: pybind11_object

enablePattern(self: popart_core.Patterns, arg0: str, arg1: bool) popart_core.Patterns
enableRuntimeAsserts(self: popart_core.Patterns, arg0: bool) popart_core.Patterns
isPatternEnabled(self: popart_core.Patterns, arg0: str) bool

13.10. Utility classes

13.10.1. Writer

Framework independent functionality for driving PopART.

class popart.writer.NetWriter(inNames, outNames, optimizer, dataFlow, inputShapeInfo)

Base class, to be inherited once per framework.

Parameters
  • inNames – A list (in order) of all the inputs to the ONNX Model.

  • outNames – names of the outputs of the ONNX Model.

  • optimizer – An optimizer (ConstSGD, SGD, etc) or None if in inference mode.

  • anchors – Only relevant if in training mode: the names of tensors which must be computed and returned. If not in training mode, then outputs of forward are the (only) tensors to return.

  • dataFlow – Configuration for the data feeds and fetches.

  • inputShapeInfo – For every loss stream input and standard input: the shape, ONNX DataType and how to get data.

infer(inputsMap)

Perform batchesPerStep inference steps.

This function only needs to be implemented by frameworks which will be used to verify PopART. See torchwriter.py for an example implementation.

saveModel(filename)

Save the model.

To be implemented once per framework: framework specific details of generating the ONNX model and writing it to file

train(inputsMap)

Perform batchesPerStep training steps.

This function only needs to be implemented by frameworks which will be used to verify PopART. See torchwriter.py for an example implementation.

13.10.2. Error handling

class popart.OutOfMemoryException(e)

Represent out of memory exceptions that that occur during runtime.

Parameters

e (popart_exception) –

Return type

None

getProfilePath()

Get the absolute path of the profile file.

The profile file is named profile.pop and contains full details of the exception.

Returns

The absolute path of profile.pop, or an empty string if the

file does not exist.

Return type

str

getSummaryReport()

Get the summary report.

Returns

The summary report string.

Return type

str

13.10.3. Debug context

class popart.DebugContext
class popart.DebugInfo
getId(self: popart_internal_ir.DebugInfo) int
setValue(self: popart_internal_ir.DebugInfo, name: str, value: popart_internal_ir.ProfileValue) bool

13.10.4. Input shape information

class popart.InputShapeInfo
add(self: popart_core.InputShapeInfo, arg0: str, arg1: popart_internal_ir.TensorInfo) None
get(self: popart_core.InputShapeInfo, arg0: str) popart_internal_ir.TensorInfo
has(self: popart_core.InputShapeInfo, arg0: str) bool

13.10.5. Type definitions

13.10.6. Enums

class popart.CommGroupType

PopART equivalent of GCL CommGroupType. Each of these enumeration constants have a corresponding GCL CommGroupType value.

Members:

All : All replicas viewed as one group, replica group size is ignored. */

Consecutive : Groups are consecutive in replica.

If there are N replicas denoted {0, … N-1} and group size is k, then there are N/k groups of size k:

{0, 1, … k-1}, {k, … 2k-1} … {N-k-1, … N-1}

Orthogonal : Groups are sliced orthogonal to the replica ordering.

If there are N replicas denoted {0, … N-1} and group size is k, then there are m = N/k groups of size k:

{0, m, 2m, …}, {1, m+1, 2m+1, …} … {m-1, 2m-1, … N-1}

Ungrouped : Each replica is in it’s own group, replica group size is ignored. */

property name
class popart.DeviceSelectionCriterion

Members:

First :

Random : Select the first device available. (Default).

property name
class popart.InitType

Members:

NoInit :

Zero :

property name
class popart.ScatterReduction

Members:

Sum

Max

Min

Mul

NoReduction

property name
class popart.TensorRemapType

Members:

FwdBwdReverse

FwdBwd

Fwd

property name