14. PopART C++ API

This chapter describes the PopART C++ API.

14.1. Sessions

#include <popart/session.hpp>
class Session

Session is a runtime instance that provides an interface for executing ONNX graphs on IPU hardware.

Subclassed by popart::InferenceSession, popart::TrainingSession

Public Functions

virtual ~Session() = 0

Destructor for the Session class.

std::vector<uint32_t> getRNGState()

Get state of the random number generator.

void setRNGState(const std::vector<uint32_t>)

Set state of the random number generator.

void setRandomSeed(uint64_t seedValue)

Set the value of the random number generator seed.

This method explicitly seeds all random operations. Additionally, this method derives a new state for the random number generator (RNG) from the seed and sets it on the device. This RNG state is used to resolve stochastic rounding. Note that to deterministically store and restore the combined random state for a session, do the following:

C++:

// Store random state (session s0).
auto seed = s0.getRandomSeed();
auto rngState = s0.getRNGState();

// Restore random state (session s1).
s1.setRandomSeed(seed);   // <-- affects RNG state, order important
s1.setRNGState(rngState);

Python:

# Store random state (session s0).
seed = s0.getRandomSeed()
rngState = s0.getRNGState()

# Restore random state (session s1).
s1.setRandomSeed(seed)   # <-- affects RNG state, order important
s1.setRNGState(rngState)

Parameters

seedValue – The value of the seed.

uint64_t getRandomSeed()

Get the value of the random number generator seed.

Calling setRandomSeed() with this value (at a later stage) reinstates the random state logic that seeds random operations.

Returns

The value used to seed current random operations.

void compileAndExport(const std::string &filename)

Compile the graph and export it to a file.

This method will first create a poplar::Graph and compile the poplar::Executable. Next, it will export the executable and PopART metadata to the file. The exported file will be in the PopEF format. This means that the file can be used to run inference using the Triton Inference Server with the Graphcore Triton backend. See the Poplar Triton Backend User Guide for more information.

Parameters

filename – The name of the file where the compiled executable and metadata will be saved.

void compileAndExport(std::ostream &out)

Compile the graph and export it to a stream.

This method will first create a poplar::Graph and compile the poplar::Executable. Next, it will export the executable and PopART metadata to the stream. The data will be streamed in the PopEF format. This means that the file can be used to run inference using the Triton Inference Server with the Graphcore Triton backend. See the Poplar Triton Backend User Guide for more information.

This method automatically creates folders as needed if filename is located in a folder which does not exist.

Parameters

out – The stream that the compiled executable and metadata will be written to.

void saveExecutableToFile(const std::string &filename)

Save a compiled graph to a file.

The file will be in the PopEF format. This means that the file can be used to run inference using the Triton Inference Server with the Graphcore Triton backend. See the Poplar Triton Backend User Guide for more information.

This method automatically creates folders as needed if filename is located in a folder which does not exist.

Parameters

filename – The name of the file where the compiled executable and metadata will be saved.

Pre

prepareDevice() must have been called.

void saveExecutableToStream(std::ostream &out)

Save a compiled graph to a stream.

The data will be streamed in the PopEF format. This means that the file can be used to run inference using the Triton Inference Server with the Graphcore Triton backend. See the Poplar Triton Backend User Guide for more information.

Parameters

out – The stream where the compiled executable and metadata will be written to.

Pre

prepareDevice() must have been called.

void saveExecutable(const std::string &path, bool savePopartMetadata = true, bool saveVariables = true)

Save a compiled graph with additional data to a file.

PopART is able to save its state after the model compilation is complete, so that it can be restored at a later time. To make this possible, it is necessary to save such elements as:

  • a serialised Poplar executable,

  • its associated metadata,

  • tensor data blobs if model parameters have not been frozen (refer to the SessionOptions::constantWeights for more information),

  • a PopART-specific opaque blob to store information only relevant to PopART. This is needed to restore PopART state.

The file will be in the PopEF format. This means that the file can be used to restore the state of the PopART program without recompiling the graph, or run inference using the Triton Inference Server with the Graphcore Triton backend. See the Poplar Triton Backend User Guide for more information. If you want to analyze file structure saved by the function please refer to the PopEF dump tool.

Parameters
  • path – The name of the file or directory where the compiled executable, metadata and variables will be saved. If you specified a path to the directory, the function will write the data to the file: “<path>/executable.popef”. If the file exists, the function will overwrite the old data with the new ones.

  • savePopartMetadata – If you do not need the option to restore the PopART state later, you can set the flag to false to reduce disk space taken up by the file.

  • saveVariables – If you don’t need to save variables (tensors) state, you can set the flag to false if you want to save them later or in a different location. The function will save data consistent with the variables contained within the model.

Pre

prepareDevice() must have been called.

void saveVariables(const std::string &path)

Save all variables to a file.

The function will save data consistent with the variables contained within the model.

The file will be in the PopEF format. If you want to analyze tensors saved by the function refer to the PopEF dump tool.

Parameters

path – The name of the file or directory where the compiled variables will be saved. If you specified a path to the directory, the function will write the data to the file: “<path>/variables.popef”. If the file exists, the function will overwrite the old data with the new ones.

Pre

prepareDevice() must have been called.

void checkInplacingAmbiguity() const

Check for potential inplacing ambiguities.

This method creates an AliasModel object for each graph and runs the Poprithms ambiguity checker on it.

Throws an error if the graph has an inplacing ambiguity and will prompt the user to check the inplacing.

See poprithms::memory::inplace::Graph::AmbiguityStatus on the Poprithms GitHub repo for more on what constitutes an ambiguity.

void loadExecutableFromFile(const std::string &filename)

Load the compiled executable and metadata from a file.

The file must have been created with compileAndExport(const std::string).

Parameters

filename – The name of the file to load the executable and metadata from.

void loadExecutableFromStream(std::shared_ptr<std::istream> in)

Load the compiled executable and from a stream.

The stream must have been created with compileAndExport(std::ostream).

Parameters

in – The shared pointer to the stream to load the executable from.

void prepareDevice(bool loadEngine = true)

Prepare the network for execution.

This will create the poplar::Graph and poplar::Engine.

Parameters

loadEngine – If true, load the engine and connect the streams once the device is ready.

void loadEngineAndConnectStreams()

Load the engine on the device and connect the streams.

This will set up the poplar::Streams.

Note: This call is optional. The engine will implicitly be loaded on the device when required.

void weightsFromHost()

Copy weights from the host to the device.

void buffersFromHost()

Copy buffers from the host to the device.

void weightsToHost()

Copy the weights from the device to the host steam memory.

uint64_t getCycleCount(std::string id = "")

Copy the cycle count tensor from the device to the host.

Parameters

id – The identifier of the cycle count tensor.

void connectStreamToCallback(const std::string &streamHandle, std::function<void(void*)> callback, unsigned index = 0)

Connect a Poplar stream with a callback.

This method will be called whenever the stream will be read or was written to by the device. The memory location will only be valid for reading or writing for the duration of the callback.

Parameters
  • streamHandle – The name of the stream to connect to.

  • callback – The callback to be called whenever the stream is to be read or was written to by the device.

  • index – The replica index to connect to, when using replicated graphs. Default=0.

void connectStream(const std::string &streamHandle, void *buffer)

Connect a Poplar stream with a fixed location in memory.

Each time data is copied to the stream, this location will be read and each time data is copied from the stream, this location will be written.

Parameters
  • streamHandle – The handle of the stream to connect to.

  • buffer – The pointer to the memory location.

void connectHostFunction(const std::string &functionHandle, std::function<void(const void*const*, size_t, void*const*, size_t)> callback, unsigned index = 0)

Connect a host function to a callback.

The callback takes two arguments, which point to the locations in memory for each of the function’s input and output arguments, respectively. During a host function call, first the device transfers the input data to the host, then the callback is invoked, and finally the output data is copied back to the device. The memory pointed to by the callback arguments must only be accessed during the duration of the callback.

Parameters
  • functionHandle – The name of the host function.

  • callback – The function to be called whenever new input data is available.

  • index – The replica index to connect to, when using replicated graphs. Default=0.

void run(IStepIO &stepIO, std::string debugName = "")

Run one step.

Read input data from address in stepIO.in.

Write the output data to addresses in stepIO.out.

Parameters
  • stepIO – The input and output data.

  • debugName – A debug string to identify this run in logs.

void run(std::string programHandle, IStepIO &stepIO, std::string debugName = "")

Run one step of a custom program.

Read input data from address in stepIO.in.

Write the output data to addresses in stepIO.out.

Parameters
  • programHandle – The handle of the custom program to run.

  • stepIO – The input and output data.

  • debugName – A debug string to identify this run in logs.

void updateExternallySavedTensorLocations(const std::string &fromLocation, const std::string &toLocation)

Update the tensor locations of tensors in the session’s ONNX model.

A new file will be created at this point, and written to when the ONNX model is saved with a subsequent call to modelToHost().

Parameters
  • fromLocation – All externally saved tensors with location fromLocation will have their location updated to toLocation.

  • toLocation – The updated tensor locations. This must not already exist.

void modelToHost(const std::string &fn)

Write the current model to an ONNX file.

Parameters

fn – The path to file. The path can be absolute or relative. If you plan to run your program in multiple processes simultaneously, you should avoid possible race conditions by writing to different files, for example by using temporary files.

TensorInfo getInfo(TensorId) const

Get the tensor information for a tensor.

Parameters

TensorId – The identifier of the tensor to get the tensor information for.

Returns

The tensor information for the tensor.

bool hasInfo(TensorId) const

Check whether a tensor has information.

Parameters

TensorId – The identifier of the tensor to get the tensor information for.

Returns

true if the tensor with identifier TensorId has tensor information and false if not.

std::set<TensorId> getAllTensorIds() const

Returns the ids of all tensors in the model.

Pre

prepareDevice() must have been called.

std::string getSummaryReport(bool resetProfile = true) const

Retrieve the summary report from from the poplar::Engine.

The options which were passed to the Session constructor will influence the information in the report.

This method may only be called after prepareDevice() has been called.

Parameters

resetProfile – If true, resets the execution profile. Default = true.

Returns

A string containing the report.

std::string getSerializedGraph() const

Retrieve the serialized graph from the poplar::Engine.

A JSON format report is produced.

This method may only be called after prepareDevice() has been called.

Returns

A string containing the serialized graph.

pva::Report getReport() const

Retrieve the graph report from the poplar::Engine.

The options which were passed to the Session constructor will influence the information in the report.

This method may only be called after prepareDevice() has been called.

Returns

The PopVision Analysis report object.

void resetHostWeights(const std::string &model, const bool ignoreWeightsInModelWithoutCorrespondingHostWeight = false)

Reset weights with weights in an ONNX model.

Note that the only differences between the ONNX model and the current model must be the weights. No other differences are allowed.

This method only updates the weights on the host. weightsFromHost() must be called after this method to update the weights on the device.

Parameters
  • model – An ONNX model protobuf, or the name of a file containing an ONNX model protobuf.

  • ignoreWeightsInModelWithoutCorrespondingHostWeight – If true, do not throw an error if there are initializers in the ONNX model without corresponding initializer tensor(s) in the session’s IR.

void readWeights(const IWeightsIO &weightsIo)

Read the weights from the host stream memory and write to the host.

This method may only be called after weightsToHost() has been called.

Parameters

weightsIo – The weight data that is read from the host stream memory is written to the addresses in weightsIo.out.

void writeWeights(const IWeightsIO &weightsIo)

Write the weights from the host to the IR tensor memory.

This method may only be called after weightsFromHost() has been called.

Parameters

weightsIo – The weight data is written to the addresses in weightsIo.out.

std::string serializeIr(IrSerializationFormat format)

Serialize the IR graph to a string.

Parameters

format – The format to use for serializing.

inline const Ir &getIr() const

Get the IR associated with the Session.

inline const popx::Devicex &getDevice() const

Get the device associated with the Session.

inline popx::Devicex &getDevice()

Get the device associated with the Session.

inline const popx::IrLowering &getIrLowering() const

Get the IR lowering associated with the Session.

inline const popx::Executablex &getExecutable() const

Get the executable associated with the Session.

void broadcastWeights(int rootRank = 0)

Broadcasts the weight from the PopRun instance with index rootRank to all other instances.

Parameters

rootRank – The index of the PopRun instance from which the weights should be broadcasted.

void updateEngineCache()

Update cacheEntries from engine cache directory and update ir::hashMatched_ with the updated cacheEntries.

void setDeviceInfo(std::shared_ptr<DeviceInfo> deviceInfo)

Set the DeviceInfo of the Session.

14.1.1. Training session

#include <popart/session.hpp>
class TrainingSession : public popart::Session

TrainingSession is a runtime instance that provides an interface for executing ONNX graphs on IPU hardware with training provided by optimizing a loss tensor using an optimizer and automatic differentiation (backpropagation).

Public Functions

~TrainingSession() override

Destructor for the TrainingSession class.

void updateOptimizerFromHost(const Optimizer *optimizer)

Update the optimizer from the host.

This method updates the optimizer and the associated hyperparameters but not the optimizer state tensors.

NOTE: The optimizer parameter has to be compatible with the optimizer passed to the TrainingSession constructor. For example, you cannot call this function with an SDG1 optimizer if you created the session with an SDG0 optimizer. This is because it is not possible to change the IR after a session has been constructed.

Parameters

optimizer – A pointer to a popart::Optimizer.

void copyFromRemoteBuffer(const std::string &buffer, void *w, int repeat_index, unsigned replication_index = 0)

Copy from a remote butter into a user buffer.

This can be useful when we run larger models with host side reductions since HEXOPT is currently limited to 128 MB.

Parameters
  • buffer – The name of the remote buffer to copy from.

  • w – Pointer to a user buffer to copy to.

  • repeat_index – The index in the remote buffer to copy from.

  • replication_index – The replicated graph index when using replicated graphs. Default=0.

void copyToRemoteBuffer(void *w, const std::string &buffer, int repeat_index, unsigned replication_index = 0)

Copy from a user buffer to a remote buffer.

This can be useful when we run larger models with host side reductions since HEXOPT is currently limited to 128 MB.

Parameters
  • w – Pointer to a user buffer to copy from.

  • buffer – The remote buffer to copy to.

  • repeat_index – The index in the remote buffer to copy to.

  • replication_index – The replicated graph index when using replicated graphs. Default=0.

Public Static Functions

static std::unique_ptr<TrainingSession> createFromIr(std::shared_ptr<Ir> ir, std::shared_ptr<DeviceInfo> deviceInfo, const std::string name = DefaultTrainingSessionName)

Create a session for training from an IR.

Parameters
  • ir – The IR to create the session from.

  • deviceInfo – The type of device that this session uses.

  • name – The name of this training session. Default: “training”.

static std::unique_ptr<TrainingSession> createFromOnnxModel(const std::string &model, const DataFlow &dataFlow, const TensorId &loss, const Optimizer &optimizer, std::shared_ptr<DeviceInfo> deviceInfo, const InputShapeInfo &inputShapeInfo = InputShapeInfo(), const SessionOptions &userOptions = SessionOptions(), const Patterns &patterns = Patterns(), const std::string name = DefaultTrainingSessionName)

Create a session for inference from an ONNX model.

Parameters
  • model – An ONNX model protobuf, or the name of a file containing an ONNX model protobuf.

  • dataFlow – Configuration for the data feeds and fetches.

  • loss – The identifier of the final scalar loss tensor for training.

  • optimizer – The name of an optimizer to use when training.

  • deviceInfo – The type of device that this session uses.

  • inputShapeInfo – (Optional) The sizes and dtypes of the input tensors. This is used to specify the sizes of the input tensors in the case that the ONNX model does not include this information. The Poplar graph programming framework uses statically allocated memory buffers and so it needs to know the size of tensors before the compilation. Default: InputShapeInfo().

  • userOptions – (Optional) The user configuration options for the Session class. Default: SessionOptions().

  • patterns – (Optional) A user-selected set of graph transformation patterns which will be applied to the graph. If this is not specified, a default set of optimisation transformations will be applied. Default: Patterns().

  • name – (Optional) The name of this inference session. Default: “training”.

14.1.2. Inference session

#include <popart/session.hpp>
class InferenceSession : public popart::Session

InferenceSession is a runtime instance that provides an interface for executing ONNX graphs on IPU hardware, without any automatic differentiation (backpropagation) or optimization.

Public Functions

~InferenceSession() override

Destructor for the InferenceSession class.

void popxlSetEngineIsLoaded(bool isLoaded)

Public Static Functions

static std::unique_ptr<InferenceSession> createFromIr(std::shared_ptr<Ir> ir, std::shared_ptr<DeviceInfo> deviceInfo, const std::string name = DefaultInferenceSessionName)

Create a session for inference from an IR.

Parameters
  • ir – The IR to create the session from.

  • deviceInfo – The type of device that this session uses.

  • name – The name of this inference session. Default: “inference”.

static std::unique_ptr<InferenceSession> createFromOnnxModel(const std::string &model, const DataFlow &dataFlow, std::shared_ptr<DeviceInfo> deviceInfo, const InputShapeInfo &inputShapeInfo = InputShapeInfo(), const SessionOptions &userOptions = SessionOptions(), const Patterns &patterns = Patterns(), const std::string name = DefaultInferenceSessionName)

Create a session for inference from an ONNX model.

Parameters
  • model – An ONNX model protobuf, or the name of a file containing an ONNX model protobuf.

  • dataFlow – Configuration for the data feeds and fetches.

  • deviceInfo – The type of device that this session uses.

  • inputShapeInfo – (Optional) The sizes and dtypes of the input tensors. This is used to specify the sizes of the input tensors in the case that the ONNX model does not include this information. The Poplar graph programming framework uses statically allocated memory buffers and so it needs to know the size of tensors before the compilation. Default: InputShapeInfo().

  • userOptions – (Optional) The user configuration options for the Session class. Default: SessionOptions().

  • patterns – (Optional) A user-selected set of graph transformation patterns which will be applied to the graph. If this is not specified, a default set of optimisation transformations will be applied. Default: Patterns().

  • name – (Optional) The name of this inference session. Default: “inference”.

14.1.3. Session options

#include <popart/sessionoptions.hpp>
enum class popart::AccumulateOuterFragmentSchedule

Enum type that determines how the operations in the accumulate outer fragment will be scheduled across virtual graphs (only relevant to pipelined modes).

Values:

enumerator Scheduler = 0

Don’t add additional constraints and let the scheduler work it out.

enumerator Serial

Add constraints that ensure ops are executed in virtual graph ID order.

enumerator OverlapCycleOptimized

Try and parallelise ops with different virtual graph IDs as much as possible.

enumerator OverlapMemoryOptimized

Try and parallelise ops with different virtual graph IDs but avoid certain steps that are costly in terms of memory usage.

enum class popart::AutodiffStitchStrategy

Enum type representing a strategy to ensure a backward graph’s inputs are either inputs of the forward graph, outputs of the forward graph or gradients of outputs of the forward graph.

Strategies may expose tensors that would otherwise have been internal to the forward graph as outputs of this forward graph.

Values:

enumerator RecomputeMinimal = 0

Recompute any backward graph inputs associated with non-gradient forward graph tensors that are neither inputs nor outputs in the forward graph.

enumerator RecomputeAllNonInputs

Recompute any backward graph inputs associated with non-gradient forward graph tensors that are not inputs in the forward graph.

enumerator AddFwdOutputs

For backward graph inputs associated with non-gradient forward graph tensors that are neither inputs or outputs in the forward graph, add them as outputs to the forward graph.

Note

This strategy is not guaranteed to work for all circumstances. In particular, it is unable to deal with subgraphs of IfOp. Using this setting may therefore result in subsequent exceptions in the Autodiff transform and it is therefore inadvisable to use this as an Autodiff default.

enumerator SafeAddFwdOutputs

Like AutodiffStitchStrategy::AddFwdOutputs except that those backward graph inputs that can’t be stitched with AutodiffStitchStrategy::AddFwdOutputs (that is, by adding outputs to the forward graph) are stitched using the AutodiffStitchStrategy::RecomputeMinimal strategy instead.

This means that this is a safe strategy to use as an Autodiff default.

enumerator N

Number of AutodiffStitchStrategy values.

enum class popart::BatchSerializationBatchSchedule

Enum type that describes how to change the batch serialisation subgraph schedule before outlining.

Note

This setting is experimental and may change.

Values:

enumerator Scheduler = 0

Don’t encourage any particular scheduling for ops within batch subgraphs (leave it to the scheduler) but tell the scheduler to schedule subgraphs in sequence.

enumerator Isomorphic

Encourage all ops within batch subgraphs to be scheduled identically and for each subgraph to be scheduled in sequence (good for outlineability).

enumerator OverlapOnIo

Attempt to put the remote load op for batch N+1 right after the compute phase of batch N.

enumerator OverlapOnCompute

Attempt to put the remote load op for batch N+1 right before the compute phase of batch N.

enumerator N

The number of BatchSerializationBatchSchedule values.

enum class popart::BatchSerializationMethod

Enum type that describes how to apply the batch serialization.

Note

This setting is experimental and may change.

Values:

enumerator UnrollDynamic = 0

Unroll the batch with dynamic slicing.

enumerator UnrollStatic

Unroll the batch with static slicing.

enumerator Loop

Loop over the batch dimension.

enumerator N

The number of BatchSerializationMethod values.

enum class popart::BatchSerializationTransformContext

Enum type that describes when to apply batch serialization.

Note

This setting is experimental and may change.

Values:

enumerator Fwd = 0

Apply batch serialiation before growing the backward pass.

enumerator Bwd

Apply batch serialiation after growing the backward pass.

enumerator N

The number of BatchSerializationTransformContext values.

enum class popart::ExecutionPhaseIOSchedule

Enum type to specify when to load tensors.

Values:

enumerator Preload = 0

Preload tensors in previous phase for use in current phase.

enumerator OnDemand

Load tensors just before they are required.

enumerator N

The number of ExecutionPhaseIOSchedule values.

enum class popart::ExecutionPhaseSchedule

Enum type to specify the order of processing optimizer operations for different weights of the same execution phase.

The steps for phased execution are:

  1. Copy to IO tiles if necessary.

  2. Run collective operations if necessary.

  3. Load optimizer state.

  4. Update optimizer state.

  5. Apply optimizer.

  6. Store updated tensor if necessary.

Values:

enumerator Interleaving = 0

Process above steps for one weight at a time (for example: 123456, 123456, 123456).

The scheduler may interleave these steps.

enumerator Batch

Process above steps for all weights together, in a way that maximises overlap potential between compute and exchange (for example: 333, 111, 222, 444, 555, 666).

enumerator BatchClusteredIO

Process above steps for all weights together, in a way that maximises overlap potential between compute and exchange, and maximise stream copy merges by keeping RemoteLoad/RemoteStore operations clustered (for example: 333, 111, 222, 444, 555, 666).

enumerator N

The number of ExecutionPhaseSchedule values.

enum class popart::GradientTensorTrackingMethod

Enum type to specify the method for selecting gradient tensors whose statistics are to be tracked for the AutomaticLossScale transform.

Values:

enumerator AllNonViewChangingGradientTensors = 0

Track all gradients of non-view-changing gradient tensors.

enumerator ConvAndMatmulGradients

Track all gradients of inputs to MatMul and Convolution ops.

enumerator GradientsOfUserSpecifiedTensors

Track gradients of user-specified tensors.

enumerator N

The number of GradientTensorTrackingMethod values.

enum class popart::Instrumentation

Enum type used to specify an instrumentation type.

Values:

enumerator Outer = 0

Outer loop instrumentation, graph over all IPUs.

enumerator Inner

Inner loop instrumentation, graph per IPU.

enumerator N

The number of Instrumentation values.

enum class popart::IrSerializationFormat

Enum type used to specify a serialization format.

Values:

enumerator JSON

JavaScript Object Notation (JSON).

enum class popart::MeanReductionStrategy

Enum type that specifies when to divide by a mean reduction factor, when doing mean reduction over a sequence of tensors \(t_1, t_2, ..., t_k\).

Values:

enumerator Running = 0

Keep the reduction buffer as the mean of the tensors accumulated so far.

If \(t_1, ..., t_f\) has just been processed, the current accumulator \(s\) is the mean of these values, and the next accumulator update is \(s = \frac{f}{f+1} * s + \frac{1}{f+1} * t_{f+1}\) to keep \(s\) a running mean.

This strategy guarantees \(s \le \max(a_1, ..., a_k)\) throughout the accumulation, therefore it will not overflow, but it is generally slower than MeanReductionStrategy::Post.

enumerator Post

Keep the accumulation factor as the running sum, and divide once by \(k\) at the end of the accumulation.

This strategy will generally be faster than MeanReductionStrategy::Running, but is prone to overflow (especially when using fp16).

enumerator N

The number of MeanReductionStrategy values.

enum class popart::MergeVarUpdateType

Enum type used to specify which VarUpdateOp ops to merge.

Values:

enumerator None = 0

Do not merge VarUpdateOp ops.

enumerator All

Merge all VarUpdateOp ops into as few groups as possible.

This is a good choice when memory is not a constraint.

enumerator AutoLoose

Merge into groups while attempting not to increase maximum variable liveness, and also not slice tensor variables so they will need to be processed by different VarUpdateOp ops.

enumerator AutoTight

Merge into groups, so that VarUpdateOp ops process tensors of exactly SessionOptions::mergeVarUpdateMemThreshold in size.

enumerator N

The number of MergeVarUpdateType values.

enum class popart::RecomputationType

Enum type to specify which ops to recompute in the backward pass when doing auto-recomputation.

Values:

enumerator None = 0

No ops are recomputed (Default).

enumerator Standard

Recompute using algorithm that picks checkpoints to try and minimise max liveness.

enumerator NormOnly

Only Norm ops (+ non-linearities, if following) are recomputed.

enumerator Pipeline

Recompute all forward pipeline stages.

enumerator RecomputeAll

Recompute all ops.

enumerator N

The number of RecomputationTypes values.

enum class popart::SubgraphCopyingStrategy

Enum type that describes how copies for inputs and outputs for subgraphs are lowered.

Currently this only affects subgraphs associated with CallOp ops.

Values:

enumerator OnEnterAndExit = 0

Copy all inputs before the start of the subgraph, copy all outputs after all ops in the subgraph.

With this strategy, subgraphs will always map to a single Poplar function.

enumerator JustInTime

Copy inputs just before they are consumed and copy outputs as soon as they are produced.

With this strategy, subgraphs may be lowered into multiple Poplar functions.

enumerator N

The number of SubgraphCopyingStrategy values.

enum class popart::SyntheticDataMode

Enum type used to specify the data source for input tensors.

Values:

enumerator Off = 0

Use real data.

enumerator Zeros

Input tensors are initialised to all zeros.

enumerator RandomNormal

Input tensors are initialised with a random normal distribution ~N(0,1).

enumerator RandomUniform

Input tensors are initialised with a uniform distribution.

enumerator N

The number of SyntheticDataMode values.

enum class popart::VirtualGraphMode

Enum type used to specify a virtual graph mode.

Values:

enumerator Off = 0

Virtual graphs are not enabled.

enumerator Manual

User must set the popart::Op::virtualGraph attribute on all ops.

enumerator Auto

Use the AutoVirtualGraph transform.

enumerator ExecutionPhases

Virtual graphs are tied to execution phases.

enumerator N

The number of VirtualGraphMode values.

struct AccumulateOuterFragmentSettings

A structure containing accumulate outer fragment settings.

Public Functions

AccumulateOuterFragmentSettings() = default
inline AccumulateOuterFragmentSettings(AccumulateOuterFragmentSchedule schedule_, const std::vector<int> &excludedVirtualGraphs_)

Constructor for AccumulateOuterFragmentSettings.

Parameters
  • schedule_ – Indicate how to schedule the accumulate outer fragment. This setting is experimental and may change. Default: AccumulateOuterFragmentSchedule::Serial

  • excludedVirtualGraphs_ – Indicate to explicitly avoid parallelising the virtual graph IDs. This setting is experimental and may change.

Public Members

AccumulateOuterFragmentSchedule schedule = AccumulateOuterFragmentSchedule::Serial

Indicate how to schedule the accumulate outer fragment.

Note

This setting is experimental and may change.

std::vector<int> excludedVirtualGraphs = {}

Indicate to explicitly avoid parallelising the virtual graph IDs.

Note

This setting is experimental and may change.

struct AutodiffSettings

The settings for the Autodiff transform.

Public Functions

AutodiffSettings() = default

Default constructor for the AutodiffSettings struct.

inline AutodiffSettings(AutodiffStitchStrategy stitchStrategy_)

Constructor for the AutodiffSettings struct.

Parameters

stitchStrategy_ – The strategy to ensure a backward graph’s inputs are either inputs of the forward graph, outputs of the forward graph or gradients of outputs of the forward graph. Default: AutodiffStitchStrategy::RecomputeAllNonInputs.

Public Members

AutodiffStitchStrategy stitchStrategy = AutodiffStitchStrategy::RecomputeAllNonInputs

The strategy PopART should use to ensure that all graph inputs of a backward graph are available as either inputs or outputs of the forward graph or gradients of outputs of the forward graph.

Note

This is an experimental option and may change.

struct AutomaticLossScalingSettings

A structure containing user configuration for automatic loss scaling settings.

Note

Automatic loss scaling is in preview. It is well tested and enabled in some of our example applications, but may not behave as expected in all models. Recommendation: if your model with automatic loss scaling enabled does not converge or triggers a compilation error, then you will need to set the loss scale manually.

Public Functions

AutomaticLossScalingSettings() = default

Default constructor for AutomaticLossScalingSettings.

AutomaticLossScalingSettings(bool enabled_, const nonstd::optional<std::vector<TensorId>> &toTrackTensors_, float binEdgeLocation_, float thresholdUpperCountProportion_, int updatePeriod_, GradientTensorTrackingMethod gradientTensorTrackingMethod_)

Constructor for AutomaticLossScalingSettings.

Parameters
  • enabled_ – Indicate whether to keep track (true) or not (false) of the distribution of gradient tensor elements over the floating point range. Default: false.

  • toTrackTensors_ – An optional list of model tensor names, for which gradient statistics will be collected. If not set, the gradients of all tensors produced by default operations (matmul, conv) will be used.

  • binEdgeLocation_ – The location of the bin edge as a proportion of the absolute numerical range of the tracked gradient tensor elements, in the range [0, 1]. 0 represents the smallest representable value, and 1 the maximum. This is the single bin edge of the histogram that is an input to the loss scale updater algorithm. Default: 0.125.

  • thresholdUpperCountProportion_ – The proportion of the elements in the upper bin above which the loss scale is increased, and below which the loss scale is decreased. Should be in the range [0, 1]. Default: 1e-7.

  • updatePeriod_ – Indicate how often the loss scale update factor should be updated with respect to optimizer steps. Default: 1

  • gradientTensorTrackingMethod_ – The method for selecting gradient tensors whose statistics are to be tracked. Default: GradientTensorTrackingMethod::AllNonViewChangingGradientTensors.

std::size_t hash() const

Public Members

bool enabled = false
float binEdgeLocation = 0.125f
float thresholdUpperCountProportion = 1e-7
nonstd::optional<std::vector<TensorId>> toTrackTensors
int updatePeriod = 1
GradientTensorTrackingMethod gradientTensorTrackingMethod = GradientTensorTrackingMethod::AllNonViewChangingGradientTensors
struct BatchSerializationSettings

A structure containing batch serialization settings.

Public Functions

BatchSerializationSettings() = default

Default constructor for BatchSerializationSettings.

BatchSerializationSettings(int factor_, bool concatOnVirtualGraphChange_, bool concatOnExecutionPhaseChange_, bool concatOnPipelineStageChange_, BatchSerializationTransformContext transformContext_ = BatchSerializationTransformContext::Fwd, BatchSerializationMethod method_ = BatchSerializationMethod::UnrollDynamic, BatchSerializationBatchSchedule batchSchedule_ = BatchSerializationBatchSchedule::Isomorphic)

Constructor for BatchSerializationSettings.

Parameters
  • factor_ – The number of compute batches to split operations into. Default: 0.

  • concatOnVirtualGraphChange_ – Indicate to break batch serialization chains (true) when the virtual graph changes (by concatenating the compute batches to the local batch). Default: true.

  • concatOnExecutionPhaseChange_ – Indicate to break batch serialization chains (true) when the execution phase changes (by concatenating the compute batches to the local batch). Default: true.

  • concatOnPipelineStageChange_ – Indicate to break batch serialization chains (true) when the pipeline stage changes (by concatenating the compute batches to the local batch). Default: true.

  • transformContext_ – An experimental value to control when batch serialization is applied. Default: ::Fwd.

  • method_ – An experimental value to control how batch serialization is applied. Default: BatchSerializationMethod::UnrollDynamic.

  • batchSchedule_ – An experimental value that changes how operations are scheduled. Default: BatchSerializationBatchSchedule::Isomorphic.

Public Members

int factor = 0

The number of compute batches to split operations into.

bool concatOnVirtualGraphChange = true

Break batch serialization chains when the virtual graph changes (by concatenating the compute batches to the local batch).

bool concatOnExecutionPhaseChange = true

Break batch serialization chains when the execution phase changes (by concatenating the compute batches to the local batch).

bool concatOnPipelineStageChange = true

Break batch serialization chains when the pipeline stage changes (by concatenating the compute batches to the local batch).

BatchSerializationTransformContext transformContext = BatchSerializationTransformContext::Fwd

Experimental value to control when batch serialization is applied.

BatchSerializationMethod method = BatchSerializationMethod::UnrollDynamic

Experimental value to control how batch serialization is applied.

BatchSerializationBatchSchedule batchSchedule = BatchSerializationBatchSchedule::Isomorphic

Experimental value that changes how operations are scheduled.

struct ExecutionPhaseSettings

A structure containing ExecutionPhase settings.

Public Functions

ExecutionPhaseSettings() = default

Default constructor for ExecutionPhaseSettings.

inline ExecutionPhaseSettings(int phases_, bool stages_, ExecutionPhaseIOSchedule weightIOSchedule_, ExecutionPhaseIOSchedule activationIOSchedule_, ExecutionPhaseIOSchedule optimizerStateIOSchedule_, ExecutionPhaseIOSchedule accumulatorIOSchedule_, ExecutionPhaseSchedule schedule_)

Constructor for ExecutionPhaseSettings.

Parameters
  • phases_ – The number of execution phases for the whole model. Default=0.

  • stages_ – The number of overlapping stages:

    • 1: Parallel streaming memory, default for 1 IPU per replica.

    • 2: PingPong between 2 IPUs, default for 2 or more IPUs per replica (Default).

  • weightIOSchedule_ – The execution phase IO schedule for weight tensors. Default: ExecutionPhaseIOSchedule::Preload.

  • activationIOSchedule_ – The execution phase IO schedule for activation and gradient tensors. Default: ExecutionPhaseIOSchedule::Preload.

  • optimizerStateIOSchedule_ – An experimental value to control when batch serialization is applied. Default: ExecutionPhaseIOSchedule::OnDemand.

  • accumulatorIOSchedule_ – An experimental value to control how batch serialization is applied. Default: ExecutionPhaseIOSchedule::Preload.

  • schedule_ – An experimental value that changes how operations are scheduled. Default: ExecutionPhaseSchedule::Interleaving.

Public Members

int phases = 0

Number of ExecutionPhases for the whole model.

int stages = 2

Number of overlapping stages.

  • 1: Parallel streaming memory, default for 1 IPU per replica.

  • 2: PingPong between 2 IPUs, default for 2 or more IPUs per replica.

ExecutionPhaseIOSchedule weightIOSchedule = ExecutionPhaseIOSchedule::Preload

The execution phase IO schedule for weight tensors.

ExecutionPhaseIOSchedule activationIOSchedule = ExecutionPhaseIOSchedule::Preload

The execution phase IO schedule for activation and gradient tensors.

ExecutionPhaseIOSchedule optimizerStateIOSchedule = ExecutionPhaseIOSchedule::OnDemand
ExecutionPhaseIOSchedule accumulatorIOSchedule = ExecutionPhaseIOSchedule::Preload
ExecutionPhaseSchedule schedule = ExecutionPhaseSchedule::Interleaving
struct ReplicatedCollectivesSettings

A structure containing settings for replicated collective operations.

Public Functions

ReplicatedCollectivesSettings(bool prepareScheduleForMergingCollectives = false, bool mergeAllReduceCollectives = false, bool mergeReduceScatterCollectives = false, bool mergeAllGatherCollectives = false)

Constructor for the ReplicatedCollectivesSettings struct.

Parameters
  • prepareScheduleForMergingCollectives – Insert constraints into the schedule such that collectives which can be merged occur one right after the other. true to insert constraints, false otherwise. Default: false.

  • mergeAllReduceCollectives – Identify allreduce operations which can be scheduled at the same time, and perform them as one larger operation to better utilize the bandwidth between replicas. true to identify operations, false otherwise. Default: false.

std::size_t hash() const

Public Members

bool prepareScheduleForMergingCollectives = false
bool mergeAllReduceCollectives = false
bool mergeReduceScatterCollectives = false

Identifies reduce-scatter operations which can be scheduled at the same time, and performs them as one larger operation so as to better utilize the bandwidth between replicas.

bool mergeAllGatherCollectives = false

Identifies allgather operations which can be scheduled at the same time, and performs them as one larger operation so as to better utilize the bandwidth between replicas.

struct SessionOptions

A structure containing user configuration options for the Session class.

Public Functions

inline bool explicitPipeliningEnabled() const

Enable explicit pipelining.

Determined from values for enablePipelining, useHostCopyOpsfault and enableExplicitMainLoops.

inline bool implicitPipeliningEnabled() const

Enable implicit pipelining.

Determined from values for enablePipelining, useHostCopyOpsfault and enableExplicitMainLoops.

inline void enableExplicitIR(bool enable)

Enable explicit representations in the IR (code paths).

Enabled if true, otherwise not.

bool shouldDelayVarUpdates() const
int64_t getGlobalReplicationFactor() const

Get the global replication factor.

Returns

  • If enableDistributedReplicatedGraphs is true, then return globalReplicationFactor.

  • If enableReplicatedGraphs is true, then return replicatedGraphCount.

  • otherwise return 1.

unsigned getAccumulationFactor() const

Get the gradient accumulation factor.

Throws an error if gradient accumulation is not enabled (enableGradientAccumulation is false) and the factor (accumulationFactor) is set to >1.

Returns

The accumulation factor.

unsigned getBufferingDepth(const TensorId &id, bool rearrangedOnHost)
bool autoRecomputationEnabled() const

Returns true if auto-recomputation is enabled, false otherwise.

inline SessionOptions()

Constructor for SessionOptions.

Public Members

std::string logDir

A directory for log traces to be written into.

std::set<std::string> dotChecks = {}

When to write .dot files during IR construction.

int firstDotOp = 0

The ops written to the .dot file will be a part of the schedule, controlled by firstDotOp and finalDotOp.

In particular, it will be [max(0, firstDotOp), min(N ops in IR, finalDotOp)).

int finalDotOp = 10000

See firstDotOp.

bool dotOpNames = false

Enable inclusion of the op name in the .dot file (the op type is always exported).

Enabled when true. Default: false.

bool exportPoplarComputationGraph = false

Enable export of Poplar computational graph.

Enabled when true. Default: false.

bool exportPoplarVertexGraph = false

Enable export of Poplar vertex graph.

Enabled when true. Default: false.

bool separateCallOpPdfs = true

Enable creation of separate PDFs for each subgraph when generating PDFs of IR graphs.

Enabled when true. Default: true.

bool enableOutlining = true

Enable outlining.

This identifies and extracts repeated parts of computational graph into subgraphs. Enabled when true. Default: true.

bool enableOutliningCopyCostPruning = true

Enable inclusion of the cost of copying of cached sections should be in the outlining cost model.

Enabled when true. Default: true.

float outlineThreshold = 1.0f

Specify the incremental value that a sub-graph requires, relative to its nested sub-graphs (if any), to be eligible for outlining.

A high threshold results in fewer sub-graphs being outlined, a negative value results in all being outlined. The gross value of a sub-graph is the sum of its constituent ops’ Op::getSubgraphValue() values. To disable outlining, it is better to set enableOutlining to false than to set this value to infinity. The default value of 1.0f results in all high value operations such as convolution being cached, but standalone low value operations such as ReLU will not be.

Default: 1.0f.

float outlineSequenceBreakCost = 10000.0f

Specify the penalty applied to outlining potential sub-graphs if the sub-graph to be created breaks up a sequence of operations that are more efficient (for example for overlapping compute and exchange) when outlined together.

Default: 10000.0f.

SubgraphCopyingStrategy subgraphCopyingStrategy = SubgraphCopyingStrategy::OnEnterAndExit

Specify how copies for inputs and outputs for subgraphs are lowered.

Setting this value to SubgraphCopyingStrategy::JustInTime may save memory at the cost of fragmenting subgraphs into multiple Poplar functions. This may be particularly useful when a number of weight updates are outlined in one subgraph, as it may prevent multiple weight tensors from being live at the same time inside the subgraph.

Default: SubgraphCopyingStrategy::OnEnterAndExit.

RecomputationType autoRecomputation = RecomputationType::None

Enable recomputation of operations in the graph in the backward pass.

This will reduce model size at the cost of computation cycles.

Default: RecomputationType::None (no recomputation).

MergeVarUpdateType mergeVarUpdate = MergeVarUpdateType::None

Enable merging of VarUpdates into groups of VarUpdates, by flattening and concatenating variable tensors and updating tensors.

Default: MergeVarUpdateType::None (no merging).

int64_t mergeVarUpdateMemThreshold = 1000000

Specify the memory threshold for VarUpdateOp merging algorithms.

The MergeVarUpdateType::AutoLoose and MergeVarUpdateType::AutoTight VarUpdateOp merging algorithms have a threshold on the total memory of variable tensors to merge for updating. Defined as total memory in bytes.

Default: 1000000.

int64_t looseThresholdAtPeak = 8000

Specify the threshold at peak used in the calculation of the absolute threshold in the MergeVarUpdateType::AutoLoose VarUpdateOp merging algorithm.

 min(mergeVarUpdateMemThreshold, liveAtPeak - liveCurrently +
looseThresholdAtPeak)

where:

  • liveAtPeak is an estimate of the maximum live memory of the computation; and

  • liveCurrently is an estimate of the live memory where the threshold is being used to determine whether to schedule or postpone a VarUpdateOp.

Default: 80000.

bool rearrangeAnchorsOnHost = true

Enable rearrangement (in memory) of anchor tensors to be done on the host.

Before anchor tensors are streamed from device to host, they are not necessarily arranged in memory as required when they are to be copied from host stream to host. This can be done on the device or on the host.

Default: true (Rearrangement done on host to save memory, but often at the expense of cycles, especially for larger anchor tensors.).

bool rearrangeStreamsOnHost = false

Enable rearrangement (in memory) of stream tensors to be done on the host.

Before stream tensors are streamed from host to device, they are not necessarily arranged in memory as required when they are to be copied from host stream to device. This can be done on the device or on the host.

Default: false (Rearrangement done on device).

bool enablePrefetchDatastreams = true

Enable prefetching for input data streams.

Poplar will speculatively read data for a stream before it is required in order to allow the ‘preparation’ of the data to occur in parallel with compute. Enabled when true. Default: true.

unsigned defaultBufferingDepth = 1

Specify the default buffering depth value used for streams that are not re-arranged on the host.

For tensors that are rearranged on the host, a buffering depth of 1 will always be used. This default value can be overridden via bufferingDepthMap.

unsigned defaultPrefetchBufferingDepth = initialDefaultPrefetchBufferingDepthValue

Deprecated:

This session option name has been deprecated and will be removed in a future release.

Please use the alias defaultBufferingDepth instead.

std::map<TensorId, unsigned> bufferingDepthMap

This mapping can be used to set stream-specific buffering depths.

The buffering depth could be thought of as being the size of a circular buffer that feeds data to and from Poplar. A buffering depth greater than 1 may improve the performance due to increased parallelisation but comes at the cost of increasing the memory footprint. Streams for tensors that have no entry in this map will default to 1 (if a tensor is rearranged on host) or defaultBufferingDepth (if a tensor is not rearranged on host). Specifying a tensor that gets rearranged on host in this map will throw an error.

std::map<TensorId, unsigned> prefetchBufferingDepthMap

Deprecated:

This session option name has been deprecated and will be removed in a future release.

Please use the alias bufferingDepthMap instead.

bool enableNonStableSoftmax = false

Enable the non-stable softmax Poplar function.

By default, the stable softmax Poplar function is used. The input tensor to softmax, \(x\), is preprocessed by subtracting \(max(x)\) from each element before computing the exponentials, ensuring numerical stability. If the inputs to the softmax operations are small enough to not cause overflow when computing the exponential, then the non-stable version can be enabled instead, to increase the speed.

Default: false (not enabled).

bool enableReplicatedGraphs = false

Enable replication of graphs. Default: false (not enabled).

bool enableGradientAccumulation = false

Enable gradient accumulation. Default: false (not enabled).

ReductionType accumulationAndReplicationReductionType = ReductionType::Sum

Specify how gradients are reduced when using gradient accumulation and graph replication.

Default: ReductionType::Sum.

MeanReductionStrategy meanAccumulationAndReplicationReductionStrategy = MeanReductionStrategy::Post

Specify when to divide by a mean reduction factor when accumulationAndReplicationReductionType is set to ReductionType::Mean.

Default: MeanReductionStrategy::Post.

int64_t replicatedGraphCount = 1

Specify the number of model replications.

If enableReplicatedGraphs is true, replicatedGraphCount will set the number of model replications. For example, if the model uses 1 IPU, a replicatedGraphCount of 2 will use 2 IPUs. If the model is pipelined across 4 IPUs, a replicatedGraphCount of 4 will use 16 IPUs in total. Therefore, the number of IPUs requested must be a multiple of replicatedGraphCount. If the training is done across multiple instances of the program then the replicatedGraphCount is the number of replicas for this instance.

int64_t accumulationFactor = 1

Specify the number of micro-batches to accumulate before applying the varUpdate.

VirtualGraphMode virtualGraphMode = VirtualGraphMode::Off

Specify how to place ops on virtual graphs to achieve model parallelism, either manually using model annotations, or automatically.

Default: VirtualGraphMode::Off.

std::vector<float> virtualGraphSplitRatios

Specify split ratios when VirtualGraphModel::Auto enabled.

These values represent split ratios in each device and each of the values is in range (0, 1).

For example, to uniformly split the whole graph on 4 IPUs, the value should be [0.25, 0.25, 025, 0.25].

bool enablePipelining = false

Enable pipelining of virtual graphs. Default: false (not enabled).

SyntheticDataMode syntheticDataMode = SyntheticDataMode::Off

Specify whether to use real or synthetic data to initialize input tensors.

Streaming to/from the host is only enabled for SyntheticDataMode::Off which indicates that real data is being used.

Default: SyntheticDataMode::Off.

bool instrumentWithHardwareCycleCounter = false

Add instrumentation to the program to count the number of device cycles (of a single tile, on a single IPU) that the main program takes to execute.

Expect this to have a small detrimental impact on performance.

std::set<Instrumentation> hardwareInstrumentations = {Instrumentation::Outer}
bool disableGradAccumulationTensorStreams = false

Disable saving of weight gradient tensors off the device.

If true, the weight gradient tensors are not saved off the device when devicex.weightsFromHost() is called.

Note

This option is overridden if syntheticDataMode is not SyntheticDataMode::Off.

Note

Weight gradient tensors that are also optimiser tensors will only be disabled if both disableGradAccumulationTensorStreams and disableOptimizerStateTensorStreams are true.

bool disableOptimizerStateTensorStreams = false

Disable streaming of optimizer tensors.

If true, streaming of optimizer tensors is disabled. This setting can be used to conserve memory if you are not interested in checkpointing the optimizer state.

Note

Weight gradient tensors that are also optimiser tensors will only be disabled if both disableGradAccumulationTensorStreams and disableOptimizerStateTensorStreams are true.

bool compileEngine = true

Setting to only build the Poplar graph but not compile not.

If false, the backend will build the Poplar graph but not compile it into an Engine. In this case, no execution can be performed, and nothing can be transferred to the device. API calls which retrieve information from the graph building stage, such as tile mapping introspection, can still be used.

bool constantWeights = true

Specify an optimization for an inference session to have constant weights.

Set this option to false in order to change the weights with a call to Session::resetHostWeights() after the session has been prepared. This option has no effect on a training session.

Default: true.

bool enableEngineCaching = false

Enable Poplar executable caching.

The file is saved to the location defined with cachePath. The file will be in the PopEF format. This means that it can be used to run inference using the Triton Inference Server because Graphcore provides a backend to it. See the Poplar Triton Backend user guide for more information.

Default: false (not enabled).

bool enableVariablesCaching = true

Enable variable caching.

This means that the caching process will save variables as additional PopEF blobs to the file location defined with cachePath. If PopART will require data for variables (during cache reading process), they will be automatically read from the cache file.

Note, turning this off allows a PopART Session to optimise the host memory it consumes during model runtime. Specifically, weightsToHost() can write directly to the IR tensor data buffers. If the option were on, this would not be safe and the session would have to create separate buffers to write the fetched data to.

Default: true (enabled).

std::string cachePath = "session_cache"

Folder to save the poplar::Executable to.

bool enableFloatingPointChecks = false

Enable that exceptions are thrown when floating point errors occur.

Default: false (not enabled).

bool enableStochasticRounding = false

Enable stochastic rounding.

PopART will set the Poplar engine option target.deterministicWorkers to true if this option is set and to false if it is not set. Adding a value for “target.deterministicWorkers” to SessionOptions::engineOptions overrides this behaviour.

Default: false (not enabled).

bool _enableRngStateManagement = false
ExecutionPhaseSettings executionPhaseSettings

Configuration settings for execution phases.

AccumulateOuterFragmentSettings accumulateOuterFragmentSettings

Configuration setting for operations in the accumulate outer fragment.

bool explicitRecomputation = false

Enable explicit recomputation.

Default: false (not enabled).

NumIOTiles numIOTiles

Number of IPU tiles dedicated to IO.

bool aliasZeroCopy = false

Enable zero-copy for subgraphs.

BatchSerializationSettings batchSerializationSettings

Configuration setting for batch serialization.

AutodiffSettings autodiffSettings

Configuration settings for the autodiff transform.

bool delayVarUpdates = true

Options to delay variable updates as much as possible.

bool scheduleNonWeightUpdateGradientConsumersEarly = false
bool enableFullyConnectedPass = true

Enable the global fullyConnectedPass option for matmuls.

See also

poplin::matMul(poplar::Graph, poplar::Tensor, poplar::Tensor, poplar::program::Sequence, poplar::Type, poplar::DebugContext, poplar::OptionFlags, matmul::PlanningCache).

bool enableSerializedMatmuls = true

Enable/disable the serializing of matmuls.

std::string partialsTypeMatMuls

Set the partials type globally for matmuls.

Can be overridden individually with Builder.setPartialsType(). Valid values are "float" and "half". By default, this is not set, so no global partials type is imposed.

bool enableStableNorm = false

If true, computes the mean first and subtracts the activations from it before computing the variance.

The implementation with this flag set to true is slower than when set to false. The stable version requires the first order moment to be estimated and applied to the sample set before the second order central moment is calculated.

std::map<std::string, std::string> engineOptions

Poplar engine options.

std::map<std::string, std::string> convolutionOptions

Poplar convolution options.

std::map<std::string, std::string> lstmOptions

Poplar LSTM options.

std::map<std::string, std::string> matmulOptions

Poplar matmul options.

std::map<std::string, std::string> reportOptions

Poplar reporting options.

std::map<std::string, std::string> gclOptions

GCL options.

ExperimentalSettings experimentalSettings

Configuration setting for custom transform applier.

std::vector<std::string> customCodelets

List of codelet files (with file extension) to be added to the Poplar graph.

See the Poplar documentation for poplar::Graph for more information.

std::vector<TensorId> updatableNamedBuffers

List of model named buffers that can be updated with call to copyNamedBuffersToDevice().

This allows to update just a subset of model weights instead of all or them as it happens with copyWeightsToDevice() call.

std::string customCodeletCompileFlags

Compile flags for the custom codelets.

For example -g to generate debug info. See the Poplar documentation for poplar::Engine for more information.

double timeLimitScheduler = 1e9

The maximum allowed time (in seconds) that can be spent searching for a good graph schedule before a solution must be returned.

int64_t swapLimitScheduler = static_cast<int64_t>(1e9)

The maximum number of improving steps allowed by the scheduling algorithm before a solution must be returned.

std::string serializedPoprithmsShiftGraphsDir = {}

The directory to serialize Poprithms graphs to.

PopART uses Poprithms for scheduling PopART graphs. The Poprithms graphs created for scheduling can be optionally serialised (written to file). If serializedPoprithmsShiftGraphsDir is empty, then the graphs will not be serialised. The names of serialization files will be poprithms_shift_graph_i.json for the lowest non-existing values of i. The directory must already exist, PopART will not create it.

std::string kahnTieBreaker = "greedy"

Specify which method is used to control how ops are scheduled.

The initial scheduling is done with Kahn’s algorithm. When several ops are free to be scheduled, this controls which method is used.

Options are described in the Poprithms KahnTieBreaker enum.

size_t transitiveClosureOptimizationThreshold = {100000}

Specify the transitive closure optimization threshold.

The transitive closure optimization pass can significantly accelerate the scheduler. It does not, in general, affect the final schedule returned. It is run between initialization with Kahn’s algorithms and the shifting swaps. The transitive closure optimization pass is O(nOps^2) and so should not be used for extremely large graphs. If a graph is above this threshold, the transitive closure optimization pass is not run.

bool decomposeGradSum = false

Enable replacement of single sums of partial gradients with a tree of additions.

This can reduce max liveness at the cost of extra cycles. A typical use case for this would be if a large weight tensor is used as an input to many operations.

Default: false (not enabled).

ReplicatedCollectivesSettings replicatedCollectivesSettings

Control the behavior of different collective operations.

bool enableDistributedReplicatedGraphs = false

Enable training with Poplar replicated graphs across multiple PopART instances.

Default: false (not enabled).

int64_t globalReplicationFactor = 1

The total number of replicas in a multi-instance, replicated-graph training session (this should be left as the default value (1) if distributed replicated graphs are disabled).

This value includes local replication.

int64_t globalReplicaOffset = 0

The first replica index that this PopART instance is running.

bool groupHostSync = false

Specify to group the streams from the host to the device at the beginning of the schedule, and the streams from the device to the host at the end of the schedule.

This trades off memory usage for speed.

When true

, tensors will stay live for longer.

Default:

false (not enabled).

Note

This setting has no effect when useHostCopyOps is enabled (true).

bool strictOpVersions = true

Enable strict op version checks.

Strict op version checks will throw an error if the exact version of an op required for the model opset is not supported. Turning this check off will cause PopART to fall back to the latest implementation of the op that is supported.

Default:

true (enabled).

Warning

Turning off these checks may cause undefined behaviour.

bool opxAliasChecking = false

Enable running Opx checks to verify that IR tensor aliasing information corresponds to the lowered Poplar tensor aliasing.

Default: false (not enabled).

bool opxModifyChecking = false

Enable running Opx checks to verify that IR tensor modification information corresponds to the lowered Poplar tensor modifications.

Default: false (not enabled).

bool useHostCopyOps = false

Enable use of IR graph operations for data and anchor streams.

Default: false (not enabled).

bool enableEfficientOverlapIOTopoCons = false

Enable simplified and equivalent overlapIO constraints.

Suppose we have the N bins in each of three stage(8 for before loop /7 for insdie loop /6 for after loop), and L ops for each bins, vallina implementaiton of overlapio creates topocons of complexity O(N*N*L*L).

To make sure InitOps in each step are scheduled before HostLoadOps, we only need to keep topo constrains in each bin and let the last of op of each bin Bin0 is scheduled before the first op of Bin1 next to Bin0. Then total complexity O(N*N*L*L) is reduced to (N*L).

Default: false (not enabled).

bool enableLoadAndOffloadRNGState = false

Enable load and offload of device RNG state from host.

Default: false (not enabled).

TensorLocationSettings activationTensorLocationSettings = TensorLocationSettings{TensorLocation(), 2, 8192}

Tensor location settings for activation/gradient tensors.

TensorLocationSettings weightTensorLocationSettings = TensorLocationSettings{TensorLocation(), 2, 8192}

Tensor location for weight tensors.

TensorLocationSettings optimizerStateTensorLocationSettings = TensorLocationSettings{TensorLocation(), 2, 8192}

Tensor location for optimizer state tensors.

TensorLocationSettings accumulatorTensorLocationSettings = TensorLocationSettings{TensorLocation(), 2, 8192}

Tensor location for gradient accumulator tensors.

std::map<TensorId, TensorLocation> tensorLocationSettingsOverride

Override tensor location for specific tensors by setting tensor locations for specific tensor ID values.

AutomaticLossScalingSettings automaticLossScalingSettings

Settings to enable and configure the automatic loss scaling behaviour when training.

Note

Automatic loss scaling is in preview. It is well tested and enabled in some of our example applications, but may not behave as expected in all models. Recommendation: if your model with automatic loss scaling enabled does not converge or triggers a compilation error, then you will need to set the loss scale manually.

DeveloperSettings developerSettings

Settings for developers to configure testing and benchmarking.

bool enableSupportedDataTypeCasting = true

Enable casting to supported data types.

If enabled (true), casts any tensor of unsupported data types to supported data types when lowering to Poplar. Currently, this implies casting:

  • INT64 -> INT32

  • UINT64 -> UINT32 The cast will throw an error for incompatible data types and over/underflows, and will warn about narrowing casts.

Default: true (enabled).

bool enableExplicitMainLoops = false

Enable explicit main loop transformation, and disable implicit training loops.

Note

This will be deprecated and enabled by default.

bool groupNormStridedChannelGrouping = false

Enable fast math mode for group norms.

Group norms have a fast math mode which changes the implementation to run faster on IPU but as a consequence is incompatible with other implementations (so for running trained weights on host). The default (false) is to use the correct, but slightly slower mode.

std::function<void(int, int)> compilationProgressLogger

Callback function used to indicate PopART compilation progress.

The function should not block. All calls to the callback function will be made from the main thread so blocking in the callback will block compilation from progressing.

If this logger is not set then compilation progress will be printed on the info channel.

Param int

The progress value.

Param int

The maximum value for the progress.

int compilationProgressTotal = 100

Total progress ticks until compilation complete.

bool enableMergeExchange = true

Enable merging remote and host IO operations to facilitate IO overlap.

true to enable, otherwise false.

Default=true.

bool ensureFp32LossScaleTensor = false

Ensure that the loss scale tensor is fp32 and that this is combined with fp16 activations as late as possible to produce the first fp16 activation gradients.

This makes it possible to choose a loss scale value greater than max(fp16). This is also recommended when automatic loss scaling is enabled. Only compatible with models that have an fp16 loss scale tensor. true ensures that the loss scale tensor is fp32.

Default: false.

bool enableInplaceAmbiguityChecking = false

Enable creation of an AliasModel object for each graph and run the Poprithms ambiguity checker on it.

This throws an error if the graph has a potential inplacing ambiguity.

See poprithms::memory::inplace::Graph::AmbiguityStatus for more info on what constitutes an ambiguity.

If set to true, AliasModel object is created for each graph and the the Poprithms ambiguity checker is run on it. No ambiguity checking is performed if this option is set to false (default). However inplace fallbacks will occur if necessary.

bool createImplicitPipeliningFwdOnlyProgram = false

Deprecated:

Create a custom program containing the forward pipeline only.

bool throwIfLog2ScaleTensorNotInRange = true

If set to true, throw a Poplar error if any fused ops that consume a log2 scale tensor receive a log2 scale tensor value not in the integer range [-32, 32).

If set to false, no error is thrown. However, note that this may lead to undefined behaviour if the value of the log2 scale is outside the range.

bool enableConstantFoldingOfMultipleConsumers = true

If set to false, disable constant folding on ops if any input have multiple consumers.

Default=true.

bool useLoopCandidateCreator = false

Use loop candidate creator for constant if one exsits.

Default=false.

bool stashAllTensorsInferencePipeline = false

Stash all tensors when inference pipeline.

Default=false.

struct ExperimentalSettings

Public Members

std::map<std::string, std::vector<std::string>> customTransformApplierSettings

Custom transform applier settings.

Enable to insert custom transform sequence at predefined checkpoint. Multiple checkpoint names and transform names can be passed for different model configurations.

The predefined checkpoint names are: FWD0: Initial IR immediately after lowering from ONNX to the IR.

FWD1: After the pre-alias patterns have been applied to FWD0.

BWD0: After growing the backward pass (including the optimiser step). Note this happens before optimiser decomposition, so the optimiser will appear as a single special op rather than the many ops that implement it.

PREALIAS: After pre-alias transforms have been applied to BWD0.

MAINLOOPS: After the MainLoops transform has been applied. This transform adds explicit loop ops to the IR for device iterations (batches per step) and gradient accumulation.

FINAL: The final IR after preparation.

The transform names are defined by PopART and users.

For example to execute ‘Transform A’ and ‘Transform B’ at ‘Fwd0’ checkpoint and exectue ‘Transform C’ at ‘Fwd1’ checkpoint:

{ “Fwd0”: [ “Transform A”, “Transform B” ], “Fwd1”: [ “Transform C” ] }

Note

This setting is experimental for inference and may change.

bool createHostTransferableTensorWithOffset = false

Accumulate the created tensors bytes, rotate the start tile of the next tensor to balance the tile mapping.

Especially when there are a lot of small input tensors, enable it can avoid mapping on tile0 all the time.

Default=false.

class NumIOTiles

A wrapper class for the SessionOptions::numIOTiles option that permits any int value and has an ‘unassigned’ state.

Public Functions

NumIOTiles()

Constructor.

NumIOTiles(int numIOTiles)

Constructor.

Parameters

numIOTiles – The number of IPU tiles dedicated to IO.

bool operator==(const int &rhs) const

Compare with int.

operator int() const

Auto convert to int.

NumIOTiles &operator=(const int &x)

Assign value using int.

struct TensorLocationSettings

A structure containing user configuration for cache/offloading settings.

Public Functions

TensorLocationSettings() = default

Constructor.

TensorLocationSettings(TensorLocation location_, int minElementsForOffChip_ = 2, int minElementsForReplicatedTensorSharding_ = 8192)

Constructor.

Parameters
  • location_ – The tensor location information.

  • minElementsForOffChip_ – The minimum number of elements below which offloading won’t be considered.

  • minElementsForReplicatedTensorSharding_ – The minimum number of elements necessary for replicated tensor sharding.

TensorLocationSettings(TensorStorage storage_, int minElementsForOffChip_ = 2, int minElementsForReplicatedTensorSharding_ = 8192)

Constructor.

Parameters
  • storage_ – The tensor storage information.

  • minElementsForOffChip_ – The minimum number of elements below which offloading won’t be considered.

  • minElementsForReplicatedTensorSharding_ – The minimum number of elements necessary for replicated tensor sharding.

Public Members

TensorLocation location = TensorLocation()

The default tensor location for this tensor type.

int minElementsForOffChip = 2

The minimum number of elements below which offloading won’t be considered.

int minElementsForReplicatedTensorSharding = 8192

A minimum number of elements below which replicated tensor sharding won’t be considered.

#include <popart/variablesettings.hpp>
class VariableSettings

A class to dictate behaviour of variables and reductions of such across multiple graphs.

Public Functions

void verify()

Runs test to see if the VariableSettings are invalid, and throws an error if so.

const CommGroup getSharedVariableDomain() const
Returns

the CommGroup sharedVariableDomain of this VariableSettings.

ReplicaGrouping getReplicaGrouping(unsigned numReplicas) const
Parameters

numReplicas – The number of replicas in the IR this is used in.

Returns

the ReplicaGrouping domain of this VariableSettings.

bool isUsingCommGroup() const
Returns

whether the VariableSettings were initialised using a CommGroup or a stride.

CommGroupType getCommGroupType() const
Returns

the CommGroupType. The value of this is invalid if VariableSettings::isUsingCommGroup returns false.

unsigned getStride() const
Returns

the stride. The value of this is invalid if VariableSettings::isUsingCommGroup returns true.

unsigned getGroupSize() const
Returns

the replica group size.

inline VariableRetrievalMode getRetrievalMode() const
Returns

the VariableRetrievalMode retrievalMode of this VariableSettings.

VariableSettings()

“Default” constructor, defaults CommGroup to [All, 0] and retrievalMode to OnePerGroup.

VariableSettings(CommGroup sharedVariableDomain_)

Defaults VariableRetrievalMode to OnePerGroup.

VariableSettings(VariableRetrievalMode retrievalMode_)

Defaults CommGroup to [All, 0].

VariableSettings(CommGroup sharedVariableDomain_, VariableRetrievalMode retrievalMode_)

Entirely custom VariableSettings.

VariableSettings(unsigned stride, unsigned groupSize)
VariableSettings(unsigned stride, unsigned groupSize, VariableRetrievalMode retrievalMode)
unsigned numReplicasReturningVariable(unsigned replicaCount) const

Calculate the number of replicas that will return this variable.

Parameters

replicaCount – Number of global replicas.

Returns

Number of variables returned.

unsigned getGroupCount(unsigned replicaCount) const
Parameters

replicaCount – The replicationFactor of the graph.

Returns

The number of groups given the replicaFactor and the VariableSettings.

unsigned getStride(unsigned replicaCount) const
Parameters

replicaCount – The replicationFactor of the graph.

Returns

The stride between each member of a group.

unsigned getRealGroupSize(unsigned replicaCount) const

Because CommGroup’s don’t have a defined group-size if the type is All or None, this function will return a group-size that is always accurate, based on replicas.

Parameters

replicaCount – The replication factor

Returns

The actual number of replicas in a group

unsigned getGroupRepresentative(unsigned group) const

Get the default first member of a group.

Parameters

group – The group to return the representative for.

Returns

The representative replica of this group.

Shape shapeOnReplica(Shape full_shape, unsigned replicaCount, const TensorId name) const

The shape Onnx reads holds an extra outer dimension in certain cases, where the outer dimension represents the number of returning replica variables.

This function takes an Onnx full-shape and removes the outer dimension safely (ie. checks if the outer dimension matches an expected outer dimension). A quick-function to avoid duplicate code.

Parameters
  • full_shape – The shape as presented by Onnx.

  • replicaCount – The local replication factor, used to calculate the return factor.

  • name – The TensorId of the function, used to give good error feedback.

Returns

The shape of the data on the replica.

Shape shapeOnHost(Shape replica_shape, unsigned replicaCount) const

Takes the shape of a tensor on a replica and returns it’s full ONNX shape.

This is the inverse operation to shapeOnReplica

Parameters
  • replica_shape – The shape of the data on a replica.

  • replicaCount – The local replication factor, used to calculate the return factor.

Returns

The shape as presented by Onnx.

std::vector<std::vector<std::int64_t>> groups(unsigned replicaCount) const

This function returns a set of vectors where each vector contains all the replicaId’s of the replicas with a sharedVariableDomain given the variableSettings and the replicaCount.

Parameters

replicaCount – The local replication factor

Returns

A set of sets, such that set.at(a).set(b) is member nr. b of group a, and set.size() is the number og groups and set.at(A).size() is the size of the group.

bool operator==(const VariableSettings &other) const

Compare two variable-settings.

Parameters

otherVariableSettings to compare these settings to.

Returns

True if all internal elements are the same

bool operator!=(const VariableSettings &other) const

Compare two variable-settings.

Parameters

otherVariableSettings to compare these settings to.

Returns

False if all internal elements are the same

enum class popart::VariableRetrievalMode

Enum type that describes how to retrieve variables from the replicas.

Each replica is in a group defined by the VariableSettings::sharedVariableDomain. Replicas within a group have variables initialized with the same values.

Values:

enumerator OnePerGroup = 0

Returns one variable per group (defined by the VariableSettings::sharedVariableDomain CommGroup), automatically returns the first replica of each group, where first means the one with the lowest replica ID.

enumerator AllReduceReplicas

As OnePerGroup, but performs an AllReduce among the replicas in the same group according to VariableSettings::sharedVariableDomain !!! CURRENTLY UNSUPPORTED.

enumerator AllReplicas

Returns all replica Weights.

#include <popart/commgroup.hpp>
class CommGroup

Class to specify sub-groups of replicas.

Examples of derived sub-groups:

  • IPU-link domain sub-rack:

type == Consecutive && replicaGroupSize == 64/replica-size/N

where N is a power of two and replicaGroupSize > 1.

  • Complete IPU-link domain / full rack:

type == Consecutive && replicaGroupSize == 64/replica-size

  • Using GW-links only:

type == Orthogonal && replicaGroupSize == numberOfIpuLinkDomains

Public Functions

CommGroup()

Default CommGroup constructor.

Sets type to CommGroupType::All and replicaGroupSize to 0.

inline CommGroup(CommGroupType type, unsigned groupSize)

Construct CommGroup.

Parameters
  • groupType – The replica group type.

  • groupSize – The replica group size.

explicit CommGroup(const ReplicaGrouping &grouping)

Construct CommGroup from a ReplicaGrouping.

Parameters

grouping – The replica grouping.

ReplicaGrouping toReplicaGrouping(unsigned numReplicas) const

Convert this CommGroup to a ReplicaGrouping.

Parameters

numReplicas – The number of replicas to pass to create the replica grouping with.

Returns

The replica grouping.

bool operator==(const CommGroup &other) const
bool operator!=(const CommGroup &other) const

Public Members

CommGroupType type = CommGroupType::All

Replica group type.

unsigned replicaGroupSize = 0

Replica group size.

enum class popart::CommGroupType

PopART equivalent of GCL CommGroupType.

Each of these enumeration constants has a corresponding GCL CommGroupType value.

Values:

enumerator All = 0

All replicas viewed as one group, replica group size is ignored.

enumerator Consecutive

Groups are consecutive in replicas.

If there are N replicas denoted {0, ... N-1} and the group size is k, then there are N/k groups of size k as {0, 1, ... k-1}, {k, ... 2k-1} ... {N-k-1, ... N-1}.

enumerator Orthogonal

Groups are sliced orthogonal to the replica ordering.

If there are N replicas denoted {0, ... N-1} and the group size is k, then there are m = N/k groups of size k as {0, m, 2m, ...}, {1, m+1, 2m+1, ...} ... {m-1, 2m-1, ... N-1}.

enumerator None

Each replica is in its own group; the replica group size is ignored.

enumerator N

Number of values.

14.2. Data input and output (IStepIO)

#include <popart/istepio.hpp>
class IStepIO

An abstract base class through which input and output data is passed to a Session (see Session::run).

Data is passed via buffers. In the case of buffers returned by IStepIO::in, PopART reads from these buffers. In the case of IStepIO::out, PopART writes to these buffers. The IStepIO::inComplete() and IStepIO::outComplete() functions are called by PopART to signal it is done with an input or output buffer.

An IStepIO implementation should conceptually implement a rolling queue of active buffers for each input and output tensor. Every successful call to IStepIO::in should yield a new data buffer for PopART to read from and add it to the head of the conceptual queue. Conversely, every call to IStepIO::inComplete() should be taken to mean that the buffer at the tail-end of the queue is no longer being used by PopART. This buffer is removed from the conceptual queue.

Note that a IStepIO::in call with the prefetch flag set is only considered successful when it returns data.

Output works analogously to input.

The expected total number of input (or output) buffers that are ‘completed’ for a tensor in one Session::run call is bps \(\times\) SessionOptions::accumulationFactor \(\times\) SessionOptions::replicatedGraphCount, where bps is the number of batches per call to Session::run (this is a value captured by the DataFlow instance passed to the Session instance).

Note, however, that there may be additional ‘incomplete’ calls to IStepIO::in and IStepIO::out.

Furthermore, the number of input (or output) buffers that may be ‘incomplete’ at a given time for a given tensor should not normally be more than SessionOptions::bufferingDepth \(\times\) SessionOptions::replicatedGraphCount, but this bound is not guaranteed.

EXAMPLE: Suppose a session is configured such that the total expected number of input buffers is 6 and these are input buffers for a tensor with ID t with 100 elements. The associated input calls in IStepIO may look like this if SessionOptions::bufferingDepth is 3:

in("t", 100, false) -> Give buffer[0] to PopART.
in("t", 100, true) -> Give buffer[1] to PopART.
in("t", 100, true) -> Give buffer[2] to PopART.
inComplete("t", 100) -> buffer[0] is no longer required and can be reused.
in("t", 100, true) -> Give buffer[3] to PopART.
inComplete("t", 100) -> buffer[1] is no longer required and can be reused.
in("t", 100, true) -> Give buffer[4] to PopART.
inComplete("t", 100) -> buffer[2] is no longer required and can be reused.
in("t", 100, true) -> Give buffer[5] to PopART.
inComplete("t", 100) -> buffer[3] is no longer required and can be reused.
in("t", 100, true) -> No data available, return nullptr.
inComplete("t", 100) -> buffer[4] is no longer required and can be reused.
inComplete("t", 100) -> buffer[5] is no longer required and can be reused.

Subclassed by popart::StepIOCallback, popart::StepIOGeneric< ARRAY_TYPE, ACCESSOR_TYPE, ArrayInfoT >, popart::StepIOGeneric< IArray, StepIONS::IArrayAccessor, IArray & >

Public Functions

virtual ~IStepIO() = default

Destructor for IStepIO.

virtual ConstVoidData in(TensorId id, int64_t numElements, bool prefetch, const bool isBroadcast = false) = 0

Request a new input data buffer.

The memory in this buffer is available for use in PopART until the corresponding inComplete() call.

Note

: Failing to provide a valid data buffer will result in a runtime failure if prefetch is set to false.

Parameters
  • id – The ID of the tensor to return data for.

  • numElements – The number of elements in the tensor.

  • prefetch – If set to true the inability to provide data is not considered an error. If false, it is considered an error if no data can be provided.

Returns

The input buffer for this tensor (or nullptr on failure) returned as a ConstVoidData object.

virtual void inComplete(TensorId id, int64_t numElements, const bool isBroadcast = false) = 0

Notify the user (running a PopART program) that a previously retrieved input data buffer is no longer used by PopART.

Parameters
  • id – The ID of the tensor to return data for.

  • numElements – The number of elements in the tensor.

virtual MutableVoidData out(TensorId id, int64_t numElements) = 0

Request a new output data buffer.

The memory in this buffer is available for use in PopART until the corresponding inComplete() call and will be modified in-place.

Note

Failing to provide a valid data buffer will result in a runtime failure.

Parameters
  • id – The ID of the tensor to return data for.

  • numElements – The number of elements in the tensor.

Returns

The output buffer for this tensor returned as a MutableVoidData object.

inline virtual void outComplete(TensorId)

Notify the user (running a PopART program) that a previously retrieved input data buffer is no longer used by PopART.

Parameters
  • id – The ID of the tensor to return data for.

  • numElements – The number of elements in the tensor.

inline void enableRuntimeAsserts(bool b)

Enable or disable runtime asserts.

If runtime asserts are enabled, then a check that the input and output buffers have the correct number of elements is performed. As Session.run() is called multiple times during a user’s session, the check is only performed in the first call to Session.run(), under the assumption that the user is unlikely to change the size of buffers between runs.

Parameters

b – The setting to enable runtime asserts (true) or disable runtime asserts (false).

inline bool runtimeAssertsEnabled() const

Check if runtime asserts are enabled.

Returns

true if runtime asserts are enabled, otherwise false.

virtual void assertNumElements(const popx::Executablex&) const = 0

Check number of elements.

This check is performed when runtimeAssertsEnabled() is true.

Parameters

Executablex – The input executable to be checked that the input and output buffers have the correct number of elements.

#include <popart/stepio.hpp>
class StepIO : public popart::StepIOGeneric<IArray, StepIONS::IArrayAccessor, IArray&>

Class to provide a Session object with input and output data.

Public Functions

inline StepIO(std::map<TensorId, IArray&> inputs, std::map<TensorId, IArray&> outputs)

Constructor for StepIO.

Parameters
  • inputs – The input data.

  • outputs – The output data.

class StepIOCallback : public popart::IStepIO

Class that implements the IStepIO interface using user-provided callback functions.

The IStepIO interface contains a number of pure virtual member functions through which PopART receives buffers to read data from and buffers to write data to. StepIOCallback inherits from IStepIO and implements those member functions by delegating the logic to the callback functions passed in the constructor. This gives the user full control as to how data buffers are provisioned.

See IStepIO for more details on the expected behaviour of the callbacks.

Public Types

using InputCallback = std::function<ConstVoidData(TensorId, bool)>

Callable object that implements IStepIO::in().

using InputCompleteCallback = std::function<void(TensorId)>

Callable object that implements IStepIO::inComplete().

using OutputCallback = std::function<MutableVoidData(TensorId)>

Callable object that implements IStepIO::out().

using OutputCompleteCallback = std::function<void(TensorId)>

Callable object that implements IStepIO::outComplete().

Public Functions

inline StepIOCallback(InputCallback inputCallback, InputCompleteCallback inputCompleteCallback, OutputCallback outputCallback, OutputCompleteCallback outputCompleteCallback)

Construct a StepIOCallback object.

Parameters
inline virtual void assertNumElements(const popx::Executablex&) const

Check number of elements.

This check is performed when IStepIO::runtimeAssertsEnabled() is true.

Parameters

Executablex – The input executable to be checked that the input and output buffers have the correct number of elements.

virtual ConstVoidData in(TensorId id, int64_t numElements, bool prefetch, bool) final

This function is called by PopART when a StepIOCallback instance is passed to Session::run() and will internally call the inputCallback parameter passed to the constructor.

This function should not be called directly.

virtual void inComplete(TensorId id, int64_t numElements, bool) final

This function is called by PopART when a StepIOCallback instance is passed to Session::run() and will internally call the inputCompleteCallback parameter passed to the constructor.

This function should not be called directly.

virtual MutableVoidData out(TensorId id, int64_t numElements) final

This function is called by PopART when a StepIOCallback instance is passed to Session::run() and will internally call the outputCallback parameter passed to the constructor.

This function should not be called directly.

virtual void outComplete(TensorId id) final

This function is called by PopART when a StepIOCallback instance is passed to Session::run() and will internally call the outputCompleteCallback parameter passed to the constructor.

This function should not be called directly.

class IWeightsIO

A virtual class for accessing pointers to the data required to perform a training step.

Subclassed by popart::WeightsIO

Public Functions

virtual ~IWeightsIO() = default

Destructor for IWeightsIO.

virtual bool contains(TensorId) const = 0

Check if the WeightsIO instance contains the weights for a specific tensor.

Parameters

TensorId – The ID of the tensor to look for weights for.

Returns

true if the WeightsIO instance contains weights for the tensor, false otherwise.

virtual MutableVoidData weight(TensorId) const = 0

Retrieve weights for a specific tensor.

Parameters

TensorId – The ID of the tensor to retrieve weights for.

Returns

The weights.

class WeightsIO : public popart::IWeightsIO

Class representing weights.

Public Functions

~WeightsIO() override = default

Destructor for WeightsIO.

virtual bool contains(TensorId) const final

Check if the WeightsIO instance contains the weights for a specific tensor.

Parameters

TensorId – The ID of the tensor to look for weights for.

Returns

true if the WeightsIO instance contains weights for the tensor, false otherwise.

virtual MutableVoidData weight(TensorId) const final

Retrieve weights for a specific tensor from the WeightsIO object.

Parameters

TensorId – The ID of the tensor to retrieve weights for.

Returns

The weights.

void insert(TensorId, MutableVoidData)

Insert weights for a specific tensor into the WeightsIO object.

Parameters
  • TensorId – The ID of the tensor to insert weights for.

  • MutableVoidData – The weights to insert.

struct IArrayAccessor

Structure to help with accessing the data in IArray objects.

Public Static Functions

static inline void *getDataPointer(IArray &array)

Get pointer to the data.

Parameters

array – The IArray object.

Returns

A pointer to the data contained in the IArray object.

static inline size_t getArraySize(const IArray &array)

Get the number of data elements.

Parameters

array – The IArray object.

Returns

The number of data elements.

static inline DataType getArrayDataType(IArray &array)

Get the data type of the data.

Parameters

array – The IArray object.

Returns

The data type of the data.

static inline size_t getArrayRank(IArray &array)

Get the rank of the data array.

Parameters

array – The IArray object.

Returns

The rank of the data array.

static inline int64_t getArrayDim(IArray &array, size_t index)

Get the size of the data at a specific location.

Parameters
  • array – The IArray object.

  • index – The index of the data element in the IArray object.

Returns

The size of the data at the specific location.

#include <popart/stepio_generic.hpp>
template<typename ARRAY_TYPE, typename ACCESSOR_TYPE, typename ArrayInfoT>
class StepIOGeneric : public popart::IStepIO

Subclassed by popart::StepIO

Public Functions

inline void assertNumElements(const popx::Executablex &exe) const final
inline TensorInfo getTensorInfo(ARRAY_TYPE &array) const
template<typename T>
inline T get(TensorId id, std::map<TensorId, ArrayInfo> &M, int64_t numElements, bool advance_, std::string mapName)
template<typename T>
inline void advance(TensorId id, std::map<TensorId, ArrayInfo> &M, int64_t numElements, std::string mapName)
inline ConstVoidData in(TensorId id, int64_t numElements, bool, bool) final
inline void inComplete(TensorId id, int64_t numElements, bool) final
inline MutableVoidData out(TensorId id, int64_t numElements) final
struct ArrayInfo

Public Members

ArrayInfoT array
int64_t offset
#include <popart/iarray.hpp>
class IArray

Subclassed by popart::NDArrayWrapper< T >

Public Functions

inline virtual ~IArray()
virtual void *data() = 0
virtual DataType dataType() const = 0
virtual std::size_t rank() const = 0
virtual int64_t dim(size_t index) const = 0
virtual std::size_t nelms() const = 0
virtual const Shape shape() const = 0

14.3. Tensors

#include <popart/tensor.hpp>
class Tensor : public popart::Vertex

Public Functions

Tensor(TensorId, TensorType, Graph&, const DebugContext& = {})
Tensor(TensorId, VariableSettings, Graph&, const DebugContext& = {})
Tensor(TensorId, TensorType, VariableSettings, Graph&, const DebugContext& = {})
inline std::string str() const final
virtual std::unique_ptr<Tensor> clone(Graph &graph_) const
TensorType tensorType() const
std::string tensor_type() const
void setTensorType(TensorType)
inline ReplicatedStreamMode getReplicatedStreamMode() const
inline void setReplicatedStreamMode(const ReplicatedStreamMode &mode)
void setTensorLocationInfo(TensorLocation&, std::pair<RemoteBufferId, RemoteBufferIndex> &remoteBufferInfo)
std::set<PipelineStage> getPipelineStages() const
Op *getProducerUnsafe() const
Op *getProducer() const
void setProducer(Op*)
void resetProducer(Op*)
bool hasProducer() const
bool isGraphInput() const
InIndex getGraphInputIndex() const
bool isGraphOutput() const
OutIndex getGraphOutputIndex() const
bool isLoopInput() const
bool isImplicitLoopInput() const
bool isExplicitLoopInput() const
bool isLoopTripCounter() const
bool isUnmodifiable() const
bool isCheckpointTensor() const
bool isImplicitRecomputeTensor() const
bool isRestoreInplaceTensor() const
bool idIncludesPrefix(const std::vector<std::string>&) const
bool isOptimizerTensor() const
bool isRemoteArgTensor() const
bool isRandomSeedTensor() const
bool isOptimizerStateTensor() const
bool isAccumulatorTensor() const
bool isHostLoadTensor() const

Is this tensor produced by a HostLoad Op or MultiExchangeOp with HostLoad descriptor?

Returns

true if producer is a HostLoad Op or MultiExchangeOp with HostLoad descriptor false otherwise.

bool isWeightTensor() const
bool isAnchored() const
bool isRootAnchor() const
bool hasTensorData() const
TensorData *tensorData()
const TensorData *tensorData() const
bool anyAlias(std::function<bool(Tensor*)> predicate) const
bool anyAliasFor(std::function<bool(Tensor*)> predicate, const AliasModel &popMem) const
void setTensorDataFromCopyOf(const void *src, std::size_t size)
void setTensorDataFromViewOf(void *src, std::size_t size)
void setTensorDataByEmplaceOf(std::vector<char> &&data)
void setTensorData(const TensorData &td)
void setTensorData(TensorData &&td)
std::vector<Op*> associatedOps() const
inline Graph &getGraph()
inline const Graph &getGraph() const
Ir &getIr()
const Ir &getIr() const
bool hasVirtualGraphId() const
VGraphId getVirtualGraphId() const
VGraphId getVirtualGraphIdUnsafe() const
VGraphIdAndTileSet getVirtualGraphIdAndTileSet(std::set<OpId> &visited) const
VGraphIdAndTileSet getVirtualGraphIdAndTileSetUnsafe() const
VGraphIdAndTileSet getVirtualGraphIdAndTileSetUnsafe(std::set<OpId> &visited) const
int getBatchAxis() const
bool consumersAllPreLoss() const
bool isModified(bool considerLoopInput = true) const

Check if any of the consumers modify this tensor.

Parameters

considerLoopInput – If explicit loop inputs should be considered as being modified. If false, only operations modifying the tensor inplace will be considered.

Returns

True if the tensor is modified, otherwise false.

bool isAliased() const

Check if any of the consumers alias this tensor.

Returns

True if the tensor is aliased to any output, otherwise false.

view::Regions modifiedRegionsByOps(std::vector<Op*> ops, Aliases &aliases) const
view::Regions modifiedRegionsByOps(std::vector<OpId> opIds, Aliases &aliases) const
std::set<Op*, POpCmp> getInplaceModifiers() const

Find operations that modify a tensor.

Returns

All operations that (direct and indirectly) modify this tensor

std::set<Op*, POpCmp> getInplaceModifiersFor(const AliasModel *popMem) const

Find operations that modify a tensor with the given poprithm graph.

Returns

All operations that (direct and indirectly) modify this tensor

std::vector<char> getDataViaGraphTraversal() const
inline const popart::DebugInfo &getDebugInfo() const
inline void setVariableUpdateType(VariableUpdateType type)

Members of old subclass VariableTensor class VariableTensor : public Tensor {.

inline VariableUpdateType getVariableUpdateType() const
inline void setCopyFromTensor(TensorId value)
inline TensorId getCopyFromTensor()
inline VariableSettings getVariableSettings() const
Returns

The VariableSettings of this Variable

std::vector<int64_t> returnedShape(unsigned replicationFactor)

Returns the shape necessitated by IO.

Parameters

replicationFactor – The replication factor

Returns

the shape of the tensor, considering replica groups

void verifyMutableVoidInfo(const TensorInfo mutableVoidInfo, unsigned replicationFactor)

Check that the info of a mutableVoidData object matches the expectations set by the TensorInfo and VariableSettings.

Throws an error if there is a mismatch.

Parameters
  • mutableVoidInfo – The data of the MutableVoidInfo with the same id as this tensor

  • replicationFactor – The replicationFactor of this instance

void setPreparedVGraphIdAndTileSet()

Set the preparedVGraphIdAndTileSet.

Public Members

TensorId id
Consumers consumers
TensorInfo info
TensorLocationInfo tensorLocationInfo
InputSettings inputSettings
enum class popart::TensorType

Values:

enumerator ActGrad = 0
enumerator Const
enumerator Stream
enumerator Unknown
enumerator Variable
enumerator N
enum class popart::VariableUpdateType

Values:

enumerator None = 0
enumerator Gradient
enumerator Copy
#include <popart/tensorinfo.hpp>
enum class popart::DataType

There is a one-to-one correspondence between popart::DataTypes and ONNX_NAMESPACE::TensorProto_DataTypes, which is equivalent to decltype(ONNX_NAMESPACE::TensorProto().data_type()).

Values:

enumerator UINT8 = 0
enumerator INT8
enumerator FLOAT8_143
enumerator FLOAT8_152
enumerator UINT16
enumerator INT16
enumerator INT32
enumerator INT64
enumerator UINT32
enumerator UINT64
enumerator BOOL
enumerator FLOAT
enumerator FLOAT16
enumerator BFLOAT16
enumerator DOUBLE
enumerator COMPLEX64
enumerator COMPLEX128
enumerator STRING
enumerator UNDEFINED
class DataTypeInfo

Public Functions

DataTypeInfo(DataType type__, int nbytes__, bool isFixedPoint__, std::string name__, std::string lcasename__)
DataType type() const
const int &nbytes() const
const std::string &name() const
const std::string &lcasename() const
bool isFixedPoint() const
class TensorInfo

Public Functions

TensorInfo(DataType, const Shape&)

Create TensorInformation based on data type and shape.

Parameters
  • data_type – - The data type.

  • shape – - The actual shape of the tensor.

TensorInfo(DataType data_type, const Shape &shape, const Shape &meta_shape)

Create TensorInformation based on data type, shape and meta shape.

Parameters
  • data_type – - The data type.

  • shape – - The actual shape of the tensor.

  • meta_shape – - The meta shape of the tensor, which can for example be used to store the original tensor shape before replicated tensor sharding was applied.

TensorInfo(std::string data_type, std::string shape)
TensorInfo(std::string data_type, const Shape&)
explicit TensorInfo(const ONNX_NAMESPACE::TensorProto&)
explicit TensorInfo(const ONNX_NAMESPACE::TypeProto&)
void set(const ONNX_NAMESPACE::TensorProto&)
void set(const ONNX_NAMESPACE::TypeProto&)
TensorInfo() = default
void set(DataType)
void set(DataType, const Shape&)
void set(DataType, const Shape&, const Shape&)
const Shape &shape() const
const Shape &metaShape() const
std::vector<size_t> shape_szt() const
inline Rank rank() const
inline int64_t nelms() const
int64_t nbytes() const
inline int64_t dim(int i) const
inline std::vector<int> strides(const std::vector<long> &shape)

Get the strides of the tensor, that is the number of bytes to step in each dimension when traversing an array in memory.

See https://numpy.org/doc/stable/reference/generated/numpy.ndarray.strides.html

Parameters

shape – The on-host ONNX shape of a tensor. This is different from this->shape(), which gives the on-replica shape of a tensor

Returns

std::vector<int> The strides vector.

DataType dataType() const
const std::string &data_type() const
const std::string &data_type_lcase() const
void append(std::ostream&) const
bool isSet() const
bool operator==(const TensorInfo&) const
bool operator!=(const TensorInfo&) const
Shape shapeFromString(const std::string &s) const
ONNX_NAMESPACE::TypeProto getOnnxTypeProto() const
const DataTypeInfo *getDataTypeInfo() const

Public Static Functions

static std::string npOutDataTypeExceptionMessage(const TensorInfo &i0, const TensorInfo &i1, const std::string &debugName)
#include <popart/tensorindex.hpp>
class TensorIndexMap

Public Functions

TensorIndexMap() = default
~TensorIndexMap()
void insert(int, Tensor*)
void reset(int, Tensor*)
void erase(int)
void clear()
bool contains(Tensor*) const
Tensor *tensor(int)
const Tensor *tensor(int) const
TensorId id(int) const
bool hasIndex(int) const
const std::vector<int> &indices(Tensor*) const
const std::map<Tensor*, std::vector<int>, PTensorCmp> &indicesMap() const
const std::map<int, Tensor*> &tensorMap() const
const std::vector<Tensor*> tensors() const
std::map<int, TensorId> tensorIdMap() const
std::map<TensorId, int> idMap() const
int n() const
void append(std::stringstream&, std::string prefix, int max_id_length) const
void setInfoIfIndex(const TensorInfo&, int index)
std::vector<TensorId> getSerialised() const
int maxIdLength() const
std::map<int, Shape> getIndexShapeMap()
int minIndex() const
int maxIndex() const
#include <popart/tensorlocation.hpp>
enum class popart::ReplicatedTensorSharding

Enum type to specify whether to shard tensors over replicas.

Values:

enumerator Off = 0

Don’t shard tensors over replicas.

enumerator On = 1

Do shard tensors over replicas.

enumerator N = 2

Number of values.

class TensorLocation

Class that describes the memory characteristics of one or multiple tensors.

See also: SessionOptions.

Public Functions

TensorLocation()

Equivalent to calling TensorLocation(TensorStorage::Undefined, TileSet::Compute, TileSet::Compute, ReplicatedTensorSharding::Off)

TensorLocation(TensorStorage storage)

Equivalent to calling TensorLocation(storage, TileSet::Compute, TileSet::Compute, ReplicatedTensorSharding::Off)

TensorLocation(TensorStorage storage, ReplicatedTensorSharding replicatedTensorSharding)

Equivalent to calling TensorLocation(storage, TileSet::Compute, TileSet::Compute, replicatedTensorSharding)

TensorLocation(TensorStorage storage, ReplicatedTensorSharding replicatedTensorSharding, CommGroup shardingDomain)

Equivalent to calling TensorLocation(storage, TileSet::Compute, TileSet::Compute, replicatedTensorSharding, shardingDomain)

TensorLocation(TensorStorage storage, TileSet loadTileSet, TileSet storageTileSet, ReplicatedTensorSharding replicatedTensorSharding)

Construct a TensorLocation from parameters.

Parameters
  • storage – The memory location of the tensor(s).

  • loadTileSet – The tiles through which the tensor(s) are loaded onto the chip.

  • storageTileSet – The tiles on which the tensor(s) are stored.

  • replicatedTensorSharding – Whether to apply replicated tensor. sharding.

TensorLocation(TensorStorage storage, TileSet loadTileSet, TileSet storageTileSet, ReplicatedTensorSharding replicatedTensorSharding, CommGroup shardingDomain)

Construct a TensorLocation from parameters.

Parameters
  • storage – The memory location of the tensor(s).

  • loadTileSet – The tiles through which the tensor(s) are loaded onto the chip.

  • storageTileSet – The tiles on which the tensor(s) are stored.

  • replicatedTensorSharding – Whether to apply replicated tensor. sharding.

  • shardingDomain – GCL communication group across which to shard the tensor. Perpendicular replicas will not shard, and reduce gradients normally (via AllReduce). Defaults to sharding across all replicas.

TensorLocation(std::vector<int64_t> serialized)
bool operator==(const TensorLocation &rhs) const
bool operator!=(const TensorLocation &rhs) const
std::vector<int64_t> serialize() const
bool isRemote() const

Public Members

TensorStorage storage

The memory location of the tensor(s).

TileSet loadTileSet

The tiles through which the tensor(s) are loaded onto the chip.

TileSet storageTileSet

The tiles on which the tensor(s) are stored.

ReplicatedTensorSharding replicatedTensorSharding

Whether to apply replicated tensor sharding (RTS) or not.

CommGroup shardingDomain

The GCL comm groups across which to shard the tensor.

enum class popart::TensorStorage

Enum type that determines where a tensor is stored.

Values:

enumerator OnChip = 0

Store the tensor in on-chip memory.

enumerator OffChip = 1

Store the tensor in streaming memory.

enumerator N = 2

Number of values.

enum class popart::TileSet

Enum type to specify a set of tiles.

Values:

enumerator Compute = 0

The set of tiles designated for compute operations.

enumerator IO = 1

The set of tiles designated for IO operations.

enumerator Undefined = 2

Undefined (no) tile set.

enumerator N = 3

Number of values.

14.4. Optimizers

#include <popart/optimizer.hpp>
class Optimizer

Interface for describing an Optimizer and, internally, how to grow the optimiser step for each weight.

  • The end-user facing interface constructed by the user to describe what kind of optimiser to use.

  • Then also used internally by the Ir to grow the optimiser step for each weight.

  • Stores OptimizerValues for optimizer parameters like learning rate, loss scaling, etc.

    See also

    OptimiserValue.

  • Optimizer stores the values for each weight - they can have different values. There is a “default” for all weights, then you can specify specific values for specific weights. This is encapsulated by an OptimizerValueMap, which is a sparse map from weight to value, with unspecified values implying the default.

    See also

    OptimizerValueMap.

  • At runtime, the user can dynamically update the Optimizer, e.g. by setting new OptimizerValues. validReplacement determines whether the new Optimizer is interchangable with the one the Ir was built for. For example, trying to replace an SGD Optimizer with an Adam Optimizer would throw.

Subclassed by popart::Adam, popart::Adaptive, popart::SGD

Public Functions

virtual ~Optimizer() = default

  • Optimizer class has a two-part initialisation. The ctor, used by the end-user, and setFactorsFromOptions called by the Ir to finish initialisation once we have all the relevant information during Ir preparation.

  • Some key methods used by the Ir to grow optimiser step for each weight are createOp, getInputIds, optimizerInputs.

  • If the OptimizerValue is const, no Ir tensor for that value is created and the VarUpdateOp created for that weight will not have the optional input for that tensor. The Opx of the VarUpdateOp will emit poplar code that uses the provided value directly.

    If the OptimizerValue is not const, an Ir tensor for that value is created and the VarUpdateOp created for that weight will have the optional input for that tensor. The tensor will be a stream tensor, so that it can be updated later from host. The tensor will be streamed an initial value of the OptimizerValue’s value.

  • It is common for Optimizer

    implementations to make use of “compound

    scalars”. Take for example the SGD0 weight update equation: w <- w * (1 - lr * (1 - dm) * wd) - g * (lr * (1 - dm) / ls) w is the weights and g is the grads. lr, dm, wd, ls are all the “atomic scalars”. These are the scalars/hyperparameters of the

    Optimizer that the user can set using OptimizerValues, as described above.

    Multiple atomic scalars appear in expressions together, and will be operated on together before being used by an Op that also consumes a tensor (in this case the weights or grads). For SGD0, they can be grouped as follows:

    w <- w * {1 -  lr * (1 - dm) * wd} -  g * { lr * (1 - dm) / ls }
             ^^^^^^^^^^^^^^^^^^^^^^^^^        ~~~~~~~~~~~~~~~~~~~~~~
                        |                               |
       weight decay scale factor 0                      |
                                               scaled learning rate 0
    

    We call wdsf0 and slr0 the “compound scalars”.

    We can statically precompute the OptimizerValues for these compound scalars using the OptimizerValues of the atomic scalars. This makes the Ir simpler, as we now have only:

    w <- w * wdsf0 - g * slr0
    

    The CompoundScalarHelpers are used to precompute the compound scalar values.

    If any of the composite atomic scalars are non-const, the compound scalar is non-const.

    See also

    compoundscalarhelper.hpp

Optimizer(OptimizerValue lossScaling, const std::vector<ClipNormSettings> &clipNormSettings, const DebugContext &debugContext)
Optimizer(const Optimizer&) = default
virtual void validReplacement(const Optimizer &other) const
virtual OptimizerType type() const = 0
virtual std::string type_s() const = 0
virtual std::unique_ptr<Optimizer> clone() const = 0
virtual void resetTensorData(Tensor&) const = 0
virtual void setTensorData(Tensor&) const = 0
virtual std::unique_ptr<Op> createOp(const Tensor &weight, Graph&) const = 0
virtual std::vector<TensorId> getInputIds(const Tensor &weight) const = 0

Returns the TensorIds of the input tensors to the VarUpdateOp this optimiser will create for the given weight .

Specifically, The TensorId at index i will be the id of the input tensor at InIndex i of the VarUpdateOp. If the input is an OptimizerValue, if it is const, then “” will be returned, else the relevant reservered prefix for that OptimizerValue will be used, followed by the weight id. The prefixes are defined in tensornames.hpp, for example reservedDefaultWeightDecayScaleFactor0Prefix or reservedSpecificScaledLearningRate1Prefix (note there are different prefixes depending on if the weight has a specific or default value for that OptimizerValue).

virtual std::vector<std::tuple<TensorId, TensorInfo>> getOptimizerInputs(const Tensor &weight) const = 0
inline const OptimizerValue &lossScaling() const
inline float getLossScalingVal() const
float getFinalLossScalingVal() const
virtual TensorId getInverseLossScalingTensorId(const Tensor &weight) const = 0
virtual void setFactorsFromOptions(const SessionOptions&)
bool gradientAccumulationEnabled() const
bool meanReductionEnabled() const
bool postMeanAccumulationEnabled() const
bool postMeanReplicationEnabled() const
int64_t getReplicatedGraphCount() const
int64_t getAccumulationFactor() const
bool meanGradientAccumulationEnabled() const
inline const std::vector<ClipNormSettings> &getClipNormSettings() const
virtual bool hasSpecific(const Tensor &w) const = 0
virtual bool hasSpecific() const = 0
virtual size_t hash() const
inline DebugContext getDebugContext() const

Public Static Functions

static TensorId getLossScalingTensorId(DataType)
enum class popart::OptimizerType

Types of optimizers.

Values:

enumerator SGD = 0
enumerator Adam
enumerator Adaptive
enumerator NTYPES
enum class popart::OptimizerReductionType

Reduction mode when doing data-parallel training over replicated graphs.

Depending on the optimizer used and its configuration, this option describes how the reduction of gradients over replicas will occur. For example, directly on the gradient, on the gradient accumulator, or on the momentum. See the documentation of individual optimizers for more information.

Values:

enumerator None = 0

No replicated graph reduction.

enumerator GradReduce

Gradient reduction (every iteration, after a weight’s gradient is produced)

enumerator AcclReduce

Momentum reduction (SGD1, after the gradient accumulation loop, if applicable)

enumerator AccumReduce

Accumulator reduction (Adam/SGD2 + gradient accumulation, after the gradient accumulation loop)

enum class popart::WeightDecayMode

Values:

enumerator Decay

Weight decay (e.g. AdamW)

enumerator L2Regularization

L2 regularization (e.g. PyTorch-like Adam)

#include <popart/optimizervalue.hpp>
class OptimizerValue

A class used to represent values of hyper parameters.

Public Functions

OptimizerValue() = default

Equivalent to OptimizerValue(0, false).

inline OptimizerValue(float v)

Equivalent to OptimizerValue(v, true).

inline OptimizerValue(float v, bool c)

Constructor.

Parameters
  • v – The current value of the hyper parameter.

  • c – A boolean flag to indicate whether the parameter will remain at this value forever (true) or may change over time (false).

inline OptimizerValue(std::pair<float, bool> x)
inline float val() const
inline bool isConst() const
void validReplacement(const OptimizerValue &rhs) const
bool operator==(const OptimizerValue &rhs) const
#include <popart/optimizervaluemap.hpp>
class OptimizerValueMap

Public Functions

inline OptimizerValueMap(OptimizerValue g)
OptimizerValue get(const TensorId &id) const
void insertSpecific(const TensorId&, OptimizerValue)
inline bool hasSpecific(const TensorId &id) const
inline bool hasSpecific() const
inline OptimizerValue getDefault() const
void validReplacement(const OptimizerValueMap &rhs) const
inline const std::map<TensorId, OptimizerValue> &getSpecifics() const

14.4.1. Stochastic Gradient Descent (SGD)

#include <popart/clipnormsettings.hpp>
class ClipNormSettings

A data structure used to represent a maximum value constraint on one or more weights.

This is passed to the optimizer on construction.

Public Types

enum class Mode

Values:

enumerator ClipSpecifiedWeights
enumerator ClipAllWeights

Public Functions

ClipNormSettings(const std::vector<TensorId> &weightIds_, float maxNorm_)

DEPRECATED This will be removed from a future release.

Constructor.

Parameters
  • weightIds_ – The weight tensor IDs that this constraint applies to.

  • maxNorm_ – The maximum permissible value.

const std::vector<TensorId> &getWeightIds() const
float getMaxNorm() const
Mode getMode() const
bool operator==(const ClipNormSettings&) const
bool operator!=(const ClipNormSettings &other) const

Public Members

std::vector<TensorId> weightIds
float maxNorm

Public Static Functions

static ClipNormSettings clipWeights(const std::vector<TensorId> &weightIds_, float maxNorm_)
static ClipNormSettings clipAllWeights(float maxNorm_)
#include <popart/sgd.hpp>
class SGD : public popart::Optimizer

Stochastic Gradient Descent (SGD) optimizer.

Like any to any optimizer implementation, this class is responsible for updating each weight tensor ( \(w\)) in the model using the gradient ( \(g\)) of the loss function with respect to the weight as calculated during the backwards pass.

The SGD optimizer has the following state for each weight:

  • velocity ( \(v\))

The SGD optimizer has the following hyper parameters:

  • learning rate ( \(\text{lr}\))

  • momentum ( \(\text{mm}\))

  • weight decay ( \(\text{wd}\))

  • dampening ( \(\text{dm}\))

  • velocity scaling ( \(\text{vs}\))

  • loss scaling ( \(\text{ls}\))

  • nesterov

  • clip norm settings

The values of these parameters can be shared between all weights but some can be overridden with weight-specific values (see SGD::insertSpecific). Hyper parameters are captured using OptimizerValue objects and therefore can be either a constant value or a non-constant value that can be adjusted by the user.

In the following we will describe how this optimizer updates a weight using a gradient. In the context of this description the gradient is is the value of the gradient after any gradient accumulation has been performed and after the application of a loss scaling factor to the gradient has been corrected for.

When the optimizer needs to update a weight, \(w\), using a gradient, \(g\), it first updates the optimizer state as follows:

\[ v' := v * \text{mm} + (1 - \text{dm}) * (g + \text{wd} * w) \text{ \ . } \]

Following the update of the optimizer state the optimizer uses said state to update the weight:

if nesterov is True:

\[ g' := g + \text{wd} * w + \text{mm} * v' \text{ \ . } \]
\[ w' := w - \text{lr} * g' \text{ \ . } \]
else:
\[ w' := w - \text{lr} * v' \text{ \ . } \]

In addition to the above, the velocity scaling hyper parameter is a scaling factor that can provide improved numerical stability by ensuring the values stored in the optimizer state, \(v\), are scaled by this value. When using this parameter PopART will automatically deal with the artificially scaled velocity value during the weight update and other hyper parameters do not need to be adjusted).

In addition, the loss scaling hyper parameter is similar in nature to the velocity scaling parameter. It is a scaling value that is applied to the loss gradient at the start of the the backwards pass and, at the end of the backwards pass, this scaling is reversed by multiplying the gradients for each weight with the inverse of the loss scaling value prior to updating the optimizer state. Using loss scaling can also improve numerical stability in some cases.

Finally, it is possible to add clip norm settings for this optimizer. These clip norms compute the L2 norm for a group of weights and adds a scalar term to the weight update that effectively divides it by the norm (or a constant value that is provided as part of the clip norm, which ever is greater).

See the SGD notes in optimizer.hpp for a more detailed and comprehensive derivation of the SGD optimizer step in PopART.

Subclassed by popart::ConstSGD

Public Functions

SGD(OptimizerValue defaultLearningRate, OptimizerValue defaultWeightDecay, OptimizerValue defaultMomentum, OptimizerValue defaultDampening, OptimizerValue defaultVelocityScaling, OptimizerValue lossScaling, OptimizerValue nesterov, const std::vector<ClipNormSettings> &clipNormSettings = {}, SGDAccumulatorAndMomentum sgdAccMm = SGDAccumulatorAndMomentum::Combined, DataType accumType = DataType::UNDEFINED, DataType accl1Type = DataType::UNDEFINED, const DebugContext &debugContext = {})

Constructor.

See also

SGDAccumulatorAndMomentum. Defaults to SGDAccumulatorAndMomentum::Combined.

Parameters
  • defaultLearningRate – The learning rate value to use for weights for which no weight-specific hyper parameter have been inserted.

  • defaultWeightDecay – The weight decay value to use for weights for which no weight-specific hyper parameter have been inserted.

  • defaultMomentum – The momentum value to use for weights for which no weight-specific hyper parameter have been inserted.

  • defaultDampening – The dampening value to use for weights for which no weight-specific hyper parameter have been inserted.

  • defaultVelocityScaling – The velocity scaling value to use for weights for which no weight-specific hyper parameter have been inserted.

  • lossScaling – The loss scaling value to use.

  • nesterov – Option to enable Nesterov momentum. Defaults to false.

  • clipNormSettings – A vector of ClipNormSettings (this can be used to set maximum values for weights).

  • sgdAccMm – The implementation strategy to use when gradient accumulation and/or momentum are used, otherwise ignored.

  • accumType – The DataType of the accum tensor, when gradient accumulation is used and sgdAccMm = SGDAccumulatorAndMomentum::Separate, otherwise ignored. Only FLOAT, FLOAT16 and UNDEFINED are supported. Defaults to UNDEFINED. If UNDEFINED, the same type as the weights will be used. If accumType is FLOAT16 and accl1Type is FLOAT, this parameter causes accum to be upcasted before being passed to the op that updates accl1.

  • accl1Type – The DataType of the accl1 tensor, when gradient accumulation is used and sgdAccMm = SGDAccumulatorAndMomentum::Separate, otherwise ignored. Only FLOAT, FLOAT16 and UNDEFINED are supported. Defaults to UNDEFINED. If UNDEFINED, the same type as the weights will be used. If accumType is FLOAT16 and accl1Type is FLOAT, this parameter causes accum to be upcasted before being passed to the op that updates accl1.

  • debugContext – Optional debug context.

SGD(OptimizerValue defaultLearningRate, OptimizerValue defaultWeightDecay, OptimizerValue defaultMomentum, OptimizerValue defaultDampening, OptimizerValue defaultVelocityScaling, OptimizerValue lossScaling, const std::vector<ClipNormSettings> &clipNormSettings = {}, SGDAccumulatorAndMomentum sgdAccMm = SGDAccumulatorAndMomentum::Combined, DataType accumType = DataType::UNDEFINED, DataType accl1Type = DataType::UNDEFINED, const DebugContext &debugContext = {})

Constructor.

See also

SGDAccumulatorAndMomentum. Defaults to SGDAccumulatorAndMomentum::Combined.

Parameters
  • defaultLearningRate – The learning rate value to use for weights for which no weight-specific hyper parameter have been inserted.

  • defaultWeightDecay – The weight decay value to use for weights for which no weight-specific hyper parameter have been inserted.

  • defaultMomentum – The momentum value to use for weights for which no weight-specific hyper parameter have been inserted.

  • defaultDampening – The dampening value to use for weights for which no weight-specific hyper parameter have been inserted.

  • defaultVelocityScaling – The velocity scaling value to use for weights for which no weight-specific hyper parameter have been inserted.

  • lossScaling – The loss scaling value to use.

  • clipNormSettings – A vector of ClipNormSettings (this can be used to set maximum values for weights).

  • sgdAccMm – The implementation strategy to use when gradient accumulation and/or momentum are used, otherwise ignored.

  • accumType – The DataType of the accum tensor, when gradient accumulation is used and sgdAccMm = SGDAccumulatorAndMomentum::Separate, otherwise ignored. Only FLOAT, FLOAT16 and UNDEFINED are supported. Defaults to UNDEFINED. If UNDEFINED, the same type as the weights will be used. If accumType is FLOAT16 and accl1Type is FLOAT, this parameter causes accum to be upcasted before being passed to the op that updates accl1.

  • accl1Type – The DataType of the accl1 tensor, when gradient accumulation is used and sgdAccMm = SGDAccumulatorAndMomentum::Separate, otherwise ignored. Only FLOAT, FLOAT16 and UNDEFINED are supported. Defaults to UNDEFINED. If UNDEFINED, the same type as the weights will be used. If accumType is FLOAT16 and accl1Type is FLOAT, this parameter causes accum to be upcasted before being passed to the op that updates accl1.

  • debugContext – Optional debug context.

SGD(const std::map<std::string, std::pair<float, bool>> &params, const std::vector<ClipNormSettings> &clipNormSettings = {}, SGDAccumulatorAndMomentum sgdAccMm = SGDAccumulatorAndMomentum::Combined, DataType accumType = DataType::UNDEFINED, DataType accl1Type = DataType::UNDEFINED, const DebugContext &debugContext = {})

Constructor.

EXAMPLE:

SGD({{"defaultLearningRate", {0.02, false}},
    {"defaultMomentum", {0.6, true}}});

See also

SGDAccumulatorAndMomentum. Defaults to SGDAccumulatorAndMomentum::Combined.

This will create an SGD Optimizer which has a constant momentum of 0.6 and a changeable learning rate initially of 0.02. All OptimizerValues not present in the map will take values from the getUnset* functions.

Parameters
  • params – A parameter map where the keys are one or more of "defaultLearningRate", "defaultWeightDecay", "defaultMomentum", "defaultDampening", "defaultVelocityScaling", "lossScaling" or `”nesterov”. The map’s values are pairs of floats and booleans representing OptimizerValue constructor arguments. The map does not have to specify each hyper parameter because default values will be used where parameters are missing.

  • clipNormSettings – A vector of ClipNormSettings (this can be used to set maximum values for weights).

  • sgdAccMm – The implementation strategy to use when gradient accumulation and/or momentum are used, otherwise ignored.

  • accumType – The DataType of the accum tensor, when gradient accumulation is used and sgdAccMm = SGDAccumulatorAndMomentum::Separate, otherwise ignored. Only FLOAT, FLOAT16 and UNDEFINED are supported. Defaults to UNDEFINED. If UNDEFINED, the same type as the weights will be used. If accumType is FLOAT16 and accl1Type is FLOAT, this parameter causes accum to be upcasted before being passed to the op that updates accl1.

  • accl1Type – The DataType of the accl1 tensor, when gradient accumulation is used and sgdAccMm = SGDAccumulatorAndMomentum::Separate, otherwise ignored. Only FLOAT, FLOAT16 and UNDEFINED are supported. Defaults to UNDEFINED. If UNDEFINED, the same type as the weights will be used. If accumType is FLOAT16 and accl1Type is FLOAT, this parameter causes accum to be upcasted before being passed to the op that updates accl1.

  • debugContext – Optional debug context.

inline SGD()

Default constructor Creates SGD with default scalars (equivalent to getUnset<scalar>() methods), and other default parameters of main constructor.

SGD(const SGD&) = default

Copy constructor.

~SGD() = default
inline virtual OptimizerType type() const final
inline virtual std::string type_s() const final
inline SGDAccumulatorAndMomentum getSGDAccumulatorAndMomentum() const
virtual std::unique_ptr<Optimizer> clone() const final
virtual std::unique_ptr<Op> createOp(const Tensor &weight, Graph&) const final

Returns the VarUpdateOp for the given weight .

If no gradient accumulation of momentum, this will be a SGD0VarUpdateOp. Else, if getSGDAccumulatorAndMomentum() == ::Combined, this will be an SGD1ComboOp, else if getSGDAccumulatorAndMomentum() == ::CombinedSGD2ComboOp, an SGD2ComboOp

.

The required compound scalar OptimizerValues for the

VarUpdateOp wil be computed and passed to the Op. See the SGD notes above this class for how they are derived. Recall that if non-const, the VarUpdateOp will take an input Tensor for the compound scalar.

See also

Optimizer::createOp

The OptimizerReductionType of the Op is derived as follows: No replication => None Replication, no grad acc => GradReduce Replication, grad acc, SGD1 => AcclReduce Replication, grad acc, SGD2 => AccumReduce See the SGD notes above this class for why this is.

If SGD2, the DataType of the accum and accl1 tensors passed to the SGD2ComboOp will be as set in the SGD constructor. Recall DataType::UNDEFINED means use the same as the weight.

An SGD1ComboOp will later be decomposed by SGD1Decompose

pattern into a series of Ops and Tensors that implement the SGD1 optimiser step.

An SGD12ComboOp will later be decomposed by

SGD2Decompose pattern into a series of Ops and Tensors that implement the SGD2 optimiser step.

See also

SGD1Decompose

See also

SGD2Decompose

virtual std::vector<TensorId> getInputIds(const Tensor &weight) const final

virtual std::vector<std::tuple<TensorId, TensorInfo>> getOptimizerInputs(const Tensor &weight) const final

smm1 and wdsf0 have the same data type as the weight . Everything else

virtual void validReplacement(const Optimizer &other) const final
virtual void resetTensorData(Tensor&) const final
virtual void setTensorData(Tensor&) const final
float getStoredValue(const TensorId &optId) const

Tensor “opt” has an id, which it uses to match a compound scalar which this object can compute from the atomic scalars.

void insertSpecific(const TensorId &weight, OptimizerValue learningRate, OptimizerValue weightDecay, OptimizerValue momentum, OptimizerValue dampening, OptimizerValue velocityScaling, OptimizerValue nesterov)

Insert a weight-specific set of hyper parameters.

Parameters
  • weight – The TensorId of the weight.

  • learningRate – The learning rate value to use for this specific weight.

  • weightDecay – The weight decay value to use for this specific weight.

  • momentum – The momentum value to use for this specific weight.

  • dampening – The dampening value to use for this specific weight.

  • velocityScaling – The velocity scaling value to use for this specific weight.

  • nesterov – Option to enable Nesterov momentum. Defaults to false.

void insertSpecific(const TensorId &weight, const std::map<std::string, std::pair<float, bool>> &params)

Insert a weight-specific set of hyper parameters.

Parameters
  • weight – The TensorId of the weight.

  • params – A parameter map where keys are one of "learningRate", "weightDecay", "momentum", "dampening", or "velocityScaling" and the map’s values pairs of floats and booleans representing OptimizerValue constructor arguments. The map does not have to specify each hyper parameter as default values will be used where parameters are missing.

virtual bool hasSpecific(const Tensor &w) const final
virtual bool hasSpecific() const final
virtual TensorId getInverseLossScalingTensorId(const Tensor &weight) const
inline const OptimizerValueMap &learningRates() const
inline const OptimizerValueMap &weightDecays() const
inline const OptimizerValueMap &momentums() const
inline const OptimizerValueMap &dampenings() const
inline const OptimizerValueMap &velocityScalings() const
inline const OptimizerValueMap &nesterov() const
virtual size_t hash() const

Public Static Functions

static inline OptimizerValue getUnsetLearningRate()

Default learning rate value.

static inline OptimizerValue getUnsetWeightDecay()

Default weight decay value.

static inline OptimizerValue getUnsetMomentum()

Default momentum value.

static inline OptimizerValue getUnsetDampening()

Default dampening value.

static inline OptimizerValue getUnsetVelocityScaling()

Default velocity scaling value.

static inline OptimizerValue getUnsetLossScaling()

Default loss scaling value.

static inline OptimizerValue getUnsetNesterov()

Default nesterov.

static SGD fromDefaultMap(const std::map<std::string, OptimizerValue>&, const DebugContext &debugContext = {})
class ConstSGD : public popart::SGD

Stochastic Gradient Descent (SGD) optimizer with constant learning rate, weight decay, loss scaling and clip norm settings (and default values for momentum, dampening or velocity scaling).

NOTE: See SGD for detailed meaning for these parameters.

NOTE: This class exists for backwards compatibility with the Python API and may be removed at some point in the future.

Public Functions

inline ConstSGD(float learningRate, float weightDecay = 0, float lossScaling = 1, const std::vector<ClipNormSettings> &clipNormSettings = {})

Constructor.

Parameters
  • learningRate – A constant learning rate.

  • weightDecay – A constant weight decay value.

  • lossScaling – A constant loss scaling value.

  • clipNormSettings – A vector of ClipNormSettings (this can be used to set maximum values for weights).

enum class popart::SGDAccumulatorAndMomentum

Strategy for implementing SGD with momentum and/or gradient accumulation.

Values:

enumerator Combined = 0

Implement SGD using a single tensor for the gradient accumulator (accum) and momentum (accl) tensors.

enumerator Separate

Implement SGD using separate tensors for the gradient accumulator (accum) and momentum (accl) tensors.

14.4.2. Adam, AdaMax & Lamb

#include <popart/adam.hpp>
enum class popart::AdamMode

Enum type describing the mode of an Adam optimizer instance.

Values:

enumerator Adam = 0

Adam or AdamW mode, depending on weight decay setting (see Kingma & Ba, 2015 and Loshchilov & Hutter, 2018).

enumerator AdamNoBias

Like Adam but without bias correction.

enumerator AdaMax

Adamax mode.

enumerator Lamb

Lamb mode (see You et al., 2020).

enumerator LambNoBias

Like Lamb but without bias correction.

class Adam : public popart::Optimizer

AdamW, Lamb and AdaMax optimizer implementation.

Like any to any optimizer implementation, this class is responsible for updating each weight tensor ( \(w\)) in the model using the gradient ( \(g\)) of the loss function with respect to the weight as calculated during the backwards pass.

The optimizer has the following state for each weight:

  • first-order momentum ( \(m\))

  • second-order momentum ( \(v\))

  • time step ( \(t\))

The optimizer has the following hyper parameters:

  • learning rate ( \(\text{lr}\))

  • weight decay ( \(\text{wd}\))

  • beta1 ( \(\beta_1\))

  • beta2 ( \(\beta_2\))

  • epsilon ( \(\epsilon\))

  • loss scaling ( \(\text{ls}\))

  • maximum weight norm ( \(\text{mwn}\))

The values of these parameters can be shared between all weights but some can be overridden with weight-specific values (see Adam::insertSpecific). Hyper parameters are captured using OptimizerValue objects and therefore can be either a constant value or a non-constant value that can be adjusted by the user.

The values of #AdamMode and #WeightDecayMode passed to the constructor determines how weights are updated (see below).

In the following we will describe how this optimizer updates a weight using a gradient. In the context of this description the gradient is is the value of the gradient after any gradient accumulation has been performed and after the application of a loss scaling factor to the gradient has been corrected for.

When the optimizer needs to update a weight, \(w\), using a gradient, \(g\), it first computes a term \(g_\text{tmp}\), which is effectively is \(g\) with L2 regularization applied if the #WeightDecayMode is set to WeightDecayMode::L2Regularization this, as follows:

\[\begin{split} g_\text{tmp} := \left\{\begin{aligned} g & \text{ \; (Decay) } \\ (g + \text{wd} * w) & \text{ \; (L2Regularization) \; . } \\ \end{aligned}\right.\\ \end{split}\]

Secondly, the optimizer updates the optimizer state as follows:

\[\begin{split} m' &:= \beta_1 * m + (1 - \beta_1) * g_\text{tmp} \\ v' &:= \left\{\begin{aligned} \beta_2 * v + (1 - \beta_2) * g_\text{tmp}^2 & \text{ \; (Adam/AdamNoBias) } \\ \beta_2 * v + (1 - \beta_2) * g_\text{tmp}^2 & \text{ \; (Lamb/LambNoBias) } \\ \text{max}(\beta_2 * v, |g_\text{tmp}|) & \text{ \; (AdaMax) } \\ \end{aligned}\right.\\ t' &:= t + 1 \\ \end{split}\]

Next, it computes the following terms:

\[\begin{split} m_\text{tmp} &:= \left\{\begin{aligned} m' & \text{ \; (AdamNoBias/LambNoBias) } \\ \frac{m'}{(1 - \beta_1^{t'})} & \text{ \; (Adam/Lamb/AdaMax) } \\ \end{aligned}\right.\\ v_\text{tmp} &:= \left\{\begin{aligned} v' & \text{ \; (AdamNoBias/LambNoBias) } \\ \frac{v'}{(1 - \beta_2^{t'})} & \text{ \; (Adam/Lamb/AdaMax) } \\ \end{aligned}\right.\\ u_\text{tmp} &:= \left\{\begin{aligned} \frac{m_\text{tmp}}{(\sqrt{v_\text{tmp}} + \epsilon)} + \text{wd} * w &\text{ \; (Decay) } \\ \frac{m_\text{tmp}}{(\sqrt{v_\text{tmp}} + \epsilon)} &\text{ \; (L2Regularization) } \\ \end{aligned}\right. \end{split}\]

Finally, the optimizer updates the weight as follows:

\[\begin{split} w' := \left\{\begin{aligned} w - \text{lr} * u_\text{tmp} &\text{ \; (Adam/AdamNoBias/AdaMax) } \\ w - \biggl(\frac{\text{min}(\lVert{w}\rVert, \text{mwn})}{\lVert{u_\text{tmp}}\rVert}\biggr) * \text{lr} * u_\text{tmp} &\text{ \; (Lamb/LambNoBias) } \\ \end{aligned}\right. \end{split}\]

In addition to the above, the loss scaling hyper parameter is similar in nature to the velocity scaling parameter. It is a scaling value that is applied to the loss gradient at the start of the the backwards pass and, at the end of the backwards pass, this scaling is reversed by multiplying the gradients for each weight with the inverse of the loss scaling value prior to updating the optimizer state. Using loss scaling can also improve numerical stability of the gradient calculations. If scaledOptimizerState is enabled then the the lossScaling will not be removed before updating the optimizer state. This can improve the numerical stability when accl1_type is set to FLOAT16.

NOTE: The maximum weight norm is referred to as \(\phi\) in You et al., 2020.

Public Functions

virtual bool hasSpecific(const Tensor &w) const final
virtual bool hasSpecific() const final
virtual TensorId getInverseLossScalingTensorId(const Tensor &weight) const final
Adam(OptimizerValue defaultLearningRate, OptimizerValue defaultWeightDecay, OptimizerValue defaultBeta1, OptimizerValue defaultBeta2, OptimizerValue defaultEps, OptimizerValue lossScaling, OptimizerValue maxWeightNorm, AdamMode adamMode, WeightDecayMode weightDecayMode, DataType accumType, DataType accl1Type, DataType accl2Type, const std::vector<ClipNormSettings> &clipNormSettings = {}, bool scaledOptimizerState = false, const DebugContext &debugContext = {})

Constructor.

Parameters
  • defaultLearningRate – The learning rate value to use for weights for which no weight-specific hyper parameters have been inserted.

  • defaultWeightDecay – The weight decay value to use for weights for which no weight-specific hyper parameters have been inserted.

  • defaultBeta1 – The beta1 value to use for weights for which no weight-specific hyper parameters have been inserted.

  • defaultBeta2 – The beta2 value value to use for weights for which no weight-specific hyper parameters have been inserted.

  • defaultEps – The epsilon value to use for weights for which no weight-specific hyper parameters have been inserted.

  • lossScaling – The loss scaling value to use.

  • maxWeightNorm – The maxWeightNorm value to use.

  • adamMode – The AdamMode value to use.

  • weightDecayMode – The WeightDecayMode value to use.

  • maxWeightNorm – The maxWeightNorm value to use.

  • accumType – Data type to use for gradient accumulation.

  • accl1Type – Data type to use for tensor that stores first-order momentum optimizer state.

  • accl2Type – Data type to use for tensor that stores second-order momentum optimizer state.

  • clipNormSettings – A vector of ClipNormSettings (this can be used to set maximum values for weights).

  • scaledOptimizerState – Experimental Option. Does not remove lossScaling before updating the optimizer state. This should have no effect on the update equation. However, it does ensure a more numerically stable implementation when accl1_type is set to DataType::FLOAT16. Note: When loading a model that includes initialised optimizer state, ensure that accl1 and accl2 are scaled by lossScaling and lossScaling^2 respectively.

  • debugContext – Optional debug context.

Adam(OptimizerValue defaultLearningRate, OptimizerValue defaultWeightDecay, OptimizerValue defaultBeta1, OptimizerValue defaultBeta2, OptimizerValue defaultEps, OptimizerValue lossScaling, AdamMode adamMode, WeightDecayMode weightDecayMode, DataType accumType, DataType accl1Type, DataType accl2Type, const std::vector<ClipNormSettings> &clipNormSettings = {}, bool scaledOptimizerState = false, const DebugContext &debugContext = {})
Adam(OptimizerValue defaultLearningRate, OptimizerValue defaultWeightDecay, OptimizerValue defaultBeta1, OptimizerValue defaultBeta2, OptimizerValue defaultEps, OptimizerValue lossScaling, OptimizerValue maxWeightNorm, AdamMode adamMode, DataType accumType, DataType accl1Type, DataType accl2Type, const std::vector<ClipNormSettings> &clipNormSettings = {}, bool scaledOptimizerState = false, const DebugContext &debugContext = {})
Adam(OptimizerValue defaultLearningRate, OptimizerValue defaultWeightDecay, OptimizerValue defaultBeta1, OptimizerValue defaultBeta2, OptimizerValue defaultEps, OptimizerValue lossScaling, AdamMode adamMode, DataType accumType, DataType accl1Type, DataType accl2Type, const std::vector<ClipNormSettings> &clipNormSettings = {}, bool scaledOptimizerState = false, const DebugContext &debugContext = {})
Adam(const std::map<std::string, std::pair<float, bool>> &params, AdamMode adamMode, WeightDecayMode weightDecayMode, DataType accumType, DataType accl1Type, DataType accl2Type, const std::vector<ClipNormSettings> &clipNormSettings = {}, bool scaledOptimizerState = false, const DebugContext &debugContext = {})

Constructor.

EXAMPLE:

Adam({{"defaultLearningRate", {0.02, False}},
      {"defaultBeta1", {0.9, True}},
      {"defaultBeta2":{0.999, True}}},
      AdamMode::Adam,
      WeightDecayMode::Decay,
      DataType::FLOAT,
      DataType::FLOAT,
      DataType::FLOAT);

Parameters
  • params – A parameter map where keys are one of "defaultLearningRate", "defaultWeightDecay", "defaultBeta1", "defaultBeta2", "defaultEps", "lossScaling" or "maxWeightNorm", and the map’s values pairs of floats and booleans representing OptimizerValue constructor arguments. The map does not have to specify each hyper parameter as default values will be used where parameters are missing.

  • adamMode – The AdamMode value to use.

  • weightDecayMode – The WeightDecayMode value to use.

  • maxWeightNorm – The maxWeightNorm value to use.

  • accumType – Data type to use for gradient accumulation.

  • accl1Type – Data type to use for tensor that stores first-order momentum optimizer state.

  • accl2Type – Data type to use for tensor that stores second-order momentum optimizer state.

  • clipNormSettings – A vector of ClipNormSettings (this can be used to set maximum values for weights).

  • scaledOptimizerState – Experimental Option. Does not remove lossScaling before updating the optimizer state. This should have no effect on the update equation. However, it does ensure a more numerically stable implementation when accl1_type is set to DataType::FLOAT16. Note: When loading a model that includes initialised optimizer state, ensure that accl1 and accl2 are scaled by lossScaling and lossScaling^2 respectively.

  • debugContext – Optional debug context.

Adam(const Adam&) = default
~Adam() = default
inline virtual OptimizerType type() const final
inline virtual std::string type_s() const final
virtual std::unique_ptr<Optimizer> clone() const final
virtual std::unique_ptr<Op> createOp(const Tensor &weight, Graph&) const final
virtual std::vector<TensorId> getInputIds(const Tensor &weight) const final

The names of the inputs for the VarUpdateOp for the Variable Tensor “weight”.

In the returned vector, an empty string (“”) is used as a placeholder for constant inputs.

virtual std::vector<std::tuple<TensorId, TensorInfo>> getOptimizerInputs(const Tensor &weight) const final

The names and infos of the optimizer tensors.

virtual void validReplacement(const Optimizer &other) const final
virtual void resetTensorData(Tensor&) const final
virtual void setTensorData(Tensor&) const final
float getStoredValue(const TensorId &optId) const

Tensor “opt” has an id, based on which it matches a compound scalar which this object can compute from the atomic scalars.

void insertSpecific(const TensorId &weight, OptimizerValue learningRate, OptimizerValue weightDecay, OptimizerValue beta1, OptimizerValue beta2, OptimizerValue eps, OptimizerValue mwn)

Insert a weight-specific set of hyper parameters.

Parameters
  • weight – The TensorId of the weight.

  • learningRate – The learning rate value to use for this specific weight.

  • weightDecay – The weight decay value to use for this specific weight.

  • beta1 – The beta1 value to use for this specific weight.

  • beta2 – The beta2 value to use for this specific weight.

  • eps – The epsilon value to use for this specific weight.

  • mwn – The max weight norm value to use for this specific weight.

void setStep(int64_t step)
void setStep(const TensorId&, int64_t step)
void setStep(std::map<TensorId, int64_t> steps)
void insertSpecific(const TensorId &weight, const std::map<std::string, std::pair<float, bool>> &params)

Insert a weight-specific set of hyper parameters.

Parameters
  • weight – The TensorId of the weight.

  • params – A parameter map where keys are one of "defaultLearningRate", "defaultWeightDecay", "defaultBeta1", "defaultBeta2", "defaultEps", "lossScaling" or "maxWeightNorm" and the map’s values pairs of floats and booleans representing OptimizerValue constructor arguments. The map does not have to specify each hyper parameter as default values will be used where parameters are missing.

inline const OptimizerValueMap &learningRates() const
inline const OptimizerValueMap &weightDecays() const
inline const OptimizerValueMap &beta1s() const
inline const OptimizerValueMap &beta2s() const
inline const OptimizerValueMap &epss() const
inline const OptimizerValueMap &maxWeightNorms() const
inline const WeightDecayMode &getWeightDecayMode() const
inline bool useScaledOptimizerState() const
virtual size_t hash() const final
virtual void setFactorsFromOptions(const SessionOptions&) final

Public Static Functions

static inline OptimizerValue getUnsetLearningRate()

Default learning rate value.

static inline OptimizerValue getUnsetWeightDecay()

Default weight decay value.

static inline OptimizerValue getUnsetBeta1()

Default beta1 value.

static inline OptimizerValue getUnsetBeta2()

Default beta2 value.

static inline OptimizerValue getUnsetEps()

Default epsilon value.

static inline OptimizerValue getUnsetLossScaling()

Default loss scaling value.

static inline OptimizerValue getUnsetMaxWeightNorm()

Default maximum weight norm value.

static Adam fromDefaultMap(const std::map<std::string, OptimizerValue>&, AdamMode adamMode_, WeightDecayMode decayMode_, DataType accumType_, DataType accl1Type_, DataType accl2Type_, const DebugContext &debugContext = {})

14.4.3. AdaDelta, RMSProp & AdaGrad

#include <popart/adaptive.hpp>
enum class popart::AdaptiveMode

Enum class representing a type of adaptive optimizer.

Values:

enumerator AdaGrad = 0

AdaGrad optimizer.

enumerator RMSProp

RMSProp optimizer.

enumerator CenteredRMSProp

CenteredRMSProp optimizer.

enumerator AdaDelta

AdaDelta optimizer.

class Adaptive : public popart::Optimizer

AdaDelta, RMSProp and AdaGrad optimizer implementation.

Like any to any optimizer implementation, this class is responsible for updating each weight tensor ( \(w\)) in the model using the gradient ( \(g\)) of the loss function with respect to the weight as calculated during the backwards pass.

The optimizer has the following state for each weight:

  • first-order momentum ( \(v_1\))

  • second-order momentum ( \(v_2\)) (only for AdaGrad/RMSProp)

  • third-order momentum ( \(v_3\))

The optimizer has the following hyper parameters:

  • learning rate ( \(\text{lr}\))

  • weight decay ( \(\text{wd}\))

  • alpha ( \(\alpha\))

  • momentum ( \(\text{m}\)))

  • epsilon ( \(\epsilon\))

  • loss scaling ( \(\text{ls}\))

The values of these parameters can be shared between all weights but some can be overridden with weight-specific values (see Adaptive::insertSpecific). Hyper parameters are captured using OptimizerValue objects and therefore can be either a constant value or a non-constant value that can be adjusted by the user.

The values of #AdaptiveMode and #WeightDecayMode passed to the constructor determines how weights are updated (see below).

In the following we will describe how this optimizer updates a weight using a gradient. In the context of this description the gradient is is the value of the gradient after any gradient accumulation has been performed and after the application of a loss scaling factor to the gradient has been corrected for.

When the optimizer needs to update a weight, \(w\), using a gradient, \(g\), it first computes a term \(g_\text{tmp}\), which is effectively is \(g\) with L2 regularization applied if the #WeightDecayMode is set to WeightDecayMode::L2Regularization this, as follows:

\[\begin{split} g_\text{tmp} := \left\{\begin{aligned} g & \text{ \; (Decay) } \\ (g + \text{wd} * w) & \text{ \; (L2Regularization) \; . } \\ \end{aligned}\right.\\ \end{split}\]

Secondly, the optimizer updates \(v_1\) the optimizer state as follows:

\[\begin{split} v_1' &:= \left\{\begin{aligned} \alpha * m + (1 - \alpha) * g_\text{tmp}^2 & \text{ \; (RMSProp/AdaDelta) } \\ \alpha * m + (1 - \alpha) * g_\text{tmp}^2 & \text{ \; (CenteredRMSProp) } \\ v_1 + g_\text{tmp}^2 & \text{ \; (AdaGrad) } \\ \end{aligned}\right.\\ \end{split}\]

Next, \(v_2\) is updated, but only for CenteredRMSProp:

\[\begin{split} v_2' &:= \alpha * v_2 + (1 - \alpha) * g_\text{tmp} \text{ \; (CenteredRMSProp) } \\ \end{split}\]

Next, it computes the update term \(u_\text{tmp}\):

\[\begin{split} u_\text{tmp} &:= \left\{\begin{aligned} \frac{g_\text{tmp}}{\sqrt{v_1'} + \epsilon} & \text{ \; (AdaGrad/RMSProp) } \\ \frac{g_\text{tmp}}{\sqrt{v_1' - v_2'^2} + \epsilon} & \text{ \; (CenteredRMSProp) } \\ \frac{g_\text{tmp} * \sqrt{v_2 + \epsilon}}{\sqrt{v_1' + \epsilon}} & \text{ \; (AdaDelta) } \\ \end{aligned}\right. \end{split}\]

Next, \(v_2\) is updated, but only for AdaDelta:

\[\begin{split} v_2' := \alpha * v_2 + (1 - \alpha) * u_\text{tmp}^2 \text{ \; (AdaDelta) } \\ \end{split}\]

Next the third momentum is updated for all modes:

\[ v_3' := m * v_3 + u_\text{tmp} \]

Finally, the optimizer updates the weight as follows:

\[\begin{split} w' := \left\{\begin{aligned} w - \text{lr} * (v_3' + \text{wd} * w) &\text{ \; (Decay) } \\ w - \text{lr} * v_3' &\text{ \; (L2Regularization) } \\ \end{aligned}\right. \end{split}\]

In addition to the above, the loss scaling hyper parameter is similar in nature to the velocity scaling parameter. It is a scaling value that is applied to the loss gradient at the start of the the backwards pass and, at the end of the backwards pass, this scaling is reversed by multiplying the gradients for each weight with the inverse of the loss scaling value prior to updating the optimizer state. Using loss scaling can also improve numerical stability in some cases.

Public Functions

virtual bool hasSpecific(const Tensor &w) const
virtual bool hasSpecific() const
virtual TensorId getInverseLossScalingTensorId(const Tensor &weight) const
Adaptive(OptimizerValue defaultLearningRate, OptimizerValue defaultWeightDecay, OptimizerValue defaultAlpha, OptimizerValue defaultMomentum, OptimizerValue defaultEps, OptimizerValue lossScaling, AdaptiveMode adaptiveMode, WeightDecayMode weightDecayMode, DataType accumType, DataType accl1Type, DataType accl2Type, DataType accl3Type, bool rmspropTFVariant = false, const DebugContext &debugContext = {})

Constructor.

Parameters
  • defaultLearningRate – The learning rate value to use for weights for which no weight-specific hyper parameter have been inserted.

  • defaultWeightDecay – The weight decay value to use for weights for which no weight-specific hyper parameter have been inserted.

  • defaultAlpha – The alpha value to use for weights for which no weight-specific hyper parameter have been inserted.

  • defaultMomentum – The momentum value to use for weights for which no weight-specific hyper parameter have been inserted.

  • defaultEps – The epsilon value to use for weights for which no weight-specific hyper parameter have been inserted.

  • lossScaling – The loss scaling value to use.

  • adaptiveMode – The AdaptiveMode value to use.

  • weightDecayMode – The WeightDecayMode value to use.

  • accumType – Data type to use for gradient accumulation.

  • accl1Type – Data type to use for tensor that stores first-order momentum optimizer state.

  • accl2Type – Data type to use for tensor that stores second-order momentum optimizer state.

  • accl3Type – Data type to use for tensor that stores third-order momentum optimizer state.

  • debugContext – Optional debug context.

Adaptive(const std::map<std::string, std::pair<float, bool>> &params, AdaptiveMode adaptiveMode, WeightDecayMode weightDecayMode, DataType accumType, DataType accl1Type, DataType accl2Type, DataType accl3Type, bool rmspropTFVariant = false, const DebugContext &debugContext = {})

Constructor.

EXAMPLE: ```{.cpp} Adaptive({{“defaultLearningRate”, {0.02, False}}, */ // {“defaultAlpha”, {0.99, True}}}, /** AdaptiveMode::RMSProp, WeightDecayMode::Decay, DataType::FLOAT, DataType::FLOAT, DataType::FLOAT, DataType::FLOAT); ```

Parameters
  • params – A parameter map where keys are one of "defaultLearningRate", "defaultWeightDecay", "defaultAlpha", "defaultMomentum", "defaultEps" or "lossScaling", and the map’s values pairs of floats and booleans representing OptimizerValue constructor arguments. The map does not have to specify each hyper parameter as default values will be used where parameters are missing.

  • adaptiveMode – The AdaptiveMode value to use.

  • weightDecayMode – The WeightDecayMode value to use.

  • accumType – Data type to use for gradient accumulation.

  • accl1Type – Data type to use for tensor that stores first-order momentum optimizer state.

  • accl2Type – Data type to use for tensor that stores second-order momentum optimizer state.

  • accl3Type – Data type to use for tensor that stores third-order momentum optimizer state.

  • debugContext – Optional debug context.

Adaptive(const Adaptive&) = default
~Adaptive() = default
inline virtual OptimizerType type() const final
inline virtual std::string type_s() const final
virtual std::unique_ptr<Optimizer> clone() const final
virtual std::unique_ptr<Op> createOp(const Tensor &weight, Graph&) const final
virtual std::vector<TensorId> getInputIds(const Tensor &weight) const final

The names of the inputs for the VarUpdateOp for the Variable Tensor “weight”.

In the returned vector, an empty string (“”) is used as a placeholder for constant inputs.

virtual std::vector<std::tuple<TensorId, TensorInfo>> getOptimizerInputs(const Tensor &weight) const final

The names and infos of the optimizer tensors.

virtual void validReplacement(const Optimizer &other) const final
virtual void resetTensorData(Tensor&) const final
virtual void setTensorData(Tensor&) const final
float getStoredValue(const TensorId &optId) const

Tensor “opt” has an id, based on which it matches a compound scalar which this object can compute from the atomic scalars.

void insertSpecific(const TensorId &weight, OptimizerValue learningRate, OptimizerValue weightDecay, OptimizerValue alpha, OptimizerValue momentum, OptimizerValue eps)

Insert a weight-specific set of hyper parameters.

Parameters
  • weight – The TensorId of the weight.

  • learningRate – The learning rate value to use for this specific weight.

  • weightDecay – The weight decay value to use for this specific weight.

  • alpha – The alpha value to use for this specific weight.

  • momentum – The momentum value to use for this specific weight.

  • eps – The epsilon value to use for this specific weight.

void setStep(int64_t step)
void setStep(const TensorId&, int64_t step)
void setStep(std::map<TensorId, int64_t> steps)
void insertSpecific(const TensorId &weight, const std::map<std::string, std::pair<float, bool>> &params)

Insert a weight-specific set of hyper parameters.

Parameters
  • weight – The TensorId of the weight.

  • params – A parameter map where keys are one of "defaultLearningRate", "defaultWeightDecay", "defaultAlpha", "defaultMomentum", "defaultEps" or "lossScaling" and the map’s values pairs of floats and booleans representing OptimizerValue constructor arguments. The map does not have to specify each hyper parameter as default values will be used where parameters are missing.

inline const OptimizerValueMap &learningRates() const
inline const OptimizerValueMap &weightDecays() const
inline const OptimizerValueMap &alphas() const
inline const OptimizerValueMap &momentums() const
inline const OptimizerValueMap &epss() const
virtual size_t hash() const

Public Static Functions

static inline OptimizerValue getUnsetLearningRate()

Default learning rate value.

static inline OptimizerValue getUnsetWeightDecay()

Default weight decay value.

static inline OptimizerValue getUnsetAlpha()

Default alpha value.

static inline OptimizerValue getUnsetMomentum()

Default momentum value.

static inline OptimizerValue getUnsetEps()

Default epsilon value.

static inline OptimizerValue getUnsetLossScaling()

Default loss scaling value.

static Adaptive fromDefaultMap(const std::map<std::string, OptimizerValue>&, AdaptiveMode adaptiveMode_, WeightDecayMode decayMode_, DataType accumType_, DataType accl1Type_, DataType accl2Type_, DataType accl3Type_, const DebugContext &debugContext = {})

14.5. Builder

#include <popart/builder.hpp>
class Builder

An interface for a Builder, used for creating ONNX graphs.

A builder interface for creating ONNX graphs.

ONNX defines a specification for describing graphs and serialising them as protobuf files. This class provides a builder interface for creating such a graph.

Note, in ONNX, all Ops belong to an “Opset”. The Builder itself does not have methods for creating Ops in the ONNX graph, but instead has accessors to Opsets, like AiGraphcoreOpset1, which contain the methods for creating Ops in the graph.

Public Functions

Builder &createSubgraphBuilder()

Create a builder for a graph which is nested inside this builder’s graph.

~Builder()

Destructor for the Builder class.

TensorId addInputTensor(const TensorInfo &tensorInfo, const popart::DebugContext &debugContext = {})

Add a new input tensor to the model.

Parameters
  • tensorInfo – The shape and data type of the input tensor.

  • debugContext – Optional debug information.

Returns

The tensor id of the input tensor.

TensorId addInputTensor(const std::string &dataType, const Shape &shape, const popart::DebugContext &debugContext = {})

Add a new input tensor to the model.

Parameters
  • dataType – The data type of the input tensor.

  • shape – The shape of the input tensor.

  • debugContext – Optional debug information.

Returns

The tensor id of the input tensor.

TensorId addInputTensor(const TensorInfo &tensorInfo, const InputSettings &settings, const popart::DebugContext &debugContext = {})

Add a new input tensor to the model.

Parameters
  • tensorInfo – The shape and data type of the input tensor.

  • InputSettings – Settings for TileSet and ExchangeStrategy.

  • debugContext – Optional debug information.

Returns

The tensor id of the input tensor.

TensorId addInputTensor(const std::string &dataType, const Shape &shape, const InputSettings &settings, const popart::DebugContext &debugContext = {})

Add a new input tensor to the model.

Parameters
  • dataType – The data type of the input tensor.

  • shape – The shape of the input tensor.

  • InputSettings – Settings for TileSet and ExchangeStrategy.

  • debugContext – Optional debug information.

Returns

The tensor id of the input tensor.

TensorId addUntypedInputTensor(const popart::DebugContext &debugContext = {})

Add a new input tensor without a type or shape to the model.

Parameters

debugContext – Optional debug information.

Returns

The tensor id of the input tensor.

void addInputTensorFromParentGraph(const TensorId &tensorId)

Add a new named input tensor (from the parent graph) to the model.

Parameters

tensorId – The identifier string of the input tensor. This identifier must already exist in the name scope of the parent GraphProto and must appear topologically before this sub-graph.

TensorId addInitializedInputTensor(const ConstVoidData &initData, const popart::DebugContext &debugContext = {})

Add a new pre-initialized input tensor to the model.

Parameters
  • initData – The initial data of the input tensor.

  • debugContext – Optional debug information.

Returns

The tensor id of the input tensor.

TensorId addInitializedInputTensor(const ConstVoidData &initData, const VariableSettings &variableSettings, const popart::DebugContext &debugContext = {})

Add a new pre-initialized input tensor to the model.

Parameters
  • initData – The initial data of the input tensor.

  • variableSettings – The settings that determine how variables are retrieved from replicas.

  • debugContext – Optional debug information.

Returns

The tensor id of the input tensor.

void addOutputTensor(const TensorId &arg0)

Add an output tensor from a node in the graph into the list of output tensors.

Parameters

arg0 – The tensor id of the output tensor to be added.

inline AiOnnxOpset6 aiOnnxOpset6()

Return the builder interface for ai.onnx opset 6.

inline AiOnnxOpset7 aiOnnxOpset7()

Return the builder interface for ai.onnx opset 7.

inline AiOnnxOpset8 aiOnnxOpset8()

Return the builder interface for ai.onnx opset 8.

inline AiOnnxOpset9 aiOnnxOpset9()

Return the builder interface for ai.onnx opset 9.

inline AiOnnxOpset10 aiOnnxOpset10()

Return the builder interface for ai.onnx opset 10.

inline AiOnnxOpset11 aiOnnxOpset11()

Return the builder interface for ai.onnx opset 11.

inline AiOnnxMlOpset1 aiOnnxMlOpset1()

Return the builder interface for ai.onnx.ml opset 1.

inline AiGraphcoreOpset1 aiGraphcoreOpset1()

Return the builder interface for ai.graphcore opset 1.

std::vector<TensorId> customOp(const OperatorIdentifier &opid, int opsetVersion, const std::vector<TensorId> &inputs, const unsigned numOutputs, const std::map<std::string, popart::any> &attributes, const DebugContext &debugContext = {})

Return the output tensors from a custom op added to the model.

Parameters
  • opid – The id of the operator.

  • opsetVersion – The version of the opset.

  • inputs – The tensor ids of the A vector of input tensor ids.

  • numOutputs – The number of output tensors.

  • attributes – The map of attributes and their values to be added.

  • debugContext – Optional debug information.

Returns

The output tensors.

void customOp(const OperatorIdentifier &opid, int opsetVersion, const std::vector<TensorId> &inputs, const std::vector<TensorId> &outputs, const std::map<std::string, popart::any> &attributes, const DebugContext &debugContext = {})

Add a custom op to the model.

Parameters
  • opid – The id of the operator.

  • opsetVersion – The version of the opset.

  • inputs – The tensor ids of the A vector of input tensor ids.

  • outputs – The tensor ids of the output tensors.

  • attributes – The map of attributes and their values to be added.

  • debugContext – Optional debug information.

template<class T>
inline TensorId reshape_const(T &t, const std::vector<TensorId> &args, const std::vector<int64_t> &shape, const std::string &name = {})

Add a constant and a reshape a tensor using the provided domain.

Parameters
  • t – The builder interface.

  • args – The tensor ids of the tensors to be updated.

  • shape – The shape information to be used.

  • name – (Optional) The name of the updated tensor. Default: None.

Returns

The tensor id of the updated tensor.

inline void outputTensorLocation(const TensorId &nodeOutputName, TensorLocation value)

Set a value for the output tensor location attribute.

Parameters
  • nodeOutputName – The tensor id of the output tensor of the ONNX node.

  • value – The location of the tensor.

inline void recomputeOutput(const TensorId &nodeOutputName, RecomputeType value)

Enable recomputation of the output of the node in the backward pass.

Parameters
  • nodeOutputName – The tensor id of the output tensor of the ONNX node.

  • value – (Optional) The type of the recompute.

inline void recomputeOutputInBackwardPass(const TensorId &nodeOutputName, RecomputeType value = RecomputeType::Recompute)

Enable or disable recomputation of the output of the node in the backward pass.

Parameters
  • nodeOutputName – The tensor id of the output tensor of the ONNX node.

  • value – (Optional) The type of the recompute. Default: RecomputeType::Recompute.

inline void recomputeOutputInBackwardPass(const std::set<TensorId> &nodeOutputNames, RecomputeType value = RecomputeType::Recompute)

Enable or disable recomputation of the output of the node in the backward pass.

Parameters
  • nodeOutputNames – The tensor ids of the output tensors of the ONNX node.

  • value – (Optional) The type of the recompute. Default: RecomputeType::Recompute.

inline bool getRecomputeOutputInBackwardPass(const TensorId &nodeOutputName)

Check if a node will have its output recomputed in the backward pass.

Parameters

nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

Returns

true if the output will be recomputed; false otherwise.

inline bool getRecomputeOutputInBackwardPass(const std::set<TensorId> &nodeOutputNames)

Check if a node will have its output recomputed in the backward pass.

Parameters

nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

Returns

true if the output will be recomputed; false otherwise.

std::vector<TensorId> checkpointOutput(const std::vector<TensorId> &nodeOutputNames)

Add checkpoint operations to the model.

This is the same as an identity op but RecomputeType is Checkpoint by default. Use this to checkpoint a subset of an operation’s output tensors.

Parameters

nodeOutputNames – The tensors to checkpoint.

Returns

The checkpointed tensors.

inline void virtualGraph(const TensorId &nodeOutputName, int64_t value = 0)

Set the virtual graph that computes the given node.

Applies when creating a graph for a multi-IPU configuration.

Parameters
  • nodeOutputName – Name of the output tensor of the ONNX node.

  • value – The index of the virtual graph that computes this node. Default=0.

inline void executionPhase(const TensorId &nodeOutputName, int64_t value = 0)

Set the execution phase that computes the given node.

Applies when creating a graph for a multi-IPU configuration.

Parameters
  • nodeOutputName – The tensor id of the output tensor of the ONNX node.

  • value – The index of the virtual graph that computes this node. Default=0.

inline void pipelineStage(const TensorId &nodeOutputName, int64_t value)

Set the value on the pipeline stage attribute.

Parameters
  • nodeOutputName – The tensor id of the output tensor of the ONNX node.

  • value – The value to be set.

inline void pipelineStage(const std::set<TensorId> &nodeOutputNames, int64_t value)

Set the value on the pipeline stage attribute.

Parameters
  • nodeOutputNames – The tensor ids of the output tensors of the ONNX node.

  • value – The value to be set.

inline void excludePatterns(const TensorId &nodeOutputName, const std::vector<std::string> &patternNames)

Set the patterns to be excluded.

Parameters
  • nodeOutputName – The tensor id of the output tensor of the ONNX node.

  • patternNames – The vector of pattern names to be excluded.

inline void excludePatterns(const std::set<TensorId> &nodeOutputNames, const std::vector<std::string> &patternNames)

Set the patterns to be excluded.

Parameters
  • nodeOutputNames – The tensor ids of the output tensors of the ONNX node.

  • patternNames – The vector of pattern names to be excluded.

inline void setSerializeMatMul(const std::set<TensorId> &nodeOutputNames, std::string mode, int64_t factor, bool keep_precision)

Set the settings for matmuls that should be serialized.

This option will split a matmul into separate smaller matmuls that will be executed in series. This will also serialize the grad operations during training.

Parameters
  • nodeOutputNames – The tensor ids of the output matmul tensors of the ONNX node.

  • mode – The dimension of the matmul to serialize on. Options are: ‘input_channels’, ‘output_channels’, ‘reducing_dim’, ‘none’.

  • factor – The number of serialised matmuls. This must be a factor of the dimensions to serialise on.

void setPartialsType(const TensorId &nodeOutputName, const std::string partialsType)

Set the partials type for the given node.

This is used in the convolution op.

Parameters
  • nodeOutputName – Name of the output tensor of the ONNX node.

  • partialsType – The type for the partials. Options are: FLOAT or HALF.

void setEnableConvDithering(const TensorId &nodeOutputName, int64_t value)

Enable convolution dithering.

Parameters
  • nodeOutputName – The tensor id of the output tensor of the ONNX node.

  • value – The value to enable convolution. This should be 1 to enable convolution dithering and 0 otherwise.

std::string getPartialsType(const TensorId &nodeOutputName)

Get the partials type for the given node.

Parameters

nodeOutputName – The tensor id of the output tensor of the ONNX node.

Returns

The partials type.

inline void setInplacePreferences(const TensorId &nodeOutputName, const std::map<OpType, float> &prefs)
void setAvailableMemoryProportion(const TensorId &nodeOutputName, const float availableMemoryProportion)

Set the available memory proportion for the given node.

This is used in the convolution op.

See also

Optimising Temporary Memory Usage for Convolutions and Matmuls on the IPU for some practical examples of using availableMemoryProportion.

Parameters
  • nodeOutputName – Name of the output tensor of the ONNX node.

  • availableMemoryProportion – The available memory proportion [0, 1).

void setAvailableMemoryProportion(const std::set<TensorId> &nodeOutputNames, const float availableMemoryProportion)

Set the available memory proportion for the given node.

This is used in the convolution op.

See also

Optimising Temporary Memory Usage for Convolutions and Matmuls on the IPU for some practical examples of using availableMemoryProportion

Parameters
  • nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

  • availableMemoryProportion – The available memory proportion [0, 1).

void setAttribute(const std::string &attribute, popart::any value)

Set the value of an attribute that will be set on all subsequent operations.

Parameters
  • attribute – The name of the attribute to set.

  • value – The value to set on the attribute.

popart::any getAttribute(const std::string attribute) const

Get an attribute that has been set for all subsequent operations.

Parameters

attribute – The name of the attribute to get.

Returns

The attribute.

bool hasAttribute(const std::string &attribute) const

Check if an attribute exists.

Parameters

attribute – The name of the attribute to check.

Returns

true if the attribute exists; false otherwise.

void clearAttribute(const std::string &attribute)

Unset an attribute that will be set on all subsequent operations.

Parameters

attribute – The name of the attribute to unset.

bool hasAttribute(const std::string &attribute)

Check if an attribute is set.

Parameters

attribute – The name of the attribute to check.

Returns

true if the attribute is set; false otherwise.

popart::any getAttribute(const std::string &attribute)

Get the attribute value.

Parameters

attribute – The name of the attribute.

Returns

The value of the attribute.

int64_t getPipelineStage() const

Get the pipeline stage attribute.

Returns

The pipeline stage.

int64_t getExecutionPhase() const

Get the execution phase attribute.

Returns

The execution phase.

int64_t getVirtualGraph() const

Get the virtual graph attribute.

Returns

The virtual graph.

inline void virtualGraph(const std::set<TensorId> &nodeOutputNames, int64_t value = 0)

Set the virtual graph that computes the given node.

Applies when creating a graph for a multi-IPU configuration.

Parameters
  • nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

  • value – The index of the virtual graph that computes this node.

inline void executionPhase(const std::set<TensorId> &nodeOutputNames, int64_t value = 0)

Set the execution phase.

Applies when creating a graph for a multi-IPU configuration.

Parameters
  • nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

  • value – The index of the virtual graph that computes this node.

void addNodeAttribute(const std::string &attributeName, const int64_t &attributeValue, const std::set<TensorId> &nodeOutputNames)

Add an attribute to the ONNX node which is uniquely identified by the output tensors.

This function will throw an exception if it cannot find the unique node or if the attribute already exists.

Parameters
  • attributeName – The name of the attribute to add.

  • attributeValue – An int64_t value of the attribute to add.

  • nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

void addNodeAttribute(const std::string &attributeName, const std::vector<int64_t> &attributeValue, const std::set<TensorId> &nodeOutputNames)

Add an attribute to the ONNX node which is uniquely identified by the output tensors.

This function will throw an exception if it cannot find the unique node or if the attribute already exists.

Parameters
  • attributeName – The name of the attribute to add.

  • attributeValue – A std::vector<int64_t> value of the attribute to add.

  • nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

void addNodeAttribute(const std::string &attributeName, const float &attributeValue, const std::set<TensorId> &nodeOutputNames)

Add an attribute to the ONNX node which is uniquely identified by the output tensors.

This function will throw an exception if it cannot find the unique node or if the attribute already exists.

Parameters
  • attributeName – The name of the attribute to add.

  • attributeValue – A float value of the attribute to add.

  • nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

void addNodeAttribute(const std::string &attributeName, const std::vector<float> &attributeValue, const std::set<TensorId> &nodeOutputNames)

Add an attribute to the ONNX node which is uniquely identified by the output tensors.

This function will throw an exception if it cannot find the unique node or if the attribute already exists.

Parameters
  • attributeName – The name of the attribute to add.

  • attributeValue – The std::vector<float> value of the attribute to add.

  • nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

void addNodeAttribute(const std::string &attributeName, const std::string &attributeValue, const std::set<TensorId> &nodeOutputNames)

Add an attribute to the ONNX node which is uniquely identified by the output tensors.

This function will throw an exception if it cannot find the unique node or if the attribute already exists.

Parameters
  • attributeName – The name of the attribute to add.

  • attributeValue – A std::string value of the attribute to add.

  • nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

void addNodeAttribute(const std::string &attributeName, const char *attributeValue, const std::set<TensorId> &nodeOutputNames)

Add an attribute to the ONNX node which is uniquely identified by the output tensors.

This function will throw an exception if it cannot find the unique node or if the attribute already exists.

Parameters
  • attributeName – The name of the attribute to add.

  • attributeValue – A char value of the attribute to add.

  • nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

void addNodeAttribute(const std::string &attributeName, const std::vector<std::string> &attributeValue, const std::set<TensorId> &nodeOutputNames)

Add an attribute to the ONNX node which is uniquely identified by the output tensors.

This function will throw an exception if it cannot find the unique node or if the attribute already exists.

Parameters
  • attributeName – The name of the attribute to add.

  • attributeValue – A std::vector<std::string> value of the attribute to add.

  • nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

void addNodeAttribute(const std::string &attributeName, const bool attributeValue, const std::set<TensorId> &nodeOutputNames)

Add an attribute to the ONNX node which is uniquely identified by the output tensors.

This function will throw an exception if it cannot find the unique node or if the attribute already exists.

Parameters
  • attributeName – The name of the attribute to add.

  • attributeValue – A bool value of the attribute to add.

  • nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

void addNodeAttribute(const std::string &attributeName, const ConstVoidData &attributeValue, const std::set<TensorId> &nodeOutputNames)

Add an attribute to the ONNX node which is uniquely identified by the output tensors.

This function will throw an exception if it cannot find the unique node or if the attribute already exists.

Parameters
  • attributeName – The name of the attribute to add.

  • attributeValue – A constant tensor initializer.

  • nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

bool nodeHasAttribute(const std::string &attributeName, const std::set<TensorId> &nodeOutputNames)

Check whether the ONNX node has an attribute set.

This function will throw an exception if it cannot find the unique node.

Parameters
  • attributeName – The name of the attribute to find.

  • nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

Returns

true if the node has an attribute set; false otherwise.

int64_t getInt64NodeAttribute(const std::string &attributeName, const std::set<TensorId> &nodeOutputNames)

Get the value of an attribute for the ONNX node where the value is a int64_t.

This function will throw an exception if it cannot find the unique node or if the attribute does not exist or if it has not been set to the int64_t type.

Parameters
  • attributeName – The name of the attribute to find.

  • nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

Returns

Value of the attribute.

std::vector<int64_t> getInt64VectorNodeAttribute(const std::string &attributeName, const std::set<TensorId> &nodeOutputNames)

Get the value of an attribute for the ONNX node where the value is a std::vector<int64_t>.

This function will throw an exception if it cannot find the unique node or if the attribute does not exist or if it has not been set to the std::vector<int64_t> type.

Parameters
  • attributeName – The name of the attribute to find.

  • nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

Returns

Value of the attribute.

float getFloatNodeAttribute(const std::string &attributeName, const std::set<TensorId> &nodeOutputNames)

Get the value of an attribute for the ONNX node where the value is a float.

This function will throw an exception if it cannot find the unique node or if the attribute does not exist or if it has not been set to the float type.

Parameters
  • attributeName – The name of the attribute to find.

  • nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

Returns

Value of the attribute.

std::vector<float> getFloatVectorNodeAttribute(const std::string &attributeName, const std::set<TensorId> &nodeOutputNames)

Get the value of an attribute for the ONNX node where the value is a std::vector<float>.

This function will throw an exception if it cannot find the unique node or if the attribute does not exist.

Parameters
  • attributeName – The name of the attribute to find.

  • nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

Returns

Value of the attribute.

std::string getStringNodeAttribute(const std::string &attributeName, const std::set<TensorId> &nodeOutputNames)

Get the value of an attribute for the ONNX node where the value is a string.

This function will throw an exception if it cannot find the unique node or the attribute does not exist or it has not been set to the std::string type.

Parameters
  • attributeName – The name of the attribute for which the value is required.

  • nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

Returns

Value of the attribute.

std::vector<std::string> getStringVectorNodeAttribute(const std::string &attributeName, const std::set<TensorId> &nodeOutputNames)

Get the value of an attribute for the ONNX node where the value is a vector of strings.

This function will throw an exception if it cannot find the unique node or if the attribute does not exist.

Parameters
  • attributeName – The name of the attribute for which the value is required.

  • nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

Returns

Value of the attribute.

bool getBoolNodeAttribute(const std::string &attributeName, const std::set<TensorId> &nodeOutputNames)

Get the value of an attribute for the ONNX node where the value is a boolean.

This function will throw an exception if it cannot find the unique node or if the attribute does not exist.

Parameters
  • attributeName – The name of the attribute for which the value is required.

  • nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

Returns

Value of the attribute.

void removeNodeAttribute(const std::string &attributeName, const std::set<TensorId> &nodeOutputNames)

Remove an attribute from the ONNX node.

This function will throw an exception if it cannot find the unique node or if the attribute does not exist.

Parameters
  • attributeName – The name of the attribute to find.

  • nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

std::vector<std::string> getAllNodeAttributeNames(const std::set<TensorId> &nodeOutputNames)

Get all the attribute names from the ONNX node.

This function will throw an exception if it cannot find the unique node.

Parameters

nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

Returns

The attribute names associated with the ONNX node.

inline int64_t getVirtualGraph(const TensorId &nodeOutputName)

Get the index of the virtual graph that computes this node.

This applies in a multi IPU system.

This function will throw an exception if the virtual graph has not been set in the current scope.

Parameters

nodeOutputName – The tensor id of the output tensor of the ONNX node used to find the node in the ONNX model.

Returns

The virtual graph associated with the ONNX node.

inline int64_t getVirtualGraph(const std::set<TensorId> &nodeOutputNames)

Get the index of the virtual graph that computes this node based on multiple output tensors.

This applies in a multi IPU system.

This function will throw an exception if the virtual graph has not been set in the current scope.

Parameters

nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

Returns

The virtual graph associated with the ONNX node.

inline int64_t getExecutionPhase(const TensorId &nodeOutputName)

Get the execution phase for a single output tensor.

This only applies to a multi-IPU system.

This function will throw an exception if the execution phase has not been set in the current scope.

Parameters

nodeOutputNames – The tensor id of the output tensor of the ONNX node used to find the node in the ONNX model.

Returns

The execution phase associated with the ONNX node.

inline int64_t getExecutionPhase(const std::set<TensorId> &nodeOutputNames)

Get the execution phase for a set of output tensors.

This only applies to a multi-IPU system.

This function will throw an exception if the execution phase has not been set in the current scope.

Parameters

nodeOutputNames – The tensor ids of the output tensors of the ONNX node used to find the node in the ONNX model.

Returns

The execution phase associated with the ONNX node.

std::string getModelProto(bool humanReadable = false) const

Retrieve the ONNX serialized ModelProto.

Parameters

humanReadable – If true, return a human readable text representation of the model, otherwise use a binary format.

Returns

A serialized ONNX ModelProto.

void saveModelProto(const std::string &fn)

Save the builder’s ONNX ModelProto into the builder and validate it.

Parameters

fn – The name of a file containing an ONNX model protobuf.

void saveInitializersExternally(const std::vector<TensorId> &ids, const std::string &fn)

Save tensor data externally.

The model data cannot exceed 2GB - the maximum size of a Protobuf message. To avoid this, for large models ONNX tensor data can be saved separately.

Parameters
  • ids – The names of tensors for which data is to be saved externally.

  • fn – The name of a file containing the binary tensor data. This can be an absolute or relative path. If a relative path, when the ONNX model is saved, external tensor data will be written to a path relative to the current working directory.

std::vector<TensorId> getInputTensorIds() const

Return a list of ONNX graph input tensor ids.

Returns

A vector of input tensor ids.

std::vector<TensorId> getOutputTensorIds() const

Return a list of ONNX graph output tensor ids.

Returns

A vector of output tensor ids.

std::vector<TensorId> getValueTensorIds() const

Return a list of ONNX graph value tensor ids.

These tensors are stored in the value_info section of the ONNX GraphProto structure.

Returns

A vector of value tensor names.

std::vector<TensorId> getTrainableTensorIds() const

Return a list of ONNX graph initialized tensor ids.

These tensors are stored in the initialized section of the ONNX GraphProto structure..

Returns

A vector of names of initialized tensors.

bool hasValueInfo(const TensorId &id) const

Check if a tensor has value info.

A tensor may not have value info if this either does not exist or if shape inference has failed.

Returns

True if the tensor has value info; false otherwise..

std::vector<int64_t> getTensorShape(const TensorId id)

Return an ONNX graph tensor shape, from either the input, output, or value_info lists in GraphProto.

Parameters

id – The id of the tensor for which dimensions are required.

Returns

A vector of the tensor dimensions.

bool isInitializer(const TensorId id) const

Check if the ONNX tensor is in the initializer list of GraphProto.

Parameters

id – A tensor id.

Returns

True if the tensor is in the initializer list; false otherwise.

std::string getTensorDtypeString(const TensorId id)

Return an ONNX graph tensor type as a lower case string, from either the input, output, or value_info lists in GraphProto.

Parameters

id – The id of the tensor for which the type is required.

Returns

A lower case string of the tensor data type.

DataType getTensorDataType(const TensorId id)

Return a tensor type from either the input, output, or value_info lists in GraphProto.

Parameters

id – The id of tensor id for which the type is required.

Returns

The data type of the tensor.

void pushNameScope(const std::string &name)

Push a name onto the name scope stack.

The names of tensors and nodes added to the ONNX graph will be prefixed with a concatenation of the names in the name scope stack.

Parameters

name – The tensor name to be pushed onto the name scope stack.

void popNameScope()

Remove the last entry in the name scope stack.

std::string getNameScope(const std::string &name = "") const

Get the current name scope stack using the default delimiter.

Parameters

name – (Optional) A string to concatenate to the end of the stack.

Returns

A string of the concatenated name scope stack.

void setGraphName(const std::string &name)

Set a graph name.

Parameters

name – The string to name the graph.

void setParent(Builder *parent)

Set the parent graph of this builder.

Parameters

parent – The builder to set as the parent of this builder.

Builder *getParent() const

Return the parent graph of this builder or null if there is no parent.

inline bool hasParent() const

Check if this builder represents a subgraph.

Returns

If true then the builder represents a subgraph. If false then the builder does not represent a subgraph.

void embedReplicationFactor(int replicationFactor)

Embed the value of replicationFactor into the OnnxModel.

Should be interpreted as 1 if not present in the model.

Parameters

replicationFactor – The replication factor.

Public Static Functions

static std::unique_ptr<Builder> create()

Create a builder for an ONNX model.

static std::unique_ptr<Builder> createFromOnnxModel(const std::string &modelProtoOrFilename)

Create a builder which loads a serialized ONNX ModelProto into the builder and validates it.

Parameters

modelProtoOrFilename – Either an ONNX model protobuf, or the name of a file containing an ONNX model protobuf.

class Ir

Public Types

enum class ExecutionMode

Values:

enumerator Inference
enumerator Training
enum class SerialiseFormat

Values:

enumerator JSON

Public Functions

poprithms::logging::TimePartitionLogger &timePartitionLogger() const

void foo() { auto timer = timePartitionLogger().scopedStopwatch("In foo"); if (cond0()){ return; } bar(); return; }

When the method timePartitionLoggerStr() (see below) is called, there will be a line with “In foo” summarizing the time between between the construction and destruction of timer, above. Something like:

In foo : 0.03 [s] : 30 % In bar : 0.02 [s] : 10 % unaccounted : 0.05 [s] : 50 % total : 0.10 [s] : 100 %.

In the case where there are multiple timers which exist concurrently, only the most recently constructed one will accumulate time. This means that the most nested scope is the one which will accumulate time.

For more information, see the poprithms SwitchingTimePartitionLogger class

Returns

An object used to track and summarize where wall clock time is spent in PopART compilation. This object is used to partition time into different components (scheduling, outlining, poplar Graph construction, etc.). It can be used as follows:

std::string timePartitionLoggerStr() const
Ir()
~Ir()
Ir(Ir&&) = delete
Ir &operator=(Ir&&) = delete
Ir(const Ir&) = delete
Ir &operator=(const Ir&) = delete
inline uint64_t getId() const
void setOnnxModel(const ONNX_NAMESPACE::ModelProto &model)
inline bool hasOnnxModel() const

Check if there’s an ONNX model in the IR.

This is true if the IR has been created from an ONNX model or using the Builder.

Returns

true If there is an onnx model, false otherwise.

void setDataFlow(const DataFlow &df)
void setUserOptions(const SessionOptions &flags)
void setInputShapeInfo(const InputShapeInfo &info)
inline const InputShapeInfo &getInputShapeInfo() const
void setOptimizer(const Optimizer&)
void ensureOptimizerTensorCreated(const TensorId &optId, const TensorInfo &info, const DebugContext &debugContext = {})
inline const Optimizer &getOptimizer() const
void setDeviceInfo(DeviceInfo&)
const DeviceInfo *getDeviceInfo() const
void setPatterns(const Patterns &p)
inline const Patterns &getPatterns() const
std::string getPatternLevelStr(const Patterns &p)
bool isPatternsLevel(const Patterns &p, PatternsLevel level)
void removeIsolatedTensors(bool retainUsedIOTensors = false, bool retainAllIOTensors = false, bool retainVarTensors = false, bool retainConstTensors = false)
void removeIsolatedGraphs()
void setExecutionMode(const ExecutionMode &mode)
inline bool isTraining() const
inline bool isTesting() const
void logIr() const
void compareWithSavedHash(const HashesMap &cacheEntries)
void prepare(const IrBundle &bundle, const HashesMap &cacheEntries = {}, size_t hashSeed = 0u)

Prepare the IR based on the IrBundle configuration.

If engine caching is enabled then the IR hash which is based on the IrBundle and the forward graph will be compared to a saved file. If the hash matches then the rest of the Ir preparation will be skipped.

Parameters
  • bundle – The bundle to prepare.

  • cacheEntries – The engine cache.

  • hashSeed – The seed to initiate the IR hash with &#8212; this hash should incorporate non-IR factors that could affect the compilation such as engine options and session options.

void prepareCache(const HashesMap &cacheEntries, size_t hashSeed)
void finalizeOpDebugInfo()
inline bool isPrepared() const
inline bool hashMatched() const
void updateOptimizer(const Optimizer&)
ONNX_NAMESPACE::ModelProto step(int n)
void addAdditionalModelProtoTensor(const TensorId&)
void addAdditionalModelProtoTensor(Tensor*)
void addAdditionalModelProtoTensors()
inline bool additionalModelProtoTensorsHaveBeenAdded() const
inline const std::set<Tensor*, PTensorCmp> &getAdditionalModelProtoTensors() const
inline std::set<Tensor*, PTensorCmp> &getAdditionalModelProtoTensors()
bool isAnchored(const TensorId&) const
bool isRootAnchor(const TensorId&) const
std::set<TensorId> getAnchors() const
std::set<TensorId> getRootAnchors() const
void remapAnchor(const TensorId &from, const TensorId &to)
void addAnchor(const TensorId &t)
const BiMap<TensorId, TensorId> &getAnchorRemap() const
bool streamingIsDisabledForTensor(const Tensor*) const
bool streamingIsDisabledForTensor(const TensorId&) const
bool storingIsDisabledForTensor(const Tensor*) const
bool storingIsDisabledForTensor(const TensorId&) const
void append(std::stringstream&) const
void serialise(SerialiseFormat format, std::stringstream &ss, bool useScheduler = true) const
std::vector<Tensor*> optimizerTensors() const
std::vector<Tensor*> optimizerStateTensors() const
std::map<TensorId, std::vector<Tensor*>> getHostLoadTensors() const

The original input tensor ID (used to identify streams) and the tensors produced by associated HostLoadOp.

std::map<TensorId, std::vector<Tensor*>> getHostStoreTensors() const

The original anchor tensor ID (used to identify streams) and the tensors consumed by associated HostStoreOp.

std::vector<Tensor*> dataStreamTensors() const
std::vector<Op*> opsOfType(const OperatorIdentifier &opid) const
bool isConsumedByOpOfType(TensorId tid, const OperatorIdentifier &opid)
std::vector<const Graph*> getGraphSchedule() const
std::vector<const Graph*> getGraphSchedule(GraphId root) const
std::vector<Op*> getOpSchedule(const OpsBeforeKey&, RequireOptimalSchedule ros) const
bool isSchedulable(const OpsBeforeKey&) const
bool virtualGraphsEnabled() const
SyntheticDataMode syntheticDataMode() const
bool useSyntheticData() const
OpId getOpsCounter() const
OpId getAndIncrOpsCounter()
TensorId getFinalLossId() const
OpId getFinalLossOpId() const
void dotCheckpoint(const Ir &ir, std::string check) const
const ONNX_NAMESPACE::ModelProto &getModel() const
Throws

error – if there is no Onnx model.

Returns

const reference to the Onnx model.

std::vector<TensorId> getModelInputIds() const
Returns

the id of every input tensor of the Onnx model. If there is no Onnx model, returns empty.

void setExternalTensorDataInfo(TensorId, const ONNX_NAMESPACE::TensorProto&)

Set the Onnx TensorProto of the given tensor in the Onnx ModelProto.

Throws

error – if this Ir has no Onnx model.

inline const SessionOptions &getSessionOptions() const
inline SessionOptions &getSessionOptions()
inline void setSessionName(const std::string name)
inline const std::string getSessionName() const
std::set<TensorId> getAllTensorIds() const
std::vector<TensorId> getTensorIds(TensorType) const
Tensor *getTensor(const TensorId&) const
bool containsTensor(const TensorId&) const
std::vector<TensorId> getGraphInputIds() const
std::vector<TensorId> getGraphOutputIds() const
const Graph &getMainGraph() const
Graph &getMainGraph()
std::vector<const Graph*> getAllGraphs() const
Graph &getGraph(const GraphId&) const
bool hasGraph(const GraphId&) const
Graph &createGraph(const GraphId&)
void removeGraph(const GraphId&)
std::map<OpId, std::unique_ptr<Op>> &getMainGraphOps()
const std::map<OpId, std::unique_ptr<Op>> &getMainGraphOps() const
std::vector<Op*> getAllOps() const
Op *getOp(OpId opId) const

Returns the Op if it exists in any graph.

Throws an error if the Op could not be found.

Parameters

opId – The unique ID of the Op to find

Returns

The Op pointer if found

Tensors &getMainGraphTensors()
const Tensors &getMainGraphTensors() const
inline const DataFlow &getDataFlow() const
void applyTransform(std::size_t transformId, Graph &graph)
void validateAnchors() const
ExecutionMode getExecutionMode() const
bool canInfer() const
bool canTrain() const
bool hasConstructedBackwards() const
bool hasDecomposedOptimizers() const
bool containsInitialisers() const
bool tensorExistsInInitialisers(TensorId) const
void constructForwards()
Graph &constructFromOnnxGraph(const ONNX_NAMESPACE::GraphProto &graph, const Scope &scope)
void foldConstants(Graph&)
void constructBackwards()
void registerInputTensors()
void updateVertices()
void unsetAllVirtualGraphIds()
void applyPreAliasPatterns(Graph&)
void applyUpdateInplacePrioritiesForIpu()
void applyInplacePattern(Graph&)
void confirmConstIds() const
void confirmNoReservedIds() const
void setFinalLoss(const TensorId &loss)
int getDefaultOpsetVersion(const std::string &domain) const
unsigned getNumVirtualGraphIds() const
int getOpSetVersionFromModel(const std::string &domain) const
inline bool autoRecomputationEnabled() const
bool hasReplicatedTensorSharding() const
bool hasOverlappedIO() const
inline void setRequiresRandomSeed()
inline bool getRequiresRandomSeed() const
RandomReferenceId getAndIncrementRandomReferenceId()
TensorId getOrSetRandomReferenceTensor(RandomReferenceId, TensorId)
void mergeRandomReferenceIds(std::set<RandomReferenceId>&)
void setRemoteBufferInfo(RemoteBufferId, RemoteBufferInfo)
const RemoteBufferInfo getRemoteBufferInfo(RemoteBufferId) const
const std::map<RemoteBufferId, RemoteBufferInfo> getAllRemoteBufferInfos() const
inline void setExecutionPhasesReady()
inline bool getExecutionPhasesReady() const
PipelineStage getNumPipelineStages() const
PipelineInfo pipelineInfo() const
void setMainGraphPathFromLoss()
void verifyTensorInfos() const

Verifies that all tensors have valid TensorInfos.

void setIsPrepared()

Marks the Ir as “prepared”.

This means the Ir is now ready to be lowered. Failing to do this before lowering the Ir will result in an error. The schedule of all graphs will be fixed by calling this. Modifying the graphs after the IR is prepared will result in an error.

PipelineStage getFinalLossPipelineStage() const

Get pipeline stage containing the final loss (the last forward pipeline stage)

Returns

pipeline stage containing the final loss

PipelineStage getMaxPipelineStage() const

Get the max pipeline stage that will exist after the backward pass has been added to the graph.

Returns

max pipeline stage of the graph

Op &getSubgraphAnchorPlaceholder()
inline const decltype(graphs) &getGraphs() const
TensorId createIntermediateTensorId(const TensorId &base_id)
TensorId createSliceTensorId(TensorId base_id, unsigned s, unsigned e)
TensorId createConcatTensorId(TensorId base_id)
GraphId createUniqueSubgraphId(GraphId base_id)
std::vector<std::vector<Op*>> getAccumulateOuterFragmentBinConstraints(const Graph &graph) const
size_t getHash() const
void computeHash(size_t hashSeed)
size_t getIrBundleHash() const
void setIrBundleHash(size_t)
ClonedGraphMaps cloneGraph(GraphId originalGraphId, GraphId newGraphId)

Clone a graph.

The OpIds and TensorIds will differ between the original and the cloned graph. Hence a map between the old OpId and cloned OpId will be returned. The new graph can be obtained by ir.getGraph(newGraphId);

Warning

Does not support cloning of the main graph.

Parameters
  • originalGraphId – The id of the graph to clone

  • newGraphId – The id of the cloned graph

Returns

A struct of maps between the OpIds and TensorIds in the original and new graphs

bool applyPreAliasPattern(const PreAliasPattern*, Graph&)

Public Static Functions

static bool usingEngineCache(const SessionOptions&, const DeviceInfo*)
using popart::HashesMap = std::map<size_t, std::string>
enum class popart::RequireOptimalSchedule

Values:

enumerator Yes = true
enumerator No = false
class Graph

Public Types

enum class CopyInputMarkings

Values:

enumerator Yes = 1
enumerator No = 0
enum class CopyOutputMarkings

Values:

enumerator Yes = 1
enumerator No = 0

Public Functions

Graph(Ir&, const GraphId&)
~Graph()
Graph() = delete
Graph(const Graph&) = delete
const std::map<OpId, std::unique_ptr<Op>> &getOps() const
std::map<OpId, std::unique_ptr<Op>> &getOps()
std::vector<OpId> getOpIds() const
const std::set<int64_t> getAllVirtualGraphIds(bool includeInvalid) const
const std::map<int64_t, int> getVirtualGraphCounts() const
Op *getOp(OpId opId) const

Return a pointer to the Op if it exists.

Throws an error if the Op could not be found.

See also

getOpUnsafe

Parameters

opId – The unique ID of the Op to find

Returns

The Op pointer if found

Op *getOpUnsafe(OpId opId) const

Returns a pointer to the Op if it exists, or nullptr otherwise.

See also

getOp

Parameters

opId – The unique ID of the Op to find

Returns

The Op pointer if found, or nullptr otherwise

const Tensors &getTensors() const
Tensors &getTensors()
Tensor *getTensor(const TensorId&)
void addActGrad(const TensorId&)
void addVarInit(const TensorId &name, const TensorInfo &info, const void *src, const DebugContext &debugContext)

Add a variable to this graph with the provided properties.

Parameters
  • name – The name of the variable.

  • info – The tensor info to create the variable with, including shape and data type.

  • src – The data to initialise the tensor with.

  • debugContext – The debug context to assist with debugging.

void addVarInit(const TensorId &name, const TensorInfo &info, const void *src, const VariableSettings &vs, const DebugContext &debugContext)

As per addVarInit, but passing a VariableSettings object to allow for grouped replicas.

See also

addVarInit(const TensorId &, const TensorInfo &, const void *, const DebugContext &)

Parameters
  • name – The name of the variable.

  • info – The tensor info to create the variable with, including shape and data type.

  • src – The data to initialise the tensor with.

  • vs – The variablesettings to use.

  • debugContext – The debug context to assist with debugging.

void addConstInit(const TensorId&, const TensorInfo&, const void*, const DebugContext&)
void addStream(const TensorId&, const TensorInfo&, const DebugContext&)
inline const Ir &getIr() const
inline Ir &getIr()
inline const TensorId &getLoss() const
inline void setLoss(const TensorId &loss_)
void constructFromOnnxGraph(const ONNX_NAMESPACE::GraphProto &onnx_graph)
Op *growFromNode(const Node &node)
OpId moveIntoGraph(std::unique_ptr<Op> op)
template<typename OP, typename ...Args>
OP *createOp(Args&&... args)
template<typename OP, typename ...Args>
OP *createConnectedOp(const std::map<InIndex, TensorId> &in, const std::map<OutIndex, TensorId> &out, Args&&... args)
std::vector<const Graph*> getCalledGraphs() const
template<typename T>
void connectInputs(const T &inContainer, OpId opId)
template<typename T>
void connectOutputs(const T &outContainer, OpId opId)
void connectInputsFromInputMapWrapper(const InputMapWrapper &in, OpId id)
void connectOutputsFromOutputMapWrapper(const OutputMapWrapper&, OpId opId)
std::map<int, std::unique_ptr<popart::Op>>::iterator eraseOp(OpId id)
void setVarUpdateConstraints()
void setConvFlipWeightConstraints()
std::vector<Op*> getOpSchedule(const OpsBeforeKey&, RequireOptimalSchedule requireOptimalSchedule) const
void freezeSchedule(const OpsBeforeKey &gCons)
bool isSchedulable(const OpsBeforeKey&, bool respectExecutionPhases = false) const
bool hasUserRecomputeOps() const
std::vector<OpSet> getLiveSets(const std::vector<Op*> &topoOps) const
inline const std::vector<TensorId> &getInputIds() const
InIndex getInputIndex(TensorId id) const

Get the index of the graph input with a specific id.

If the id is not a valid input id then a error will be raised.

Parameters

id – Tensor name to find the index for.

Returns

The input index for the specified id, if it exists.

void addInput(const InIndex &index, const TensorId &id, const TensorInfo &info, bool overwrite)

Add a graph input at a specific index in the list.

Parameters
  • index – Force the input to be at the specified index in the graph.

  • id – Tensor name to create and connect

  • info – Tensor info

  • overwrite – Overwrites any existing input at the index if true, otherwise, moves all other inputs by one position

void addInput(const TensorId &id, const TensorInfo &info)

Add a graph input to the end of the list.

Parameters
  • id – Tensor name to create and connect

  • info – Tensor info

void markAsInput(const TensorId&)
TensorId addInput(const TensorInfo&)
Tensor *getInputTensor(InIndex idx) const
inline TensorId getInputId(InIndex idx) const
bool hasInputId(const TensorId &id) const
void removeInput(const TensorId&)
void removeInput(const InIndex&)
inline const std::vector<TensorId> &getOutputIds() const
OutIndex getOutputIndex(TensorId id) const
void markAsOutput(const OutIndex &index, const TensorId &id, bool overwrite)

Mark a graph tensor as graph output at a specific index in the list.

Parameters
  • index – Force the output to be at the specified index in the graph. Overwrites any existing output at the index.

  • id – Tensor in the graph to mark as output

  • overwrite – Overwrites any existing output at the index if true, otherwise, moves all other outputs by one position

void markAsOutput(const TensorId &id)

Mark a graph tensor as graph output at the end of the list.

Parameters

id – Tensor in the graph to mark as output

void removeOutput(const TensorId&)
void removeOutput(const OutIndex&)
inline TensorId getOutputId(OutIndex idx) const
bool hasOutputId(const TensorId &id) const
Tensor *getOutputTensor(OutIndex idx) const
Scope getScope() const
void replaceTensor(const TensorId &oldId, const TensorId &newId)

Replace oldId with newId on any consumers.

Both tensors need to exist.

Parameters
  • oldId – Tensor to disconenct from consumers & graph outputs

  • newId – Tensor to connect from consimers & graph outputs

std::vector<Op*> getCallSiteOps() const
std::vector<Op*> getCallSiteOps(size_t num) const
std::map<OpId, std::unordered_set<OpId>> getEdgeMap() const
inline const std::string &getGraphId() const
std::string getGraphString() const
void copyFrom(const Graph &other, CopyInputMarkings copyInputMarkings = CopyInputMarkings::Yes, CopyOutputMarkings copyOutputMarkings = CopyOutputMarkings::Yes)
std::pair<bool, std::vector<Op*>> getDirectViewChain(Tensor *from, Tensor *to)

Find a chain of view changing ops in the graph from “from” to “to” (if one exists) and return a vector of ops such that op1(op2(…opN(in))) = out for {op1, op1, …, opN}.

If no such chain exists, returns {false, {}};

Parameters
  • from – The tensor to start at

  • to – The tensor to finish at

Returns

std::pair<bool, std::vector<Op *>> The ops along the chain, in order. where the first of the pair is a bool indicating whether the path exists. The second is the vector of ops in order from ‘from’ to ‘to’. Givent the ops are 1-in-1-out, this will also be in schedule order.

void setOnnxToOnnx(std::unique_ptr<onnxpasses::IOnnxToOnnx>)

Set the object which will perform the ONNX -> ONNX transformation, which happens early on in the Graph constructor.

The default object, which is used if this method is not called, is an instance of the onnxpasses::Canonnxalizer class, which performs a set of required transformations, such as decomposing ASinh into more basic Nodes.

void finalizeSchedule()

Finalizes the graph schedule.

Schedule cannot change anymore after this was called. Calling finalize multiple times results in an error.

inline void removeIsolatedTensors(bool retainUsedIOTensors = false, bool retainAllIOTensors = false, bool retainVarTensors = false, bool retainConstTensors = false)
inline bool canBeRecursivelyAutodiffed() const

If this graph X is called in graph Y, when applying autodiff to Y, is it safe to autodiff X?

inline void setCanBeRecursivelyAutodiffed(bool value)

Public Members

std::unique_ptr<TopoCons> topoCons
const GraphId id

Public Static Attributes

static const int64_t NoVGraph
class AiOnnxMlOpset1 : public popart::DomainOpSet

Class that represents the AI ONNX ML opset.

Public Functions

inline AiOnnxMlOpset1(std::unique_ptr<BuilderImpl> &impl_)

Constructor for the AiOnnxMlOpset1 class.

Parameters

impl_ – A pointer to an implementation of the Builder class.

class AiGraphcoreOpset1 : public popart::DomainOpSet

Class that represents the AI Graphcore opset.

Public Functions

inline AiGraphcoreOpset1(std::unique_ptr<BuilderImpl> &impl_)

Constructor for the AiGraphcoreOpset1 class.

Parameters

impl_ – A pointer to an implementation of the Builder class.

TensorId copyvarupdate(const std::vector<TensorId> &args, const DebugContext &debugContext = {})

Copies a tensor to an initalised tensor (variable).

This is used to update an initalised tensor (a variable created using addInitializedInputTensor()) which retains its value between iterations, by setting the value to the value of another tensor (the updater). The purpose is to manually update the tensor in use cases for variables other than trained parameters (weights) or tensors used by other ops.

Parameters
  • args – A vector of the input tensor ids containing the tensor to be updated, tensor and the tensor containing the values for the update, updater as [tensor, updater].

  • debugContext – Optional debug information.

Returns

An alias to the updated variable: to ensure correct ordering of the updated variable, you should use this variable for any op which should operate on the updated variable.

std::vector<TensorId> batchnormalization(const std::vector<TensorId> &args, unsigned num_outputs, float epsilon = 1e-05f, float momentum = 0.9f, const popart::DebugContext &debugContext = {})

Add a batch normalization operation to the model.

This version uses N-1 as the population size for calculating running variance (like PyTorch). PyTorch BatchNorm1d

Whereas, the Onnx version uses N. ONNX version

Parameters
  • args – List of input tensor ids

  • num_outputs – The number of output tensor ids

  • epsilon – The ‘epsilon’ attribute

  • momentum – The ‘momentum’ attribute

  • name – Optional identifier for the operation

Returns

A list of normalized output tensors

std::vector<TensorId> groupnormalization(const std::vector<TensorId> &args, int64_t num_groups, float epsilon = 1e-05f, const DebugContext &debugContext = {})

Add a group normalization operation to the model.

This is a Poplar extension.

The group will be created from a strided input.

Parameters
  • args – A vector of input tensor ids for input data x, scale scale, and bias bias as [x, scale, bias].

  • num_groups – The number of groups to separate the channels into.

  • epsilon – The epsilon value to use to avoid division by zero.

  • debugContext – Optional debug information.

Returns

A vector of output tensor ids for output data y, the mean mean and the variance var as [y, mean, var].

std::vector<TensorId> multiconv(const MultiConvInputs &tensors, const MultiConvDilations &dilations = {}, const MultiConvDilations &inDilations = {}, const MultiConvPads &pads = {}, const MultiConvPads &outPads = {}, const MultiConvStrides &strides = {}, const std::vector<float> &availableMemoryProportions = {}, const std::vector<std::string> &partialsTypes = {}, const nonstd::optional<std::string> planType = nonstd::nullopt, const nonstd::optional<int> perConvReservedTiles = nonstd::nullopt, const nonstd::optional<float> cycleBackOff = nonstd::nullopt, const std::vector<int64_t> enableConvDithering = {}, const DebugContext &debugContext = {})

Add a multi-convolution operation to the model.

Using this multi-convolution API ensures that the convolutions are executed in parallel on the device.

Functionally, a multi-convolution is equivalent to a series of single convolutions. Using this multi-convolution API is always equivalent to calling the single-convolution API (conv) once for each argument.

For example, calling:

A0 = conv({X0, W0, B0})
A1 = conv({X1, W1})

is functionally equivalent to calling:

{A0, A1} = multiconv({{X0, W0, B0}, {X1, Q1}).

It is possible that any two convolutions cannot be executed in parallel due to topological constraints. For example, the following:

B = conv({A, W0});
C = B + A
D = conv({C, W1});

cannot be converted to:

{B, D} = multiconv({{A, W0}, {C, W1}}).

Note that it is not possible to create such a cycle by adding a multi-convolution with this API.

Calls to multiconv() are mapped to poplar::poplin::multiconv::convolution().

All input vectors must be either empty, or equal in length to the number of convolutions. Note that groups for each convolution are automatically inferred from the shapes of the data and weight inputs.

See also

Optimising Temporary Memory Usage for Convolutions and Matmuls on the IPU for some practical examples of using availableMemoryProportion.

Parameters
  • tensors – List of tensor ids for input tensors for data, weights and biases as [data, weight,bias] for each convolution. bias is optional.

  • dilations – The dilations attributes for each convolution.

  • inDilations – The input dilations attributes for each convolution.

  • pads – The pads for each convolution.

  • outPads – The output padding for each convolution.

  • strides – The strides for each convolution.

  • availableMemoryProportions – The available memory proportions per convolution, each [0, 1).

  • partialsTypes – The partials type per convolution.

  • planType – Run convolutions in parallel or series.

  • perConvReservedTiles – The number of tiles to reserve per convolution when planning.

  • cycleBackOff – Cycle back-off proportion, [0, 1).

  • enableConvDithering – Enable convolution dithering per convolution. If true, then convolutions with different parameters will be laid out from different tiles in an effort to improve tile balance in models.

  • debugContext – Optional debug information.

Returns

A vector of tensor ids of the output tensor from each convolution.

TensorId subsample(const std::vector<TensorId> &args, const std::vector<int64_t> &strides, const DebugContext &debugContext = {})

Add a sub-sample operation to the model.

This is a Poplar extension.

If multiple tensors are provided, the strides will be applied to them all.

Parameters
  • args – A vector of tensor ids to sub-sample.

  • strides – The strides to use.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

TensorId printtensor(const std::vector<TensorId> &args, int64_t print_gradient = 1, const DebugContext &debugContext = {}, const std::string &title = {}, const int summariseThreshold = 1000, const int edgeItems = 3, const int maxLineWidth = 75, const int digits = 8, const int floatFormat = 0, const char separator = ' ', const char openBracket = '[', const char closeBracket = ']')

Add a print tensor operation to the model.

This is a Poplar extension.

Parameters
  • args – A vector of tensor ids to print.

  • print_gradient – Indicates whether the gradient tensor(s) associated with the input tensor(s) are also printed. If 1, the gradient tensor(s) are also printed, otherwise the gradient tensor(s) are not printed.

  • debugContext – Optional debug information.

  • title – An optional title to print.

  • summariseThreshold – (default 1000) If the number of elements of the tensor exceeds this threshold the output will be summarised. Only the edge elements will be displayed with an ellipsis indicating skipped elements. A value of 0 will disable summarisation.

  • edgeItems – (default 3) number of edge elements to include at the beginning and end when summarisation is enabled

  • maxLineWidth – (default 75) lines longer than this limit will be split across multiple lines. A value of 0 will disable line splitting.

  • digits – (default 8) number of digits to display. For integers this limit can be exceeded if any number is large enough. For floating points this does not include the exponent. The number of digits is used in conjunction analysis of the tensor to determine the width of each element to align all elements when printed. A value of 0 disables this analysis and each elements will be printed in an unaligned format.

  • floatFormat – (default 0=Auto) determines the floating point format to use. 0=auto, 1=fixed, 2=scientific 3=none. Automatic mode determines the appropriate format based on the data. If digits==0 this option is disregarded and the floatFormat is set to none.

  • separator – (default space) character used to delininate values.

  • openBracket – (default square bracket) character used to open a tensor.

  • closeBracket – (default square bracket) character used to close a tensor.

Returns

The tensor id of the result tensor.

TensorId nop(const std::vector<TensorId> &args, const DebugContext &debugContext = {})

Add a no-op operation to the model.

Parameters
  • args – A vector of input tensor ids.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

TensorId normalize_image(const std::vector<TensorId> &args, float scale, const DebugContext &debugContext = {})

Normalize image and pad it from 3 channels to 4 channels.

The input channel must be in the last dimension.

Parameters
  • args – Contains the image input, offsets, scales input tensors as required by Poplibs

  • scale – the scale to apply

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

TensorId scale(const std::vector<TensorId> &args, float scale, const DebugContext &debugContext = {})

Add a scale operation to the model.

This is a Poplar extension.

Parameters
  • args – A vector of input tensor ids.

  • scale – The scale to apply.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

TensorId scaledadd(const std::vector<TensorId> &args, float scale0, float scale1, const DebugContext &debugContext = {})

Add a scaled add operation to the model.

The scaled add operation takes the form:

X = scale0 * T0 + scale1 * T1

where scale0 is the scale factor to be applied to tensor \T0 and scale1 is the scale factor to be applied to tensor \T1.

Parameters
  • args – A vector of input tensor ids: [T0, T1, scale0, scale1].

  • scale0 – The scale to apply (if no scale0 tensor is supplied).

  • scale1 – The scale to apply (if no scale1 tensor is supplied).

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

std::vector<TensorId> lstm(const std::vector<TensorId> &args, int64_t outputFullSequence, const DebugContext &debugContext = {})
TensorId gelu(const std::vector<TensorId> &args, const DebugContext &debugContext = {})

Add a GELU operation to the model.

This is a Poplar extension.

Parameters
  • args – A vector of input tensor ids.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

TensorId geluerf(const std::vector<TensorId> &args, const DebugContext &debugContext = {})

Add an accurate GELU (ERF instead of TANH) operation to the model.

Parameters
  • args – A vector of input tensor IDs.

  • debugContext – Optional debug information.

Returns

The tensor ID of the result tensor.

TensorId detach(const std::vector<TensorId> &args, const DebugContext &debugContext = {})

Add a detach operation to the model.

Parameters
  • args – A vector of input tensor ids.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

TensorId depthtospace(const std::vector<TensorId> &args, int64_t blocksize, const std::string &mode = "DCR", const DebugContext &debugContext = {})

Add a depth-to-space operation to the model.

This allows DepthToSpace_11 to be targeted from earlier opsets.

The purpose of a depth-to-space operation, also known as pixel shuffling, is to rearrange data from the depth (channels) dimension into the spatial (width and height) dimensions. It is an efficient means of learning upsampling alongside mixing convolution with bilinear interpolation and using transpose convolution.

Parameters
  • args – A vector containing a single tensor id of the input tensor of shape [N,C,H,W], where N is the batch axis, C is the channel or depth, H is the height and W is the width.

  • blocksize – The size of the blocks to be moved. If the input is [N, C, H, W] and the blocksize is B, the output will be [N, C/(B*B), H*B, W*B].

  • mode – Specifies how the data is rearranged:

    • ”DCR” (Default): depth-column-row order

    • ”CRD”: column-row-depth order

  • debugContext – Optional debug information.

Returns

A tensor which is a rearrangement of the input tensor.

TensorId round(const std::vector<TensorId> &args, const DebugContext &debugContext = {})

Add a rounding operation to the model.

This allows Round_11 to be targeted from earlier opsets.

Parameters
  • args – A vector of input tensor ids.

  • debugContext – Optional debug information.

Returns

The normalized output tensor ids.

TensorId init(Attributes::Ints shape, Attributes::Int data_type, Attributes::Int init_type, Attributes::Int batch_axis, const DebugContext &debugContext = {})

Add an init operation to the model.

Parameters
  • shape – The shape of the tensor to initialise.

  • data_type – The data type to initialise tensor with. The value is the integer attribute taken from the DataType enum.

  • init_type – The mode of the tensor initialisation. The value is the integer attribute taken from the InitType enum.

  • batch_axis – Batch axis specifies the axis that the batches are split along and is a literal integer.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

TensorId init(Attributes::Ints shape, Attributes::Int data_type, Attributes::Int init_type, const DebugContext &debugContext = {})

Add an init operation to the model.

Parameters
  • shape – The shape of the tensor to initialise.

  • data_type – The data type to initialise tensor with. The value is the integer attribute taken from the DataType enum.

  • init_type – The mode of the tensor initialisation. The value is the integer attribute taken from the InitType enum.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

TensorId dynamicslice(const std::vector<TensorId> &args, Attributes::Ints axes, Attributes::Ints sizes, Attributes::Int noOverlap, const DebugContext &debugContext = {})

Add a dynamic slice operation to the model.

Creates a new slice tensor, slice, at offset position, offset, in a tensor, tensor. For example:

slice = tensor[offset]

Parameters
  • args – A vector of input tensor ids: [tensor, offset].

  • axes – The axes along which to slice.

  • sizes – The size of the slice along each axis.

  • noOverlap – Indicates whether the slice regions overlap or not. If 1, slice regions do not overlap, otherwise they do overlap.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

TensorId dynamicupdate(const std::vector<TensorId> &args, Attributes::Ints axes, Attributes::Ints sizes, Attributes::Int noOverlap, const DebugContext &debugContext = {})

Add a dynamic update operation to the model.

Creates a copy of a tensor, tensor, and updates the elements of the copied tensor at offset position, offset, with the elements contained in the slice tensor, slice, For example:

out = tensor
out[offset] = slice

Parameters
  • args – A vector of input tensor ids: [tensor, offset, slice].

  • axes – The axes along which to update.

  • sizes – The size of the slice along each axis.

  • noOverlap – Indicates whether the updates overlap or not. If 1, the updates do not overlap, otherwise they do overlap.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

TensorId dynamiczero(const std::vector<TensorId> &args, Attributes::Ints axes, Attributes::Ints sizes, const DebugContext &debugContext = {})

Add a dynamic zero operation to the model.

Creates a copy of a tensor, tensor, with a slice tensor at offset position, offset set to zero. For example:

out = tensor
out[offset] = 0.0

Parameters
  • args – A vector of input tensor ids: [tensor, offset].

  • axes – The axes along which to zero elements.

  • sizes – The size of the slice along each axis.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

TensorId dynamicadd(const std::vector<TensorId> &args, Attributes::Ints axes, Attributes::Ints sizes, const DebugContext &debugContext = {})

Add a dynamic add operation to the model.

Creates a copy of a tensor, tensor, with a slice tensor, slice, added at an offset position, offset. For example:

out = tensor
out[offset] += slice

Parameters
  • args – A vector of input tensor ids: [tensor, offset, slice].

  • axes – The axes along which to add the slice.

  • sizes – The size of the slice along each axis.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

TensorId sequenceslice(const std::vector<TensorId> &args, Attributes::Int zeroUnused, const DebugContext &debugContext = {})

Slice a 2D tensor based on offsets.

The outermost dimension is sliced. For the following:

  • source is the source tensor.

  • destination is the destination tensor.

  • N is the number of elements to copy.

  • sourceOffset is the first element read from the source tensor.

  • destinationOffset is the first element written to in the destination tensor. Then, for each entry in N, sourceOffset and destinationOffset:

    destination[destinationOffset:destinationOffset+N][...] =
    source[sourceOffset:sourceOffset+N][...]
    

Entries after the first N==0 may be ignored. Unreferenced elements of destination are zeroed if zeroUnused is set. The same output element should not be written by multiple inputs.

source and destination must have rank greater than or equal to 2. The outer dimension is sliced; the product of the inner dimensions must match. sourceOffset, destinationOffset and N must be 1-dimensional and of the same size. For example:

N = [1, 1, 1]
sourceOffset = [0, 2, 4]
destinationOffset = [0, 1, 2]

Parameters
  • args – A vector of input tensor ids for the following tensors [source, destination, N, sourceOffset, destinationOffset].

  • zeroUnused – Determines whether to zero unreferenced destination elements. If 1, the unreferenced elements are zeroed, otherwise they are not zeroed.

  • debugContext – Optional debug information.

std::vector<TensorId> call(const std::vector<TensorId> &args, unsigned num_outputs, const Builder &callee, const DebugContext &debugContext = {})

Add a call operation to the model.

This is a Poplar extension, to expose manual code re-use to the builder.

Parameters
  • args – A vector of input tensor ids.

  • callee – The subgraph to call into.

  • debugContext – Optional debug information.

Returns

A vector of tensors; the subgraph outputs.

TensorId replicatedallreduce(const std::vector<TensorId> &args, const nonstd::optional<std::vector<int64_t>> &commGroup = nonstd::nullopt, const DebugContext &debugContext = {})

DEPRECATED: Add a replicated allreduce operation to the model.

This is a Poplar extension, to expose manual code re-use to the builder.

Parameters
  • args – A vector of input tensor ids to reduce across.

  • commGroup – GCL CommGroup parameter.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

TensorId replicatedallreduce(const std::vector<TensorId> &args, const nonstd::optional<CollectiveOperator> &collectiveOperator = nonstd::nullopt, const nonstd::optional<CommGroup> &commGroup = nonstd::nullopt, const DebugContext &debugContext = {})

Add a replicated allreduce operation to the model.

This is a Poplar extension, to expose manual code re-use to the builder.

Parameters
  • args – A vector of input tensor ids to reduce across

  • collectiveOperator – A Graphcore Communication Library (GCL) collective operator.

  • commGroup – A GCL CommGroup parameter.

  • debugContext – Optional debug information

Returns

The tensor id of the result tensor.

TensorId replicatedreducescatter(const std::vector<TensorId> &args, const nonstd::optional<CollectiveOperator> &collectiveOperator = nonstd::nullopt, const nonstd::optional<CommGroup> &commGroup = nonstd::nullopt, const DebugContext &debugContext = {})

Add a replicated reduce-scatter operation to the model.

This is a Poplar extension, to expose manual code re-use to the builder.

Parameters
  • args – A vector of input tensor ids to reduce across.

  • collectiveOperator – A Graphcore Communication Library (GCL) collective operator.

  • commGroup – A GCL CommGroup parameter.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

TensorId l1loss(const std::vector<TensorId> &args, const float lambda, const ReductionType reduction = ReductionType::Mean, const DebugContext &debugContext = {})

Add an l1 loss operation to the model.

Calculates the mean absolute error between each element in the input with a zero target.

Parameters
  • args – A vector of input tensor ids.

  • lambda – The scale factor of the L1 loss.

  • reduction – The type of reduction to perform on the individual losses.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

TensorId nllloss(const std::vector<TensorId> &args, const ReductionType reduction = ReductionType::Mean, const nonstd::optional<int> ignoreIndex = nonstd::nullopt, bool inputIsLogProbability = false, const DebugContext &debugContext = {})

Add a negative log-likelihood loss operation to the model.

Calculates the negative log likelihood (NLL) loss given a probability tensor over classes, and a target tensor containing class labels.

Parameters
  • args – A vector of input tensor ids: probability and tensor.

  • reduction – The type of reduction to perform on the individual losses.

  • ignoreIndex – Optional class index to ignore in loss calculation.

  • inputIsLogProbability – If true the input tensor contains log-probabilities, otherwise raw probabilities. Default = false.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

TensorId identityloss(const std::vector<TensorId> &args, const ReductionType reduction = ReductionType::Mean, const DebugContext &debugContext = {})

Add an identity loss operation to the model.

Calculates the loss using the identity operator.

Parameters
  • args – A vector of input tensor ids.

  • reduction – The type of reduction to perform on the individual losses.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

TensorId tensorremap(const std::vector<TensorId> &args, Attributes::Int remap_type, const DebugContext &debugContext = {})

Add a tensor remap operation to the model.

Changes the tensor layout to conform to the downstream consumers, which means the consumers can read the tensor without having to rearrange it.

Parameters
  • args – The tensor id of the tensor to remap. This is a single tensor that should be copied to a new tensor with a tensor layout conforming to the downstream consumer.

  • remap_type – The type of remap to perform on the forward/backward pass. Backward pass remapping requires the op to exist in the IR before autodiff. The value is the integer attribute value of the enum TensorRemapType.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

TensorId ctcloss(const std::vector<TensorId> &args, const ReductionType reduction = ReductionType::Mean, const unsigned blank = 0, const std::string &outDataType = "UNDEFINED", const bool zeroInfinity = false, const DebugContext &debugContext = {})

Add a connectionist temporal classification (CTC) loss operation to the model.

With maximum input length T, batch size N, number of classes C and maximum target length S, this op calculates the CTC loss for a logarithmised probabilities tensor with shape [T, N, C], a class target tensor with shape [N, S], an input lengths tensor [N] and a target lengths tensor [N].

Note that C includes a blank class (default=0). The probabilities tensor is padded as required. Target sequences are also padded and are populated with values less than or equal to C, not including the blank class, up to their respective target lengths. Note that target lengths cannot exceed input lengths.

Parameters
  • args – A vector of input tensor ids [log_probs,targets, input_lengths, target_lengths].

  • reduction – The type of reduction to perform on the individual losses.

  • blank – The integer representing the blank class.

  • outDataType – The data type of the output tensors. Default = UNDEFINED.

  • zeroInfinity – If true infinite losses and the associated gradients are zeroed-out. Default = false.

  • debugContext – Optional debug information

Returns

The tensor id of the result tensor.

std::vector<TensorId> _ctcloss(const std::vector<TensorId> &args, const ReductionType reduction = ReductionType::Mean, const unsigned blank = 0, const std::string &outDataType = "UNDEFINED", const bool zeroInfinity = false, const DebugContext &debugContext = {})
std::vector<TensorId> ctcbeamsearchdecoder(const std::vector<TensorId> &args, unsigned blank = 0, unsigned beamWidth = 100, unsigned topPaths = 1, const DebugContext &debugContext = {})

Add a connectionist temporal classification (CTC) beam search decoder operation to the model.

Calculate the most likely topPaths labels and their probabilities given the input logProbs with lengths dataLengths.

Parameters
  • args – A vector of input tensor ids. These are [logProbs, dataLengths], where logProbs is of shape [maxTime, batchSize, * numClasses], and dataLengths is of shape [batchSize].

  • blank – The integer representing the blank class.

  • beamWidth – The number of beams to use when decoding.

  • topPaths – The number of most likely decoded paths to return, must be less than or equal to beamWidth.

  • debugContext – Optional debug information.

Returns

The names of the result tensors. These are [labelProbs, labelLengths,decodedLabels], where labelProbsis of shape [batchSize,topPaths],labelLengthsis of shape [batchSize, topPaths], anddecodedLabelsis of shape [batchSize, topPaths,maxTime`].

TensorId shapeddropout(const std::vector<TensorId> &args, const std::vector<int64_t> &shape, float ratio = 0.5f, const DebugContext &debugContext = {})

Add a shaped dropout operation to the model.

Applies a shaped dropout to the input tensor. This operator requires a shape parameter that is used to define the shape of the dropout mask so that strongly correlated features in the input tensor can be preserved. The provided shape must be broadcastable to the input tensor. Note that this operation targets the poprand library function of the same name.

Parameters
  • args – A vector of input tensor ids.

  • shape – The shape of dropout mask. This must be broadcastable to the input.

  • ratio – The probability of dropping an input feature. Default = 0.5.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

TensorId atan2(const std::vector<TensorId> &args, const DebugContext &debugContext = {})

Add an atan2 operation to the model.

Returns the element-wise angle theta as a tensor. For \( -\pi < \theta \le \pi \), such that for two input tensors \(x\) and \(y\) and given \( r \ne 0 \), then \( x = r \cos\theta \), and \( y = r \sin\theta \), element-wise.

In the case of \( x > 0 \) , \( \theta = arctan(y/x)\) .

Parameters
  • args – A vector of input tensor ids: [y, x].

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

TensorId expm1(const std::vector<TensorId> &args, const DebugContext &debugContext = {})

Add a expm1 operation to the model.

This calculates the element-wise exponential of the input tensor and subtracts one: \( exp(x) - 1 \).

Parameters
  • args – A vector of input tensor ids.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

TensorId log1p(const std::vector<TensorId> &args, const DebugContext &debugContext = {})

Add a log1p operation to the model.

This calculates the element-wise logarithm of the input tensor plus one: \( log(x + 1) \).

Parameters
  • args – A vector of input tensor ids.

  • name – Optional identifier for operation.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

TensorId reshape(const TensorId &arg, const Attributes::Ints &shape, const DebugContext &debugContext = {})

Add a reshape operation to the model.

This reshapes an input tensor. This reshape takes the target shape as an attribute instead of a tensor input as for the ONNX reshape op.

Parameters
  • arg – The tensor id of the input tensor.

  • shape – The shape of the output tensor. The output tensor must contain the same number of elements as the input tensor.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

TensorId fmod(const std::vector<TensorId> &args, const DebugContext &debugContext = {})

Add an fmod operation to the model.

This is equivalent to the C fmod function. The result has the same sign as the dividend.

Parameters
  • args – A vector of input tensor ids.

  • debugContext – Optional debug information.

Returns

Computes the element-wise remainder of division. The remainder has the same sign as the dividend.

TensorId remainder(const std::vector<TensorId> &args, const DebugContext &debugContext = {})

Add a remainder operation to the model.

This is equivalent to Python’s modulo operator %. The result has the same sign as the divisor.

Parameters
  • args – A vector of input tensor ids.

  • debugContext – Optional debug information.

Returns

Computes the element-wise remainder of division. The remainder has the same sign as the divisor.

TensorId reverse(const std::vector<TensorId> &args, const std::vector<int64_t> &dimensions, const DebugContext &debugContext = {})

Add a reverse operator to the model.

This reverses or flips the tensor along the specified dimensions.

Parameters
  • args – A vector of input tensor ids.

  • dimensions – The dimensions along which to reverse the tensor. If this is empty then this is equivalent to the identity operator.

  • debugContext – Optional debug information.

Returns

The tensor id of the reversed tensor.

TensorId slice(const std::vector<TensorId> &args, const std::vector<int64_t> &ends, const std::vector<int64_t> &starts, const std::vector<int64_t> &axes = std::vector<int64_t>(), const popart::DebugContext &debugContext = {})

Add a slice to the model.

This version of slice uses the starts, ends and axes attributes rather than tensor inputs. This reduces the number of ops as constant tensors are treated as ops while attributes are not.

Parameters
  • args – A vector of input tensor ids.

  • ends – The ends attribute.

  • starts – The starts attribute.

  • axes – The axes attribute.

  • debugContext – Optional debug information.

Returns

The normalized output tensor id.

TensorId packedDataBlock(const std::vector<TensorId> &args, const std::vector<int64_t> &maxSequenceLengths, int64_t resultSize, int64_t callbackBatchSize, const Builder &callback, const DebugContext &debugContext = {})

Add a packedDataBlock operator to the model.

Unpack packed sequences of data and call the callback function on the unpacked sequences.

Parameters
  • args – A vector of input tensor ids.

  • maxSequenceLengths – The maximum length of a sequence in each of the data inputs.

  • resultSize – The size of the first dimension of the result tensor.

  • callbackBatchSize – The number of batches to pass to the callback.

  • callback – The callback function.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

void abort(const std::vector<TensorId> &args, const DebugContext &debugContext = {})

Add an abort operation to the model.

The operation can be conditional or unconditional.

Parameters
  • args – A vector of input tensor ids.

  • debugContext – Optional debug information.

TensorId bitwisenot(const std::vector<TensorId> &args, const DebugContext &debugContext = {})

Add a bitwise NOT operation to the model.

The operation computes the bitwise NOT of an integer tensor.

Parameters
  • args – An input tensor of type integer.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

TensorId bitwiseand(const std::vector<TensorId> &args, const DebugContext &debugContext = {})

Add a bitwise AND operation to the model.

The operation computes the bitwise AND of two integer tensors.

Parameters
  • args – Two broadcastable input tensors of type integer.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

TensorId bitwiseor(const std::vector<TensorId> &args, const DebugContext &debugContext = {})

Add a bitwise OR operation to the model.

The operation computes the bitwise OR of two integer tensors.

Parameters
  • args – Two broadcastable input tensors of type integer.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

TensorId bitwisexor(const std::vector<TensorId> &args, const DebugContext &debugContext = {})

Add a bitwise XOR operation to the model.

The operation computes the bitwise XOR of two integer tensors.

Parameters
  • args – Two broadcastable input tensors of type integer.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

TensorId bitwisexnor(const std::vector<TensorId> &args, const DebugContext &debugContext = {})

Add a bitwise XNOR operation to the model.

The operation computes the bitwise XNOR of two integer tensors.

Parameters
  • args – Two broadcastable input tensors of type integer.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

std::vector<TensorId> reducemedian(const std::vector<TensorId> &args, const nonstd::optional<std::vector<int64_t>> &axes = nonstd::nullopt, int64_t keepdims = 1, const DebugContext &debugContext = {})

Add reducemedian operation to the model.

This method computes the median values along the specified axes. In the case of an even number of elements, the lower of the two medians is selected. By default, the input tensor is reduced over all axes. Additionally, the operation also returns the indices of found median values in the reduction axis. If reduction is performed over multiple axes, the indices are “flattened” over the reduced axes, similar to numpy.ndarray.flat. The index may not be the first occurrence of the median value found in the input tensor.

Parameters
  • args – A vector with a single input tensor id.

  • axes – The axes over which the reduction is performed.

  • keepdims – If 1, the result tensors are of equal size as the input, but with reduction axes of size 1. Otherwise, the reduction axes are squeezed and the result tensors have fewer dimensions compared to the input. Default = 1.

  • debugContext – Optional debug information.

Returns

The names of the two result tensors, one for median values and one for indices.

TensorId groupedgather(const std::vector<TensorId> &args, Attributes::Int axis = 0, Attributes::Int group_size = 1, const DebugContext &debugContext = {})
TensorId groupedscatterreduce(const std::vector<TensorId> &args, Attributes::Int axis_size, Attributes::Int axis = -1, ScatterReduction reduction = ScatterReduction::Sum, Attributes::Int group_size = 1, Attributes::Int enable_index_broadcast = 1, const DebugContext &debugContext = {})

Add a grouped scatterreduce operation to the model.

Reduces all the values from the source tensor src at the indices specified along the given axis by index for each group. In some frameworks this is also known as a split-apply-combine operation as well as a reduce or aggregate by key. In this analogy the src input is the data we are splitting and the indices define the groups for the reduction operation.

In pseudocode the operator can be expressed as:

for g in range(group_size):
    for i in range(axis_size):
        output[g][i] = reduce(src[g][index == i])

where the looping over output indices is implicitly handled by poplar.

Parameters
  • args – A vector of tensor ids as [src, index, initial_values]. initial_values is optional and if omitted the output will be initialised based on the selected reduction type. For example, a tensor of zeros is used to initialise the output tensor for ScatterReduction::Sum.

  • axis_size – The size of the reduced axis.

  • axis – The axis to reduce along. Default = -1.

  • reduction – The type of reduction to apply. Default = ScatterReduction::Sum.

  • group_size – The number of groups to reduce. Default = 1.

  • enable_index_broadcast – If 1

    index will be broadcasted to match”

    `data` tensor size, otherwise (`0`) its size will remain unchanged.” Default = 1.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

TensorId scatterreduce(const std::vector<TensorId> &args, Attributes::Int axis_size, Attributes::Int axis = -1, ScatterReduction reduction = ScatterReduction::Sum, Attributes::Int enable_index_broadcast = 1, const DebugContext &debugContext = {})

Add a scatterreduce operation to the model.

Reduces all the values from the source tensor src at the indices specified along the given axis by index. In some frameworks this is also known as a split-apply-combine operation as well as a reduce or aggregate by key. In this analogy the src input is the data we are splitting and the indices define the groups for the reduction operation.

In pseudocode the operator can be expressed as:

for i in range(axis_size):
    output[i] = reduce(src[index == i])

where the looping over output indices is implicitly handled by poplar.

Parameters
  • args – A vector of tensor ids as [src, index, initial_values]. initial_values is optional and if omitted the output will be initialised based on the selected reduction type. For example, a tensor of zeros is used to initialise the output tensor for ScatterReduction::Sum.

  • axis_size – The size of the reduced axis.

  • axis – The axis to reduce along. Default = -1.

  • reduction – The type of reduction to apply. Default = ScatterReduction::Sum.

  • enable_index_broadcast – If 1

    index will be broadcasted to match”

    `data` tensor size, otherwise (`0`) its size will remain unchanged.” Default = 1.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

TensorId swish(const std::vector<TensorId> &args, const DebugContext &debugContext = {})

Add a swish operation to the model.

The operation computes the swish activation function, also known as the SiLU activation.

Parameters
  • args – A vector with a single input tensor id.

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

TensorId incrementmod(const std::vector<TensorId> &args, Attributes::Float increment, Attributes::Float modulus, const DebugContext &debugContext = {})

Add an incrementmod operation to the model.

The operation is of the form y = (x + increment) % modulus.

Parameters
  • args – A vector with a single input tensor id.

  • increment – A scalar increment

  • modulus – A scalar modulus

  • debugContext – Optional debug information.

Returns

The tensor id of the result tensor.

TensorId bucketize(const std::vector<TensorId> &args, Attributes::Int right = 0, const DebugContext &debugContext = {})

Add a bucketize operation to the model.

The operation returns the indices of the buckets to which each value in the input tensor belongs. The ranges of each bucket are defined by the boundaries tensor. The returned index satisfies the following rules:

right == 1: boundaries[i-1] <= input[m][n]…[l][x] < boundaries[i] right == 0: boundaries[i-1] < input[m][n]…[l][x] <= boundaries[i]

Parameters
  • args – A vector of tensor IDs containing [input, boundaries]. Where

    • input is an N-D tensor or a scalar containing the search values

    • boundaries is a 1-D tensor defining ranges of the buckets. This must contain a monotonically increasing sequence.

  • right – If 0 (default) then the left boundary is closed.

Returns

The tensor ID of the result tensor. The result tensor has the same size and shape as the input tensor.

std::vector<TensorId> sort(const std::vector<TensorId> &args, Attributes::Int axis = -1, Attributes::Int descending = 0, Attributes::Int stable = 0, const popart::DebugContext &debugContext = {})

Add a sort operation to the model.

Parameters
  • args – A vector with a single input tensor id.

  • axis – The dimension to sort along.

  • descending – If ‘1’ then the elements are sorted in descending order by value.

  • stable – If ‘1’ then the sorting routine becomes stable, preserving the order of equivalent elements.

Returns

A vector of (values, indices) is returned, where the values are the sorted values and indices are the indices of the elements in the original input tensor.

TensorId nearbyint(const std::vector<TensorId> &args, const DebugContext &debugContext = {})

Add a nearby int rounding operation to the model.

Rounds the floating-point argument to an integer value in floating-point format.

Parameters
  • args – A vector of input tensor ids.

  • debugContext – Optional debug information.

Returns

The normalized output tensor ids.

std::vector<TensorId> splinebasis(const std::vector<TensorId> &args, Attributes::Int degree = 1, const DebugContext &debugContext = {})

Add a splinebasis operation to the model.

The operation returns two outputs: coefficients for the B-spline basis functions and weight indices for each spline coefficient.

Parameters
  • args – A vector of tensor IDs containing [pseudo, kernel_size, is_open_spline]. where

    • pseudo is a 2-D tensor with pseudo coordinates, of shape [numEdges * numDims].

    • kernel_size is a 1-D tensor containing the kernel size at each dimension of the edge pseudo coordinates.

    • is_open_slice is a 1-D tensor that for each dimension encodes whether an open or a closed B-spline basis function must be used.

  • degree – The degree of the B-spline basis function.

Returns

The basis and weightIndex tensors, both of shape [numEdges * numSplines]. basis contains the coefficients for the B-spline basis functions. weightIndex contains weight indices for each spline.

TensorId splineweighting(const std::vector<TensorId> &args, const DebugContext &debugContext = {})

Add a splineweighting operation to the model.

The operation returns features weighted by a continuous B-spline kernel function.

Parameters

args – A vector of tensor IDs containing [input, weight, basis weightIndex]. where

  • input is a 2-D tensor (size: [numEdges * numInputChannels]) with input features.

  • weight is a 3-D tensor (size: [numEdges * numInputChannels * numOutputChannels]) containing weights for B-Spline functions.

  • basis is a 2-D tensor (size: [numEdges * numSplines]) of the coefficients for the B-spline basis functions and is produced by the splinebasis op.

  • weightIndex is a 2-D tensor (size: [numEdges * numSplines]) of the weight indices produced by the splinebasis op.

Returns

A tensor of shape [numEdges * numOutputChannels] containing features weighted by a continuous B-spline kernel function.

#include <popart/scope.hpp>
class Scope

Public Functions

inline bool empty() const
void pop()
Scope getCommonParent(const Scope&) const
inline size_t depth() const
bool operator==(const Scope&) const
bool operator!=(const Scope&) const
std::string str() const
Scope operator/(const std::string &name) const
inline operator std::string()
bool isSubscope(const Scope&) const
const std::vector<std::string> getScopeNames() const

Public Static Functions

static inline std::string delimiter()
static Scope getCommonParent(const std::vector<Op*>&)

14.6. Data flow

#include <popart/dataflow.hpp>
enum class popart::AnchorReturnTypeId

Class that defines the identifiers for the return type of the anchor tensors.

An anchor tensor is a tensor that the user wants returned after a call to Session::run(). Each call to Session::run() results in batchesPerStep x accumulationFactor x replicationFactor of anchor tensors being computed. The samples associated with each computation is called a micro batch. The dimensions are user-specified with the following parameters:

This enum type describes the strategy with which the micro batch values for anchor tensors (or their summaries) are written or to the IStepIO instance passed to Session::run.

NOTE: Anchors are essentially what TensorFlow calls “fetches”.

See also

AnchorReturnType.

Values:

enumerator Final = 0

Only return the tensor value for the last micro batch of the Session::run call for each replica.

The buffer shape required for this anchor in IStepIO is [replicationFactor, <anchorTensorShape>] (with dimensions of size 1 removed).

enumerator EveryN

Return the tensor value for every N-th global batch for each replica and for all accumulation steps in that global batch.

Note that the value of N is captured by AnchorReturnType.

The buffer shape required for this anchor in IStepIO is [batchesPerStep / N, accumulationFactor, replicationFactor, <anchorTensorShape>] (with dimensions of size 1 removed).

enumerator All

Return the tensor value for all micro batches for each replica.

The buffer shape required for this anchor in IStepIO is [batchesPerStep, accumulationFactor, replicationFactor, <anchorTensorShape>] (with dimensions of size 1 removed).

enumerator Sum

Return one tensor value for each replica, doing a sum reduction over the batchesPerStep and accumulationFactor dimensions.

The buffer shape required for this anchor in IStepIO is [replicationFactor, <anchorTensorShape>] (with dimensions of size 1 removed).

enum class popart::ExchangeStrategy

Enum type to specify an exchange strategy.

JustInTime: .- outer loop ———-&#8212;. |.- inner loop ——–&#8212;.| || load - compute - store || |’———————&#8212;’| ‘———————–&#8212;’

OverlapInnerLoop:

  • Boxes denote subgraphs / subgraph Ops / loops

  • Inputs/outputs are loop carried in order

.- outer loop ————————————-&#8212;. | .- inner loop -. | | load - compute - | - store | | | load - | - compute &#8212; | - store | | | load –&#8212; | - compute - store | | ‘———–&#8212;’ | ‘————————————————–&#8212;’ ^^^^^^^ ^^^^^^^ ^^^^^^^ overlap overlap overlap

OverlapLoops

  • Boxes denote subgraphs / subgraph Ops / loops

  • Numbers on boxes are matching subgraph/loop inputs and outputs

  • Overlap indicators indicate compute & load/store pairs overlapping in time

             load
               |
            compute   load            load         < overlap
               |        |               |
               1        2               |
           .-- inner loop --.           |
           |   |        |   |           |
           | store  compute |           |          < overlap
           | load       |   |           |          < overlap
           |   |        |   |           |
           '----------------'           |
               2        1      load compute        < overlap
               |        |        |      |
               1        2        3      4
    
    .- outer loop ——————————–&#8212;. | | | | | | | compute store | store | < overlap | \ / | | 1 2 | | .&#8212; inner loop &#8212;. | | | | | | | | | store compute | | < overlap | | load | | | < overlap | | | | | | | ‘————-&#8212;’ | | 2 1 | | | | | | load compute | load | < overlap | | | | | | ‘———————————————&#8212;’ 3 4 2 1 | | | | compute | store | < overlap | \ / | 1 2 | .&#8212; inner loop &#8212;. | | | | | | | store compute | < overlap | | load | | < overlap | | | | | | ‘————-&#8212;’ | 2 1 | | | store compute store < overlap | store

OverlapStep: Not supported yet

Values:

enumerator JustInTime = 0

Copy tensor when required.

enumerator OverlapInnerLoop = 1

Preload values in previous inner loop iteration for the next iteration.

enumerator OverlapLoops = 2

Preload values in the previous loop iteration for the next iteration (implies OverlapInnerLoop)

enumerator OverlapStep = 3

Preload values in the previous host training step for next step (implies OverlapLoops) - not supported yet.

enumerator N = 4

Number of values.

class AnchorReturnType

Class that captures an AnchorReturnTypeId value.

When the value is AnchorReturnTypeId::EVERYN, the associated N value. The constructor takes std::string values and converts them as appropriate.

Public Functions

AnchorReturnType()

Default constructor for the AnchorReturnType class.

AnchorReturnType(std::string artString, TileSet tileSet = TileSet::Compute, ExchangeStrategy exchangeStrategy = ExchangeStrategy::JustInTime)

Constructor for the AnchorReturnType class.

NOTE: Attempting to construct an AnchorReturnType for AnchorReturnTypeId::EVERYN using this constructor will result in an error. Use AnchorReturnType(std::string,int,TileSet,ExchangeStrategy) which also specifies the return period.

Parameters
  • artString – The string to convert to an AnchorReturnTypeId value. The following values are acceptable (case insensitive):

    • ”final” = AnchorReturnTypeId::FINAL

    • “all” = AnchorReturnTypeId::ALL

    • “sum” = AnchorReturnTypeId::SUM

  • tileSet – (Optional) The type of the tile set. Default: TileSet::Compute.

  • exchangeStrategy – (Optional) The overlap strategy (between IO and compute) for anchor tensors. Default: ExchangeStrategy::JustInTime.

AnchorReturnType(std::string artString, int returnPeriod, TileSet tileSet = TileSet::Compute, ExchangeStrategy exchangeStrategy = ExchangeStrategy::JustInTime)

Constructor for the AnchorReturnType class.

Parameters
  • artString – The string to convert to an AnchorReturnTypeId value. The following values are acceptable (case insensitive):

    • ”final” = AnchorReturnTypeId::FINAL

    • “all” = AnchorReturnTypeId::ALL

    • “sum” = AnchorReturnTypeId::SUM

  • returnPeriod – The value of N in the case of AnchorReturnTypeId::EVERYN.

  • tileSet – (Optional) The type of the tile set. Default: TileSet::Compute.

  • exchangeStrategy – (Optional) The overlap strategy (between IO and compute) for anchor tensors. Default: ExchangeStrategy::JustInTime.

inline const std::string &str() const

Get a string of AnchorReturnTypeId.

inline const TileSet &tileSet() const

Get the type of the tile set.

inline const ExchangeStrategy &exchangeStrategy() const

Get the type of overlap strategy.

class DataFlow

This class specifies parameters for host-device data streams.

The parameters are used to control the amount input data processed in each step, that is each Session::run call. The parameters also determine how data is returned to the user.

See also

AnchorReturnType, AnchorReturnTypeId.

Public Functions

DataFlow()

Default constructor.

This constructor sets batchesPerStep to 0 and does not have any anchor tensors.

DataFlow(int batchesPerStep)

Construct a DataFlow instance without anchor tensors.

Parameters

batchesPerStep – The number of global batches to run in the inference or training session for each call to Session::run before returning control to the caller.

DataFlow(int batchesPerStep, const AnchorReturnTypeMap &anchorMap)

Construct a DataFlow instance with anchor tensors.

Parameters
  • batchesPerStep – The number of global batches to run in the inference or training session for each call to Session::run before returning control to the caller.

  • anchorMap – A mapping from output tensor TensorId to AnchorReturnType indicating the strategy with which to write the anchor tensor values to the IStepIO object provided to Session::run.

DataFlow(int batchesPerStep, const std::vector<TensorId> anchorTensorIds, const AnchorReturnType &anchorReturnType = AnchorReturnType("All"))

Construct a DataFlow instance with anchor tensors.

Parameters
  • batchesPerStep – The number of global batches to run in the inference or training session for each call to Session::run before returning control to the caller.

  • anchorTensorIds – The tensor ID of anchor tensors.

  • anchorReturnType – The strategy with which to write anchor tensor values to the IStepIO object provided to Session::run.

DataFlow(const DataFlow &rhs) = default
inline void setBatchesPerStep(const int batchesPerStep)

Set the value for batchesPerStep.

class InputSettings

Class that describes the TileSet, ExchangeStrategy, and ReplicatedStreamMode used for an input tensor.

Public Functions

InputSettings()

Constructor for the InputSettings class.

InputSettings(TileSet tileSet, ExchangeStrategy exchangeStrategy)

Constructor for the InputSettings class.

Parameters
  • tileSet – The type of the tile set.

  • exchangeStrategy – The overlap strategy (between IO and compute) for anchor tensors.

InputSettings(ReplicatedStreamMode replicatedStreamMode)

Constructor for the InputSettings class.

Parameters

replicatedStreamMode – The mode used for the replicated stream.

inline const TileSet &tileSet() const

Get the type of the tile set.

inline const ExchangeStrategy &exchangeStrategy() const

Get the type of overlap strategy.

inline ReplicatedStreamMode replicatedStreamMode() const

Get the mode of the replicated stream.

inline void setTileSet(TileSet tileSet)

Set the type of the tile set.

Parameters

tileSet – The type of the tile set..

inline void setExchangeStrategy(ExchangeStrategy exchangeStrategy)

Set the overlap strategy (between IO and compute).

Parameters

exchangeStrategy – The overlap strategy.

inline void setReplicatedStreamMode(ReplicatedStreamMode streamMode)

Set the mode used for the replicated stream.

Parameters

replicatedStreamMode – The mode used for the replicated stream.

using popart::AnchorReturnTypeMap = std::map<TensorId, AnchorReturnType>
#include <popart/replicatedstreammode.hpp>
enum class popart::ReplicatedStreamMode

Values:

enumerator Broadcast
enumerator Replicate

14.7. Device manager

#include <popart/devicemanager.hpp>
enum class popart::DeviceType

Defines the type of device to use for graph compilation and execution.

Values:

enumerator IpuModel = 0

Use the Poplar IPU Model for graph compilation and execution.

The IPU Model will simulate the behaviour of the IPU hardware. It will not completely implement every aspect of a real IPU. (Default).

enumerator Cpu

Use CPU for graph compilation and execution.

enumerator Ipu

Use IPU for graph execution.

enumerator OfflineIpu

Compile graph for later execution.

This can be done even if IPUs are not present. Offline graph compilation is also useful for verifying memory constraints.

enumerator Sim

[For Graphcore internal use only] Use a simulator for graph compilation and execution.

enum class popart::DeviceConnectionType

Controls when to connect to the IPU (if at all).

Values:

enumerator Always = 0

Attach to the IPU from the start (Default).

enumerator OnDemand

Wait until the compilation is complete and the executable is ready to be run before attaching to the IPU.

enumerator Never

Never try to attach to an IPU.

This is useful for offline compilation (DeviceType::OfflineIpu. Trying to run an executable will throw an error.

enum class popart::SyncPattern

Controls synchronisation in multi-IPU systems.

Values:

enumerator Full = 0

Require all IPUs to synchronise on every communication between IPUs or between IPUs and host (Default).

enumerator SinglePipeline

Allow IPUs to synchronise with the host independently, without having to synchronise with each other.

This permits any one IPU to perform host IO while other IPUs are processing data.

enumerator ReplicaAndLadder

Allow an IPU group to communicate with the host without requiring synchronisation between groups.

This permits multiple IPU groups to alternate between performing host IO and computation.

class DeviceInfo

Represents a specific device.

Subclassed by popart::popx::DevicexInfo, popart::popx::DevicexOfflineIpuInfo

Public Functions

DeviceInfo(DeviceType _type, DeviceConnectionType _connectionType, const poplar::OptionFlags &_flags)

Constructor for the DeviceInfo class.

Parameters
  • _type – The type of the device.

  • _connectionType – The setting for when to connect to the device, if at all.

  • _flags – A set of Poplar option/value string flags.

virtual ~DeviceInfo()

Destructor for DeviceInfo.

virtual bool attach() = 0

Attach to the device.

Returns

true if successfully attached to the device, false otherwise.

virtual void detach() = 0

Detach from the device.

virtual bool isAttached() const = 0

Check if attached to the device.

Returns

true if attached to the device, false otherwise.

inline DeviceType getType() const

Get the type of the device.

Returns

The type of the device.

inline DeviceConnectionType getConnectionType() const

Get the setting for when to connect to the device.

Returns

The setting for when to connect to the device.

std::string toString() const

Return a description of the device.

virtual int getId() const = 0

Get the device id.

virtual std::vector<int> getChildIds() const = 0

Get the child device IDs.

The value returned by getId() for a multi-IPU device is a ‘parent ID’ and does not relate to the IDs of the devices it comprises. This function, in the case of real devices, uses the Poplar API to work out which single-IPU device IDs it relates to. In the case of replication, a device includes all IPUs involved, so a 2-IPU model with 2x replication would expect to have 4 child IDs returned here.

virtual std::string getVersion() const = 0

Get the version of the software on the IPU.

virtual int getNumIpus() const = 0

Get the number of IPUs in the device.

virtual int getTilesPerIPU() const = 0

Get the number of tiles per IPU.

virtual int getNumWorkerContexts() const = 0

Get the number of worker contexts per tile.

virtual std::string getIpuVersion() const = 0

Get the IPU version.

virtual std::vector<unsigned> getDriverIds() const = 0

Get the version of the drivers on the IPU.

virtual const poplar::Target &getTarget() const = 0

Get the Poplar target.

inline virtual bool canCompileOffline() const

Get whether the device supports offline compilation.

Returns

true if the device supports offline compilation, otherwise false`.

const poplar::OptionFlags &getOptionFlags() const
void setOnDemandAttachTimeout(const unsigned seconds)

Set timeout (in seconds) for trying to attach to a device.

If unable to attach to a device on the first try, the DeviceManager instance will periodically try to attach to the device until successfully attached or this timeout is reached.

Note

This only applies when trying to attach with DeviceConnectionType::OnDemand.

Parameters

seconds – The timeout (in seconds) for trying to attach to the device.

inline const unsigned &getOnDemandAttachTimeout() const

Get timeout (in seconds) for trying to attach to a device.

Returns

The timeout (in seconds) for trying to attach to the device.

bool tryAttachUntilTimeout()

Periodically try to attach to the device until either the attach timeout is reached or successfully attached.

bool isHwCompatible() const
void writeToDeviceAccessLog(const std::string &event, const std::map<std::string, std::string> &auxKeyVals = {})

Log an event for device debugging purposes.

This event will get logged to the file location defined by the environment variable POPART_LOG_DEVICE_ACCESS_IN_TESTS, if it is set.

Parameters
  • event – A text description of the event to be written to the log.

  • auxKeyVals – Optional additional parameters to log.

class DevicexInfo : public popart::DeviceInfo

Subclassed by popart::popx::DevicexCpuInfo, popart::popx::DevicexIpuInfo, popart::popx::DevicexIpuModelInfo, popart::popx::DevicexSimInfo

Public Functions

inline DevicexInfo(popart::DeviceType _type, popart::DeviceConnectionType _connectionType, poplar::Device &_device, const poplar::OptionFlags &_flags)
~DevicexInfo() override
bool attach() override
void detach() override
inline int getNumIpus() const override
inline int getTilesPerIPU() const override
inline int getNumWorkerContexts() const override
inline std::vector<unsigned> getDriverIds() const override
inline const poplar::Device &getDevice() const
inline const poplar::Target &getTarget() const override
inline std::string getIpuVersion() const override
inline bool isAttached() const override
virtual void setMostRecentlyLoaded(Devicex *devicex)

Mark devicex as the last one that was loaded.

virtual bool isMostRecentlyLoaded(const Devicex *devicex) const

Check if Devicex was the last one that was loaded.

class DevicexCpuInfo : public popart::popx::DevicexInfo

Public Functions

inline DevicexCpuInfo(poplar::Device &_device)
inline int getId() const override
inline std::vector<int> getChildIds() const override
inline std::string getVersion() const override
class DevicexIpuInfo : public popart::popx::DevicexInfo

Public Functions

inline DevicexIpuInfo(popart::DeviceConnectionType _dct, int _id, poplar::Device &_device, const poplar::OptionFlags &_flags)
inline int getId() const override
std::vector<int> getChildIds() const override
std::string getVersion() const override
inline bool canCompileOffline() const override
class DevicexIpuModelInfo : public popart::popx::DevicexInfo

Public Functions

inline DevicexIpuModelInfo(poplar::Device &_device, const std::string _ipuVersion)
inline int getId() const override
inline std::vector<int> getChildIds() const override
inline std::string getVersion() const override
class DevicexSimInfo : public popart::popx::DevicexInfo

Public Functions

inline DevicexSimInfo(poplar::Device &_device)
inline int getId() const override
inline std::vector<int> getChildIds() const override
inline std::string getVersion() const override
class DevicexOfflineIpuInfo : public popart::DeviceInfo

Public Functions

inline DevicexOfflineIpuInfo(poplar::Target &_target, const poplar::OptionFlags &_flags)
inline bool attach() override
inline void detach() override
inline int getId() const override
inline std::vector<int> getChildIds() const override
inline std::string getVersion() const override
inline int getNumIpus() const override
inline int getTilesPerIPU() const override
inline int getNumWorkerContexts() const override
inline std::string getIpuVersion() const override
inline std::vector<unsigned> getDriverIds() const override
inline const poplar::Target &getTarget() const override
inline bool canCompileOffline() const override
inline bool isAttached() const override
class DeviceManager

A class to manage devices.

Public Functions

DeviceManager(const DeviceManager&) = default
~DeviceManager() = default
void registerDeviceProvider(DeviceProvider *provider)

Register a device provider.

Parameters

provider – The device provider to be registered with the device manager.

virtual void enumerate(std::vector<std::shared_ptr<popart::DeviceInfo>> &devices, unsigned requiredNumIPUs, SyncPattern syncPattern, DeviceType type, DeviceConnectionType connectionType, uint32_t requiredTilesPerIPU)

Get the list of all devices that satisfy the specified criteria.

Parameters
  • devices – The list of devices.

  • requiredNumIPUs – The number of IPUs required.

  • syncPattern – The setting for when to synchronise in a multi-IPU system.

  • type – The type of the device to use for compilation and execution.

  • connectionType – The setting for when to connect to the device.

  • requiredTilesPerIPU – The number of tiles per IPU required.

std::vector<std::shared_ptr<DeviceInfo>> enumerateDevices(SyncPattern pattern = SyncPattern::Full, int numIpus = 1, DeviceType deviceType = DeviceType::Ipu, DeviceConnectionType connectionType = DeviceConnectionType::Always, int tilesPerIPU = 0)

Get the list of all devices with the required criteria.

Parameters
  • pattern – The setting for when to synchronise in a multi-IPU system. (Default: SyncPattern::Full).

  • numIpus – The number of IPUs required. (Default: 1).

  • deviceType – The type of the device required. (Default: DeviceType::Ipu).

  • connectionType – The setting for when to connect to the device. (Default: DeviceConnectionType::Always).

  • tilesPerIPU – The number of tiles per IPU required. (Default: 0).

Returns

The list of devices with the required criteria.

std::shared_ptr<DeviceInfo> getDevice(SyncPattern syncPattern = SyncPattern::Full, uint32_t deviceManagerId = 0, DeviceConnectionType connectionType = DeviceConnectionType::Always)

Get a device with the required criteria.

Parameters
  • syncPattern – The setting for when to synchronise in a multi-IPU system. (Default: SyncPattern::Full).

  • deviceManagerId – The ID of the requested device. (Default: 0)

  • connectionType – The setting for when to connect to the device. (Default: DeviceConnectionType::Always).

Returns

A device, which can be used with a session. If no device is acquired, a nullptr is returned.

std::shared_ptr<DeviceInfo> tryAcquireAvailableDevice(int numIpus = 1, int tilesPerIPU = 0, SyncPattern pattern = SyncPattern::Full, DeviceConnectionType connectionType = DeviceConnectionType::Always, DeviceSelectionCriterion selectionCriterion = DeviceSelectionCriterion::First)

Finds an available hardware device, with the specified number of IPUs.

This method will attach to the device if connectionType is equal to DeviceConnectionType::Always. This method is suitable when polling for an available device when resources are constrained.

Parameters
  • numIpus – The number of IPUs on the device (Default: 1).

  • tilesPerIPU – The number of tiles per IPU. An input of 0 will match any number. (Default: 0).

  • pattern – The setting for when to synchronise in a multi-IPU system. (Default: SyncPattern::Full).

  • connectionType – The setting for when to connect to the device. (Default: DeviceConnectionType::Always).

  • selectionCriterion – The method for selecting a device from the list of valid selections. (Default: DeviceSelectionCriterion::First).

Returns

A device, which can be used with a session. If no device is acquired, a nullptr is returned.

std::shared_ptr<DeviceInfo> acquireAvailableDevice(int numIpus = 1, int tilesPerIPU = 0, SyncPattern pattern = SyncPattern::Full, DeviceConnectionType connectionType = DeviceConnectionType::Always, DeviceSelectionCriterion selectionCriterion = DeviceSelectionCriterion::First)

Finds an available hardware device, with a certain number of IPUs.

This method will attach to the device if connectionType is equal to DeviceConnectionType::Always. Throws an error if there are less than numIpus IPUs available.

Parameters
  • numIpus – The number of IPUs on the device [=1].

  • tilesPerIPU – The number of tiles per IPU. An input of 0 will match any number. (Default: 0).

  • pattern – The setting for when to synchronise in a multi-IPU system. (Default: SyncPattern::Full).

  • connectionType – The connection type, for deciding when to attach to the device.

  • selectionCriterion – How to select a device from the list of valid selections.

Returns

A device, which can be used with a session.

std::shared_ptr<DeviceInfo> tryAcquireDeviceById(int id, SyncPattern pattern = SyncPattern::Full, DeviceConnectionType connectionType = DeviceConnectionType::Always)

Allocates the hardware device by ID.

This ID can be found running gc-info -l. This method will try to attach to the device if connectionType is equal to DeviceConnectionType::Always. This method is suitable when polling for an available device when resources are constrained.

Parameters
  • id – The ID of the IPU to be used.

  • pattern – The setting for when to synchronise in a multi-IPU system. (Default: SyncPattern::Full).

  • connectionType – The connection type, for deciding when to attach to the device. (Default: DeviceConnectionType::Always).

Returns

A device, which can be used with a session. If no device is acquired, a nullptr is returned.

std::shared_ptr<DeviceInfo> acquireDeviceById(int id, SyncPattern pattern = SyncPattern::Full, DeviceConnectionType connectionType = DeviceConnectionType::Always)

Allocates the hardware device by ID.

This ID can be found running gc-info -l. This method will attach to the device if connectionType is equal to DeviceConnectionType::Always.

Parameters
  • id – The ID of the IPU to be used.

  • pattern – The setting for when to synchronise in a multi-IPU system. (Default: SyncPattern::Full).

  • connectionType – The connection type, for deciding when to attach to the device. (Default: DeviceConnectionType::Always).

Returns

A device, which can be used with a session.

std::shared_ptr<DeviceInfo> createHostDevice(DeviceType type, const std::map<std::string, std::string> &options)

Create a simulated device on the host for testing purposes.

Parameters
  • type – The type of device to simulate.

  • options – The configuration settings for the host device.

Returns

The requested device for testing purposes.

std::shared_ptr<DeviceInfo> createCpuDevice()

Create a simulated CPU device for testing purposes.

Returns

A simulated CPU device.

std::shared_ptr<DeviceInfo> createIpuModelDevice(const std::map<std::string, std::string> &options)

Create a simulated IpuModel device for testing purposes.

The following options are supported:

  • numIPUs: The number of IPUs to simulate (Default: 1).

  • ge: The number of tiles per IPU (Default: defaultFewTiles).

  • compileIPUCode: Indicate whether or not to compile real IPU code for modelling.

Parameters

options – Configuration settings for the IPU Model.

Returns

A device.

std::shared_ptr<DeviceInfo> createSimDevice(const std::map<std::string, std::string> &options)
std::shared_ptr<DeviceInfo> createOfflineIPUDevice(const std::map<std::string, std::string> &options)

Create a simulated OfflineIpu device for testing purposes.

This resembles an IPU and is used for offline compilation.

The following options are supported:

  • numIPUs: The number of IPUs to compile for

  • ge: The number of tiles per IPU (Default: defaultManyTiles).

  • ipuVersion: The ipu architecture (Default: “ipu2”).

  • syncPattern: The setting for synchronisation in a multi-IPU system.

Parameters

options – Configuration settings for the IPU Model.

Returns

A simulated OfflineIpu device.

std::shared_ptr<DeviceInfo> createOfflineIpuFromDeviceInfo(const DeviceInfo &deviceInfo)

Create a simulated OfflineIpu device from the description of another device.

Parameters

deviceInfo – The device to create a OfflineIpu version of.

Returns

An OfflineIpu device.

std::shared_ptr<DeviceInfo> createOfflineIpuFromSystemString(const std::string &system, uint32_t numIpus)

Create a simulated OfflineIpu device from the name of a system.

Parameters
  • system – The device to create a OfflineIpu version of.

  • numIpus – The number of IPUs. Providing 0 corresponds to all IPUs in system

Returns

An OfflineIpu device.

void setOnDemandAttachTimeout(const unsigned seconds)

If unable to attach to a device on first try, the attach timeout set here is the length of time (in seconds) that the DeviceManager will wait to try and attach.

Note: this only takes effect when trying to attach with a DeviceConnectionType::OnDemand DeviceConnectionType.

Parameters

seconds – The attach timeout in seconds.

Public Static Functions

static DeviceManager &createDeviceManager()

Accessor for the device manager.

Returns

A reference to the DeviceManager instance.

class DeviceProvider

The interface for device providers which are registered with the device manager.

Subclassed by popart::popx::DevicexManager

Public Functions

inline virtual ~DeviceProvider()

Destructor for DeviceProvider.

virtual std::shared_ptr<DeviceInfo> getDevice(SyncPattern syncPattern, unsigned deviceManagerId, DeviceConnectionType connectionType) = 0

Get the list of all devices that satisfy the specified criteria.

Throws an error if the connection type is DeviceConnectionType::Never.

Parameters
  • syncPattern – The setting for synchronisation on multi-IPU systems.

  • deviceManagerId – The ID of the requested device.

  • connectionType – The setting for when to connect to the device.

Returns

The list of all devices that satisfy the specified criteria.

virtual void enumerate(std::vector<std::shared_ptr<DeviceInfo>> &devices, uint32_t requiredNumIPUs, SyncPattern syncPattern, DeviceType type, DeviceConnectionType connectionType, uint32_t requiredTilesPerIPU) = 0

Get the list of all devices that satisfy the specified criteria.

Parameters
  • devices – The list of devices.

  • requiredNumIPUs – The number of IPUs required.

  • syncPattern – The setting for when to synchronise in a multi-IPU system.

  • type – The type of the device to use for compilation and execution.

  • connectionType – The setting for when to connect to the device.

  • requiredTilesPerIPU – The number of tiles per IPU required.

virtual std::shared_ptr<DeviceInfo> createHostDevice(DeviceType type, const std::map<std::string, std::string> &options, SyncPattern syncPattern = SyncPattern::Full) = 0

Create a host device for testing.

Parameters
  • type – The type of the device to use for compilation and execution.

  • options – The configuration for the created device. See createCpuDevice(), createIpuModelDevice(), createOfflineIPUDevice() and createSimDevice() for more information about options.

  • syncPattern – The setting for when to synchronise in a multi-IPU system.

Returns

The device for use in testing.

virtual std::shared_ptr<DeviceInfo> createOfflineIpuFromDeviceInfo(const DeviceInfo &deviceInfo) = 0
virtual std::shared_ptr<DeviceInfo> createOfflineIpuFromSystemString(const std::string &system, uint32_t numIpus) = 0
class DevicexManager : public popart::DeviceProvider

Public Functions

DevicexManager()
std::shared_ptr<DeviceInfo> getDevice(SyncPattern syncPattern, uint32_t deviceManagerId, DeviceConnectionType connectionType) override
void enumerate(std::vector<std::shared_ptr<popart::DeviceInfo>> &devices, unsigned requiredNumIPUs, SyncPattern syncPattern, DeviceType type, DeviceConnectionType connectionType, uint32_t requiredTilesPerIPU) override
std::shared_ptr<popart::DeviceInfo> createHostDevice(popart::DeviceType type, const std::map<std::string, std::string> &options, SyncPattern syncPattern = SyncPattern::Full) override
std::shared_ptr<DeviceInfo> createOfflineIpuFromDeviceInfo(const DeviceInfo &deviceInfo) override
std::shared_ptr<DeviceInfo> createOfflineIpuFromSystemString(const std::string &system, uint32_t numIpus) override
#include <popart/popx/devicex.hpp>
class Devicex

Public Functions

const Ir &ir() const
const IrLowering &lowering() const
IrLowering &lowering()
Devicex(Executablex &exe, std::shared_ptr<DeviceInfo> deviceInfo)
~Devicex()
void prepare()
void weightsFromHost()
void buffersFromHost()
void remoteBufferWeightsFromHost(const bool isUpdate = false)
void optimizerFromHost()
void setRandomSeedFromHost()
uint64_t getRandomSeedToHost()
void setRngStateFromHost()
std::vector<uint32_t> getRngStateToHost()
void setRngStateValue(const std::vector<uint32_t>)
std::map<std::string, std::vector<uint64_t>> cycleCountTensorToHost()
void run(IStepIO&, std::string debugName = "")
void run(std::string programHandle, IStepIO&, std::string debugName = "")
void weightsToHost()
void remoteBufferWeightsToHost()
void weightsToHost(const std::map<TensorId, MutableVoidData>&)
void popxlWeightsToTensorData()

Copy data from the device, to the host buffers, to the tensor.tensorData() buffers.

Will not run a WeightsToHost program if weights already in sync with ipu. After WeightsToHost, marks the weights as in sync with the ipu.

void popxlMarkHostWeightsOutOfSync()

Mark the d2hWeightBuffers as out of sync with the ipu.

void popxlMarkHostWeightsInSync()

Mark the d2hWeightBuffers as in sync with the ipu.

bool popxlAreHostWeightsInSync()

Are all the weights in sync with the ipu?

void readWeights(const IWeightsIO &dst)
void writeWeights(const IWeightsIO &src)
std::string getSummaryReport(bool resetProfile = true) const
std::string getSerializedGraph() const
pva::Report getReport() const
bool isEngineLoaded() const
void setEngineIsLoaded(bool isLoaded)
void connectRandomSeedStream()
void connectRngStateStream()
void connectStreamToCallback(const std::string &streamHandle, std::function<void(void*)> callback, unsigned index)
void connectStream(const std::string &streamHandle, void *host_buffer)
void connectHostFunction(const std::string &functionHandle, std::function<void(const void*const*, size_t, void*const*, size_t)> callback, unsigned index)
void copyFromRemoteBuffer(const PopStreamId buffer, void *w, int repeat_index, unsigned replication_index = 0)
void copyToRemoteBuffer(void *w, const PopStreamId buffer, int repeat_index, unsigned replication_index = 0)
unsigned getReplicationFactor() const
unsigned getAccumulationFactor() const
unsigned getGlobalReplicaOffset() const
unsigned getGlobalReplicationFactor() const
bool isReplicatedGraph() const
inline const DeviceInfo *getDeviceInfo() const
inline DeviceInfo *getDeviceInfo()
inline void setDeviceInfo(std::shared_ptr<DeviceInfo> deviceInfo_)
std::set<TensorId> getLinearlyCreatedInputTensors() const
std::set<TensorId> getEfficientlyCreatedInputTensors() const
inline bool prepareHasBeenCalled() const
void loadEngineAndConnectStreams()
void serializeExecutable(std::ostream &out, bool serializePopartMetadata, bool serializeTensorData)
void serializeExecutable(const std::string &path, bool serializePopartMetadata, bool serializeTensorData)
void serializeTensorData(const std::string &path)

Public Members

poplin::PlanningCache convCache
poplin::matmul::PlanningCache matmulCache
bool prePlanConvolutions = true
bool prePlanMatMuls = true

Friends

friend class serialization::WriterImpl
typedef std::string popart::popx::PopStreamId
class Executablex

Public Functions

Executablex(IrLowering &ir_lowering_)
Executablex(IrLowering &ir_lowering_, std::unordered_map<TensorId, std::unique_ptr<Tensor>> &&tensorMap, std::map<TensorId, CollectiveBalancedReorderId> &&cbrIdMap, std::map<CollectiveBalancedReorderId, gcl::CollectiveBalancedHostRearrangement> &&cbrMap)
IrLowering &lowering()
const IrLowering &lowering() const
const Ir &ir() const
inline bool isDeserialized() const
bool shouldSerialize()
bool containsTensor(const TensorId &id) const
Tensor *getTensor(const TensorId&)
const Tensor *getTensor(const TensorId&) const
std::set<TensorId> getAllTensorIds()
std::vector<TensorId> getTensorIds(TensorType)
void setRandomSeedValue(uint64_t value)
void resetWeights(const ONNX_NAMESPACE::ModelProto &modelProto, const bool ignoreWeightsInModelWithoutCorrespondingIrWeight = false)
inline const SessionOptions &getSessionOptions() const
inline std::vector<Tensor*> &getWeightTensors()
inline const std::vector<Tensor*> &getWeightTensors() const
inline const std::vector<Tensor*> &getAnchorTensors() const
inline const std::vector<Tensor*> &getOptimizerTensors() const
inline const std::vector<Tensor*> &getDataStreamTensors() const
inline const Tensor *getSeedTensor() const
const gcl::CollectiveBalancedHostRearrangement &getCollectiveBalancedHostRearrangement(const TensorId &id) const
const std::map<CollectiveBalancedReorderId, gcl::CollectiveBalancedHostRearrangement> getCollectiveBalancedHostRearrangements() const
const std::map<TensorId, CollectiveBalancedReorderId> getCollectiveBalancedHostRearrangementIds() const
std::string getCachePath(const std::string &cacheDir) const
void updateOptimizerTensors()

Public Static Functions

static std::unique_ptr<Executablex> createFromLoweredIr(IrLowering &ir_lowering_)
static std::unique_ptr<Executablex> createFromStream(IrLowering &ir_lowering_, std::unordered_map<TensorId, std::unique_ptr<Tensor>> &&tensorMap, std::map<TensorId, CollectiveBalancedReorderId> &&cbrIdMap, std::map<CollectiveBalancedReorderId, gcl::CollectiveBalancedHostRearrangement> &&cbrMap)
#include <popart/popx/irlowering.hpp>
class IrLowering

Public Types

using FunctionBuffers = std::vector<std::pair<const poplar::Function, poplar::FunctionBuffer>>

Public Functions

IrLowering(const Ir&, std::shared_ptr<DeviceInfo> deviceInfo, bool prepareGraphHasBeenCalled = false)
virtual ~IrLowering()
inline const Ir &ir() const
void growOpx(Opx*, SequenceMap::SequenceInterval seqInterval)
void growOpxCall(Opx*, SequenceMap::SequenceInterval seqInterval)
inline void setDevicex(Devicex *d)
std::set<TensorId> getLinearlyCreatedInputTensors() const
inline void setLinearlyCreatedInputTensors(const std::set<TensorId> &s)
inline void addLinearlyCreatedInputTensors(TensorId id)
std::set<TensorId> getEfficientlyCreatedInputTensors() const
inline void setEfficientlyCreatedInputTensors(const std::set<TensorId> &s)
inline void addEfficientlyCreatedInputTensors(TensorId id)
bool tryInitTensorByPostIRAliasing(TensorId dstId, RequireParallelWritable requireParallelWritable, const ViewChangers &viewChangers)
inline const std::vector<std::string> &getCycleCountIds() const
inline void setCycleCountIds(const std::vector<std::string> &ids)
inline const PopTensors &tensors() const
inline PopTensors &tensors()
inline const PopPrograms &progs() const
inline PopPrograms &progs()
void instrumentWithHardwareCycleCounter(poplar::program::Sequence&, int64_t tileId = 0, std::string id = "")
inline poplar::Graph &graph()
inline const poplar::Graph &graph() const
void prepareGraph()
void loadPoplarExecutable(serialization::Reader &reader)
poplar::Executable getExecutable(const ProfileCacher &ProfileCacher)
std::string getPoplarGraphDebugName()
std::string getSerializedGraph() const
poplar::Graph &getVirtualGraph(VGraphId virtualGraphIndex, TileSet tileSet = TileSet::Compute)
PriTaskDependency taskWhichCreates(TensorId) const
TaskId taskWhichPopulates(TensorId) const
PriTask getDependencyFreeInitTensorCreatorTask(const TensorId&)
unsigned getReplicationFactor() const
unsigned getAccumulationFactor() const
unsigned getGlobalReplicaOffset() const
unsigned getGlobalReplicationFactor() const
bool isReplicatedGraph() const
bool doRearrangeOnHost(Tensor *tensor) const
int getNumFragments(const Graph &graph) const
bool containsFragments(const Graph &graph) const
bool containsFragment(const Graph &graph, SubgraphPartIndex subgraphPart) const
void createFragment(const Graph &graph, SubgraphPartIndex subgraphPart)
std::vector<poplar::Function> &getFragmentFunctions(const Graph &graph)
poplar::Function &getFragmentFunction(const Graph &graph, SubgraphPartIndex subgraphPart)
void addFunctionBuffers(const GraphId gid, poplar::FunctionBufferMappingType fbmt)

Add a vector of pairs {f, buffer} for a given graph id, FunctionBufferMappingType pair.

This is enough for an [Internal|External]CodeCopy op to move code from the buffer in to the function. Note the subgraphpartitioner may have split this into multiple functions, so we require a vector of these for each graph.

Parameters
  • gid – The graph id to add the functions and buffers for.

  • fbmt – The FunctionBufferMappingType to add the vector for.

inline FunctionBuffers getFunctionBuffer(const GraphId gid, poplar::FunctionBufferMappingType fbmt)

Get the Function Buffers for the given GraphId and FunctionBufferMappingType.

Wrapper around popprograms function.

Parameters
  • gid – The GraphId to lookup.

  • fbmt – The FunctionBufferMappingType to lookup.

Returns

FunctionBuffers the vector of functions and buffers.

inline bool hasFunctionBuffer(const GraphId gid, poplar::FunctionBufferMappingType fbmt)

Returns true if a functionBuffer vector exists for the given graphId / FunctionBufferMappingType.

Wrapper around popprograms function.

Parameters
  • gid – The graph id to lookup.

  • fbmt – The FunctionBufferMappingType to lookup.

Returns

true If pairs exist.

Returns

false Otherwise.

std::vector<ICreatorCandidatePtr> getCreatorEndpoints(const Tensor *tensor, bool excludeEndpointsFromPath = true, bool includeDeadends = false) const
std::vector<ICreatorCandidatePtr> getTensorCreators(const Tensor *tensor, bool dependencyFree) const
poplar::Tensor getConst(poplar::Graph &graph, const poplar::Type &type, const std::vector<size_t> &shape, double val, const poplar::DebugContext &dc = {})
inline const ReplicatedTensorShardingBundle &getReplicatedTensorShardingBundle() const
inline ReplicatedTensorShardingBundle &getReplicatedTensorShardingBundle()
poplar::Tensor getScalarVariable(poplar::Graph &graph, const poplar::Type &type, const poplar::DebugContext &dc = {})
inline LinearMapper &getLinearMapper()
inline InitTensorOffsetMap &getInitTensorOffsetMap()
inline const liveness::LivenessAnalyzer *getLivenessAnalyzer() const
inline const liveness::SubgraphPartitioner *getSubgraphPartitioner() const
inline liveness::AliasZeroCopy *getAliasZeroCopy() const
inline const DeviceInfo *getDeviceInfo() const
inline void setDeviceInfo(std::shared_ptr<DeviceInfo> deviceInfo_)
std::unique_ptr<Opx> createOpx(Op*)
inline Opx *getOpx(OpId id)
inline const Opx *getOpx(OpId id) const
const std::vector<Op*> &getMainGraphOpSeries() const
std::map<Op*, int, POpCmp> getMainGraphOpSeriesNums() const
std::map<Op*, int, POpCmp> getMainGraphOpCounts() const
std::string getContextOpString(ExecutionContext context, const std::vector<TaskId> &taskOrder) const
inline bool prepareGraphHasBeenCalled() const
inline bool getOuterLoopFragEmpty() const
inline bool usingCachedExecutable() const
poplar::DataStream &insertGradientStoreStream(TensorId, TensorInfo, poplar::Graph&)
poplar::DataStream &insertGradientLoadStream(TensorId, TensorInfo, poplar::Graph&)
poplar::DataStream &insertWeightLoadStream(TensorId, TensorInfo, poplar::Graph&)
inline void addPipelineIndexTensor(const poplar::Tensor &tensor)
inline ExchangeBundle &getExchangeBundle()

Get the exchange bundle containing stream and remote buffer data structures.

Returns

Exchange bundle

inline const ExchangeBundle &getExchangeBundle() const

Get the exchange bundle containing stream and remote buffer data structures.

Returns

Exchange bundle

inline const std::vector<poplar::Tensor> getPipelineIndexTensors()
inline const std::map<TensorId, poplar::DataStream> &getFromHostStreams() const
inline const std::map<TensorId, poplar::DataStream> &getToHostAnchorStreams() const
inline const std::map<TensorId, poplar::DataStream> &getToHostWeightStreams() const
template<class T>
inline T *getOpxState(OpId opid)
inline void setProgramHandleIndexMap(const std::map<std::string, unsigned> &programHandleIndexMap_)
inline const std::map<std::string, unsigned> &getProgramHandleIndexMap() const

Public Members

poplar::OptionFlags pooling_options
poplar::OptionFlags lstmOptions
poplar::OptionFlags matmulOptions
poplar::OptionFlags gclOptions
poplar::OptionFlags engineOptions
poplar::OptionFlags reportOptions
std::map<OpId, std::unique_ptr<Opx>> opxs

Public Static Functions

static std::string cycleCountStreamId(std::string id)
static void removeNonDependencyFreeCreators(std::vector<ICreatorCandidatePtr> &candidates)
static PopStreamId h2dId(TensorId)
static PopStreamId d2hId(TensorId, bool isAnchorStream)
static PopStreamId gradientStoreStreamId(TensorId id)
static PopStreamId gradientLoadStreamId(TensorId id)
static PopStreamId weightLoadStreamId(TensorId id)
#include <popart/popx/poptensors.hpp>
class PopTensors

Public Functions

PopTensors(const Ir&)
void insert(TensorId, const poplar::Tensor&)
void insertAliased(TensorId to, TensorId from)
void insertUnsafe(TensorId id, const poplar::Tensor &pt)
const poplar::Tensor &get(TensorId) const
const poplar::Tensor &getView(TensorId) const
bool hasViewChangers(TensorId) const
const ViewChangers &getViewChangers(TensorId)
void setViewChangers(TensorId, const ViewChangers &viewChangers)
bool contains(TensorId) const
const std::map<TensorId, std::shared_ptr<poplar::Tensor>> &getTensors() const
bool canAlias(TensorId, RequireParallelWritable requireParallelWritable) const
#include <popart/popx/popprograms.hpp>
class PopPrograms

Class for managing the complete set of programs that a Devicex can run.

A program in this context is the instance of the poplar::Program class which represents a control program that executes operations on the graph.

The state std::vector<poplar::program::Sequence> seqs contains all these programs, and is populated during IrLowering. The programs are passed to poplar::compileGraph to construct the executable (see IrLowering::getExecutable()).

Public Types

enum ProgramIndex

Values:

enumerator WeightsFromHost = 0
enumerator OptimizerFromHost
enumerator RandomSeedFromHost
enumerator RandomSeedToHost
enumerator RngStateFromHost
enumerator Program
enumerator RngStateToHost
enumerator WeightsToHost
enumerator CycleCountTensorToHost
enumerator CustomProgramsStart
enumerator N
enum class ProgramFragmentIndex

Values:

enumerator StreamWeightsFromHost = 0
enumerator StreamOptimizerFromHost
enumerator RandomSeedFromHost
enumerator RandomSeedToHost
enumerator RngStateFromHost
enumerator Init
enumerator PreForward
enumerator Forward
enumerator Backward
enumerator VarUpdateFromAccumulator
enumerator RngStateToHost
enumerator WeightsToHost
enumerator ToHostFinalCopy
enumerator CycleCountTensorToHost
enumerator N
enum class PipelineFragmentId

Values:

enumerator ToDeviceStream = 0
enumerator Main
enumerator ToHostStream
using FunctionBuffers = std::vector<std::pair<const poplar::Function, poplar::FunctionBuffer>>

Public Functions

PopPrograms(IrLowering *ir_lowering_p_)
const poplar::program::Sequence &streamWeightsFromHostFragment() const
poplar::program::Sequence &streamWeightsFromHostFragment()
const poplar::program::Sequence &streamOptimizerFromHostFragment() const
poplar::program::Sequence &streamOptimizerFromHostFragment()
const poplar::program::Sequence &randomSeedFromHostFragment() const
poplar::program::Sequence &randomSeedFromHostFragment()
const poplar::program::Sequence &randomSeedToHostFragment() const
poplar::program::Sequence &randomSeedToHostFragment()
const poplar::program::Sequence &cycleCountTensorToHostFragment() const
poplar::program::Sequence &rngStateFromHostFragment()
const poplar::program::Sequence &rngStateFromHostFragment() const
poplar::program::Sequence &rngStateToHostFragment()
const poplar::program::Sequence &rngStateToHostFragment() const
poplar::program::Sequence &cycleCountTensorToHostFragment()
const poplar::program::Sequence &toHostFinalCopyFragment() const
poplar::program::Sequence &toHostFinalCopyFragment()
const poplar::program::Sequence &initFragment() const
poplar::program::Sequence &initFragment()
const poplar::program::Sequence &preForwardFragment() const
poplar::program::Sequence &preForwardFragment()
const poplar::program::Sequence &forwardFragment() const
poplar::program::Sequence &forwardFragment()
const poplar::program::Sequence &backwardFragment() const
poplar::program::Sequence &backwardFragment()
const poplar::program::Sequence &accumulateOuterFragment() const
poplar::program::Sequence &accumulateOuterFragment()
const poplar::program::Sequence &weightsToHostFragment() const
poplar::program::Sequence &weightsToHostFragment()
poplar::program::Sequence &forwardOrBackwardFragment(ScheduledPreLoss)
const std::vector<poplar::program::Program> progs() const
poplar::program::Sequence &programFragment(PopPrograms::ProgramFragmentIndex)
int getNumFragments(const Graph &graph) const
std::vector<poplar::program::Sequence> &scopeFragments(const Graph&)
poplar::program::Sequence &scopeFragment(const Graph&, SubgraphPartIndex subgraphPart)
bool containsFragments(const Graph &graph) const
bool containsFragment(const Graph &graph, SubgraphPartIndex subgraphPart) const
void createFragment(const Graph &graph, SubgraphPartIndex subgraphPart)
std::vector<poplar::Function> &getFragmentFunctions(const Graph &graph, poplar::Graph &poplarGrpah)
poplar::Function &getFragmentFunction(const Graph &graph, SubgraphPartIndex subgraphPart, poplar::Graph &poplarGraph)
std::vector<poplar::program::Sequence>::iterator recomputeFragment(OpId)
SequenceMap::SequenceInterval createRecomputeFragment(OpId)
bool hasBeenRecomputed(OpId, ExecutionPhase) const
void recordRecomputed(OpId, ExecutionPhase)
std::string getStrFromPipelineFragmentId(PipelineFragmentId) const
poplar::program::Sequence &pipelineFragment(PipelineStage, PipelineFragmentId, const std::string &desc)
poplar::program::Sequence &pipelineToDeviceStreamFragment(PipelineStage pipelineStage, const std::string &desc)
poplar::program::Sequence &pipelineMainFragment(PipelineStage, const std::string &desc)
poplar::program::Sequence &pipelineToHostStreamFragment(PipelineStage, const std::string &desc)
poplar::program::Sequence &pipelineIpuCopyFragment(const std::string &desc)
poplar::program::Sequence &namedBuffersCopyFragment()
void addPipelineCycle(PipelineInfo pInfo, PipelineCycle pCycle, poplar::program::Sequence &sq, std::ostringstream &ss) const
void addFunctionBuffers(const GraphId gid, poplar::FunctionBufferMappingType fbmt)

Add a vector of pairs {f, buffer} for a given graph id.

This is enough for a [Internal|External]CodeCopy op to move code from the buffer in to the function. Note the subgraphpartitioner may have split this into multiple functions, so we require a vector of these for each graph.

Parameters
  • pair – The graph id, FunctionBufferMappingType pair to add the functions and buffers for.

  • funcVec – The vector of functions and buffers.

inline FunctionBuffers getFunctionBuffer(const GraphId gid, poplar::FunctionBufferMappingType fbmt)

Get the Function Buffers for the given GraphId and FunctionBufferMappingType.

Parameters
  • gid – The GraphId to lookup.

  • fbmt – The FunctionBufferMappingType to lookup.

Returns

FunctionBuffers the vector of functions and buffers.

inline bool hasFunctionBuffer(const GraphId gid, poplar::FunctionBufferMappingType fbmt)

Returns true if a functionBuffer vector exists for the given graphId and FunctionBufferMappingType.

Parameters
  • gid – The graph id to lookup.

  • fbmt – The FunctionBufferMappingType to lookup.

Returns

true If pairs exist.

Returns

false Otherwise.

unsigned addCustomProgram(const poplar::program::Program &program)

Add a custom program.

Parameters

program – Program to add

Returns

Index of the popart/poplar program

void createPipelineFunctions()

Turn pipeline sequences into callable pipeline functions.

poplar::program::Sequence getFullProgramFromPipelineFragments(bool fwdOnly) const

Return the program based on the pipeline fragments.

See docs/notes/transforms/pipelining.md#assemble-from-fragments for detailed explanation.

Returns

The program based on the pipeline fragments

Public Members

IrLowering *ir_lowering_p

Public Static Attributes

static const std::unordered_map<popef::ProgramFlow::ProgramIndexType, std::string> commonPrograms
#include <popart/popx/inittensor.hpp>
class ICreatorCandidate

Subclassed by popart::popx::InputCreatorCandidate, popart::popx::InputMultiCreatorCandidate

Public Functions

ICreatorCandidate()
virtual ~ICreatorCandidate() = default
virtual std::pair<poplar::Tensor, ViewChangers> createInput(const poplar::DebugNameAndId &dnai) = 0
virtual DnfTensorIds mustExistBeforeCreate() = 0
virtual double getMaxCreatorPriority() const = 0
virtual int64_t getNumElems() const = 0
virtual std::vector<std::vector<OpxInAndOutIndex>> getPathsFromInput() = 0
virtual std::string str() = 0
virtual std::pair<poplar::Tensor, ViewChangers> unwind(poplar::Tensor) = 0
virtual std::vector<popart::view::Region> unwind(popart::view::Region) = 0
virtual std::vector<popart::view::Region> unwind() = 0
virtual int64_t getScheduleIndex() const = 0

Public Static Functions

static bool greaterThan(ICreatorCandidatePtr, ICreatorCandidatePtr)
#include <popart/popx/replicatedtensorshardingbundle.hpp>
class ReplicatedTensorShardingBundle

Helper class to bundle all replicated tensor sharding related lowering information together.

Public Functions

ReplicatedTensorShardingBundle(const Ir &ir)

Construct empty replicated tensor sharding bundle Creates the replicatedTensorShardingTracer with the IR object.

Parameters

ir – IR to create the ReplicatedTensorShardingTracer with

bool hasCollectiveBalancedReorder(const TensorId &tensorId) const

Check whether a tensor has an associated CollectiveBalancedReorder.

Parameters

tensorId – TensorId to check

Returns

True if the tensor has an associated CollectiveBalancedReorder

std::shared_ptr<gcl::CollectiveBalancedReorder> getCollectiveBalancedReorder(const TensorId &tensorId) const

Get the associated CollectiveBalancedReorder of a tensor.

Throws an error if the tensor does not have one.

Parameters

tensorId – TensorId to return the CollectiveBalancedReorder for

Returns

Shared pointer to the associated CollectiveBalancedReorder

const gcl::CollectiveBalancedHostRearrangement &getCollectiveBalancedHostRearrangement(const TensorId &tensorId) const

Get the host rearrangement method of a tensor.

Can be applied on the host-side tensor data to rearrange the data before upload or after download to/from the IPU

Parameters

tensorId – TensorId to return the CBR host rearrangement for

Returns

CBR host rearrangement method

void setCollectiveBalancedReorder(const TensorId &tensorId, CollectiveBalancedReorderId cbrId)

Associate an existing CollectiveBalancedReorder with a tensor.

Parameters
  • tensorId – TensorId to associate the CollectiveBalancedReorder with

  • cbrId – Identifier of an existing, registered CollectiveBalancedReorder obtained by registerCollectiveBalancedReorder

CollectiveBalancedReorderId registerCollectiveBalancedReorder(std::shared_ptr<gcl::CollectiveBalancedReorder> cbr)

Register a new collective balanced reorder method.

Parameters

cbrGCL CollectiveBalancedReoder to register

Returns

Registered ID for the CollectiveBalancedReoder

inline const std::map<CollectiveBalancedReorderId, std::shared_ptr<gcl::CollectiveBalancedReorder>> &getCollectiveReorders() const
Returns

inline const ReplicatedTensorShardingTracer &getReplicatedTensorShardingTracer() const
Returns

Tracer to resolve replicated tensor sharding groups

inline ReplicatedTensorShardingTracer &getReplicatedTensorShardingTracer()
Returns

Tracer to resolve replicated tensor sharding groups

inline const std::map<TensorId, CollectiveBalancedReorderId> &getCollectiveReorderIds() const

Get mapping to resolve which CollectiveBalancedReorder has to be applied to a tensor to restore the original data order.

Returns

Mapping of all tensors and their associated CollectiveBalancedReorderId

#include <popart/popx/linearmapper.hpp>
class LinearMapper

Public Functions

void mapTensor(poplar::Graph &graph, poplar::Tensor &tensor)

14.8. Ops

14.8.1. Op definition for PopART IR

#include <popart/op.hpp>
class Op : public popart::Vertex

Parent class for the concrete Op implementations.

The poplar implementation which the op represents can be found in the corresponding popx::Opx class, and will be lowered to poplar.

Subclassed by popart::AbortOp, popart::AbsGradOp, popart::AdaDeltaUpdaterOp, popart::AdamUpdaterOp, popart::AddBiasOp, popart::AllReduceOp, popart::ArgExtremaOp, popart::AveragePoolGradOp, popart::BaseOnnxRNNGradOp, popart::BaseOnnxRNNOp, popart::BasePadOp, popart::BaseSliceOp, popart::BaseSortOp, popart::BatchNormGradOp, popart::BatchNormOp, popart::BinaryComparisonOp, popart::BoundaryOp, popart::BucketizeOp, popart::CastOp, popart::CastThenPow2ScaleOp, popart::CollectivesBaseOp, popart::ConcatGradOp, popart::ConcatOp, popart::ConvFlipWeightsOp, popart::ConvTransposeOp, popart::CoshOp, popart::CtcBeamSearchDecoderOp, popart::CtcGradOp, popart::CumSumGradOp, popart::CumSumOp, popart::DynamicBaseOp, popart::ElementWiseBinaryBaseOp, popart::ElementWiseBinaryGradOp, popart::ElementWiseNonLinearUnaryGradOp, popart::ElementWiseUnaryBooleanOp, popart::ElementWiseUnaryOp, popart::ExchangeBaseOp, popart::ExpandGradOp, popart::ExpandOp, popart::ExpGradOp, popart::Expm1GradOp, popart::GatherGradOp, popart::GatherOp, popart::GetRandomSeedOp, popart::GlobalAveragePoolGradOp, popart::GlobalAveragePoolOp, popart::GlobalMaxPoolGradOp, popart::GlobalMaxPoolOp, popart::GroupNormGradOp, popart::GroupNormOp, popart::HasReceptiveFieldOp, popart::HistogramOp, popart::IdentityLossGradOp, popart::IfOp, popart::InitOp, popart::InstanceNormGradOp, popart::InstanceNormOp, popart::InternalCodeCopyOp, popart::IoTileCopyOp, popart::IpuCopyOp, popart::L1GradOp, popart::LambSquareOp, popart::LeakyReluGradOp, popart::LogSoftmaxGradOp, popart::LossOp, popart::LossScaleUpdateOp, popart::LRNGradOp, popart::LRNOp, popart::MatMulBaseOp, popart::MaxPoolGradOp, popart::ModifyRandomSeedOp, popart::MultiConvBaseOp, popart::MultiConvDataGradBaseOp, popart::MultiConvWeightsGradBaseOp, popart::NllGradOp, popart::NlllWithSoftmaxGradDirectOp, popart::NormalizeImageOp, popart::OnehotGradOp, popart::OnehotOp, popart::PackedDataBlockOp, popart::ParameterizedOp< TDerivedOp, TOpParams >, popart::PlaceholderOp, popart::PopartLSTMGradOp, popart::PopartLSTMOp, popart::Pow2ScaleThenCastOp, popart::ReduceGradOp, popart::ReduceOp, popart::ReluGradOp, popart::ReshapeBaseOp, popart::ResizeOp, popart::RestoreOp, popart::ReverseBaseOp, popart::RMSPropUpdaterOp, popart::RoiAlignGradOp, popart::RoiAlignOp, popart::ScaledAddOp, popart::ScatterDataGradOp, popart::ScatterReduceGradOp, popart::ScatterReduceOp, popart::ScatterUpdateGradOp, popart::SequenceSliceOp, popart::SGD1NesterovOp, popart::ShapeOrLikeOp, popart::SigmoidGradOp, popart::SoftmaxGradDirectOp, popart::SoftmaxGradOp, popart::SplineBasisOp, popart::SplineWeightingOp, popart::SplitGradOp, popart::SplitOp, popart::SqrtGradOp, popart::StashOp, popart::SubgraphOp, popart::SubsampleBaseOp, popart::SubsampleGradOp, popart::SyncOp, popart::TanhGradOp, popart::TensorRemapOp, popart::TileOp, popart::TopKGradOp, popart::TransposeBaseOp, popart::UpsampleOp, popart::VariadicGradOp, popart::VariadicOp, popart::VarUpdateOp, popart::WhereOp, popart::WhereXGradOp, popart::WhereYGradOp

Public Types

using SubgraphInSig = std::tuple<Op*, fwtools::subgraph::OutIndex, std::string>

The functionality required for sub-graph matching.

Public Functions

inline Settings &getSettings()

Get the settings associated with the op.

Returns

The op settings.

inline const Settings &getSettings() const

Get the settings associated with the op.

Returns

The op settings.

virtual Settings getInSettings(InIndex) const

Return suitable settings for an op inserted before the input to an existing op.

Parameters

InIndex – The input index before which the op is inserted.

Returns

The settings for the op inserted before the input index.

virtual Settings getOutSettings(OutIndex) const

Return suitable settings for an op inserted after the output to an existing op.

Parameters

OutIndex – The output index after which the op is inserted.

Returns

The settings for the op inserted after the output index.

Settings adjustInSettings(InIndex, Op::Settings) const

Adjust the settings to be suitable as input at the input index.

Parameters
  • InIndex – The input index where the settings are to be applied.

  • Settings – The settings to be adjusted.

Returns

Adjusted settings suitable for input at the input index.

Settings adjustOutSettings(InIndex, Op::Settings) const

Adjust the settings to be suitable as output at an output index.

Parameters
  • OutIndex – The output index where the settings are to be applied.

  • Settings – The settings to be adjusted.

Returns

Adjusted settings suitable for output at the output index.

const OptionalVGraphId getOptionalVGraphId() const

Get the ID of the optional virtual graph.

Returns

The ID of the optional virtual graph.

VGraphId getVirtualGraphId() const

Get the ID of the virtual graph.

Returns

The ID of the virtual graph.

VGraphIdAndTileSet getIntrospectionInVirtualGraphId(InIndex) const

Get virtual graph ID and tile set associated with an input index.

Parameters

InIndex – The input index.

Returns

The virtual graph ID and tile set at the input index.

VGraphIdAndTileSet getIntrospectionOutVirtualGraphId(OutIndex) const

Get virtual graph ID and tile set associated with an output index.

Parameters

OutIndex – The output index.

Returns

The virtual graph ID and tile set at the output index.

virtual VGraphIdAndTileSet getIntrospectionInVirtualGraphId(InIndex, std::set<OpId> &visited) const

Get virtual graph ID and tile set associated with an input index.

Parameters
  • InIndex – The input index.

  • visited – The set of labels associated with this operator to distinguish it from other operators in the virtual graph.

Returns

The virtual graph ID and tile set at the input index.

virtual VGraphIdAndTileSet getIntrospectionOutVirtualGraphId(OutIndex, std::set<OpId> &visited) const

Get virtual graph ID and tile set associated with an output index.

Parameters
  • OutIndex – The output index.

  • visited – The set of labels associated with this operator to distinguish it from other operators in the virtual graph.

Returns

The virtual graph ID and tile set at the output index.

void setVirtualGraphId(const OptionalVGraphId)

Set a virtual graph ID for the op.

Parameters

OptionalVGraphId – The ID of the virtual graph to set on this op.

bool hasVirtualGraphId() const

Check if the op has a virtual graph ID set.

Returns

true if the op has a virtual graph ID set, false otherwise.

void setPipelineStage(OptionalPipelineStage)

Set a pipeline stage for the op.

Parameters

OptionalPipelineStage – The pipeline stage to be set for the op.

bool hasPipelineStage() const

Check if the op has a pipeline stage set.

Returns

true if the op has a pipeline stage set, false otherwise.

PipelineStage getPipelineStage() const

Get the pipeline stage that has been set for the op.

Returns

The pipeline stage that has been set for the op.

OptionalPipelineStage getOptionalPipelineStage() const

Get the optional pipeline stage.

Returns

The optional pipeline stage that has been set for the op.

const OptionalExecutionPhase getOptionalExecutionPhase() const

Get the optional execution phase.

Returns

The optional execution phase that has been set for the op.

virtual ExecutionPhase getExecutionPhase() const

Get the execution phase that has been set for the op.

Returns

The execution phase that has been set for the op.

void setExecutionPhase(const OptionalExecutionPhase)

Set the execution phase for the op.

Parameters

OptionalExecutionPhase – The execution phase to be set for the op.

bool hasExecutionPhase() const

Check if the op has an execution phase set.

Returns

true if the op has a execution phase set, false otherwise.

const OptionalBatchSerializedPhase getOptionalBatchSerializedPhase() const

Get the optional batch serialized phase.

Returns

The optional batch serialized phase that has been set for the op.

virtual BatchSerializedPhase getBatchSerializedPhase() const

Get the batch serialized phase.

Returns

The batch serialized phase that has been set for the op.

void setBatchSerializedPhase(const OptionalBatchSerializedPhase)

Set the batch serialized phase.

Parameters

OptionalBatchSerializedPhase – The batch serialized phase to be set for the op.

bool hasBatchSerializedPhase() const

Check if the op has a batch serialization phase set.

Returns

true if the op has a batch serialization phase set, otherwise false.

const OptionalStochasticRoundingMethod getOptionalStochasticRoundingMethod() const

Get the optional stochastic rounding method.

Returns

The optional stochastic rounding method that has been set for the op.

virtual StochasticRoundingMethod getStochasticRoundingMethod() const

Get the stochastic rounding method.

Returns

The stochastic rounding method that has been set for the op.

void setStochasticRoundingMethod(const OptionalStochasticRoundingMethod)

Set the optional stochastic rounding method.

Parameters

OptionalStochasticRoundingMethod – The optional stochastic rounding method to be set for the op.

bool hasStochasticRoundingMethod() const

Check if the op has a stochastic rounding method set.

Returns

true if the op has a stochastic rounding method set, otherwise false.

bool isExcludedFromPattern(const Pattern*) const

Check if the op is excluded from a pattern.

Returns

true if the op is excluded from a pattern, false otherwise.

inline virtual int getInBatchAxis(InIndex) const

Get the batch axis for the input index.

Returns

The batch axis for the input index.

inline virtual int getOutBatchAxis(OutIndex) const

Get the batch axis for the output index.

Returns

The batch axis for the output index.

void inheritPlacementAttributes(bool inheritSerializations, AliasModel &aliasModel)

Helper function to set an op’s placement attributes by inheriting them from other ops in the graph.

The attributes that are set include:

  • Execution context.

  • Pipeline stage.

  • Execution phase.

  • Virtual graph ID.

  • Batch serial phase (optional).

Parameters
  • inheritSerializations – The indicator to enable or disable the batch serialization phase. true enables the batch serialization phase and false disables it.

  • aliasModel – An AliasModel object containing alias info for this op’s graph.

Ir &getIr()

Get the IR associated with the op.

Returns

The IR associated with the op.

const Ir &getIr() const

Get the IR associated with the op.

Returns

The IR associated with the op.

inline Graph &getGraph()

Get the graph associated with the op.

Returns

The graph associated with the op.

inline const Graph &getGraph() const

Get the graph associated with the op.

Returns

The graph associated with the op.

inline const Scope &getScope() const

Get the scope associated with the op.

Returns

The scope associated with the op.

inline void setScope(const Scope &scope)

Get the scope associated with the op.

Returns

The scope associated with the op.

inline const std::string &getName() const

Get the name of the op.

Returns

The name of the op.

inline void setName(const std::string &name)

Get the name of the op.

Returns

The name of the op.

inline const OpDebugInfo &getDebugInfo() const

Get the debug info of the op.

Returns

The debug info for the op.

virtual bool isNorm() const

Checks if the op is a norm op.

Returns

true if the op is a norm op, false otherwise.

bool isElementWiseUnary() const

Checks if the op is an element-wise unary op.

Returns

true if the op is an element-wise unary op, false otherwise.

virtual bool canBeReplacedByIdentity() const

Check if the op can be replaced by the identity op.

Returns

true if the op and be replaced by the identity op, false otherwise.

Op(const OperatorIdentifier &_opid, const Op::Settings &settings)

Constructor of the Op class.

Parameters
  • _opid – The operator identifier specifying domain:type:version, minimum and maximum number of input tensors and number of output tensors.

  • settings – The general op settings such as graph, name and scope.

Op(const Op&)

Copy constructor.

Note

This does NOT copy input and output.

Op &operator=(const Op&) = delete
virtual ~Op()

Destructor.

std::string str() const final

Return the op ID.

std::string debugName() const

Return the op name that is used for debug and profiling.

void createAndConnectOutTensor(OutIndex, TensorId)

Create an ActGrad (output) tensor and connect it to this op’s output.

Parameters
  • OutIndex – The output index that the output tensor should be connected to.

  • TensorId – The tensor ID of the tensor to be converted to an output tensor.

void append(std::stringstream &ss) const

Append this op to a stream.

Parameters

ss – The stream to append the op to.

void toJSON(std::stringstream &ss) const

Convert this op to JSON format and append it to a stream.

Parameters

ss – The stream to append the JSON-serialised op to.

int64_t memOfOutputs() const

Return the total memory of used by all output tensors.

inline virtual std::set<InIndex> optionalInputs() const

Return the input indices of all optional inputs to the op.

void defaultConnectInTensor(InIndex, TensorId)

Connect a tensor to an input index.

This method updates the input and updates consumers of the tensor with the tensor ID.

Parameters
  • InIndex – The input index to connect the tensor to.

  • TensorId – The tensor ID of the tensor to connect.

virtual void connectInTensor(InIndex index, TensorId tensorId)

Connect existing tensor to input index.

Parameters
  • index – The input index at which to connect the tensor.

  • tensorId – The ID of the existing tensor.

virtual void connectInTensor(InIndex inIndex, TensorId tensorId, VGraphId vgid)

Connect an existing tensor to an index with the source virtual graph.

Parameters
  • inIndex – The input index at which to connect the tensor.

  • tensorId – The ID of the existing tensor.

  • vgid – The virtual graph on which the existing tensor resides.

void connectInTensorDispatch(InIndex inIndex, TensorId tensorId)

Connect an existing tensor at an index with the source virtual graph.

Dispatcher to resolve issues with templated inheritance overloads. This will automatically derive the virtual graph ID of the input when required.

Parameters
  • inIndex – The input index at which to connect the tensor.

  • tensorId – The ID of the existing tensor.

void connectInTensorLike(const Op *other, InIndex index, TensorId tenId)

Connects the input tensor analogously to another op.

This is useful when cloning graphs or ops, because it avoids having to check if the op requires special considerations when connecting inputs.

IpuCopyOp is currently the only op where this applies, since a source virtual graph has to be specified when connecting it otherwise:

void connectInTensor(InIndex, TensorId, uint64_t sourceIpu);

Parameters
  • other – An op of the same type as the current op, from which to copy how the tensor at the corresponding index should be connected.

  • index – The input index to connect.

  • tenId – The ID of the tensor to connect.

void connectOutTensor(OutIndex, TensorId)

Connect existing tensor to output index.

Parameters
  • index – The output index at which to connect the tensor.

  • tensorId – The ID of the existing tensor.

void disconnectInTensor(Tensor *tensor)

Disconnect an input tensor from the op.

Parameters

tensor – The tensor to disconnect.

virtual void disconnectInTensor(InIndex, Tensor *tensor)

Disconnect an input tensor from the op at a specific input index.

Parameters
  • tensor – The tensor to disconnect.

  • InIndex – The index of the input tensor in the op.

void disconnectInTensor(InIndex)

Disconnect an input tensor from the input index.

Parameters

InIndex – The input index to disconnect the tensor from.

void disconnectOutTensor(Tensor *tensor)

Disconnect an output tensor from the op.

Parameters

tensor – The tensor to disconnect.

void disconnectAllInputs()

Disconnect all input tensors from the op.

void disconnectAllOutputs()

Disconnect all output tensors from the op.

const std::string &name() const

Return the op name.

virtual void setup()

Set the shape and type of the arguments to the op.

This MUST set the type and shape information for all the output TensorInfo objects.

void finalizeDebugInfo()

Finalize DebugInfo.

This method is called once after Ir::prepare() has completed.

virtual void setCalledSubgraphGradInfo(const FwdGraphToBwdGraphInfo &calledGraphsGradInfo)

Set information about the gradient graphs for this op’s called subgraphs.

If the op has called subgraphs, then this method will get called prior to getGradOps() to provide the op with the information it needs to call the grad version of the called subgraphs.

Parameters

calledGraphsGradInfo – The mapping between the forward graph and information on the gradient graph.

virtual std::vector<std::unique_ptr<Op>> getGradOps()

Determine the corresponding grad op for each op in the forward graph to automatically generate the backward pass.

There can be a separate gradient op for each input or a single gradient op that generates gradients for all inputs.

The mapping from the index of each output tensor of the gradient op to the index of each input tensor of the non-grad op is configured using the gradOutToNonGradIn() method that should be overridden in the grad op definitions.

Throws an error if this op is already a gradient op.

virtual std::vector<std::tuple<OperatorIdentifier, float>> inplacePriorityDefault() const

Return the variants of this op (if any) which can modify / alias the inputs at the given indices.

This function doesn’t check for anchor violations or topological order violations. When there are several ops, they should be returned in descending order of preference If the op can be replaced by an in-place variant of itself, this method should be overridden to return a vector of <OperatorIdentifier, float> tuples in descending order of preference.

virtual std::unique_ptr<Op> getInplaceVariant(const OperatorIdentifier&) const

Instantiate a particular in-place variant of the op with a specified OperatorIdentifier from the vector returned by inplacePriorityDefault().

Parameters

OperatorIdentifier – The operator identifier of the op to be instantiated.

Returns

An instance of the required op.

virtual void growAliasModel(AliasModel &aliasModel) const

For certain tasks which involve analysing how tensors alias each other, such as inplacing, a poprithms::memory::inplace::Graph that corresponds to this op’s graph is constructed.

The Poprithms graph can then be queried for aliasing information, and can have algorithms run on it.

To construct the Poprithms graph, each PopART op defines what its Poprithms equivalent ops are. This method inserts this op’s poprithms::memory::inplace::Op equivalents into the Poprithms Graph, which is the container popAliaser.

See also

AliasModel.

Parameters

aliasModel – The mapping between this op’s (PopART) graph and the Poprithms graph.

Pre

All input tensors of this op have mappings in aliasModel before the call to aliasModel.

Post

All output tensors of this op have mappings in aliasModel after to the call to aliasModel.

virtual poprithms::memory::inplace::Proposal mapInplaceProposal(const AliasModel &aliasModel, OperatorIdentifier) const

Translate a PopART inplacing proposal.

This replaces an outplace op with an inplace op of type inplaceId, into an AliasModel equivalent.

This method is defined as a void method which sets a value passed by reference, as opposed to a getter method, so that no Poprithms headers need to be included in this file.

Parameters
  • aliasModel – The mapping between this op’s (PopART) graph and the Poprithms graph.

  • 2 – The operator identifier to translate to the AliasModel equivalent.

Returns

A tuple where the first element corresponds to an alias gate in the AliasModel and the second element is a input index.

virtual view::Regions modifies(InIndex) const

Return the input region which this op modifies (for inplace ops).

Parameters

InIndex – The input index.

Returns

The regions which this op modifies.

virtual view::Regions uses(InIndex) const

Return the input region which this op uses.

Parameters

InIndex – The input index.

Returns

The regions which this op uses.

virtual view::Regions aliases(InIndex, OutIndex) const

Return the input region which the op output will alias (for inplace and view-changing ops).

See also

For more information on views, refer to the IPU Programmer’s Guide.

Parameters
  • InIndex – The input index.

  • OutIndex – The output index.

Returns

The regions which the output will alias.

virtual view::RegMap fwdRegMap(InIndex, OutIndex) const

Map regions of the input tensor at the input index to the regions of the output tensor at the output index that these input regions alias.

Parameters
  • InIndex – The op input index.

  • OutIndex – The op output index.

virtual view::RegMap bwdRegMap(InIndex, OutIndex) const

Map regions of the output tensor at the output index to the regions of the input tensor at the input index that these output regions alias.

Parameters
  • InIndex – The op input index.

  • OutIndex – The op output index.

virtual std::tuple<ReplEqOutputMap, ReplEqModifiedInputMap> fwdPropagateIsReplicaEqual(const AliasModel &aliasModel, const ReplEqInputMap &inputMap, ReplicaEqualAnalysisProxy &proxy) const

Determine whether output tensors are guaranteed to have an equal value across all replicas.

This means that they are “replica equal”. The check is based on information about the replica equal status of input tensors (and the same for any inputs that are modified by the op).

The default implementation sets each output tensor as being replica-equal if and only if all tensor inputs are replica-equal. For modified inputs, the default is to assume it is replica-equal only if there is an output that is deemed replica-equal that fully aliases all elements of the input. This default implementation is not correct for all ops. Ops that need a specialized implementation should override this virtual function.

Parameters
  • aliasModel – An alias model object.

  • inputMap – A map that stores, for each input, whether the inputs are data-equivalent over all replicas.

  • proxy – A helper object passed in by the replica-equal analysis.

Returns

A tuple comprising of:

  1. a mapping from output index to a replica-equal status with an entry for each output tensor.

  2. a vector of input indices for inputs that were modified by the op to a value that is not replica-equal.

bool doesAlias() const

Check if any input tensor aliases any output tensor .

Returns

true if any input tensor aliases any output tensor, otherwise false.

inline bool isOutplace() const

Check if this is an outplace op.

This means that no input tensor aliases any output tensor.

Returns

true if this is an outplace op, otherwise false.

bool doesAlias(InIndex inIndex, OutIndex outIndex) const

Check that the input tensor at an input index aliases the output tensor at an output index.

Returns

true if the input tensor at inIndex aliases the output tensor at outIndex, false otherwise.

bool modifies() const

Check if op modifies a tensor at any index.

Returns

true if the op modifies a tensor at any index, otherwise false.

bool modifiesIndex(InIndex in) const

Check if an op modifies a tensor at a specific index.

Parameters

in – The input index to check.

Returns

true if the op modifies the tensor, false otherwise.

bool overwritesTensor(Tensor *t) const

Check if an op overwrites a tensor.

Parameters

t – The tensor to check.

Returns

true if it overwrites the tensor, false otherwise.

bool modifiesTensor(Tensor *t) const

Check if an op modifies a tensor.

Parameters

t – The tensor to check.

Returns

true if it modifies the tensor, false otherwise.

inline virtual bool isInplaceViewChange() const

Check if this is an inplace op that changes a view.

Examples of inplace ops that change views are:

  • ReshapeInplaceOp

  • IdentityInplaceOp

  • TransposeInplaceOp.

    See also

    For more information on views, refer to the IPU Programmer’s Guide.

Returns

true if this is a view changing inplace op, false otherwise.

inline virtual bool isOutplaceViewChange() const

Check if this is an outplace op that changes a view.

Examples of outplace ops that change views are:

Returns

true if this is a view changing outplace op, otherwise false.

virtual int getNonGradInIndex(int gradOpOutIndex) const

Return the index in the non-grad op which has an output edge-gradient tensor in the matching grad op.

This method throws an error if the op this is called on is not a grad op.

Parameters

gradOpOutIndex – The index at which the grad op has an output of an edge-gradient tensor.

Returns

The index in the non-grad op containing the input tensor corresponding to the edge-gradient tensor in the grad op output.

virtual const std::vector<GradInOutMapper> &gradInputInfo() const

Get the mapping between input indices in the grad op (for inputs, outputs and grad outputs) to the input indices in the corresponding non-grad op.

This method throws an error if the op this is called on is not a grad op.

Returns

The mapping between input indices in the grad op (for inputs, outputs and grad outputs) to the input indices in the corresponding non-grad op.

virtual const std::map<int, int> &gradOutToNonGradIn() const

Get the mapping between the grad op outputs and the inputs of the corresponding non-grad op.

This method throws an error if the op this is called on is not a grad op.

virtual std::unique_ptr<Op> clone() const = 0

Return a copy of the op.

This method must be implemented. The compiler throws an error if this method is not implemented.

template<typename T>
inline bool isConvertibleTo() const
virtual bool isLossOp() const

Check if this is a LossOp op, for example NllOp or L1Op.

Note

The op SumOp which adds the losses together is not a LossOp.

Returns

true if this is a LossOp op, false otherwise.

virtual bool isIpuCopyOp() const

Check if this is an IpuCopyOp op.

Returns

true if this is an IpuCopyOp op, false otherwise.

virtual bool copiesOptimizerTensors() const

Check if this copies only optimizer tensors from one IPU to another.

Returns

true if this op copies only optimizer tensors from one IPU to another, false otherwise.

virtual bool isOptimizerOp() const

Check if op is part of the optimizer.

bool isGradientClippingOp() const

Check if op is a part of gradient clipping.

virtual bool requiresRandomSeed() const

Check if the op requires a random seed.

This is set to falseby default and should be overridden and set to true if an IPU random seed tensor is required by the op. If, so it will be connected to inTensor(getSeedInIndex()) by the IR process.

Returns

true if the op requires a random seed, false otherwise.

virtual InIndex getSeedInIndex() const

Check if the op requires a random seed.

This is set to false by default and should be overridden and set to true if an IPU random seed tensor is required by the op. If, so it will be connected to inTensor(getSeedInIndex()) by the IR process.

Returns

true if the op requires a random seed, false otherwise.

bool hasInput(InIndex index) const

Check if the op has an input at the input index.

Returns

true if the op has an input at the input index, otherwise false.

bool hasOutput(OutIndex index) const

Check if the op has an output at the output index.

Returns

true if the op has an output at the output index, otherwise false.

Tensor *inTensor(InIndex index)

Get the input tensor at the input index.

Parameters

index – The input index.

Returns

The tensor at the input index.

const Tensor *inTensor(InIndex index) const

Get the input tensor at the input index.

Parameters

index – The input index.

Returns

The tensor at the input index.

Tensor *outTensor(OutIndex index)

Get the output tensor at the output index.

Parameters

index – The output index.

Returns

The tensor at the output index.

const Tensor *outTensor(OutIndex index) const

Get the output tensor at the output index.

Parameters

index – The output index.

Returns

The tensor at the output index.

TensorId inId(InIndex index)

Get the ID of the input tensor at the input index.

Parameters

index – The input index.

Returns

The tensor ID of the tensor at the input index.

const TensorId inId(InIndex index) const

Get the ID of the input tensor at the input index.

Parameters

index – The input index.

Returns

The tensor ID of the tensor at the input index.

TensorId outId(OutIndex index)

Get the ID of the output tensor at the output index.

Parameters

index – The output index.

Returns

The tensor ID of the tensor at the output index.

const TensorId outId(OutIndex index) const

Get the ID of the output tensor at the output index.

Parameters

index – The output index.

Returns

The tensor ID of the tensor at the output index.

TensorInfo &inInfo(InIndex index)

Get the info of the input tensor at the input index.

Parameters

index – The input index.

Returns

The tensor info of the tensor at the input index.

const TensorInfo &inInfo(InIndex index) const

Get the info of the input tensor at the input index.

Parameters

index – The input index.

Returns

The tensor info of the tensor at the input index.

TensorInfo &outInfo(OutIndex index)

Get the info of the output tensor at the output index.

Parameters

index – The output index.

Returns

The tensor info of the tensor at the output index.

const TensorInfo &outInfo(OutIndex index) const

Get the info of the output tensor at the output index.

Parameters

index – The output index.

Returns

The tensor info of the tensor at the output index.

const Shape &inShape(InIndex index) const

Get the shape info of the input tensor at the input index.

Parameters

index – The input index.

Returns

The shape info of the tensor at the input index.

const Shape &outShape(OutIndex index) const

Get the shape info of the output tensor at the output index.

Parameters

index – The output index.

Returns

The shape info of the tensor at the output index.

size_t inTensorCount() const

Get the number of input tensors of this op.

Returns

The number of input tensors this op has.

size_t outTensorCount() const

Get the number of output tensors of this op.

Returns

The number of output tensors this op has.

Rank inRank(InIndex index) const

Get the rank of the input tensor at the input index.

Parameters

index – The input index.

Returns

The rank of the tensor at the input index.

Rank outRank(OutIndex index) const

Get the rank of the output tensor at the output index.

Parameters

index – The output index.

Returns

The rank of the tensor at the output index.

InIndex inIndex(Tensor*) const

Get the input index of the tensor.

Parameters

Tensor – The input tensor.

Returns

The input index of the tensor in the op.

OutIndex outIndex(Tensor*) const

Get the output index of the tensor.

Parameters

Tensor – The output tensor.

Returns

The output index of the tensor in the op.

virtual void appendAttributes(OpSerialiserBase&) const

Append attributes when serialising the op to a stream.

This is used for debugging and also to generate the PopART IR hash. This hash is used to determine whether a Poplar cache can be reused so it is important that op attributes which may alter the Poplar compilation are appended to this stream. If this method is overridden, then it must also call the base class method.

Parameters

OpSerialiserBase – The stream to which the attributes should be appended.

virtual void appendOutlineAttributes(OpSerialiserBase&) const

Append the op attributes that are relevant for outlining ops.

Ops should override this function if there are additional attributes. Two ops with identical type and outline attributes can be outlined and are supposed to be functionally equivalent.

Parameters

OpSerialiserBase – The stream to which the attributes should be appended.

virtual void appendMore(OpSerialiserBase&) const

Append additional attributes to the stream.

This method should be overridden if the derived class has additional attributes.

Parameters

OpSerialiserBase – The stream to which the attributes should be appended.

Shape prettyNpOut(const Shape &s0, const Shape &s1) const

Calculate the NumPy broadcast shape for two shapes.

This will throw an error if the broadcast is not aligned. The error will have operator context. Note: If the replicated tensor sharding meta-shape is required, use prettyNpOut with TensorInfo instead.

Parameters
  • s0 – The first shape.

  • s1 – The second shape.

Returns

The NumPy-like broadcasted output shape.

TensorInfo prettyNpOut(const TensorInfo &i0, const TensorInfo &i1, bool checkDataType = true) const

Calculate the NumPy broadcast shape for two shapes.

This will throw an error if the broadcast is not aligned. The error will have operator context.

Parameters
  • i0 – The info for the first tensor containing shape and meta-shape.

  • i1 – The info for the second tensor containing shape and meta-shape.

  • checkDataType – Check that the data types are identical. If true, check that the data types are identical and throw an error if they are not. If false, do not check that data types are identical.

Returns

The NumPy-like broadcast output info containing the correct shape and meta-shape. The data type is taken from i0.

virtual std::vector<const Graph*> getCalledGraphs() const

Get all graphs that this op may call during its execution.

Returns

A vector of all graphs that this op may call during its execution.

std::vector<GraphId> getCalledGraphIds() const

Get the IDs of all graphs that this op may call during its execution.

Returns

A vector of IDs of all graphs that this op may call during its execution.

SubgraphIndex getCalledGraphIndex(const GraphId &id) const

Get the index in the op where the graph is called.

Parameters

id – The ID of the called graph.

Returns

The index at which the graph is called.

virtual InIndex opInToSubgraphInIndex(SubgraphIndex subgraphIndex, InIndex inIndex) const

Get the input index for the subgraph corresponding to the op input index.

Parameters
  • subgraphIndex – The index of the subgraph from the set of subgraphs called by this op (returned by getCalledGraphs()).

  • inIndex – The input index in the op.

Returns

The input index in the subgraph that corresponds to the input index in the op, or -1 if the op input index is not used by the subgraph.

virtual InIndex subgraphInToOpInIndex(SubgraphIndex subgraphIndex, InIndex inIndex) const

Get the input index for the op corresponding to the subgraph input index.

Parameters
  • subgraphIndex – The index of the subgraph from the set of subgraphs called by this op (returned by getCalledGraphs()).

  • inIndex – The input index in the subgraph.

Returns

The input index in the op that corresponds to the input index in the subgraph, or -1 if the subgraph input index is not used by the op.

virtual OutIndex opOutToSubgraphOutIndex(SubgraphIndex subgraphIndex, OutIndex outIndex) const

Get the output index for the subgraph corresponding to the op output index.

Parameters
  • subgraphIndex – The index of the subgraph from the set of subgraphs called by this op (returned by getCalledGraphs()).

  • outIndex – The output index in the op.

Returns

The output index in the subgraph that corresponds to the output index in the op, or -1 if the op output index is not used by the subgraph.

virtual OutIndex subgraphOutToOpOutIndex(SubgraphIndex subgraphIndex, OutIndex outIndex) const

Get the output index for the op corresponding to the subgraph output index.

Parameters
  • subgraphIndex – The index of the subgraph from the set of subgraphs called by this op (returned by getCalledGraphs()).

  • outIndex – The output index in the subgraph.

Returns

The output index in the op that corresponds to the output index in the subgraph, or -1 if the subgraph output index is not used by the op.

virtual std::set<OutIndex> opInToOpOutIndex(InIndex in) const

Get the the set of outputs to visit based on the input index (for graph traversal).

Parameters

in – The input index used to determine the set of outputs to visit.

Returns

The set of outputs to visit based on the input index.

virtual std::set<InIndex> opOutToOpInIndex(OutIndex out) const

Get the the set of inputs to visit based on the output index (for graph traversal).

Parameters

out – The output index used to determine the set of inputs to visit.

Returns

The set of inputs to visit based on the output index.

std::string getSubgraphEquivId(const std::map<std::string, popart::any> &externalAttrs = {}) const

Get a string that represents the equivalence class that this op belongs to.

This is used by, for example transforms, to determine if two ops are the same. If and only if two ops return the same equivalence ID then those ops can be considered of the same equivalence class.

Parameters

externalAttrs – Additional attributes by which to distinguish this op. The value types must be one of: float, double, int, int64_t, uint32_t, uint64_t, std::string, std::vector<float>, std::vector<double>, std::vector<int64_t>, popart::Scope, bool, nonstd::optional<int64_t>, nonstd::optional<float>, nonstd::optional<double> or std::map<TensorId, uint64_t>. We use this to add, for example replica-equalness properties to the equivalence ID, which is a property that is calculated on-the-fly as opposed to stored in the op.

Returns

The equivalence ID.

std::map<fwtools::subgraph::InIndex, SubgraphInSig> getSubgraphInputs() const

Get all the producer ops of the tensors consumed at the input index.

Returns

A map of producer ops for the tensors consumed at the input index.

std::map<fwtools::subgraph::OutIndex, OpSet> getSubgraphOutputs() const

Get all the consumer ops of the tensors produced at the output index.

Returns

A map of consumer ops for the tensors produced at the output index.

virtual float getSubgraphValue() const = 0

Get the subgraph value.

This is used by outlining algorithm to determine whether or not to outline ops. There are high bounding values retrieved by getHighSubgraphValue() (for expensive ops such as Conv) or low bounding values retrieved by getLowSubgraphValue() (for inexpensive ops such as Relu).

Returns

The subgraph value. Default: 0.

inline float getHighSubgraphValue() const

Return the high subgraph value.

inline float getLowSubgraphValue() const

Return the low subgraph value.

virtual float calcAutoVirtualGraphCost(std::set<int> &inputs_seen)

Get approximate cost of activations between forward and backward graphs.

virtual bool isOutlineable() const

Check if op can be outlined.

If this method returns false, it will mean that any possible subgraph that this op is part of will not be cached.

Returns

true if the op can be outlined, false otherwise. Default: true.

virtual bool hasSideEffect() const

Check if the op has any effect that is not captured by the (modification of) input or output tensors, such as modifying the state of the IPU or host system.

Returns

true if the op has side effects, false otherwise. Default=false.

virtual bool canRecompute() const

Check if the op can be recomputed.

To recompute an op means to clone it to produce the same output. The function checks the safeness of recompute in the context of explicit recompute. It may still be unsafe for implicit recompute.

Returns

true if the op can be recomputed, false otherwise. Default: hasSideEffect().

bool inputsUnmodifiable() const

Check if any input indices are unmodifiable or alias an unmodifiable tensor.

Returns

true if any connected variable tensor for all input indices has a non-empty alias chain and is unmodifiable, false otherwise.

bool consumesGraphOutput() const

Check if op consumes the outputs of the graph.

Returns

true if op consumes graph outputs, false otherwise.

bool producesGraphOutput() const

Check if op produces the outputs of the graph.

Returns

true if op produces graph outputs, false otherwise.

bool inputUnmodifiable(InIndex in) const

Check if the input index is unmodifiable or aliases an unmodifiable tensor.

Parameters

in – The input index to check.

Returns

true if any connected variable tensor has a non-empty alias chain and is unmodifiable, false otherwise.

bool inputUnmodifiableFor(InIndex in, const AliasModel *popMem) const

Check if the input index is unmodifiable or aliases an unmodifiable tensor with given poprithm graph.

Parameters

in – The input index to check.

Returns

true if any connected variable tensor has a non-empty alias chain and is unmodifiable, false otherwise.

bool hasAliasedModifiers(OutIndex out) const

Check if output is modified by any consumer.

Parameters

out – The output index to check.

Returns

true if any consumer of any aliased tensor downstream modifies a non-empty region, false otherwise.

bool hasAliasedModifiersFor(OutIndex out, const AliasModel *popMem) const

Check if output is modified by any consumer with the given poprithm graph.

Parameters

out – The output index to check.

Returns

true if any consumer of any aliased tensor downstream modifies a non-empty region, false otherwise.

bool isParentOf(const Op*) const

Check if the graph is a parent of the op.

A graph is a parent of an op if and only if the op is a child of the graph.

Parameters

1 – The op that is being checked.

Returns

true if the graph is a parent graph, false otherwise.

bool isChildOf(const Op*) const

Check if the graph is a child graph.

A graph is a direct child of an op if the graph consumes any of the tensors the op produces.

Parameters

1 – The op that is being checked.

Returns

true if the graph is a child graph, false otherwise.

virtual bool canShard() const

Check if the operation can be sharded into multiple operations.

Returns

true if the operation can be sharded, false otherwise.

virtual ReductionType getShardReductionType(OutIndex index) const

Get the reduction type to apply after sharding, if the output shape does not change.

Parameters

index – The output index at which to determine the reduction type.

Returns

The reduction type.

inline virtual float getShardRescaleFactor(Op *const shardedOp, OutIndex index) const

Get the scale factor to apply after sharding, if required.

Parameters
  • shardedOp – The sharded op.

  • index – The output index at which to determine the scale factor.

Returns

The scale factor. Default:1.0.

std::map<TensorId, std::vector<TensorId>> shard(const std::map<TensorId, std::vector<TensorId>> &inputs)

Shard an operation into multiple operations according to the new, already sharded input tensors.

Parameters

inputs – The sharded input tensors.

Returns

The sharded output tensors.

ShardingPlan shard(const ShardingPlan plan)

Create an output sharding plan from sharding an op.

The sharding plan also contains the individual input/output shards of an operation. When sharding an operation, the new plan is updated with the resulting sharded tensors.

Parameters

plan – The input sharding.

Returns

The plan after sharding the operation containing the resulting sharded tensors.

virtual void configureShardedOp(Op *const shardedOp, const Settings *const settings_) const

Configure a sharded op.

Parameters
  • shardedOp – The sharded op to be configured.

  • settings_ – The settings to apply to the sharded op.

virtual ReplicatedTensorShardingIndices getReplicatedTensorShardingIndices() const

Return which inputs and outputs are replicated tensor sharding pairs.

virtual void configureForReplicatedTensorSharding(ReplicatedTensorShardingIndices indices, CommGroup shardingDomain)

Configure the op for replicated tensor sharding at specific indices.

Parameters
  • indices – The indices at which to configure the op for replicated tensor sharding.

  • shardingDomain – The type and size of the replica group specified by a CommGroup object.

virtual void configureForReplicatedTensorSharding(ReplicatedTensorShardingIndices indices, const ReplicaGrouping &grouping)

Configure the op for replicated tensor sharding at specific indices.

Parameters
  • indices – The indices at which to configure the op for replicated tensor sharding.

  • grouping – The stride and size of the replica group specified by a ReplicaGrouping object.

void transferBaseProperties(Op *to)

Transfer the base properties from another op to this op.

Parameters

to – The op to transfer the base properties from.

Op *getPrecedingOp(InIndex inIndex)

Get the producer op of the input tensor at the input index.

Parameters

inIndex – The index at which the input tensor is produced.

Returns

The op which produces the input tensor at the input index.

Op *getFollowingOp(OutIndex outIndex = 0)

Get the op that consumes an output tensor at an output index.

This will throw an error if there is more than one consumer op.

Parameters

outIndex – The index at which the output tensor is consumed.

Returns

The op which consumes the output tensor at the output index.

std::vector<Op*> getFollowingOps(OutIndex outIndex = 0)

Get all ops that consume an output tensor at an output index.

Parameters

outIndex – The index at which the output tensor is consumed.

Returns

A vector of ops which consume the output tensor at the output index.

template<typename T>
inline T *getPrecedingOp(InIndex inIndex)

Get the producer op of the input tensor at the input index.

This will throw an error if the producer op cannot be converted to type T.

Parameters

inIndex – The index at which the input tensor is produced.

Returns

The op, converted to type T, which produces the input tensor at the input index.

template<typename T>
inline T *getFollowingOp(OutIndex outIndex = 0)

Get the op that consumes an output tensor at an output index.

This will throw an error if there is more than one consumer op, or if the consumer op cannot be converted to type T.

Parameters

outIndex – The index at which the output tensor is consumed.

Returns

The op, converted to type T, which consumes the output tensor at the output index.

template<typename T>
inline std::vector<T*> getFollowingOps(OutIndex outIndex = 0)

Get all ops that consume an output tensor at an output index.

This will throw an error if not all of the consumer ops can be converted to type T.

Parameters

outIndex – The index at which the output tensor is consumed.

Returns

A vector of ops, converted to type T, which consume the output tensor at the output index.

bool isPipelineIpuCopyOp() const

Check if the op is of the class IpuCopyOp that copies between pipeline stages.

Returns

true if op is of the class IpuCopyOp and copies between pipeline stages, false otherwise.

Public Members

std::unique_ptr<TensorIndexMap> input
std::unique_ptr<TensorIndexMap> output
OpId id = {-1}
OperatorIdentifier opid
bool pruneable = true
Settings settings
OpDebugInfo debugInfo
struct Settings

Structure to capture the settings for the op.

Public Functions

inline Settings(Graph &graph_, const std::string &name_)

Constructor for the Settings structure.

Parameters
  • graph_ – The graph the op belongs to.

  • name_ – The name of the op.

inline Settings(Graph &graph_, const std::string &name_, const Scope &scope_)

Constructor for the Settings structure.

Parameters
  • graph_ – The graph the op belongs to.

  • name_ – The name of the op.

  • scope_ – The scope of the op.

inline Settings(Graph &graph_, const std::string &name_, const Scope &scope_, const uint64_t parentId_)

Constructor for the Settings structure.

Parameters
  • graph_ – The graph the op belongs to.

  • name_ – The name of the op.

  • scope_ – The scope of the op.

  • parentId_ – The ID of the debug info.

inline Settings(Graph &graph_, const std::string &name_, const uint64_t parentId_)

Constructor for the Settings structure.

Parameters
  • graph_ – The main graph.

  • name_ – The name of the op.

  • parentId_ – The ID of the debug info.

virtual ~Settings() = default

Destructor for the Settings structure.

Settings(const Settings&) = default
inline Settings copy(const std::string &new_name)

Create a copy of the current settings with a new name.

Parameters

new_name – The name of the new settings.

Returns

A copy of the current settings with the new name.

virtual void setFromAttributes(const Attributes &attributes)

Append the optional attributes to the Settings structure depending on whether the attribute has been set in the ONNX model.

Parameters

attributes – The attributes to be added to the Settings structure.

Ir &getIr() const

Get the IR associated with the main graph.

Returns

The IR associated with the main graph.

Public Members

std::reference_wrapper<Graph> graph
std::string name = ""
Scope scope
RecomputeType recomputeType = RecomputeType::Undefined
OptionalTensorLocation tensorLocation
std::vector<std::tuple<std::string, float>> inplacePriorityVeto
std::unordered_set<std::string> excludePatterns
OptionalVGraphId vgraphId
OptionalPipelineStage pipelineStage
OptionalExecutionPhase executionPhase
OptionalBatchSerializedPhase batchSerializedPhase
OptionalStochasticRoundingMethod stochasticRoundingMethod
TileSet tileSet = {TileSet::Compute}
ExecutionContext executionContext = {ExecutionContext::Normal}
std::map<InIndex, InIndex> inferTensorMappingToFrom
double schedulePriority = {0.0}
std::map<std::string, std::string> extraOutlineAttributes
uint64_t debugInfoId = {0}
bool optimizerOp = {false}
bool gradientClippingOp = {false}
class GradInOutMapper

Class that represents the mapping between the indices of the input tensors to the gradient operation and the indices of these same tensors in the non-gradient operation.

Public Functions

GradInOutMapper(InIndex iGrad_, int iNonGrad_, GradOpInType)

Constructor for the GradInOutMapper class.

Parameters
  • iGrad_ – The index of the input tensor to the gradient operation.

  • iNonGrad_ – The index of the gradient operation input tensor as it is indexed in the non-gradient operation.

  • GradOpInType – The type of the input tensor to the gradient operation.

bool operator==(const GradInOutMapper &rhs) const

Check if the current GradInOutMapper object is equal to another GradInOutMapper object.

Parameters

rhs – A GradInOutMapper object to be compared to the current object.

Returns

true if objects are equal, false otherwise.

Public Members

InIndex iGrad
int iNonGrad
GradOpInType type
enum class popart::ReductionType

Define the reduction operation to use over a sequence of tensors.

The two use-cases for this enum type are:

  • denoting how to reduce individual losses produced by a LossOp over a minibatch (specified by the LossOp reduction parameter)

  • denoting how to reduce weight gradients over a number of replicas when gradient accumulation is enabled (specified by the global session option SessionOptions::accumulationAndReplicationReductionType).

Values:

enumerator Sum = 0

Sum the input values and do not scale the output (Default).

enumerator Mean

Take the mean of the input values.

enumerator NoReduction

Do not reduce the input values.

Keep them stacked into a single tensor. So values \(t_1, ..., t_k\) get collected into a tensor \([t_1, ..., t_k]\).

enumerator N

The number of ReductionType values.

#include <popart/operatoridentifier.hpp>
struct OperatorIdentifier

Subclassed by popart::AiGraphcoreOpIdV1

Public Functions

inline OperatorIdentifier(const OpDomain &_domain, const OpType &_type, OpVersion _version, NumInputs inputs = {}, int outputs = 0)
inline bool operator==(const OperatorIdentifier &rhs) const
inline bool operator!=(const OperatorIdentifier &rhs) const
inline bool operator<(const OperatorIdentifier &rhs) const

Public Members

OpDomain domain
OpType type
OpVersion version
NumInputs numInputs
int numOutputs
struct NumInputs

Public Functions

inline NumInputs()
inline NumInputs(int f)
inline NumInputs(int _min, int _max)

Public Members

int min
int max
#include <popart/tensorlocation.hpp>
using popart::VGraphIdAndTileSet = std::pair<VGraphId, TileSet>
#include <popart/basicoptionals.hpp>
using popart::OptionalTensorLocation = BasicOptional<TensorLocation, 9>
using popart::OptionalVGraphId = BasicOptional<VGraphId, 2>
using popart::OptionalPipelineStage = BasicOptional<PipelineStage, 3>
using popart::OptionalExecutionPhase = BasicOptional<ExecutionPhase, 5>
using popart::OptionalBatchSerializedPhase = BasicOptional<BatchSerializedPhase, 7>
using popart::OptionalStochasticRoundingMethod = BasicOptional<StochasticRoundingMethod, 10>
using popart::OptionalDataType = BasicOptional<DataType, 0>
#include <popart/opmanager.hpp>
class OpDefinition

Public Types

using DataTypes = std::vector<DataType>
using Inputs = std::vector<Input>
using Outputs = std::vector<Output>
using Attributes = std::map<std::string, Attribute>

Public Functions

inline OpDefinition()
inline OpDefinition(Inputs i, Outputs o, Attributes a)

Public Members

Inputs inputs
Outputs outputs
Attributes attributes
struct Attribute

Public Functions

inline Attribute(std::string regex)

Public Members

std::string supportedValuesRegex
struct Input

Public Functions

inline Input(std::string n, std::vector<DataType> t, bool _constant = false)

Public Members

std::string name
std::vector<DataType> supportedTensors
bool constant
struct Output

Public Functions

inline Output(std::string n, std::vector<DataType> t)

Public Members

std::string name
std::vector<DataType> supportedTensors
class OpCreatorInfo

Public Functions

inline OpCreatorInfo(const OperatorIdentifier &_opid, const Op::Settings &_settings, const Attributes &_attributes, const std::vector<TensorId> &_inputIds, const std::vector<TensorId> &_outputIds)
inline bool hasInputIds() const
inline bool hasOutputIds() const
const std::vector<TensorId> &getInputIds() const
const std::vector<TensorId> &getOutputIds() const
Tensor *getInputTensor(int index) const
TensorData *getInputTensorData(int index) const
TensorInfo &getInputTensorInfo(int index) const
bool hasInputTensor(int index) const
std::string debugName() const
template<typename T>
inline std::vector<T> getInputData(int index, const std::set<DataType> &acceptedTypes) const
template<typename T>
inline std::vector<T> getInputData(int index) const
template<typename T>
inline T getInputScalarValue(int index) const
template<typename T>
inline T getInputScalarValue(int index, T defaultValue) const

Public Members

const OperatorIdentifier &opid
const Op::Settings &settings
const Attributes &attributes
class OpManager

Public Types

using OpFactoryFunc = std::function<std::unique_ptr<Op>(const OpCreatorInfo&)>
using ComplexOpFactoryFunc = std::function<Op*(const OpCreatorInfo&, Graph &graph)>

Public Functions

OpManager() = default

Public Static Functions

static void registerOp(const OpInfo &opInfo)
static Attributes getAttributesFromAnyMap(std::map<std::string, popart::any> attributes)
static std::unique_ptr<Op> createOp(const OpDomain &domain, const OpType &type, const int opsetVersion, Graph &graph, const std::string &name = "", const Scope &scope = {}, const Attributes &_attr = {}, const std::vector<TensorId> &inputIds = {}, const std::vector<TensorId> &outputIds = {})
static std::unique_ptr<Op> createOp(const OperatorIdentifier &opid, Graph &graph, const std::string &name = "", const Attributes &_attr = {})
static std::unique_ptr<Op> createOpWithInputs(const OperatorIdentifier &opid, Graph &graph, const std::string &name, const Attributes &_attr, const std::vector<TensorId> &inIds)
static Op *createOpInGraph(const Node &node, Graph &graph)
static const std::vector<OperatorIdentifier> getSupportedOperations(bool includePrivate)
static const std::vector<OperatorIdentifier> getUnsupportedOperations(int opsetVersion)
static const OpDefinitions getSupportedOperationsDefinition(bool includePrivate)
static OpVersion getOpVersionFromOpSet(const OpDomain &opDomain, const OpType &type, const int opsetVersion)
class OpInfo

Public Functions

inline OpInfo(const OperatorIdentifier &_id, bool _isPublic, const OpDefinition &_details, OpFactoryFunc _f1)
inline OpInfo(const OperatorIdentifier &_id, bool _isPublic, const OpDefinition &_details, ComplexOpFactoryFunc _f2)
OpFactoryFunc &getSimpleFactory()
ComplexOpFactoryFunc &getComplexFactory()
bool hasComplexFactory()

Public Members

bool isPublic
const OperatorIdentifier id
OpDefinition details
enum class popart::RecomputeType

Define the type of recomputation.

Values:

enumerator Undefined = 0

Default value if RecomputeType has not been set.

enumerator Checkpoint

Do not recompute. Outputs from the op are kept from the forward pass.

enumerator Recompute

Recompute operation.

enumerator Recomputed

For explicit recomputation, this marks a cloned operation that had RecomputeType::Recompute set.

After cloning, the original op is changed to RecomputeType::Checkpoint, and the cloned op is changed to Recomputed.

enum class popart::ExecutionContext

Define the type of the execution context.

Values:

enumerator Normal = 0

Run the forward and backward passes (Default).

enumerator AccumulateOuterFragment

Used to run the AccumulateOps after the gradient accumulation loop completes.

enumerator WeightsFromHostFragment

Used to transfer weights from host to device.

enumerator WeightsToHostFragment

Used to download weights from the device to the host.

enumerator OptimizerFromHostFragment

Used to stream the optimizer state from the host.

enumerator Subgraph

Program fragment used for subgraph-specific operations.

enum class popart::GradOpInType

Define the relationship between the input tensors of a gradient operation and the corresponding non-gradient operation.

Values:

enumerator In = 0

Indicates that the input tensor to the gradient operation is an input tensor of the non-gradient operation (Default).

enumerator Out

Indicates that the input tensor to the gradient operation is an output tensor of the non-gradient operation.

enumerator GradOut

Indicates that the input tensor to the gradient operation is an output gradient tensor of the non-gradient operation.

#include <popart/op/varupdate.hpp>
class VarUpdateOp : public popart::Op

Base class used to define PopART ops that update variable tensors.

Subclassed by popart::AccumulatorScaleOp, popart::VarUpdateWithUpdaterOp

Public Functions

VarUpdateOp(const OperatorIdentifier&, const Op::Settings&)
virtual std::unique_ptr<Op> clone() const override = 0

Return a copy of the op.

This method must be implemented. The compiler throws an error if this method is not implemented.

virtual void setup() final

Set the shape and type of the arguments to the op.

This MUST set the type and shape information for all the output TensorInfo objects.

virtual view::Regions aliases(InIndex in, OutIndex) const override

Return the input region which the op output will alias (for inplace and view-changing ops).

See also

For more information on views, refer to the IPU Programmer’s Guide.

Parameters
  • InIndex – The input index.

  • OutIndex – The output index.

Returns

The regions which the output will alias.

virtual view::Regions modifies(InIndex) const override

Return the input region which this op modifies (for inplace ops).

Parameters

InIndex – The input index.

Returns

The regions which this op modifies.

virtual std::map<InIndex, TensorId> optimizerInputs() const = 0
inline virtual bool isOptimizerOp() const override

Check if op is part of the optimizer.

virtual ReplicatedTensorShardingIndices getReplicatedTensorShardingIndices() const override

Return which inputs and outputs are replicated tensor sharding pairs.

virtual void growAliasModel(AliasModel&) const override

For certain tasks which involve analysing how tensors alias each other, such as inplacing, a poprithms::memory::inplace::Graph that corresponds to this op’s graph is constructed.

The Poprithms graph can then be queried for aliasing information, and can have algorithms run on it.

To construct the Poprithms graph, each PopART op defines what its Poprithms equivalent ops are. This method inserts this op’s poprithms::memory::inplace::Op equivalents into the Poprithms Graph, which is the container popAliaser.

See also

AliasModel.

Parameters

aliasModel – The mapping between this op’s (PopART) graph and the Poprithms graph.

Pre

All input tensors of this op have mappings in aliasModel before the call to aliasModel.

Post

All output tensors of this op have mappings in aliasModel after to the call to aliasModel.

Public Static Functions

static inline InIndex getVarToUpdateInIndex()
static inline OutIndex getUpdatedVarOutIndex()
class AccumulatorScaleOp : public popart::VarUpdateOp

Inplace multiplies a tensor by an OptimizerValue factor.

As with other Ops that consume OptimizerValues, will only have an input tensor for the value if the OptimizerValue is not const.

Will directly zero the input tensor if the factor is const and 0.

Subclassed by popart::AccumulatorZeroOp

Public Functions

AccumulatorScaleOp(const OptimizerValue factor_, const Op::Settings&)
virtual std::unique_ptr<Op> clone() const override

Return a copy of the op.

This method must be implemented. The compiler throws an error if this method is not implemented.

virtual std::map<InIndex, TensorId> optimizerInputs() const override
virtual void appendOutlineAttributes(OpSerialiserBase&) const override

Append the op attributes that are relevant for outlining ops.

Ops should override this function if there are additional attributes. Two ops with identical type and outline attributes can be outlined and are supposed to be functionally equivalent.

Parameters

OpSerialiserBase – The stream to which the attributes should be appended.

inline const OptimizerValue &getFactor() const
inline virtual float getSubgraphValue() const override

Get the subgraph value.

This is used by outlining algorithm to determine whether or not to outline ops. There are high bounding values retrieved by getHighSubgraphValue() (for expensive ops such as Conv) or low bounding values retrieved by getLowSubgraphValue() (for inexpensive ops such as Relu).

Returns

The subgraph value. Default: 0.

virtual view::Regions modifies(InIndex) const override

Return the input region which this op modifies (for inplace ops).

Parameters

InIndex – The input index.

Returns

The regions which this op modifies.

Public Static Functions

static inline InIndex getFactorInIndex()
class AccumulatorZeroOp : public popart::AccumulatorScaleOp

An AccumulatorScaleOp with a factor of 0, so zeroes the input tensor.

Public Functions

inline AccumulatorZeroOp(const Op::Settings &settings)
virtual std::unique_ptr<Op> clone() const override

Return a copy of the op.

This method must be implemented. The compiler throws an error if this method is not implemented.

class VarUpdateWithUpdaterOp : public popart::VarUpdateOp

Subclassed by popart::AccumulateBaseOp, popart::AdamComboOp, popart::AdamVarUpdateOp, popart::AdaptiveComboOp, popart::CopyVarUpdateOp, popart::ScaledVarUpdateOp, popart::SGD0ComboOp, popart::SGD0VarUpdateOpBase, popart::SGD1AcclUpdateOp, popart::SGD1VarUpdateOp, popart::SGDMComboBaseOp

Public Functions

VarUpdateWithUpdaterOp(const OperatorIdentifier &opid, const Op::Settings &settings_)
virtual std::unique_ptr<Op> clone() const override = 0
ReplicatedTensorShardingIndices getReplicatedTensorShardingIndices() const override

Public Static Functions

static inline InIndex getUpdaterInIndex()
class AccumulateBaseOp : public popart::VarUpdateWithUpdaterOp

Subclassed by popart::AccumulateOp, popart::RescaleAccumulateOp, popart::SparseAccumulateOp

Public Functions

AccumulateBaseOp(const OperatorIdentifier &opid, AccumulationType type_, OptimizerValue factor_, const Op::Settings&)
virtual std::unique_ptr<Op> clone() const override = 0
std::map<InIndex, TensorId> optimizerInputs() const override
void appendOutlineAttributes(OpSerialiserBase&) const override
inline float getSubgraphValue() const override
inline AccumulationType getAccumulationType() const
inline const OptimizerValue &getFactor() const

Public Static Functions

static inline constexpr InIndex getFactorInIndex()
class AccumulateOp : public popart::AccumulateBaseOp

Public Functions

AccumulateOp(AccumulationType type, OptimizerValue factor, const Op::Settings&)
std::unique_ptr<Op> clone() const override
class RescaleAccumulateOp : public popart::AccumulateBaseOp

The same as AccumulateOp however it also includes a rescale factor that allows for the accumulator to be rescaled at the same time.

Public Functions

RescaleAccumulateOp(AccumulationType type_, OptimizerValue factor_, const Op::Settings&)
virtual std::unique_ptr<Op> clone() const final

Return a copy of the op.

This method must be implemented. The compiler throws an error if this method is not implemented.

virtual std::map<InIndex, TensorId> optimizerInputs() const final

Public Static Functions

static inline InIndex getRescaleRatioInIndex()
class SparseAccumulateOp : public popart::AccumulateBaseOp

Say you have: w -> Gather -> x.

In backward pass you have: dW <- GatherGrad <- x

and when the optimiser step is grown: dW <- GatherGrad <- x \ Accumulate -> accum’ / accum

GatherGrad is essentially a scatter. Then we Accumulate the resultant dW on accum. This involves creating an extra dW tensor, so instead we can do:

          x
          |
          V
accum -> SparseAccumulate -> accum’

Where SparseAccumulate can in one operation, without extra space, accumulate the slices of x into accum as required.

The input tensor at getOriginalVarToUpdateInIndex() is an optional input. This is can be used when two different views of the weight are consumed in the forward pass (by ops that will be autodiffed), and one of those ops is a Gather, thus requiring a SparseAccumulate in the weight update step.

We connect getOriginalVarToUpdateInIndex() to the other view of the weight than the one this SparseAccumulate is for. Then, SparseAccumulateOpx will clone that tensor (and its layout) when creating accum.

You probably do not need this outside of the TiedGatherPattern.

See also

SparseAccumulateOpx::createInputTensor for further motivation of why it does this.

Public Functions

SparseAccumulateOp(AccumulationType type, const OptimizerValue &factor, unsigned axis, const Op::Settings&)
virtual std::unique_ptr<Op> clone() const override

Return a copy of the op.

This method must be implemented. The compiler throws an error if this method is not implemented.

virtual void appendOutlineAttributes(OpSerialiserBase&) const override

Append the op attributes that are relevant for outlining ops.

Ops should override this function if there are additional attributes. Two ops with identical type and outline attributes can be outlined and are supposed to be functionally equivalent.

Parameters

OpSerialiserBase – The stream to which the attributes should be appended.

inline virtual std::set<InIndex> optionalInputs() const override

Return the input indices of all optional inputs to the op.

unsigned getAxis() const

Public Static Functions

static inline constexpr InIndex getIndicesInIndex()
static inline constexpr InIndex getOriginalVarToUpdateInIndex()
static bool supportsAccumulationType(AccumulationType type)
class AdamComboOp : public popart::VarUpdateWithUpdaterOp

Public Functions

AdamComboOp(OptimizerValue initialLr, OptimizerValue initialWd, OptimizerValue initialB1, OptimizerValue initialB2, OptimizerValue initialEps, OptimizerValue initialLs, OptimizerValue mwn, OptimizerValue initialGs, AdamMode mode_, WeightDecayMode decayMode_, bool withGradAccum_, OptimizerReductionType reductionType_, DataType accumType_, DataType accl1Type_, DataType accl2Type_, bool scaledOptimizerState_, const Op::Settings&)
std::unique_ptr<Op> clone() const final
std::map<InIndex, TensorId> optimizerInputs() const final
void appendOutlineAttributes(OpSerialiserBase&) const final
std::set<InIndex> optionalInputs() const final
inline float getSubgraphValue() const final

Public Members

const OptimizerValue initLr
const OptimizerValue initWd
const OptimizerValue initB1
const OptimizerValue initB2
const OptimizerValue initEps
const OptimizerValue initLs
const OptimizerValue initMwn
const OptimizerValue initGs
const AdamMode mode
const WeightDecayMode decayMode
const bool withGradAccum
const OptimizerReductionType reductionType
DataType accumType
DataType accl1Type
DataType accl2Type
const bool scaledOptimizerState

Public Static Functions

static inline InIndex getLrInIndex()
static inline InIndex getWdInIndex()
static inline InIndex getBeta1InIndex()
static inline InIndex getBeta2InIndex()
static inline InIndex getEpsInIndex()
static inline InIndex getLsInIndex()
static inline InIndex getMwnInIndex()
static inline InIndex getGsInIndex()
class AdamVarUpdateOp : public popart::VarUpdateWithUpdaterOp

Public Functions

AdamVarUpdateOp(OptimizerValue initLr, OptimizerValue mwn, const Op::Settings&)
std::unique_ptr<Op> clone() const final
std::map<InIndex, TensorId> optimizerInputs() const final
void appendOutlineAttributes(OpSerialiserBase&) const final
inline float getSubgraphValue() const final

Public Members

const OptimizerValue initLr
const OptimizerValue initMwn

Public Static Functions

static inline InIndex getLambR1SqInIndex()
static inline InIndex getLambR2SqInIndex()
static inline InIndex getLrInIndex()
static inline InIndex getMwnInIndex()
class AdaptiveComboOp : public popart::VarUpdateWithUpdaterOp

Public Functions

AdaptiveComboOp(OptimizerValue initialLr, OptimizerValue initialWd, OptimizerValue initialA, OptimizerValue initialM, OptimizerValue initialEps, OptimizerValue initialLs, OptimizerValue initialGs, AdaptiveMode mode_, WeightDecayMode decayMode_, bool withGradAccum_, OptimizerReductionType reductionType_, DataType accumType_, DataType accl1Type_, DataType accl2Type_, DataType accl3Type_, bool rmspropTFVariant_, const Op::Settings&)
std::unique_ptr<Op> clone() const final
std::map<InIndex, TensorId> optimizerInputs() const final
void appendOutlineAttributes(OpSerialiserBase&) const final
std::set<InIndex> optionalInputs() const final
inline float getSubgraphValue() const final

Public Members

const OptimizerValue initLr
const OptimizerValue initWd
const OptimizerValue initA
const OptimizerValue initM
const OptimizerValue initEps
const OptimizerValue initLs
const OptimizerValue initGs
const AdaptiveMode mode
const WeightDecayMode decayMode
const bool withGradAccum
const OptimizerReductionType reductionType
DataType accumType
DataType accl1Type
DataType accl2Type
DataType accl3Type
const bool rmspropTFVariant

Public Static Functions

static inline InIndex getLrInIndex()
static inline InIndex getWdInIndex()
static inline InIndex getAlphaInIndex()
static inline InIndex getMomentumInIndex()
static inline InIndex getEpsInIndex()
static inline InIndex getLsInIndex()
static inline InIndex getGsInIndex()
class CopyVarUpdateOp : public popart::VarUpdateWithUpdaterOp

Public Functions

CopyVarUpdateOp(const Op::Settings&)
CopyVarUpdateOp(const OperatorIdentifier&, const Op::Settings&)
std::unique_ptr<Op> clone() const final
inline std::map<InIndex, TensorId> optimizerInputs() const final
inline float getSubgraphValue() const final
view::Regions modifies(InIndex) const override
class SGD0ComboOp : public popart::VarUpdateWithUpdaterOp

A single Op that encapsulates all the information needed to describe an SGD0 optimiser step.

The “0” in the name signifies that there is no optimizer state (note a gradient accum tensor may still be required)

The “Combo” in the name signifies that this

Op will later be decomposed into many Ops and Tensors that actually implement the optimiser step. In this case, by the SGD0Decompose pattern.

See also

SGD for the definition of what SGD0 is.

See also

SGD0Decompose for the definition of this decomposition.

Public Functions

SGD0ComboOp(OptimizerValue initialSwd, OptimizerValue initialSlr, bool withGradAccum_, OptimizerReductionType reductionType_, DataType accumType_, const Op::Settings&)
virtual std::unique_ptr<Op> clone() const final

Return a copy of the op.

This method must be implemented. The compiler throws an error if this method is not implemented.

virtual std::set<InIndex> optionalInputs() const override

Return the input indices of all optional inputs to the op.

virtual std::map<InIndex, TensorId> optimizerInputs() const override
virtual void appendOutlineAttributes(OpSerialiserBase&) const override

Append the op attributes that are relevant for outlining ops.

Ops should override this function if there are additional attributes. Two ops with identical type and outline attributes can be outlined and are supposed to be functionally equivalent.

Parameters

OpSerialiserBase – The stream to which the attributes should be appended.

inline virtual float getSubgraphValue() const override

Get the subgraph value.

This is used by outlining algorithm to determine whether or not to outline ops. There are high bounding values retrieved by getHighSubgraphValue() (for expensive ops such as Conv) or low bounding values retrieved by getLowSubgraphValue() (for inexpensive ops such as Relu).

Returns

The subgraph value. Default: 0.

Public Members

OptimizerValue initSlr0
OptimizerValue initWdsf0
const bool withGradAccum
const OptimizerReductionType reductionType
const DataType accumType

Public Static Functions

static inline InIndex getSlr0InIndex()
static inline InIndex getWdsf0InIndex()
class SGD0VarUpdateOpBase : public popart::VarUpdateWithUpdaterOp

Subclassed by popart::SGD0VarUpdateOp

Public Functions

SGD0VarUpdateOpBase(const OperatorIdentifier &_opid, OptimizerValue initialSlr0, OptimizerValue initialWdsf0, const Op::Settings &settings_)
virtual std::unique_ptr<Op> clone() const override = 0
std::map<InIndex, TensorId> optimizerInputs() const final
void appendOutlineAttributes(OpSerialiserBase&) const final
std::set<InIndex> optionalInputs() const final

Public Members

const OptimizerValue initSlr0
const OptimizerValue initWdsf0

Public Static Functions

static inline InIndex getSlr0InIndex()
static inline InIndex getWdsf0InIndex()
class SGD0VarUpdateOp : public popart::SGD0VarUpdateOpBase

Public Functions

SGD0VarUpdateOp(OptimizerValue initialSlr0, OptimizerValue initialWdsf0, const Op::Settings&)
std::unique_ptr<Op> clone() const final
float getSubgraphValue() const final
class SGD1AcclUpdateOp : public popart::VarUpdateWithUpdaterOp

Performs the part of the SGD1 velocity update equation that is pre-computed for the next time step after the weight update of the current time step.

Let: v be the input at getVarToUpdateInIndex() g be the input at getUpdaterInIndex() then this op performs: v <- v * smm1 + swd1 * g

See also

SGD for how this is derived and the definitions of smm1 and swd1.

Subclassed by popart::SGD2PartialAcclUpdateOp

Public Functions

SGD1AcclUpdateOp(OptimizerValue initSmm1, OptimizerValue initSwd1, const Op::Settings&)
virtual std::unique_ptr<Op> clone() const override

Return a copy of the op.

This method must be implemented. The compiler throws an error if this method is not implemented.

virtual std::map<InIndex, TensorId> optimizerInputs() const override
virtual void appendOutlineAttributes(OpSerialiserBase&) const override

Append the op attributes that are relevant for outlining ops.

Ops should override this function if there are additional attributes. Two ops with identical type and outline attributes can be outlined and are supposed to be functionally equivalent.

Parameters

OpSerialiserBase – The stream to which the attributes should be appended.

inline virtual float getSubgraphValue() const final

Get the subgraph value.

This is used by outlining algorithm to determine whether or not to outline ops. There are high bounding values retrieved by getHighSubgraphValue() (for expensive ops such as Conv) or low bounding values retrieved by getLowSubgraphValue() (for inexpensive ops such as Relu).

Returns

The subgraph value. Default: 0.

Public Members

const OptimizerValue initSmm1
const OptimizerValue initSwd1

Public Static Functions

static inline InIndex getSmm1InIndex()
static inline InIndex getSwd1InIndex()
class SGD2PartialAcclUpdateOp : public popart::SGD1AcclUpdateOp

This Op is by design exactly equivalent to an SGD1AcclUpdateOp.

Any logic based on an SGD1AcclUpdateOp, like transform code or lowering into Opx, can be applied to an SGD2PartialAcclUpdateOp. This includes the OperatorIdentifer being Onnx::CustomOperators::SGD1AcclUpdateOp.

For SGD2, the entire v update equation could be done in one op (see equation derivation in optimizer.hpp); however, we reuse the SG1AcclUpdateOp and AccumulateOp to implement the equation in the two steps.

Public Functions

virtual std::unique_ptr<Op> clone() const final

Return a copy of the op.

This method must be implemented. The compiler throws an error if this method is not implemented.

SGD1AcclUpdateOp(OptimizerValue initSmm1, OptimizerValue initSwd1, const Op::Settings&)
SGD1AcclUpdateOp(OptimizerValue initSmm1, OptimizerValue initSwd1, OperatorIdentifier opid, const Op::Settings&)
class SGD1VarUpdateOp : public popart::VarUpdateWithUpdaterOp

Performs the SGD1 weight update equation.

Let: w be the input at getVarToUpdateInIndex() g be the input at getUpdaterInIndex() then this op performs: w <- w - slr1 * g

See also

SGD for how this is derived and the definition of slr1.

Subclassed by popart::SGD2VarUpdateOp

Public Functions

SGD1VarUpdateOp(OptimizerValue initSlr1, const Op::Settings&)
virtual std::unique_ptr<Op> clone() const override

Return a copy of the op.

This method must be implemented. The compiler throws an error if this method is not implemented.

virtual std::map<InIndex, TensorId> optimizerInputs() const final
virtual void appendOutlineAttributes(OpSerialiserBase&) const final

Append the op attributes that are relevant for outlining ops.

Ops should override this function if there are additional attributes. Two ops with identical type and outline attributes can be outlined and are supposed to be functionally equivalent.

Parameters

OpSerialiserBase – The stream to which the attributes should be appended.

inline virtual float getSubgraphValue() const final

Get the subgraph value.

This is used by outlining algorithm to determine whether or not to outline ops. There are high bounding values retrieved by getHighSubgraphValue() (for expensive ops such as Conv) or low bounding values retrieved by getLowSubgraphValue() (for inexpensive ops such as Relu).

Returns

The subgraph value. Default: 0.

Public Members

const OptimizerValue initSlr1

Public Static Functions

static inline InIndex getSlr1InIndex()
class SGD2VarUpdateOp : public popart::SGD1VarUpdateOp

This Op is by design exactly equivalent to an SGD1VarUpdateOp.

Any logic based on an SGD1VarUpdateOp, like transform code or lowering into Opx, can be applied to an SGD2VarUpdateOp. This includes the OperatorIdentifer being Onnx::CustomOperators::SGD1VarUpdate.

Public Functions

virtual std::unique_ptr<Op> clone() const final

Return a copy of the op.

This method must be implemented. The compiler throws an error if this method is not implemented.

SGD1VarUpdateOp(OptimizerValue initSlr1, const Op::Settings&)
class SGDMComboBaseOp : public popart::VarUpdateWithUpdaterOp

Subclassed by popart::SGD1ComboOp, popart::SGD2ComboOp

Public Functions

SGDMComboBaseOp(const OperatorIdentifier &opid, OptimizerValue initialSmm1, OptimizerValue initialDpsf1, OptimizerValue initialSwd1, OptimizerValue initialSlr1, OptimizerReductionType reductionType_, const Op::Settings&)
SGDMComboBaseOp(const OperatorIdentifier &opid, OptimizerValue initialSmm1, OptimizerValue initialDpsf1, OptimizerValue initialSwd1, OptimizerValue initialSlr1, OptimizerValue initialMm, OptimizerValue initialWd, OptimizerValue initialNgsf, OptimizerValue initialNdsf, OptimizerReductionType reductionType_, const Op::Settings&)
virtual std::unique_ptr<Op> clone() const override = 0
std::map<InIndex, TensorId> optimizerInputs() const override
void appendOutlineAttributes(OpSerialiserBase&) const override
std::set<InIndex> optionalInputs() const override
inline float getSubgraphValue() const override

Public Members

const OptimizerValue initSmm1
const OptimizerValue initDpsf1
const OptimizerValue initSwd1
const OptimizerValue initSlr1
OptimizerValue initMm
OptimizerValue initWd
OptimizerValue initNgsf
OptimizerValue initNdsf
const OptimizerReductionType reductionType
bool nesterov

Public Static Functions

static inline InIndex getSmm1InIndex()
static inline InIndex getDpsf1InIndex()
static inline InIndex getSwd1InIndex()
static inline InIndex getSlr1InIndex()
static inline InIndex getMmInIndex()
static inline InIndex getWdInIndex()
static inline InIndex getNgsfInIndex()
static inline InIndex getNdsfInIndex()
class SGD1ComboOp : public popart::SGDMComboBaseOp

A single Op that encapsulates all the information needed to describe an SGD1 optimiser step.

The “1” in the name signifies that only one extra optimiser tensor (the accl tensor) is required.

The “Combo” in the name signifies that this

Op will later be decomposed into many Ops and Tensors that actually implement the optimiser step. In this case, by the SGD1Decompose pattern.

See also

SGD for the definition of what SGD1 is.

See also

SGD1Decompose for the definition of this decomposition.

Public Functions

SGD1ComboOp(OptimizerValue initialSmm1, OptimizerValue initialDpsf1, OptimizerValue initialSwd1, OptimizerValue initialSlr1, OptimizerReductionType reductionType_, const Op::Settings&)
SGD1ComboOp(OptimizerValue initialSmm1, OptimizerValue initialDpsf1, OptimizerValue initialSwd1, OptimizerValue initialSlr1, OptimizerValue initialMm, OptimizerValue initialWd, OptimizerValue initialNgsf1, OptimizerValue initialNdsf1, OptimizerReductionType reductionType_, const Op::Settings&)
virtual std::unique_ptr<Op> clone() const final

Return a copy of the op.

This method must be implemented. The compiler throws an error if this method is not implemented.

class SGD2ComboOp : public popart::SGDMComboBaseOp

A single Op that encapsulates all the information needed to describe an SGD2 optimiser step.

The “2” in the name signifies that two extra optimiser tensors (the accum and accl1 tensors) may be required.

The “Combo” in the name signifies that this

Op will later be decomposed into many Ops and Tensors that actually implement the optimiser step. In this case, by the SGD2Decompose pattern.

See also

SGD for the definition of what SGD2 is.

See also

SGD2Decompose for the definition of this decomposition.

Public Functions

SGD2ComboOp(OptimizerValue initialSmm1, OptimizerValue initialDpsf1, OptimizerValue initialSwd1, OptimizerValue initialSlr1, bool withGradAccum_, OptimizerReductionType reductionType_, DataType accumType_, DataType accl1Type_, const Op::Settings&)
SGD2ComboOp(OptimizerValue initialSmm1, OptimizerValue initialDpsf1, OptimizerValue initialSwd1, OptimizerValue initialSlr1, OptimizerValue initialMm, OptimizerValue initialWd, OptimizerValue initialNgsf2, OptimizerValue initialNdsf2, bool withGradAccum_, OptimizerReductionType reductionType_, DataType accumType_, DataType accl1Type_, const Op::Settings&)
virtual std::unique_ptr<Op> clone() const final

Return a copy of the op.

This method must be implemented. The compiler throws an error if this method is not implemented.

Public Members

const bool withGradAccum
const DataType accumType
const DataType accl1Type
class ScaledVarUpdateOp : public popart::VarUpdateWithUpdaterOp

Public Functions

ScaledVarUpdateOp(OptimizerValue initLr, OptimizerValue initWd, bool lrInUpdater, const Op::Settings&)
std::unique_ptr<Op> clone() const final
std::map<InIndex, TensorId> optimizerInputs() const final
void appendOutlineAttributes(OpSerialiserBase&) const final
inline float getSubgraphValue() const final

Public Members

const OptimizerValue initLr
const OptimizerValue initWd
const bool lrInUpdater

Public Static Functions

static inline InIndex getLrInIndex()
static inline InIndex getWdInIndex()
#include <popart/alias/aliasmodel.hpp>
class AliasModel

A container for the poprithms::memory::inplace::Graph which corresponds to a PopART Graph.

It contains the poprithms Graph, and mappings between PopART Tensors and Ops, and their poprithms equivalents.

Public Types

using PoprithmsTensorId = poprithms::memory::inplace::TensorId
using PoprithmsOpId = poprithms::memory::inplace::OpId

Public Functions

AliasModel()
~AliasModel() = default
void setGraph(const popart::Graph *graph)

Set PopART graph.

void insertTensor(const PoprithmsTensorId &poprithmsTensor, const Tensor &popartTensor)

Register that a poprithms Tensor and a popart Tensor correspond to each other.

In addition to registering the Tensor correspondence, the Ops which produce the respective Tensors are registered to be corresponding.

Parameters
  • poprithmsTensor – The Tensor in the poprithms Graph.

  • popartTensor – The Tensor in the PopART Graph.

void insertOp(PoprithmsOpId, OpId)

Register that a poprithms Op and a popart Op correspond.

Note that multiple poprithms Ops can correspond to a single popart Op.

void insertUnaryModifier0(const Op &op)

This method performs the following steps:

(1) inserts an aliasGate which is open at index 0 (2) appends a modify to the output aliasGate created in (1) (3) registers that op.output(0) match the output of (2) (4) registers that the poprithms ops created at (1) and (2) correspond to #op.

Parameters

op – A PopART Op, which might have multiple inputs, and whose output is a modifies alias of its input at index 0.

void insertUnaryModifier(const Op&, InIndex)

As per insertUnaryModifier0, but the input index may be different from 0.

void insertBinaryModifier(const Op &op)

This method performs the following steps:

(1) inserts an aliasGate whose inputs are the 2 poprithms Tensors corresponding to the 2 inputs of #op. The alias gate is open at the index which #op aliases through, if any.

(2) appends a modify to the output of the aliasGate created at (1)

(3) registers that the poprithms ops (1) and (2) correspond to #op.

Diagramatically, for the PopART Op:

input0 … input1 \ / op | output0

This method creates the following poprithms subgraph:

input0 … input1 \ / aliasGate | modify | output0

Parameters

op – A PopART Op with 2 inputs.

void insertNG2aryModifier(const Op &op, unsigned int numInputs)

The method is the same as insertBinaryModifier except for allowing a larger number of inputs than 2.

Parameters
  • op – A PopART Op with 2 or more inputs.

  • numInputs – The number of inputs

void insertViewChange(PoprithmsTensorId viewChangeOut, const Tensor &t, bool isOutplace)

This method performs the following steps:

(1) adds an aliasGate whose (unique) unput is viewChangeOut,

(2) registers that the output of the aliasGate corresponds to the PopART Tensor #t.

(3) registers that the creator of t (if there is any) corresponds to 2 poprithms ops: the creator of viewChangeOut and the aliasGate created at (1).

Parameters
  • viewChangeOut – This is a Tensor which is the output of a view changing Op, such as reshape and dimShuffle.

  • t – This PopART Tensor is the output of the corresponding PopART view changing Op.

  • isOutplace – This boolean determines if the AliasGate created at (1) should be open or closed. If isOutplace is true, then the AliasGate will be closed.

void update(OpId oldId, OpId newId)

Replace all appearances of #oldId in all maps between PopART and poprithms, with #newId.

This is useful when, for example, an Op is replaced in the PopART Graph during the inplacing transformation.

TensorId getTensorId(const PoprithmsTensorId &id) const
Returns

The TensorId corresponding to a poprithms TensorId.

bool contains(const PoprithmsTensorId&) const
PoprithmsTensorId getPoprithmsTensorId(const TensorId &id) const
Returns

The poprithms TensorId corresponding to a TensorId.

bool contains(const TensorId&) const
OpId getOpId(PoprithmsOpId) const
Returns

The OpId corresponding to a poprithms OpId.

bool contains(PoprithmsOpId) const
PoprithmsOpId getGate(OpId opId) const
Returns

The ID of the AliasGate in the poprithms Graph, which corresponds to the PopART Op #opId. If no such AliasGate exists, an error is thrown.

std::vector<PoprithmsOpId> getAll(OpId) const
Returns

The poprithms OpIds which correspond to a PopART OpId. It is possible for 1 PopART Op to correspond to multiple poprithms Ops.

bool contains(OpId) const
std::vector<Tensor*> allAliases(const Tensor &t) const

Get all aliases for a tensor for this given model.

Returned tensors include the argument #t, if it is non-empty.

bool contains(const Tensor &super, const Tensor &sub) const
Returns

true if all of the ‘allocation’ elements of sub and are also in super.

Public Members

poprithms::memory::inplace::Graph g

The poprithms Graph.

popart::Graph *thisGraph = nullptr

The PopART graph reference.

Public Static Attributes

static constexpr int loadFactor = 0.5

load factor used for hash map containers

#include <popart/op/ipucopy.hpp>
class IpuCopyOp : public popart::Op

Public Functions

IpuCopyOp(const OperatorIdentifier &_opid, VGraphId _destIpu, const Op::Settings &settings_)
std::unique_ptr<Op> clone() const final
void setup() final
inline VGraphId getDestIpu() const
const SourceIpuMap &getSourceIpus() const
const SourceTensorMap &getSourceTensors() const
VGraphId getSourceIpu(const TensorId &tenId) const
VGraphId getSourceIpu() const
VGraphId getMinSourceIpu() const
VGraphId getMaxSourceIpu() const
void setSourceIpus(const SourceIpuMap sourceIpus)
void setSourceTensors(const SourceTensorMap sourceTensors)
void appendOutlineAttributes(OpSerialiserBase&) const override
inline float getSubgraphValue() const final
bool isOutlineable() const override
bool isIpuCopyOp() const final
bool copiesOptimizerTensors() const final
void connectInTensor(InIndex, TensorId, VGraphId sourceIpu) override
std::string getFromToStr() const
void disconnectInTensor(InIndex, Tensor*) override
inline bool canShard() const override
VGraphIdAndTileSet getIntrospectionInVirtualGraphId(InIndex index, std::set<OpId> &visited) const override
VGraphIdAndTileSet getIntrospectionOutVirtualGraphId(OutIndex index, std::set<OpId> &visited) const override
using popart::SourceIpuMap = std::map<TensorId, VGraphId>
using popart::SourceTensorMap = std::map<VGraphId, std::vector<TensorId>>

14.8.2. Op definition for Poplar implementation

#include <popart/popx/opx.hpp>
class Opx

Subclassed by popart::popx::AbortOpx, popart::popx::AdaDeltaUpdaterOpx, popart::popx::AdamUpdaterOpx, popart::popx::AddBiasDataGradOpx, popart::popx::AddBiasOpx, popart::popx::AllReduceOpx, popart::popx::ArgExtremaOpx, popart::popx::AsinGradOpx, popart::popx::AtanGradOpx, popart::popx::BaseConcatOpx, popart::popx::BaseExpandOpx, popart::popx::BasePadOpx, popart::popx::BaseSliceOpx, popart::popx::BaseSortOpx, popart::popx::BaseWhereOpx, popart::popx::BinaryComparisonOpx, popart::popx::Bucketizex, popart::popx::CastOpx, popart::popx::CastThenPow2ScaleOpx, popart::popx::ClipGradOpx, popart::popx::CollectivesBaseOpx, popart::popx::ConcatGradOpx, popart::popx::ConvFlipWeightsGradOpx, popart::popx::CtcBeamSearchDecoderOpx, popart::popx::CtcGradOpx, popart::popx::CtcOpx, popart::popx::CumSumGradOpx, popart::popx::CumSumOpx, popart::popx::DynamicSliceOpx, popart::popx::DynamicUpdateOpx, popart::popx::DynamicZeroOpx, popart::popx::ElementWiseBinaryOpx, popart::popx::ElementWiseUnaryOpx, popart::popx::EluGradOpx, popart::popx::ExchangeBaseOpx, popart::popx::ExpandGradOpx, popart::popx::GatherBaseOpx, popart::popx::GatherGradOpx, popart::popx::GeluErfGradOpx, popart::popx::GeluGradOpx, popart::popx::GetRandomSeedOpx, popart::popx::GRUGradOpx, popart::popx::GRUOpx, popart::popx::HardSigmoidGradOpx, popart::popx::HistogramOpx, popart::popx::IdentityInplaceOpx, popart::popx::IdentityLossGradOpx, popart::popx::IdentityLossOpx, popart::popx::IfOpx, popart::popx::InitOpx, popart::popx::IoTileCopyOpx, popart::popx::IpuCopyOpx, popart::popx::L1GradOpx, popart::popx::L1Opx, popart::popx::LambSquareOpx, popart::popx::LeakyReluGradOpx, popart::popx::LossScaleUpdateOpx, popart::popx::LRNGradOpx, popart::popx::LRNOpx, popart::popx::LSTMGradOpx, popart::popx::LSTMOpx, popart::popx::MatMulOpx, popart::popx::MaxArgGradOpx, popart::popx::MaxOpx, popart::popx::MeanArgGradOpx, popart::popx::MinArgGradOpx, popart::popx::MinOpx, popart::popx::ModifyRandomSeedOpx, popart::popx::MultiConvBaseOpx, popart::popx::MultiConvWeightsGradBaseOpx, popart::popx::NllGradOpx, popart::popx::NlllWithSoftmaxGradDirectOpx, popart::popx::NllOpx, popart::popx::NopOpx, popart::popx::NormalizeImageOpx, popart::popx::NormOpx, popart::popx::OnehotGradOpx, popart::popx::OnehotOpx, popart::popx::PopartLSTMOpxBase< LSTMOP >, popart::popx::Pow2ScaleThenCastOpx, popart::popx::PrintTensorOpx, popart::popx::RandomNormalOpx, popart::popx::RandomUniformOpx, popart::popx::ReduceL1GradOpx, popart::popx::ReduceL1Opx, popart::popx::ReduceL2GradOpx, popart::popx::ReduceL2Opx, popart::popx::ReduceLogSumExpGradOpx, popart::popx::ReduceLogSumExpOpx, popart::popx::ReduceLogSumGradOpx, popart::popx::ReduceLogSumOpx, popart::popx::ReduceMaxGradOpx, popart::popx::ReduceMaxOpx, popart::popx::ReduceMeanGradOpx, popart::popx::ReduceMeanOpx, popart::popx::ReduceMedianGradOpx, popart::popx::ReduceMedianOpx, popart::popx::ReduceMinGradOpx, popart::popx::ReduceMinOpx, popart::popx::ReduceProdGradOpx, popart::popx::ReduceProdOpx, popart::popx::ReduceSumGradOpx, popart::popx::ReduceSumOpx, popart::popx::ReduceSumSquareGradOpx, popart::popx::ReduceSumSquareOpx, popart::popx::ReluGradOpx, popart::popx::ReshapeBaseOpx, popart::popx::ResizeGradOpx, popart::popx::ResizeOpx, popart::popx::RestoreBaseOpx< Derived >, popart::popx::ReverseBaseOpx, popart::popx::RMSPropUpdaterOpx, popart::popx::RNNGradOpx, popart::popx::RNNOpx, popart::popx::RoiAlignGradOpx, popart::popx::RoiAlignOpx, popart::popx::ScaledAddOpx, popart::popx::ScatterDataGradOpx, popart::popx::ScatterReduceGradOpx, popart::popx::ScatterReduceOpx, popart::popx::ScatterUpdateGradOpx, popart::popx::SeluGradOpx, popart::popx::SequenceSliceInplaceOpx, popart::popx::SequenceSliceOpx, popart::popx::SGD1NesterovOpx, popart::popx::ShapedDropoutOpx, popart::popx::ShrinkGradOpx, popart::popx::SinhGradOpx, popart::popx::SoftmaxGradDirectOpx, popart::popx::SoftPlusGradOpx, popart::popx::SoftSignGradOpx, popart::popx::SplineBasisx, popart::popx::SplineWeightingx, popart::popx::SplitOpx, popart::popx::StashOpx, popart::popx::SubgraphOpx, popart::popx::SubsampleGradOpx, popart::popx::SubsampleInplaceOpx, popart::popx::SubsampleOpx, popart::popx::SumArgGradOpx, popart::popx::SumOpx, popart::popx::SwishGradOpx, popart::popx::SyncOpx, popart::popx::TanhGradOpx, popart::popx::TanhOpx, popart::popx::TensorRemapOpx, popart::popx::ThresholdedReluGradOpx, popart::popx::TileGradOpx, popart::popx::TileOpx, popart::popx::TopKGradOpx, popart::popx::TransposeInplaceOpx, popart::popx::TransposeOpx, popart::popx::VarUpdateOpx, popart::popx::WhereXGradOpx, popart::popx::WhereYGradOpx, popart::popx::ZerosOpx, popart::popx::PopartLSTMOpxBase< PopartLSTMGradOp >, popart::popx::PopartLSTMOpxBase< PopartLSTMOp >, popart::popx::RestoreBaseOpx< RestoreInplaceOpx >, popart::popx::RestoreBaseOpx< RestoreOpx >

Public Functions

Opx(Op*, Devicex*)
virtual ~Opx()
virtual poplar::Tensor createInput(InIndex index, const poplar::DebugNameAndId &dnai) const
virtual poplar::Tensor createInputTensor(popart::InIndex index, const poplar::DebugNameAndId &dnai) const
virtual InputCreatorType getInputCreatorType(InIndex index) const
virtual bool canUnwind(InIndex, OutIndex) const
virtual view::RegMap unwindRegion(InIndex, OutIndex) const
virtual poplar::Tensor unwindTensorLayout(poplar::Tensor tensor, InIndex, OutIndex) const
virtual bool createsEquiv(int index0, const Opx *opx1, int index1) const
virtual bool outputCreatedExternally(OutIndex index) const
virtual std::set<TensorId> mustExistBeforeCreate(int index0) const
virtual DnfTensorIds mustExistBeforeCreateDNF(int index0) const
poplar::Tensor cloneNcopy(poplar::program::Sequence&, TensorId) const
poplar::Tensor cloneNcopy(poplar::program::Sequence&, const poplar::Tensor&, const std::string name = "") const
poplar::Tensor broadcast(const std::vector<int64_t>&, TensorId) const
poplar::Tensor broadcast(const std::vector<int64_t>&, poplar::Tensor) const
const Devicex *getDevicex() const
int64_t getVirtualGraphId() const
poplar::Graph &graph() const
poplar::Graph &topLevelGraph() const
virtual poplar::Graph &srcGraph(InIndex) const
virtual poplar::Graph &dstGraph(OutIndex) const
const poplar::Tensor &get(TensorId) const
const poplar::Tensor &getView(TensorId) const
void insert(TensorId, const poplar::Tensor&) const
Tensor *inTensor(InIndex) const
Tensor *outTensor(OutIndex) const
const poplar::Tensor &getInTensor(InIndex index) const
const poplar::Tensor &getOutTensor(OutIndex index) const
const poplar::Tensor &getInView(InIndex index) const
const poplar::Tensor &getOutView(OutIndex index) const
bool hasInViewChangers(InIndex index) const
const ViewChangers &getInViewChangers(InIndex index) const
void setOutViewChangers(OutIndex index, const ViewChangers &changers) const
const TensorInfo &inInfo(InIndex) const
const Shape &inShape(InIndex) const
const TensorInfo &outInfo(OutIndex) const
const Shape &outShape(OutIndex) const
template<class OP>
inline OP &getOp() const
template<class OP>
inline void verifyOp(Op *op, const OperatorIdentifier &opid)
template<class OP>
inline void verifyOp(Op *op, std::vector<OperatorIdentifier> opids)
template<class OP>
inline void verifyOp(Op *op)
bool hasInput(InIndex) const
bool hasOutput(OutIndex) const
void setOutTensor(OutIndex index, const poplar::Tensor &tensor) const
TensorId inId(InIndex index) const
TensorId outId(OutIndex index) const
poplar::Tensor getConst(const poplar::Type &type, const std::vector<size_t> &shape, double val, const std::string &name) const
poplar::Tensor getScalarVariable(const poplar::Type &type, const std::string &name) const
poplar::Tensor getZerosTensor(std::vector<std::size_t>, poplar::Type, std::string) const
poplar::Graph &inGraph(InIndex in) const

Return the virtual graph associated with input at index in.

Parameters

in – the input index

Returns

the corresponding poplar virtual graph

virtual std::set<OpxGrowPartId> getInGrowPartIds(Tensor *inTensor) const
virtual OpxGrowPartId getOutGrowPartId(Tensor *outTensor) const
virtual bool hasCreatorViewChangers(InIndex index) const
virtual ViewChangers getCreatorViewChangers(InIndex index) const
virtual void growPart(OpxGrowPartId id) const
virtual void grow(poplar::program::Sequence&) const
virtual void grow(std::vector<poplar::program::Sequence>&) const
const popart::DebugInfo &getDebugInfo() const
const poplar::DebugNameAndId getDebugNameAndId(const std::string name = "", poplar::SourceLocation loc = poplar::SourceLocation::Current()) const
poplar::DebugContext debugContext(const std::string name = "", poplar::SourceLocation loc = poplar::SourceLocation::Current()) const
virtual PreparedTensorInfos getOutputsToPrepare() const
virtual PreparedTensorInfos getInputsToPrepare() const
poplar::Graph &outGraph(OutIndex out) const

Return the virtual graph associated with output at index out.

Parameters

out – the output index

Returns

the corresponding poplar virtual graph

const std::vector<size_t> inShapeSzt(InIndex) const
poplar::Tensor mapMaybeInPlace(popops::expr::BinaryOpType, poplar::Tensor&, poplar::Tensor&, poplar::program::Sequence&, const poplar::DebugContext&, const poplar::OptionFlags&, const std::string&)

Public Members

double inputCreatorPriority = {0.0}
Op *op_p
class RoiAlignGradOpx : public popart::popx::Opx

Public Functions

RoiAlignGradOpx(Op*, Devicex*)
~RoiAlignGradOpx() override = default
virtual void grow(poplar::program::Sequence&) const final
class RoiAlignOpx : public popart::popx::Opx

Public Functions

RoiAlignOpx(Op*, Devicex*)
~RoiAlignOpx() override = default
virtual void grow(poplar::program::Sequence&) const final

14.8.3. Available Ops (Op class)

struct AiGraphcoreOpIdV1 : public popart::OperatorIdentifier

Public Functions

inline AiGraphcoreOpIdV1(const OpType &_type, NumInputs inputs = {}, int outputs = 0)
class AbortOp : public popart::Op

Public Functions

AbortOp(const OperatorIdentifier&, const Op::Settings&)
std::unique_ptr<Op> clone() const override
void setup() final
inline float getSubgraphValue() const final
inline bool hasSideEffect() const override

Public Static Functions

static inline InIndex getInIndex()
class AbsGradOp : public popart::Op

Public Functions

AbsGradOp(const AbsOp&)
const std::vector<GradInOutMapper> &gradInputInfo() const final
const std::map<int, int> &gradOutToNonGradIn() const final
void setup() final
std::unique_ptr<Op> clone() const final
inline virtual float getSubgraphValue() const final

Public Static Functions

static inline InIndex getGradInIndex()
static inline InIndex getFwdArgInIndex()
static inline OutIndex getOutIndex()
class AbsOp : public popart::ElementWiseUnaryOp

Public Functions

AbsOp(const OperatorIdentifier &_opid, const Op::Settings &settings)
std::unique_ptr<Op> clone() const override
std::vector<std::unique_ptr<Op>> getGradOps() final
class AdaDeltaUpdaterOp : public popart::Op

Public Functions

AdaDeltaUpdaterOp(OptimizerValue eps, const Op::Settings&)
std::unique_ptr<Op> clone() const final
void setup() final
void appendOutlineAttributes(OpSerialiserBase&) const final
inline float getSubgraphValue() const final
inline bool isOptimizerOp() const override
ReplicatedTensorShardingIndices getReplicatedTensorShardingIndices() const final

Public Members

const OptimizerValue initEps

Public Static Functions

static inline InIndex getGradInIndex()
static inline InIndex getAccl1InIndex()
static inline InIndex getAccl2InIndex()
static inline InIndex getEpsInIndex()
static inline OutIndex getUpdaterOutIndex()
class AdamUpdaterOp : public popart::Op

Public Functions

AdamUpdaterOp(AdamMode mode_, OptimizerValue wd, OptimizerValue b1, OptimizerValue b2, OptimizerValue eps, const Op::Settings&)
std::unique_ptr<Op> clone() const final
void setup() final
void appendOutlineAttributes(OpSerialiserBase&) const final
inline float getSubgraphValue() const final
inline bool isOptimizerOp() const override
view::Regions modifies(InIndex) const final
ReplicatedTensorShardingIndices getReplicatedTensorShardingIndices() const final

Public Members

AdamMode mode
const OptimizerValue initWd
const OptimizerValue initB1
const OptimizerValue initB2
const OptimizerValue initEps

Public Static Functions

static inline InIndex getVarInIndex()
static inline InIndex getAccl1InIndex()
static inline InIndex getAccl2InIndex()
static inline InIndex getStepInIndex()
static inline InIndex getWdInIndex()
static inline InIndex getBeta1InIndex()
static inline InIndex getBeta2InIndex()
static inline InIndex getEpsInIndex()
static inline OutIndex getUpdaterOutIndex()
class AddArg0GradOp : public popart::ReduceSumOp

Public Functions

AddArg0GradOp(const Op&, const std::vector<int64_t> &axes)
const std::vector<GradInOutMapper> &gradInputInfo() const final
const std::map<int, int> &gradOutToNonGradIn() const final
void setup() final
std::unique_ptr<Op> clone() const final
class AddArg1GradOp : public popart::ReduceSumOp

Public Functions

AddArg1GradOp(const Op&, const std::vector<int64_t> &axes)
const std::vector<GradInOutMapper> &gradInputInfo() const final
const std::map<int, int> &gradOutToNonGradIn() const final
void setup() final
std::unique_ptr<Op> clone() const final
class AddBiasBiasGradOp : public popart::ReduceSumOp

Public Functions

AddBiasBiasGradOp(const AddBiasOp&, const std::vector<int64_t> &axes)
std::unique_ptr<Op> clone() const final
const std::vector<GradInOutMapper> &gradInputInfo() const final
const std::map<int, int> &gradOutToNonGradIn() const final

Public Static Functions

static inline InIndex getInIndex()
static inline OutIndex getOutIndex()
class AddBiasDataGradOp : public popart::IdentityOp

Public Functions

AddBiasDataGradOp(const AddBiasOp&)
std::unique_ptr<Op> clone() const final
const std::vector<GradInOutMapper> &gradInputInfo() const final
const std::map<int, int> &gradOutToNonGradIn() const final

Public Static Functions

static inline InIndex getInIndex()
static inline OutIndex getOutIndex()
class AddBiasInplaceOp : public popart::AddBiasOp

Public Functions

AddBiasInplaceOp(const AddBiasOp&)
std::unique_ptr<Op> clone() const override
std::vector<std::tuple<OperatorIdentifier, float>> inplacePriorityDefault() const final
std::unique_ptr<Op> getInplaceVariant(const OperatorIdentifier &o) const final
view::Regions modifies(InIndex) const override
view::Regions aliases(InIndex, OutIndex) const override
class AddBiasOp : public popart::Op

Subclassed by popart::AddBiasInplaceOp

Public Functions

AddBiasOp(const OperatorIdentifier &_opid, const Op::Settings &settings)
std::unique_ptr<Op> clone() const override
std::vector<std::unique_ptr<Op>> getGradOps() final
void setup() final
inline float getSubgraphValue() const final
std::vector<std::tuple<OperatorIdentifier, float>> inplacePriorityDefault() const override
std::unique_ptr<Op> getInplaceVariant(const OperatorIdentifier&) const override
view::RegMap fwdRegMap(InIndex, OutIndex) const override
view::RegMap bwdRegMap(InIndex, OutIndex) const override
void growAliasModel(AliasModel&) const override
poprithms::memory::inplace::Proposal mapInplaceProposal(const AliasModel&, OperatorIdentifier) const override

Public Static Functions

static inline InIndex getDataInIndex()
static inline InIndex getBiasInIndex()
static inline OutIndex getOutIndex()
class AddLhsInplaceOp : public popart::ElementWiseBinaryInplaceLhsOp

Public Functions

inline AddLhsInplaceOp(const OperatorIdentifier &_, const Op::Settings &_settings)
inline AddLhsInplaceOp(const Op::Settings &_settings)
std::unique_ptr<Op> clone() const final
class AddRhsInplaceOp : public popart::ElementWiseBinaryInplaceRhsOp

Public Functions

inline AddRhsInplaceOp(const Op::Settings &_settings)
std::unique_ptr<Op> clone() const final
class AllReduceGradOp : public popart::AllReduceOp

Public Functions

AllReduceGradOp(CollectiveOperator op_, std::vector<int64_t> ipus_, const bool identicalInputs_, const bool identicalGradInputs_, const Op::Settings &settings_)
std::unique_ptr<Op> clone() const override
const std::vector<GradInOutMapper> &gradInputInfo() const final
const std::map<int, int> &gradOutToNonGradIn() const final
class AllReduceOp : public popart::Op

Subclassed by popart::AllReduceGradOp

Public Functions

AllReduceOp(const OperatorIdentifier &_opid, CollectiveOperator op_, std::vector<int64_t> ipus_, const Op::Settings &settings_)
AllReduceOp(const OperatorIdentifier &_opid, CollectiveOperator op_, std::vector<int64_t> ipus_, const bool identicalInputs_, const bool identicalGradInputs_, const Op::Settings &settings_)
std::unique_ptr<Op> clone() const override
std::vector<std::unique_ptr<Op>> getGradOps() override
void setup() final
void appendOutlineAttributes(OpSerialiserBase&) const override
bool canBeReplacedByIdentity() const override
inline float getSubgraphValue() const override
VGraphIdAndTileSet getIntrospectionInVirtualGraphId(InIndex index, std::set<OpId> &visited) const override
VGraphIdAndTileSet getIntrospectionOutVirtualGraphId(OutIndex index, std::set<OpId> &visited) const override
inline CollectiveOperator getReduceOp() const
inline bool getIdenticalInputs() const
inline std::vector<int64_t> getIpus() const

Public Static Functions

static inline InIndex getInStartIndex()
static inline OutIndex getOutStartIndex()
class AndOp : public popart::BinaryComparisonOp

Public Functions

AndOp(const OperatorIdentifier &_opid, const Op::Settings &settings)
std::unique_ptr<Op> clone() const final
std::vector<std::unique_ptr<Op>> getGradOps() final
class ArgExtremaOp : public popart::Op

Subclassed by popart::ArgMaxOp, popart::ArgMinOp

Public Functions

ArgExtremaOp(const OperatorIdentifier &_opid, int64_t axis, int64_t keepdims, const Op::Settings &settings)
std::unique_ptr<Op> clone() const override
void setup() final
int64_t getKeepDims() const
int64_t getAxis() const
void appendOutlineAttributes(OpSerialiserBase&) const final
inline float getSubgraphValue() const final
inline bool canShard() const override

Public Static Functions

static inline InIndex getInIndex()
static inline OutIndex getOutIndex()
class ArgMaxOp : public popart::ArgExtremaOp

Public Functions

std::unique_ptr<Op> clone() const final
class ArgMinOp : public popart::ArgExtremaOp

Public Functions

std::unique_ptr<Op> clone() const final
class AsinGradOp : public popart::ElementWiseNonLinearUnaryGradOp

Public Functions

AsinGradOp(const AsinOp&)
std::unique_ptr<Op> clone() const final
class AsinInplaceOp : public popart::ElementWiseInplaceUnaryOp

Public Functions

AsinInplaceOp(const AsinOp&)
std::unique_ptr<Op> clone() const final
class AsinOp : public popart::ElementWiseUnaryOp

Public Functions

AsinOp(const OperatorIdentifier &_opid, const Op::Settings &settings_)
std::unique_ptr<Op> clone() const final
std::vector<std::unique_ptr<Op>> getGradOps() final
std::vector<std::tuple<OperatorIdentifier, float>> inplacePriorityDefault() const final
std::unique_ptr<Op> getInplaceVariant(const OperatorIdentifier&) const final
class Atan2Arg0GradOp : public popart::ElementWiseBinaryArg0GradOp

Public Functions

Atan2Arg0GradOp(const Op&, const std::vector<int64_t> &reduction_axes)
std::unique_ptr<Op> clone() const final
class Atan2Arg1GradOp : public popart::ElementWiseBinaryArg1GradOp

Public Functions

Atan2Arg1GradOp(const Op&, const std::vector<int64_t> &reduction_axes)
std::unique_ptr<Op> clone() const final
class Atan2LhsInplaceOp : public popart::ElementWiseBinaryInplaceLhsOp

Public Functions

inline Atan2LhsInplaceOp(const Op::Settings &_settings)
std::unique_ptr<Op> clone() const final
class AtanGradOp : public popart::ElementWiseNonLinearUnaryGradOp

Public Functions

AtanGradOp(const AtanOp&)
std::unique_ptr<Op> clone() const final
class AtanInplaceOp : public popart::ElementWiseInplaceUnaryOp

Public Functions

AtanInplaceOp(const AtanOp&)
std::unique_ptr<Op> clone() const final
class AtanOp : public popart::ElementWiseUnaryOp

Public Functions

AtanOp(const OperatorIdentifier &_opid, const Op::Settings &settings_)
std::unique_ptr<Op> clone() const final
std::vector<std::unique_ptr<Op>> getGradOps() final
std::vector<std::tuple<OperatorIdentifier, float>> inplacePriorityDefault() const final
std::unique_ptr<Op> getInplaceVariant(const OperatorIdentifier&) const final
class AutoLossScaleProxyGradOp : public popart::AutoLossScaleProxyOp

Public Functions

AutoLossScaleProxyGradOp(const AutoLossScaleProxyOp &fwdOp)
std::unique_ptr<Op> clone() const final
const std::vector<GradInOutMapper> &gradInputInfo() const final
const std::map<int, int> &gradOutToNonGradIn() const final
class AutoLossScaleProxyOp : public popart::ElementWiseUnaryOp

Subclassed by popart::AutoLossScaleProxyGradOp

Public Functions

AutoLossScaleProxyOp(const OperatorIdentifier &_opid, const Op::Settings &settings)
std::unique_ptr<Op> clone() const override
std::vector<std::unique_ptr<Op>> getGradOps() final
class AveragePoolGradOp : public popart::Op

Public Functions

AveragePoolGradOp(const AveragePoolOp&)
const std::vector<GradInOutMapper> &gradInputInfo() const final
const std::map<int, int> &gradOutToNonGradIn() const final
void setup() final
std::unique_ptr<Op> clone() const final
inline float getSubgraphValue() const final
void appendOutlineAttributes(OpSerialiserBase&) const override

Public Members

const Shape creatorSpatialK
const Shape creatorStrides
const Shape creatorLowerPads
const Shape creatorUpperPads

Public Static Functions

static inline InIndex getPrePooledInIndex()
static inline InIndex getPooledInIndex()
static inline InIndex getGradPooledInIndex()
static inline OutIndex getOutIndex()
class AveragePoolOp : public popart::HasReceptiveFieldOp

Public Functions

AveragePoolOp(const OperatorIdentifier &_opid, int64_t _countIncludePad, const std::vector<int64_t> &_kernelShape, const HasReceptiveFieldOp::ReceptiveOpAttributes &attributes, const Op::Settings &settings_)
std::unique_ptr<Op> clone() const final
std::vector<std::unique_ptr<Op>> getGradOps() final
int64_t getNOutChans() const final
void appendOutlineAttributes(OpSerialiserBase&) const override
inline float getSubgraphValue() const final
bool canBeReplacedByIdentity() const override
Shape getSpatialK() const final

Public Static Functions

static inline InIndex getInIndex()
static inline OutIndex getOutIndex()
class BaseOnnxRNNGradOp : public popart::Op

Subclassed by popart::GRUGradOp, popart::LSTMGradOp, popart::RNNGradOp

Public Functions

BaseOnnxRNNGradOp(const OperatorIdentifier &_opid, const BaseOnnxRNNOp &fwd_op)
virtual std::unique_ptr<Op> clone() const override = 0
void setup() override
const std::vector<GradInOutMapper> &gradInputInfo() const override
const std::map<int, int> &gradOutToNonGradIn() const override
bool hasLastHiddenStateGradInput() const
bool hasFullHiddenStateGradInput() const
inline float getSubgraphValue() const final

Public Members

const bool hasBiasesInput
const bool hasInitialHInput
const unsigned batch_size
const unsigned input_size
const unsigned max_seq_length
const unsigned hidden_size
const unsigned num_directions = 1

Public Static Functions

static inline InIndex getInputInIndex()
static inline InIndex getInputWeightsInIndex()
static inline InIndex getRecurrenceWeightsInIndex()
static inline InIndex getBiasesInIndex()
static inline InIndex getInitialHInIndex()
static inline InIndex getFullHiddenStateInIndex()
static inline InIndex getLastHiddenStateGradInIndex()
static inline InIndex getFullHiddenStateGradInIndex()
static inline InIndex getSequenceLensInIndex()
static inline OutIndex getInputOutIndex()
static inline OutIndex getInputWeightsOutIndex()
static inline OutIndex getRecurrenceWeightsOutIndex()
static inline OutIndex getBiasesOutIndex()
static inline OutIndex getInitialHOutIndex()
class BaseOnnxRNNOp : public popart::Op

Subclassed by popart::GRUOp, popart::LSTMOp, popart::RNNOp

Public Functions

BaseOnnxRNNOp(const OperatorIdentifier &_opid, nonstd::optional<int64_t> hidden_size, const Op::Settings &settings_)
virtual std::unique_ptr<Op> clone() const override = 0
int64_t getMaxSeqLength() const
int64_t getBatchSize() const
int64_t getInputSize() const
int64_t getHiddenSize() const
virtual int64_t getNumDirections() const
void checkHiddenSize() const
bool hasBiasesInput() const
bool hasInitialHInput() const
bool hasSeqLenInput() const
std::set<InIndex> optionalInputs() const override
void appendOutlineAttributes(OpSerialiserBase&) const override
inline float getSubgraphValue() const final
inline virtual std::string getName() const
inline nonstd::optional<int64_t> getHiddenSizeAttribute() const

Public Static Functions

static inline InIndex getInputInIndex()
static inline InIndex getInputWeightsInIndex()
static inline InIndex getRecurrenceWeightsInIndex()
static inline InIndex getBiasesInIndex()
static inline InIndex getSequenceLensInIndex()
static inline InIndex getInitialHInIndex()
static inline OutIndex getFullHiddenStateOutIndex()
static inline OutIndex getLastHiddenStateOutIndex()
class BasePadOp : public popart::Op

Subclassed by popart::BasePadOutplaceOp, popart::PadInplaceOp

Public Functions

BasePadOp(const OperatorIdentifier &_opid, const std::vector<int64_t> &_pads, const std::vector<unsigned> &_flips, float value_, const std::string &_mode, const Op::Settings &settings_)
std::unique_ptr<Op> clone() const override
bool padSizeZero() const
inline float getSubgraphValue() const final
view::Region valueRegion() const
std::vector<int64_t> padDimensions() const
inline int64_t getLowerPadding(size_t dim) const
inline int64_t getUpperPadding(size_t dim) const
inline const std::string &getMode() const
inline float getPadValue() const
void appendOutlineAttributes(OpSerialiserBase&) const override
void setup() final
view::RegMap fwdRegMap(InIndex, OutIndex) const final
view::RegMap bwdRegMap(InIndex, OutIndex) const final
inline int64_t getRank() const
std::vector<Slice> getSlices() const
inline std::vector<std::ptrdiff_t> getLowerPadding() const
inline std::vector<std::ptrdiff_t> getUpperPadding() const
inline const std::vector<int64_t> &getPads() const
inline const std::vector<unsigned> &getFlips() const
void growAliasModel(AliasModel&) const override

Public Static Functions

static inline InIndex getInIndex()
static inline OutIndex getOutIndex()
class BasePadOutplaceOp : public popart::BasePadOp

Subclassed by popart::PadOp, popart::SliceGradOp

Public Functions

BasePadOutplaceOp(const OperatorIdentifier &_opid, const std::vector<int64_t> &_pads, const std::vector<unsigned> &_flips, float value_, const std::string &_mode, const Op::Settings &settings_)
std::unique_ptr<Op> clone() const override
inline bool canBeReplacedByIdentity() const override
poprithms::memory::inplace::Proposal mapInplaceProposal(const AliasModel&, OperatorIdentifier) const override
std::vector<std::tuple<OperatorIdentifier, float>> inplacePriorityDefault() const override
std::unique_ptr<Op> getInplaceVariant(const OperatorIdentifier&) const override
class BaseSliceOp : public popart::Op

Subclassed by popart::SliceInplaceOp, popart::SliceOp

Public Functions

BaseSliceOp(const OperatorIdentifier &_opid, const std::vector<int64_t> &starts_, const std::vector<int64_t> &ends_, const std::vector<int64_t> &axes_, const std::vector<int64_t> &steps_, const Op::Settings &settings_)
std::unique_ptr<Op> clone() const override
void growAliasModel(AliasModel&) const override
void setup() final
virtual void connectInTensor(InIndex, TensorId) final
void appendOutlineAttributes(OpSerialiserBase&) const override
view::RegMap fwdRegMap(InIndex, OutIndex) const final
view::RegMap bwdRegMap(InIndex, OutIndex) const final
view::Regions uses(InIndex) const final
view::Region createSlicedRegion(const Shape &toBeSliced) const
view::Region getFullInRegion() const
view::Region getFullOutRegion() const
inline const std::vector<int64_t> &getStarts() const
inline const std::vector<int64_t> &getEnds() const
inline const std::vector<int64_t> &getAxes() const
inline const std::vector<int64_t> &getSteps() const
inline void setStarts(const std::vector<int64_t> &x)
inline void setEnds(const std::vector<int64_t> &x)
inline void setAxes(const std::vector<int64_t> &x)
inline void setSteps(const std::vector<int64_t> &x)
std::array<std::vector<int64_t>, 2> getLowerUpper() const
std::vector<Slice> getSlices(std::vector<int64_t> input_shape) const
std::vector<Slice> getSlices() const
std::vector<int64_t> getPads() const
std::vector<unsigned> getFlips() const
inline float getSubgraphValue() const final
inline bool canShard() const override

Public Members

int unwindConcatDim = 0

Public Static Functions

static inline InIndex getInIndex()
static inline InIndex getStartsInIndex()
static inline InIndex getEndsInIndex()
static inline InIndex getAxesInIndex()
static inline InIndex getStepsInIndex()
static inline OutIndex getOutIndex()
class BaseSortOp : public popart::Op

Subclassed by popart::TopKOp

Public Functions

BaseSortOp(const OperatorIdentifier &_opid, int64_t axis, const Op::Settings &settings)
std::unique_ptr<Op> clone() const override
int64_t getAxis() const
void appendOutlineAttributes(OpSerialiserBase&) const override
inline float getSubgraphValue() const final

Public Static Functions

static inline int getInIndex()
class BatchNormGradOp : public popart::Op

Public Functions

BatchNormGradOp(const BatchNormOp&)
std::unique_ptr<Op> clone() const final
const std::vector<GradInOutMapper> &gradInputInfo() const final
const std::map<int, int> &gradOutToNonGradIn() const final
void setup() final
inline float getEpsilon() const
inline int64_t getSpatial() const
void appendOutlineAttributes(OpSerialiserBase&) const override
inline float getSubgraphValue() const final

Public Static Functions

static inline InIndex getXInIndex()
static inline InIndex getScaleInIndex()
static inline InIndex getMeanInIndex()
static inline InIndex getVarInIndex()
static inline InIndex getYGradInIndex()
static inline OutIndex getXOutIndex()
static inline OutIndex getScaleOutIndex()
static inline OutIndex getBOutIndex()
class BatchNormOp : public popart::Op

Public Functions

BatchNormOp(const OperatorIdentifier &_opid, float _epsilon, float _momentum, int64_t _spatial, bool _unbiased_variance, const Op::Settings &settings)
std::unique_ptr<Op> clone() const final
std::vector<std::unique_ptr<Op>> getGradOps() final
void setup() final
inline float getSubgraphValue() const final
inline float getEpsilon() const
inline float getMomentum() const
inline int64_t getSpatial() const
inline bool useUnbiasedVariance() const
inline bool isTraining() const
void appendOutlineAttributes(OpSerialiserBase&) const override
inline bool isNorm() const override

Public Static Functions

static inline InIndex getXInIndex()
static inline InIndex getScaleInIndex()
static inline InIndex getBInIndex()
static inline InIndex getMeanInIndex()
static inline InIndex getVarInIndex()
static inline OutIndex getYOutIndex()
static inline OutIndex getMeanOutIndex()
static inline OutIndex getVarOutIndex()
static inline OutIndex getSavedMeanOutIndex()
static inline OutIndex getSavedVarOutIndex()
class BinaryComparisonOp : public popart::Op

Subclassed by popart::AndOp, popart::EqualOp, popart::GreaterOp, popart::LessOp, popart::OrOp

Public Functions

BinaryComparisonOp(const OperatorIdentifier &_opid, const Op::Settings &_settings)
std::unique_ptr<Op> clone() const override
void setup() final
inline float getSubgraphValue() const final
inline bool canShard() const override

Public Static Functions

static inline InIndex getArg0InIndex()
static inline InIndex getArg1InIndex()
static inline OutIndex getOutIndex()
class BinaryConstScalarOp : public popart::ElementWiseUnaryOp

A unary Op, which performs a binary operation (Mul, Div, etc) between its single input tensor and a scalar, whose value is stored as an Op attribute.

The input index (0 or 1) of the tensor and scalar are controlled by the scalarInIndex attribute.

Some examples. Let T be the input tensor of this Op.

[value = 2, opType = “Div”, scalarInIndex = 1]: T / 2.0

[value = 4, opType = “Pow”, scalarInIndex = 0]: 2.0 ** T

[value = 0.2, opType = “Add”, scalarInIndex = 0]: 0.2 + T

[value = 100, opType = “Sub”, scalarInIndex = 1]: T - 100.

Public Types

enum class Type

Values:

enumerator Add = 0
enumerator Sub
enumerator Mul
enumerator Div
enumerator Pow
enumerator N

Public Functions

inline BinaryConstScalarOp(const OperatorIdentifier &x, float value, Type t, int64_t index, const Op::Settings &settings)
virtual std::unique_ptr<Op> clone() const override

Return a copy of the op.

This method must be implemented. The compiler throws an error if this method is not implemented.

virtual std::vector<std::unique_ptr<Op>> getGradOps() final

Determine the corresponding grad op for each op in the forward graph to automatically generate the backward pass.

There can be a separate gradient op for each input or a single gradient op that generates gradients for all inputs.

The mapping from the index of each output tensor of the gradient op to the index of each input tensor of the non-grad op is configured using the gradOutToNonGradIn() method that should be overridden in the grad op definitions.

Throws an error if this op is already a gradient op.

inline float value() const
inline Type opType() const
inline int64_t scalarInIndex() const
class BitwiseBinaryOp : public popart::ElementWiseBinaryOp

Public Functions

BitwiseBinaryOp(const OperatorIdentifier &_opid, const Op::Settings &settings)
std::unique_ptr<Op> clone() const final
std::vector<std::unique_ptr<Op>> getGradOps() final
class BitwiseNotOp : public popart::ElementWiseUnaryOp

Public Functions

BitwiseNotOp(const OperatorIdentifier &_opid, const Op::Settings &settings)
std::unique_ptr<Op> clone() const final
std::vector<std::unique_ptr<Op>> getGradOps() final
class BoundaryOp : public popart::Op

Public Functions

inline BoundaryOp(const Op::Settings &settings_)
std::unique_ptr<Op> clone() const override
inline void setup() final
inline float getSubgraphValue() const final
inline bool isOutlineable() const override
inline bool hasSideEffect() const override
class BucketizeOp : public popart::Op

Public Functions

BucketizeOp(const OperatorIdentifier &opid, bool right, const Op::Settings &settings)
void setup() override
std::unique_ptr<Op> clone() const override
float getSubgraphValue() const override
void appendOutlineAttributes(OpSerialiserBase&) const override
bool isRight() const noexcept

Public Static Functions

static inline InIndex inIndex()
static inline InIndex boundariesInIndex()
static inline OutIndex outIndex()
class CallGradOp : public popart::CallOp

Public Functions

CallGradOp(CallOp &fwdOp, Graph &bwdGraph, const std::vector<GradInOutMapper> &gradInInfo_, const std::map<int, int> &gradOutInfo_)
const std::vector<GradInOutMapper> &gradInputInfo() const final
const std::map<int, int> &gradOutToNonGradIn() const final
class CallOp : public popart::SubgraphOp

Subclassed by popart::CallGradOp

Public Functions

CallOp(const OperatorIdentifier&, Graph &callee, const Op::Settings &settings)
CallOp(const OperatorIdentifier&, Graph &callee, const std::vector<int> &modifiedInputsViaAttrs, const Op::Settings &settings)
void setup() final
std::unique_ptr<Op> clone() const final
Graph &getCalledGraph() const override
std::vector<std::unique_ptr<Op>> getGradOps() final
std::vector<TensorId> getGradOpInputIds(const Graph &gradGraph)
void appendOutlineAttributes(OpSerialiserBase &os) const override
inline float getSubgraphValue() const final
std::vector<const Graph*> getCalledGraphs() const override
void setCalledGraph(Graph&) override
inline InIndex subgraphInToOpInIndex(InIndex index) const override
inline InIndex opInToSubgraphInIndex(InIndex index) const override
inline OutIndex subgraphOutToOpOutIndex(OutIndex index) const override
inline OutIndex opOutToSubgraphOutIndex(OutIndex index) const override
inline std::set<OutIndex> opInToOpOutIndex(InIndex in) const override
inline std::set<InIndex> opOutToOpInIndex(OutIndex out) const override
inline void growAliasModel(AliasModel &m) const override
void connectInTensor(InIndex inIndex, TensorId tenId) override
class CastGradOp : public popart::CastOp

Public Functions

CastGradOp(const CastOp &fwdOp)
std::unique_ptr<Op> clone() const final
const std::vector<GradInOutMapper> &gradInputInfo() const final
const std::map<int, int> &gradOutToNonGradIn() const final
class CastOp : public popart::Op

Subclassed by popart::CastGradOp

Public Functions

CastOp(const OperatorIdentifier &_opid, DataType _to, const Op::Settings &settings)
std::unique_ptr<Op> clone() const override
std::vector<std::unique_ptr<Op>> getGradOps() final
void setup() override
inline DataType toDataType() const
inline float getSubgraphValue() const final
inline bool canShard() const override
inline ReplicatedTensorShardingIndices getReplicatedTensorShardingIndices() const override
bool canBeReplacedByIdentity() const override

Public Static Functions

static inline InIndex getInIndex()
static inline OutIndex getOutIndex()
class CeilInplaceOp : public popart::OneWayUnaryInPlaceOp

Public Functions

CeilInplaceOp(const CeilOp&)
std::unique_ptr<Op> clone() const final
class CeilOp : public popart::OneWayUnaryOp

Public Functions

CeilOp(const OperatorIdentifier &_opid, const Op::Settings &settings_)
std::unique_ptr<Op> clone() const override
std::vector<std::tuple<OperatorIdentifier, float>> inplacePriorityDefault() const final
std::unique_ptr<Op> getInplaceVariant(const OperatorIdentifier&) const final
class ClipGradOp : public popart::ClipOp

Public Functions

ClipGradOp(const ClipOp &fwdOp)
std::unique_ptr<Op> clone() const final
const std::vector<GradInOutMapper> &gradInputInfo() const final
const std::map<int, int> &gradOutToNonGradIn() const final
std::vector<std::tuple<OperatorIdentifier, float>> inplacePriorityDefault() const final

Public Static Functions

static inline InIndex getClippedInIndex()
static inline InIndex getGradClippedInIndex()
class ClipInplaceOp : public popart::ElementWiseInplaceUnaryOp

Public Functions

ClipInplaceOp(const ClipOp&)
std::unique_ptr<Op> clone() const final
float getClipMin() const
float getClipMax() const
void appendOutlineAttributes(OpSerialiserBase&) const override
class ClipOp : public popart::ElementWiseUnaryOp

Subclassed by popart::ClipGradOp

Public Functions

ClipOp(const OperatorIdentifier &_opid, float min_, float max_, const Op::Settings &settings_)
std::unique_ptr<Op> clone() const override
std::vector<std::unique_ptr<Op>> getGradOps() final
inline void setClipMin(float value)
float getClipMin() const
inline void setClipMax(float value)
float getClipMax() const
void appendOutlineAttributes(OpSerialiserBase&) const override
std::vector<std::tuple<OperatorIdentifier, float>> inplacePriorityDefault() const override
std::unique_ptr<Op> getInplaceVariant(const OperatorIdentifier&) const final
bool canBeReplacedByIdentity() const override

Public Static Functions

static inline InIndex clip11MinInputIndex()
static inline InIndex clip11MaxInputIndex()
class CollectivesBaseOp : public popart::Op

Subclassed by popart::MultiCollectiveBaseOp, popart::ReplicatedAllGatherOp, popart::ReplicatedAllReduceOp, popart::ReplicatedReduceScatterOp

Public Functions

CollectivesBaseOp(const OperatorIdentifier &_opid, CommGroup group, const Op::Settings &settings_)
CollectivesBaseOp(const OperatorIdentifier &_opid, const ReplicaGrouping &grouping, const Op::Settings &settings_)
virtual std::unique_ptr<Op> clone() const override = 0
virtual bool hasCorrespondingLinkedIndexTensor(Tensor *t)
inline bool hasCorrespondingLinkedIndexTensor(InIndex in)
virtual Tensor *getCorrespondingLinkedIndexTensor(Tensor *t)
inline Tensor *getCorrespondingLinkedIndexTensor(InIndex in)
virtual bool isCollectiveLinkedIndexTensor(InIndex in) const
virtual bool isCollectiveLinkedIndexTensor(Tensor *t) const
inline void setGCLCommGroup(CommGroup group)
inline CommGroup getGCLCommGroup() const
void setReplicaGrouping(const ReplicaGrouping &grouping)
const ReplicaGrouping &getReplicaGrouping() const
virtual int64_t getCommSize() const

Number of replicas the collective communicates across.

This will be used to create a CollectiveBalanceReorder in lowering to improve the tile mapping when using RTS.

void appendOutlineAttributes(OpSerialiserBase &os) const override
inline virtual bool isConfigureOutputForReplicatedTensorSharding() const

Check Replicated tensor sharding (RTS) mode Collective operations setup for RTS are allowed to scramble the data element order of the input (AllGather) / output (ReduceScatter) tensor such that the tensor layouts minimize inter-tile exchanges.

As a consequence, the RTS sharded tensor does not follow the original data order and can only be used in elementwise, RTS-enabled operations, such as optimizers, where all inputs consumed are rearranged in the same way.

Returns

True if this operation is configured for replicated tensor sharding

Public Static Functions

static inline InIndex getInIndex()
static inline InIndex getCollectiveLinkedIndex()
static inline OutIndex getOutIndex()
static inline ReplicatedTensorShardingIndicesIndex getDefaultTensorShardingGroupIndex()
class ConcatGradOp : public popart::Op

Public Functions

ConcatGradOp(const ConcatOp &op, InIndex input)
ConcatGradOp(const ConcatInplaceOp &op, InIndex input)
std::unique_ptr<Op> clone() const override
void setup() override
void appendOutlineAttributes(OpSerialiserBase&) const override
const std::vector<GradInOutMapper> &gradInputInfo() const final
const std::map<int, int> &gradOutToNonGradIn() const final
int64_t getAxis() const
int64_t getStart() const
int64_t getEnd() const
inline float getSubgraphValue() const final
inline bool canShard() const override
inline ReductionType getShardReductionType(OutIndex index) const override
void configureShardedOp(Op *const shardedOp, const Settings *const settings_) const override

Public Static Functions

static inline InIndex getInIndex()
static inline OutIndex getOutIndex()
class ConcatInplaceOp : public popart::ConcatOp

Public Functions

ConcatInplaceOp(int64_t axis_, const Op::Settings &settings)
ConcatInplaceOp(const ConcatOp &concatOp, int64_t axis_)
std::unique_ptr<Op> clone() const override
inline std::vector<std::tuple<OperatorIdentifier, float>> inplacePriorityDefault() const final
inline std::unique_ptr<Op> getInplaceVariant(const OperatorIdentifier &o) const final
inline view::Regions aliases(InIndex in, OutIndex) const final
class ConcatOp : public popart::Op

Subclassed by popart::ConcatInplaceOp

Public Functions

ConcatOp(const OperatorIdentifier &_opid, int64_t axis_, const Op::Settings &settings)
std::unique_ptr<Op> clone() const override
void setup() final
std::vector<std::unique_ptr<Op>> getGradOps() final
int64_t getAxis() const
std::vector<std::tuple<OperatorIdentifier, float>> inplacePriorityDefault() const override
std::unique_ptr<Op> getInplaceVariant(const OperatorIdentifier&) const override
view::RegMap fwdRegMap(InIndex, OutIndex) const final
view::RegMap bwdRegMap(InIndex, OutIndex) const final
void appendOutlineAttributes(OpSerialiserBase&) const override
bool canBeReplacedByIdentity() const override
inline float getSubgraphValue() const final
inline bool canShard() const override
void growAliasModel(AliasModel&) const override
poprithms::memory::inplace::Proposal mapInplaceProposal(const AliasModel&, OperatorIdentifier) const override

Public Static Functions

static inline InIndex getInIndex(InIndex index)
static inline OutIndex getOutIndex()
static Shape getOutputShape(int64_t axis, const std::vector<const Shape*> inputs)
class ConvDataGradOp : public popart::MultiConvDataGradBaseOp

Public Functions

ConvDataGradOp(const ConvOp&)
std::unique_ptr<Op> clone() const final
inline int numConvs() const override
inline const ConvParameters &getParameters() const

Public Static Functions

static inline InIndex getWeightsInIndex()
static inline InIndex getGradConvolvedInIndex()
static inline OutIndex getOutIndex()
class ConvFlipWeightsGradOp : public popart::ConvFlipWeightsOp

Public Functions

ConvFlipWeightsGradOp(const ConvFlipWeightsGradOp&) = default
ConvFlipWeightsGradOp(const ConvFlipWeightsOp &convFlipWeightsOp)
std::unique_ptr<Op> clone() const final
const std::vector<GradInOutMapper> &gradInputInfo() const final
const std::map<int, int> &gradOutToNonGradIn() const final
class ConvFlipWeightsOp : public popart::Op

Subclassed by popart::ConvFlipWeightsGradOp

Public Functions

ConvFlipWeightsOp(const ConvFlipWeightsOp&) = default
ConvFlipWeightsOp(const OperatorIdentifier &_opid, const Op::Settings &settings_)
~ConvFlipWeightsOp() override
std::unique_ptr<Op> clone() const override
void setup() final
std::vector<std::unique_ptr<Op>> getGradOps() final
inline const ConvParameters &getParameters() const
inline void setParameters(const ConvParameters &p)
inline bool getGroupReshape() const
inline void setGroupReshape(bool reshape)
inline float getSubgraphValue() const final
void appendOutlineAttributes(OpSerialiserBase &os) const final
inline void setConvOptions(const MultiConvOptions &opts)
inline const MultiConvOptions &getMultiConvOptions() const
inline std::map<std::string, std::string> getConvOptions() const

Public Static Functions

static inline InIndex getInIndex()
static inline OutIndex getOutIndex()
class ConvOp : public popart::MultiConvBaseOp

Public Functions

ConvOp(const OperatorIdentifier &_opid, const Settings &settings_, std::vector<int64_t> strides, std::vector<int64_t> pads, std::vector<int64_t> dilations, int64_t group, const AutoPad &padType, const MultiConvOptions &convOpts)
std::unique_ptr<Op> clone() const final
std::vector<std::unique_ptr<Op>> getGradOps() final
void setup() final
inline int numConvs() const final
inline int64_t getGroups() const
inline void setGroup()
inline int64_t getNInChans() const
inline int64_t getNOutChans() const
inline ConvParameters getParameters() const
void restoreAttributesFromParams(const std::vector<ConvParameters>&) override
bool isPow2ScaledConv() const

Returns true if and only if the inputs to the op constitute a valid set of inputs for a fused (float8) convolution.

inline std::set<InIndex> optionalInputs() const override

Public Static Functions

static inline InIndex getDataInIndex()
static inline InIndex getWeightsInIndex()
static inline InIndex getLog2ScaleInIndex()
static inline OutIndex getOutIndex()
class ConvTransposeOp : public popart::Op

Public Functions

ConvTransposeOp(const OperatorIdentifier &_opid, const Settings &settings_, std::vector<int64_t> strides, std::vector<int64_t> pads, std::vector<int64_t> dilations, int64_t group, const AutoPad &padType, std::vector<int64_t> outputPadding, Shape outputShape, const MultiConvOptions &convOpts)
std::unique_ptr<Op> clone() const override
void setup() final
inline float getSubgraphValue() const final
bool isPow2ScaledConvTranspose() const
inline std::set<InIndex> optionalInputs() const override

Public Members

std::vector<int64_t> strides
std::vector<int64_t> dilations
int64_t group
const AutoPad padType
const MultiConvOptions convOpts
ConvParameters params

Public Static Functions

static inline InIndex getInIndex()
static inline InIndex getWeightsInIndex()
static inline InIndex getLog2ScaleInIndex()
static inline OutIndex getOutIndex()
class ConvWeightsGradOp : public popart::MultiConvWeightsGradBaseOp

Public Functions

ConvWeightsGradOp(const ConvOp&)
std::unique_ptr<Op> clone() const final
ConvWeightsGradOp(const ConvWeightsGradOp&) = default
inline int numConvs() const final
inline const ConvParameters &getParameters() const

Public Static Functions

static inline InIndex getGradConvolvedInIndex()
static inline InIndex getPreConvolvedInIndex()
static inline OutIndex getOutIndex()
class CosGradOp : public popart::ElementWiseNonLinearUnaryGradOp

Public Functions

CosGradOp(const CosOp &fwdOp)
std::unique_ptr<Op> clone() const final
class CosOp : public popart::ElementWiseUnaryOp

Public Functions

CosOp(const OperatorIdentifier &_opid, const Op::Settings&)
std::unique_ptr<Op> clone() const override
std::vector<std::unique_ptr<Op>> getGradOps() final

Public Static Functions

static OperatorIdentifier getOpId(const Ir &ir)
class CoshOp : public popart::Op

Public Functions

CoshOp(const OperatorIdentifier &_opid, const Op::Settings &settings_)
std::unique_ptr<Op> clone() const override
std::vector<std::unique_ptr<Op>> getGradOps() final
void setup() final
inline float getSubgraphValue() const final

Public Static Functions

static inline InIndex getInIndex()
static inline OutIndex getOutIndex()
class CtcBeamSearchDecoderOp : public popart::Op

Public Functions

CtcBeamSearchDecoderOp(const popart::OperatorIdentifier &_opid, unsigned _blankClass, unsigned _beamWidth, unsigned _topPaths, const popart::Op::Settings &settings_)
std::unique_ptr<Op> clone() const final
void setup() final
void appendAttributes(popart::OpSerialiserBase &os) const override
void appendOutlineAttributes(popart::OpSerialiserBase &os) const override
std::vector<std::unique_ptr<Op>> getGradOps() final
float getSubgraphValue() const final
bool requiresRandomSeed() const override
inline unsigned getBlankClass() const
inline unsigned getBeamWidth() const
inline unsigned getTopPaths() const
inline unsigned getMaxTime() const
inline unsigned getBatchSize() const
inline unsigned getNumClasses() const

Public Static Functions

static inline InIndex getLogProbsInIndex()
static inline InIndex getDataLengthsInIndex()
static inline OutIndex getLabelProbsOutIndex()
static inline OutIndex getLabelLengthsOutIndex()
static inline OutIndex getDecodedLabelsOutIndex()
class CtcGradOp : public popart::Op

Public Functions

CtcGradOp(const CtcOp&)
const std::vector<GradInOutMapper> &gradInputInfo() const final
const std::map<int, int> &gradOutToNonGradIn() const final
void setup() final
std::unique_ptr<Op> clone() const final
inline float getSubgraphValue() const final
inline ReductionType getReductionType() const
virtual void appendOutlineAttributes(OpSerialiserBase&) const final
inline bool canShard() const override
inline bool getEnableReducedClassesInLabel() const

Public Static Functions

static inline InIndex getLogProbsGradientWrtCtcLossInIndex()
static inline InIndex getTargetLengthsInIndex()
static inline InIndex getCtcLossGradientInIndex()
static inline OutIndex getLogProbsGradientOutIndex()
class CtcOp : public popart::LossOp

Public Functions

CtcOp(const OperatorIdentifier &_opid, const ReductionType reduction, const unsigned blank, const bool zeroInfinity, const Op::Settings &settings_, const bool enableReducedClassesInLabel, const DataType outDataType = DataType::UNDEFINED)
std::unique_ptr<Op> clone() const final
std::vector<std::unique_ptr<Op>> getGradOps() final
void setup() final
inline float getSubgraphValue() const final
inline unsigned getBlank() const
inline bool getZeroInfinity() const
virtual void appendOutlineAttributes(OpSerialiserBase&) const final
unsigned getBatchSize() const
unsigned getMaxInputLength() const
unsigned getMaxTargetLength() const
unsigned getNumClasses() const
inline bool canShard() const override
inline bool getEnableReducedClassesInLabel() const

Public Static Functions

static inline InIndex getLogProbsInIndex()
static inline InIndex getTargetsInIndex()
static inline InIndex getInputLengthsInIndex()
static inline InIndex getTargetLengthsInIndex()
static inline OutIndex getCtcLossOutIndex()
static inline OutIndex getLogProbsGradientWrtCtcLossOutIndex()
class CumSumGradOp : public popart::Op

Public Functions

CumSumGradOp(const CumSumOp &op, bool exclusive, bool reverse, int64_t axis)
std::unique_ptr<Op> clone() const override
const std::vector<GradInOutMapper> &gradInputInfo() const final
const std::map<int, int> &gradOutToNonGradIn() const final
void setup() final
bool getExclusive() const
bool getReverse() const
int64_t getAxis() const
inline float getSubgraphValue() const final

Public Static Functions

static inline InIndex outGradXInIndex()
static inline InIndex fwdXInIndex()
static inline OutIndex outIndex()
class CumSumOp : public popart::Op

Public Functions

CumSumOp(const OperatorIdentifier &_opid, bool exclusive_, bool reverse_, const Op::Settings &settings_)
std::unique_ptr<Op> clone() const override
std::vector<std::unique_ptr<Op>> getGradOps() override
void setup() final
bool getExclusive() const
bool getReverse() const
int64_t getAxis() const
inline float getSubgraphValue() const final

Public Static Functions

static inline InIndex xInIndex()
static inline InIndex axisInIndex()
static inline OutIndex outIndex()
class DetachInplaceOp : public popart::DetachOp

Public Functions

DetachInplaceOp(const DetachOp &detachOp)
DetachInplaceOp(const Op::Settings &settings)
std::unique_ptr<Op> clone() const override
inline view::Regions aliases(InIndex in, OutIndex) const final
class DetachOp : public popart::ElementWiseUnaryOp

Subclassed by popart::DetachInplaceOp

Public Functions

DetachOp(const OperatorIdentifier &_opid, const Op::Settings &settings)
inline std::vector<std::unique_ptr<Op>> getGradOps() final
std::unique_ptr<Op> clone() const override
std::unique_ptr<Op> getInplaceVariant(const OperatorIdentifier &o) const final
std::vector<std::tuple<OperatorIdentifier, float>> inplacePriorityDefault() const final
inline bool isIdentity() const final
inline bool isOutplaceViewChange() const override
class DivArg0GradOp : public popart::ElementWiseBinaryArg0GradOp

Public Functions

DivArg0GradOp(const Op&, const std::vector<int64_t> &_reduction_axes)
std::unique_ptr<Op> clone() const final
class DivArg1GradOp : public popart::ElementWiseBinaryArg1GradOp

Public Functions

DivArg1GradOp(const Op&, const std::vector<int64_t> &_reduction_axes)
std::unique_ptr<Op> clone() const final
class DropoutBaseOp : public popart::RandomBaseOp

Subclassed by popart::DropoutOp, popart::ShapedDropoutOp

Public Functions

DropoutBaseOp(const OperatorIdentifier &_opid, float ratio_, const Op::Settings &settings_)
std::unique_ptr<Op> clone() const override
bool canBeReplacedByIdentity() const override
inline float getRatio() const
inline void setRatio(float r)
inline InIndex getSeedInIndex() const override
inline bool canShard() const override
void configureShardedOp(Op *const shardedOp, const Settings *const settings_) const override

Public Static Functions

static inline InIndex getInIndex()
static inline OutIndex getOutIndex()
static float validateRatioAttribute(const OpCreatorInfo &info)
class DropoutOp : public popart::DropoutBaseOp

Subclassed by popart::DropoutGradOp

Public Functions

DropoutOp(const OperatorIdentifier &_opid, float ratio_, const Op::Settings &settings_)
std::unique_ptr<Op> clone() const override
std::vector<std::unique_ptr<Op>> getGradOps() override
void setup() override
bool canBeReplacedByIdentity() const override
void appendAttributes(OpSerialiserBase &os) const override
inline void setOutputMask(bool v)
inline bool getOutputMask() const
void appendOutlineAttributes(OpSerialiserBase&) const override
inline void setReferenceId(RandomReferenceId id)
inline RandomReferenceId getReferenceId() const
TensorId getReferenceTensorId()

Public Static Functions

static inline OutIndex getMaskOutIndex()
class DropoutGradOp : public popart::DropoutOp

Public Functions

DropoutGradOp(const DropoutOp &fwdOp)
std::unique_ptr<Op> clone() const override
const std::vector<GradInOutMapper> &gradInputInfo() const override
const std::map<int, int> &gradOutToNonGradIn() const override

Public Static Functions

static inline InIndex getGradInIndex()
static inline OutIndex getOutIndex()
class DynamicAddInplaceOp : public popart::DynamicTernaryBaseInplaceOp

Public Functions

DynamicAddInplaceOp(const DynamicAddOp &dynamicAddOp)
DynamicAddInplaceOp(const OperatorIdentifier &_opid, std::vector<int64_t> axes_, std::vector<int64_t> sizes_, bool noOverlap_, const Op::Settings &settings_, TensorInfo updateInInfo_ = TensorInfo())
std::unique_ptr<Op> clone() const final
class DynamicAddOp : public popart::DynamicTernaryBaseOp

Public Functions

DynamicAddOp(const OperatorIdentifier &_opid, std::vector<int64_t> axes_, std::vector<int64_t> sizes_, bool noOverlap_, const Op::Settings &settings_, TensorInfo updateInInfo_ = TensorInfo())
std::unique_ptr<Op> clone() const final
std::vector<std::unique_ptr<Op>> getGradOps() final
std::unique_ptr<Op> getInplaceVariant(const OperatorIdentifier&) const final
std::vector<std::tuple<OperatorIdentifier, float>> inplacePriorityDefault() const override
class DynamicBaseOp : public popart::Op

Dynamic Base Op.

Base class for operators acting on a run-time selectable slice of a tensor.

The word “dynamic” refers to the fact that the index can be specified during runtime, where index

is the second tensor argument of this operator as specified in

A slice along an axis can be defined as by the tuple (

start, stop, step ) start - will be equal the index for the respective axis stop - will be equal index + size for the respective axis step - will equal 1

See also

graphcoreoperators.hpp. The axes specifies along which axes the tensor should be sliced. The size specifies the size of the slices.

Limitations: Assuming we would like to slice A with dimension (4, 3)

  • Step other than 1 is not supported (i.e. A[::2,:] is not supported)

  • Negative slicing is not supported (i.e. A[:-1,:] is not supported)

  • stop greater than the size of the axis is not supported (i.e. A[:5,:] is not supported)

Example: Given a Tensor A with shape (3, 2, 4, 5) If we specify axes = {1, 3} (i.e. we will slice the first and third axis [counting from 0]) the operator will operate on A[:, index[0]:(index[0]+size[0]), :, index[1]:(index[1]+size[1])] If we instead specify axes = {0, 1, 3} the operator will operate on A[index[0]:(index[0]+size[0]), index[1]:(index[1]+size[1]), :, index[2]:(index[2]+size[2])]

Subclassed by popart::DynamicBinaryBaseOp, popart::DynamicSliceBaseOp, popart::DynamicSlicePadGradOp

Public Functions

DynamicBaseOp(const OperatorIdentifier &_opid, std::vector<int64_t> axes_, std::vector<int64_t> sizes_, bool noOverlap_, const Op::Settings&)
virtual std::unique_ptr<Op> clone() const override

Return a copy of the op.

This method must be implemented. The compiler throws an error if this method is not implemented.

virtual void setup() override

Set the shape and type of the arguments to the op.

This MUST set the type and shape information for all the output TensorInfo objects.

inline virtual float getSubgraphValue() const final

Get the subgraph value.

This is used by outlining algorithm to determine whether or not to outline ops. There are high bounding values retrieved by getHighSubgraphValue() (for expensive ops such as Conv) or low bounding values retrieved by getLowSubgraphValue() (for inexpensive ops such as Relu).

Returns

The subgraph value. Default: 0.

inline const std::vector<int64_t> &getAxes() const
inline void setAxes(const std::vector<int64_t> &x)
inline const std::vector<int64_t> &getSizes() const
inline void setSizes(const std::vector<int64_t> &x)
inline bool isNotOverlapping() const
TensorInfo createOutInfo() const
virtual void appendOutlineAttributes(OpSerialiserBase&) const override

Append the op attributes that are relevant for outlining ops.

Ops should override this function if there are additional attributes. Two ops with identical type and outline attributes can be outlined and are supposed to be functionally equivalent.

Parameters

OpSerialiserBase – The stream to which the attributes should be appended.

Public Static Functions

static inline InIndex getIndexInIndex()
static inline OutIndex getOutIndex()
class DynamicBinaryBaseInplaceOp : public popart::DynamicBinaryBaseOp

Subclassed by popart::DynamicZeroInplaceOp

Public Functions

DynamicBinaryBaseInplaceOp(const OperatorIdentifier &_opid, std::vector<int64_t> axes_, std::vector<int64_t> sizes_, bool noOverlap_, const Op::Settings &settings_, TensorInfo updateInInfo_ = TensorInfo())
std::unique_ptr<Op> clone() const override
view::RegMap fwdRegMap(InIndex, OutIndex) const final
view::RegMap bwdRegMap(InIndex, OutIndex) const final
view::Regions aliases(InIndex, OutIndex) const final
view::Regions modifies(InIndex) const final
class DynamicBinaryBaseOp : public popart::DynamicBaseOp

Dynamic Binary Base Op.

Base class for operators acting on a run-time selectable slice of a tensor. The word “binary” refers to the fact that the operator takes two tensors as input.

See also

DynamicBaseOp for details

Subclassed by popart::DynamicBinaryBaseInplaceOp, popart::DynamicTernaryBaseOp, popart::DynamicUpdateToUpdateGradOp, popart::DynamicZeroGradOp, popart::DynamicZeroOp

Public Functions

DynamicBinaryBaseOp(const OperatorIdentifier &_opid, std::vector<int64_t> axes_, std::vector<int64_t> sizes_, bool noOverlap_, const Op::Settings &settings_, TensorInfo updateInInfo_ = TensorInfo())
virtual std::unique_ptr<Op> clone() const override

Return a copy of the op.

This method must be implemented. The compiler throws an error if this method is not implemented.

virtual void setup() final

Set the shape and type of the arguments to the op.

This MUST set the type and shape information for all the output TensorInfo objects.

inline const TensorInfo &getUpdateTensorInfo() const
virtual void growAliasModel(AliasModel &m) const final

For certain tasks which involve analysing how tensors alias each other, such as inplacing, a poprithms::memory::inplace::Graph that corresponds to this op’s graph is constructed.

The Poprithms graph can then be queried for aliasing information, and can have algorithms run on it.

To construct the Poprithms graph, each PopART op defines what its Poprithms equivalent ops are. This method inserts this op’s poprithms::memory::inplace::Op equivalents into the Poprithms Graph, which is the container popAliaser.

See also

AliasModel.

Parameters

aliasModel – The mapping between this op’s (PopART) graph and the Poprithms graph.

Pre

All input tensors of this op have mappings in aliasModel before the call to aliasModel.

Post

All output tensors of this op have mappings in aliasModel after to the call to aliasModel.

virtual poprithms::memory::inplace::Proposal mapInplaceProposal(const AliasModel&, OperatorIdentifier) const override

Translate a PopART inplacing proposal.

This replaces an outplace op with an inplace op of type inplaceId, into an AliasModel equivalent.

This method is defined as a void method which sets a value passed by reference, as opposed to a getter method, so that no Poprithms headers need to be included in this file.

Parameters
  • aliasModel – The mapping between this op’s (PopART) graph and the Poprithms graph.

  • 2 – The operator identifier to translate to the AliasModel equivalent.

Returns

A tuple where the first element corresponds to an alias gate in the AliasModel and the second element is a input index.

Public Static Functions

static inline InIndex getUpdateInIndex()
static inline InIndex getIndexInIndex()
static inline OutIndex getOutIndex()
class DynamicSliceBaseOp : public popart::DynamicBaseOp

Subclassed by popart::DynamicSliceOp, popart::DynamicUpdateUpdaterGradOp

Public Functions

DynamicSliceBaseOp(const OperatorIdentifier &_opid, std::vector<int64_t> axes_, std::vector<int64_t> sizes_, bool noOverlap_, const Op::Settings&)
std::unique_ptr<Op> clone() const override
void setup() final
TensorInfo createOutInfo() const

Public Static Functions

static inline InIndex getInIndex()
class DynamicSliceInplaceOp : public popart::DynamicSliceOp

Dynamic Slice Inplace Op.

This Op takes two or three TensorIds as input (as indicated in

  1. The TensorId of tensor to slice from.

  2. The (optional) TensorId of the index of the starting point of the slice (

    See also

    DynamicBaseOp for explanation).

  3. The TensorId of the tensor to write the slice into (not used in outplace variant).

See also

graphcoreoperators.hpp)

The output is the TensorId of the sliced tensor, aliased

Public Functions

DynamicSliceInplaceOp(const OperatorIdentifier &_opid, std::vector<int64_t> axes_, std::vector<int64_t> sizes_, bool noOverlap_, const Op::Settings&)
DynamicSliceInplaceOp(const DynamicSliceOp&)
virtual std::unique_ptr<Op> clone() const final

Return a copy of the op.

This method must be implemented. The compiler throws an error if this method is not implemented.

virtual std::vector<std::tuple<OperatorIdentifier, float>> inplacePriorityDefault() const final

Return the variants of this op (if any) which can modify / alias the inputs at the given indices.

This function doesn’t check for anchor violations or topological order violations. When there are several ops, they should be returned in descending order of preference If the op can be replaced by an in-place variant of itself, this method should be overridden to return a vector of <OperatorIdentifier, float> tuples in descending order of preference.

virtual std::unique_ptr<Op> getInplaceVariant(const OperatorIdentifier &o) const final

Instantiate a particular in-place variant of the op with a specified OperatorIdentifier from the vector returned by inplacePriorityDefault().

Parameters

OperatorIdentifier – The operator identifier of the op to be instantiated.

Returns

An instance of the required op.

virtual view::RegMap fwdRegMap(InIndex, OutIndex) const final

Map regions of the input tensor at the input index to the regions of the output tensor at the output index that these input regions alias.

Parameters
  • InIndex – The op input index.

  • OutIndex – The op output index.

virtual view::RegMap bwdRegMap(InIndex, OutIndex) const final

Map regions of the output tensor at the output index to the regions of the input tensor at the input index that these output regions alias.

Parameters
  • InIndex – The op input index.

  • OutIndex – The op output index.

virtual view::Regions modifies(InIndex) const override

Return the input region which this op modifies (for inplace ops).

Parameters

InIndex – The input index.

Returns

The regions which this op modifies.

virtual view::Regions aliases(InIndex, OutIndex) const override

Return the input region which the op output will alias (for inplace and view-changing ops).

See also

For more information on views, refer to the IPU Programmer’s Guide.

Parameters
  • InIndex – The input index.

  • OutIndex – The output index.

Returns

The regions which the output will alias.

class DynamicSliceOp : public popart::DynamicSliceBaseOp

Dynamic Slice Op.

This Op takes two or three TensorIds as input (as indicated in

  1. The TensorId of tensor to slice from.

  2. The (optional) TensorId of the index of the starting point of the slice (

    See also

    DynamicBaseOp for explanation).

  3. The TensorId of the tensor to write the slice into (not used in outplace variant).

See also

graphcoreoperators.hpp)

The output is the TensorId of the sliced tensor.

Subclassed by popart::DynamicSliceInplaceOp

Public Functions

DynamicSliceOp(const OperatorIdentifier &_opid, std::vector<int64_t> axes_, std::vector<int64_t> sizes_, bool noOverlap_, const Op::Settings&)
virtual std::unique_ptr<Op> clone() const override

Return a copy of the op.

This method must be implemented. The compiler throws an error if this method is not implemented.

virtual std::vector<std::unique_ptr<Op>> getGradOps() final

Determine the corresponding grad op for each op in the forward graph to automatically generate the backward pass.

There can be a separate gradient op for each input or a single gradient op that generates gradients for all inputs.

The mapping from the index of each output tensor of the gradient op to the index of each input tensor of the non-grad op is configured using the gradOutToNonGradIn() method that should be overridden in the grad op definitions.

Throws an error if this op is already a gradient op.

virtual std::vector<std::tuple<OperatorIdentifier, float>> inplacePriorityDefault() const override

Return the variants of this op (if any) which can modify / alias the inputs at the given indices.

This function doesn’t check for anchor violations or topological order violations. When there are several ops, they should be returned in descending order of preference If the op can be replaced by an in-place variant of itself, this method should be overridden to return a vector of <OperatorIdentifier, float> tuples in descending order of preference.

virtual std::unique_ptr<Op> getInplaceVariant(const OperatorIdentifier&) const override

Instantiate a particular in-place variant of the op with a specified OperatorIdentifier from the vector returned by inplacePriorityDefault().

Parameters

OperatorIdentifier – The operator identifier of the op to be instantiated.

Returns

An instance of the required op.

virtual void growAliasModel(AliasModel&) const override

For certain tasks which involve analysing how tensors alias each other, such as inplacing, a poprithms::memory::inplace::Graph that corresponds to this op’s graph is constructed.

The Poprithms graph can then be queried for aliasing information, and can have algorithms run on it.

To construct the Poprithms graph, each PopART op defines what its Poprithms equivalent ops are. This method inserts this op’s poprithms::memory::inplace::Op equivalents into the Poprithms Graph, which is the container popAliaser.

See also

AliasModel.

Parameters

aliasModel – The mapping between this op’s (PopART) graph and the Poprithms graph.

Pre

All input tensors of this op have mappings in aliasModel before the call to aliasModel.

Post

All output tensors of this op have mappings in aliasModel after to the call to aliasModel.

virtual poprithms::memory::inplace::Proposal mapInplaceProposal(const AliasModel&, OperatorIdentifier) const override

Translate a PopART inplacing proposal.

This replaces an outplace op with an inplace op of type inplaceId, into an AliasModel equivalent.

This method is defined as a void method which sets a value passed by reference, as opposed to a getter method, so that no Poprithms headers need to be included in this file.

Parameters
  • aliasModel – The mapping between this op’s (PopART) graph and the Poprithms graph.

  • 2 – The operator identifier to translate to the AliasModel equivalent.

Returns

A tuple where the first element corresponds to an alias gate in the AliasModel and the second element is a input index.

Public Static Functions

static inline InIndex getSliceInIndex()
class DynamicSlicePadGradOp : public popart::DynamicBaseOp

Public Functions

DynamicSlicePadGradOp(const OperatorIdentifier &_opid, std::vector<int64_t> axes_, std::vector<int64_t> sizes_, bool noOverlap_, const Op::Settings &settings_, TensorInfo updateInInfo_ = TensorInfo())
void setup() final
std::unique_ptr<Op> clone() const final
const std::vector<GradInOutMapper> &gradInputInfo() const final
const std::map<int, int> &gradOutToNonGradIn() const override
inline std::set<InIndex> optionalInputs() const override

Public Static Functions

static inline InIndex getInIndex()
class DynamicTernaryBaseInplaceOp : public popart::DynamicTernaryBaseOp

Subclassed by popart::DynamicAddInplaceOp, popart::DynamicUpdateInplaceOp

Public Functions

DynamicTernaryBaseInplaceOp(const OperatorIdentifier &_opid, std::vector<int64_t> axes_, std::vector<int64_t> sizes_, bool noOverlap_, const Op::Settings &settings_, TensorInfo updateInInfo_ = TensorInfo())
std::unique_ptr<Op> clone() const override
view::RegMap fwdRegMap(InIndex, OutIndex) const final
view::RegMap bwdRegMap(InIndex, OutIndex) const final
view::Regions aliases(InIndex, OutIndex) const final
view::Regions modifies(InIndex) const final
class DynamicTernaryBaseOp : public popart::DynamicBinaryBaseOp

Dynamic Ternary Base Op.

Base class for operators acting on a run-time selectable slice of a tensor. The word “ternary” refers to the fact that the operator takes three tensors as input.

See also

DynamicBaseOp for details

Subclassed by popart::DynamicAddOp, popart::DynamicTernaryBaseInplaceOp, popart::DynamicUpdateOp

Public Functions

DynamicTernaryBaseOp(const OperatorIdentifier &_opid, std::vector<int64_t> axes_, std::vector<int64_t> sizes_, bool noOverlap_, const Op::Settings &settings_, TensorInfo updateInInfo_ = TensorInfo())
virtual std::unique_ptr<Op> clone() const override

Return a copy of the op.

This method must be implemented. The compiler throws an error if this method is not implemented.

Public Static Functions

static inline InIndex getUpdateInIndex()
static inline InIndex getInIndex()
class DynamicUpdateInplaceOp : public popart::DynamicTernaryBaseInplaceOp

Public Functions

DynamicUpdateInplaceOp(const DynamicUpdateOp &dynamicUpdateOp)
DynamicUpdateInplaceOp(const OperatorIdentifier &_opid, std::vector<int64_t> axes_, std::vector<int64_t> sizes_, bool noOverlap_, const Op::Settings &settings_, TensorInfo updateInInfo_ = TensorInfo())
std::unique_ptr<Op> clone() const final
class DynamicUpdateOp : public popart::DynamicTernaryBaseOp

Dynamic Update Op.

This class takes three TensorIds as input (as indicated in

  1. The TensorId of the tensor to be updated.

  2. The TensorId of the index of the starting point of the slice (

    See also

    DynamicBaseOp for explanation).

  3. The TensorId to update with (must match dimension with ( index, axes, sizes )).

See also

graphcoreoperators.hpp)

The output is the TensorId of the updated tensor.

See also

DynamicTernaryBaseOp for details.

Public Functions

DynamicUpdateOp(const OperatorIdentifier &_opid, std::vector<int64_t> axes_, std::vector<int64_t> sizes_, bool noOverlap_, const Op::Settings &settings_, TensorInfo updateInInfo_ = TensorInfo())
virtual std::unique_ptr<Op> clone() const final

Return a copy of the op.

This method must be implemented. The compiler throws an error if this method is not implemented.

virtual std::vector<std::unique_ptr<Op>> getGradOps() final

Determine the corresponding grad op for each op in the forward graph to automatically generate the backward pass.

There can be a separate gradient op for each input or a single gradient op that generates gradients for all inputs.

The mapping from the index of each output tensor of the gradient op to the index of each input tensor of the non-grad op is configured using the gradOutToNonGradIn() method that should be overridden in the grad op definitions.

Throws an error if this op is already a gradient op.

virtual std::unique_ptr<Op> getInplaceVariant(const OperatorIdentifier&) const final

Instantiate a particular in-place variant of the op with a specified OperatorIdentifier from the vector returned by inplacePriorityDefault().

Parameters

OperatorIdentifier – The operator identifier of the op to be instantiated.

Returns

An instance of the required op.

virtual std::vector<std::tuple<OperatorIdentifier, float>> inplacePriorityDefault() const override

Return the variants of this op (if any) which can modify / alias the inputs at the given indices.

This function doesn’t check for anchor violations or topological order violations. When there are several ops, they should be returned in descending order of preference If the op can be replaced by an in-place variant of itself, this method should be overridden to return a vector of <OperatorIdentifier, float> tuples in descending order of preference.

class DynamicUpdateToUpdateGradOp : public popart::DynamicBinaryBaseOp

Public Functions

DynamicUpdateToUpdateGradOp(const OperatorIdentifier &_opid, std::vector<int64_t> axes_, std::vector<int64_t> sizes_, bool noOverlap_, const Op::Settings&)
std::unique_ptr<Op> clone() const final
const std::vector<GradInOutMapper> &gradInputInfo() const final
const std::map<int, int> &gradOutToNonGradIn() const final
class DynamicUpdateUpdaterGradOp : public popart::DynamicSliceBaseOp

Public Functions

DynamicUpdateUpdaterGradOp(const OperatorIdentifier &_opid, std::vector<int64_t> axes_, std::vector<int64_t> sizes_, bool noOverlap_, const Op::Settings&)
std::unique_ptr<Op> clone() const final
const std::vector<GradInOutMapper> &gradInputInfo() const final
const std::map<int, int> &gradOutToNonGradIn() const final
class DynamicZeroGradOp : public popart::DynamicBinaryBaseOp

Public Functions

DynamicZeroGradOp(const OperatorIdentifier &_opid, std::vector<int64_t> axes_, std::vector<int64_t> sizes_, bool noOverlap_, const Op::Settings &settings_)
std::unique_ptr<Op> clone() const final
std::unique_ptr<Op> getInplaceVariant(const OperatorIdentifier&) const final
const std::vector<GradInOutMapper> &gradInputInfo() const final
const std::map<int, int> &gradOutToNonGradIn() const final
class DynamicZeroInplaceOp : public popart::DynamicBinaryBaseInplaceOp

Public Functions

DynamicZeroInplaceOp(const DynamicZeroOp &dynamicZeroOp)
DynamicZeroInplaceOp(const OperatorIdentifier &_opid, std::vector<int64_t> axes_, std::vector<int64_t> sizes_, bool noOverlap_, const Op::Settings&, TensorInfo updateInInfo_ = TensorInfo())
std::unique_ptr<Op> clone() const final
class DynamicZeroOp : public popart::DynamicBinaryBaseOp

Public Functions

DynamicZeroOp(const OperatorIdentifier &_opid, std::vector<int64_t> axes_, std::vector<int64_t> sizes_, bool noOverlap_, const Op::Settings &settings_, TensorInfo updateInInfo_ = TensorInfo())
std::unique_ptr<Op> clone() const final
std::vector<std::unique_ptr<Op>> getGradOps() final
std::unique_ptr<Op> getInplaceVariant(const OperatorIdentifier&) const final
std::vector<std::tuple<OperatorIdentifier, float>> inplacePriorityDefault() const override
class ElementWiseBinaryArg0GradOp : public popart::ElementWiseBinaryGradOp

Subclassed by popart::Atan2Arg0GradOp, popart::DivArg0GradOp, popart::FmodArg0GradOp, popart::MulArg0GradOp, popart::PowArg0GradOp

Public Functions

inline ElementWiseBinaryArg0GradOp(const OperatorIdentifier &_opid, const std::vector<int64_t> &_reduction_axes, const TensorInfo &_forward_op_arg_info, const Op::Settings &_settings)
std::unique_ptr<Op> clone() const override
class ElementWiseBinaryArg1GradOp : public popart::ElementWiseBinaryGradOp

Subclassed by popart::Atan2Arg1GradOp, popart::DivArg1GradOp, popart::MulArg1GradOp, popart::PowArg1GradOp, popart::SubtractArg1GradOp

Public Functions

inline ElementWiseBinaryArg1GradOp(const OperatorIdentifier &_opid, const std::vector<int64_t> &_reduction_axes, const TensorInfo &_forward_op_arg_info, const Op::Settings &_settings)
std::unique_ptr<Op> clone() const override
class ElementWiseBinaryBaseOp : public popart::Op

Subclassed by popart::ElementWiseBinaryInplaceLhsOp, popart::ElementWiseBinaryInplaceRhsOp, popart::ElementWiseBinaryOp

Public Functions

ElementWiseBinaryBaseOp(const OperatorIdentifier &_opid, const Op::Settings &_settings)
std::unique_ptr<Op> clone() const override
void setup() override
inline float getSubgraphValue() const final
inline bool canShard() const override
void growAliasModel(AliasModel&) const override
ReplicatedTensorShardingIndices getReplicatedTensorShardingIndices() const override
inline view::RegMap fwdRegMap(InIndex argIndex, OutIndex) const final
inline view::RegMap bwdRegMap(InIndex argIndex, OutIndex) const final

Public Static Functions

static inline InIndex getArg0InIndex()
static inline InIndex getArg1InIndex()
static inline OutIndex getOutIndex()
class ElementWiseBinaryGradOp : public popart::Op

Subclassed by popart::ElementWiseBinaryArg0GradOp, popart::ElementWiseBinaryArg1GradOp

Public Functions

ElementWiseBinaryGradOp(const OperatorIdentifier &_opid, const std::vector<int64_t> &_reduction_axes, const TensorInfo &_forward_op_arg_info, const Op::Settings &_settings)
virtual std::unique_ptr<Op> clone() const override = 0
void setup() final
inline const std::vector<int64_t> &getReductionAxes() const
inline float getSubgraphValue() const final
inline const std::map<int, int> &gradOutToNonGradIn() const final
inline virtual const std::vector<GradInOutMapper> &gradInputInfo() const final

Public Static Functions

static inline InIndex getGradInIndex()
static inline InIndex getFwdArg0InIndex()
static inline InIndex getFwdArg1InIndex()
static inline InIndex getFwdOutIndex()
static inline OutIndex getOutIndex()
class ElementWiseBinaryInplaceLhsOp : public popart::ElementWiseBinaryBaseOp

Subclassed by popart::AddLhsInplaceOp, popart::Atan2LhsInplaceOp, popart::MulLhsInplaceOp, popart::PowLhsInplaceOp

Public Functions

inline ElementWiseBinaryInplaceLhsOp(const OperatorIdentifier &_opid, const Op::Settings &_settings)
std::unique_ptr<Op> clone() const override
inline view::Regions modifies(InIndex index) const final
inline view::Regions aliases(InIndex index, OutIndex) const final
class ElementWiseBinaryInplaceRhsOp : public popart::ElementWiseBinaryBaseOp

Subclassed by popart::AddRhsInplaceOp, popart::MulRhsInplaceOp

Public Functions

inline ElementWiseBinaryInplaceRhsOp(const OperatorIdentifier &_opid, const Op::Settings &_settings)
std::unique_ptr<Op> clone() const override
inline view::Regions modifies(InIndex index) const final
inline view::Regions aliases(InIndex index, OutIndex) const final
class ElementWiseBinaryOp : public popart::ElementWiseBinaryBaseOp

Subclassed by popart::ElementWiseNpBroadcastableBinaryWithGradOp< AddArg0GradOp, AddArg1GradOp >, popart::ElementWiseNpBroadcastableBinaryWithGradOp< Atan2Arg0GradOp, Atan2Arg1GradOp >, popart::ElementWiseNpBroadcastableBinaryWithGradOp< DivArg0GradOp, DivArg1GradOp >, popart::ElementWiseNpBroadcastableBinaryWithGradOp< MulArg0GradOp, MulArg1GradOp >, popart::ElementWiseNpBroadcastableBinaryWithGradOp< PowArg0GradOp, PowArg1GradOp >, popart::ElementWiseNpBroadcastableBinaryWithGradOp< SubtractArg0GradOp, SubtractArg1GradOp >, popart::BitwiseBinaryOp, popart::ElementWiseNpBroadcastableBinaryWithGradOp< Arg0GradOp, Arg1GradOp >, popart::FmodOp, popart::PReluOp

Public Functions

ElementWiseBinaryOp(const OperatorIdentifier &_opid, const Op::Settings &_settings)
std::unique_ptr<Op> clone() const override
std::vector<std::tuple<OperatorIdentifier, float>> inplacePriorityDefault() const final
std::unique_ptr<Op> getInplaceVariant(const OperatorIdentifier&) const final
void setInplacePriority(const OperatorIdentifier&, float)
float getInplacePriority(const OperatorIdentifier&) const
poprithms::memory::inplace::Proposal mapInplaceProposal(const AliasModel&, OperatorIdentifier) const override
class ElementWiseInplaceUnaryOp : public popart::ElementWiseUnaryOp

Subclassed by popart::AsinInplaceOp, popart::AtanInplaceOp, popart::ClipInplaceOp, popart::EluInplaceOp, popart::ExpInplaceOp, popart::Expm1InplaceOp, popart::GeluErfInplaceOp, popart::GeluInplaceOp, popart::HardSigmoidInplaceOp, popart::IncrementModInplaceOp, popart::LeakyReluInplaceOp, popart::Log1pInplaceOp, popart::LogSoftmaxInplaceOp, popart::OneWayUnaryInPlaceOp, popart::ReluInplaceOp, popart::ScaleInplaceOp, popart::SeluInplaceOp, popart::ShrinkInplaceOp, popart::SigmoidInplaceOp, popart::SinhInplaceOp, popart::SoftmaxInplaceOp, popart::SoftPlusInplaceOp, popart::SoftSignInplaceOp, popart::SwishInplaceOp, popart::ThresholdedReluInplaceOp

Public Functions

inline ElementWiseInplaceUnaryOp(const OperatorIdentifier &_opid, const Op::Settings &_settings)
std::unique_ptr<Op> clone() const override
inline view::Regions modifies(InIndex index) const final
inline view::Regions aliases(InIndex in, OutIndex) const final
class ElementWiseNonLinearUnaryGradOp : public popart::Op

Subclassed by popart::AsinGradOp, popart::AtanGradOp, popart::CosGradOp, popart::EluGradOp, popart::ErfGradOp, popart::GeluErfGradOp, popart::GeluGradOp, popart::HardSigmoidGradOp, popart::Log1pGradOp, popart::LogGradOp, popart::ReciprocalGradOp, popart::SeluGradOp, popart::ShrinkGradOp, popart::SinGradOp, popart::SinhGradOp, popart::SoftPlusGradOp, popart::SoftSignGradOp, popart::SwishGradOp, popart::ThresholdedReluGradOp

Public Functions

ElementWiseNonLinearUnaryGradOp(const OperatorIdentifier &_opid, const ElementWiseUnaryOp &fwdOp)
std::unique_ptr<Op> clone() const override
void setup() final
const std::vector<GradInOutMapper> &gradInputInfo() const final
const std::map<int, int> &gradOutToNonGradIn() const final
inline float getSubgraphValue() const final
inline bool canShard() const override

Public Static Functions

static inline InIndex getGradInIndex()
static inline InIndex getFwdArgInIndex()
static inline OutIndex getOutIndex()
template<class Arg0GradOp, class Arg1GradOp>
class ElementWiseNpBroadcastableBinaryWithGradOp : public popart::ElementWiseBinaryOp

Subclassed by popart::AddOp, popart::Atan2Op, popart::DivOp, popart::MulOp, popart::PowOp, popart::SubtractOp

Public Functions

inline ElementWiseNpBroadcastableBinaryWithGradOp(const OperatorIdentifier &_opid, const Op::Settings &_settings)