2. PopART Python API
2.1. Sessions
2.1.1. Training session
- class popart.TrainingSession(fnModel, dataFlow, loss, optimizer, deviceInfo, inputShapeInfo=<popart_core.InputShapeInfo object>, patterns=None, userOptions=<popart_core.SessionOptions object>, name='training')
Session for training.
TrainingSession
is a runtime instance that provides an interface for executing ONNX graphs on IPU hardware with training provided by optimizing a loss tensor using an optimizer and automatic differentiation (backpropagation).- Parameters
fnModel (bytes) –
loss –
optimizer (Optimizer) –
deviceInfo (DeviceInfo) –
inputShapeInfo (InputShapeInfo) –
patterns (Patterns) –
userOptions (SessionOptions) –
name (str) –
- Return type
None
- property accumulationFactor
Get the gradient accumulation factor.
- compileAndExport(filename)
Compile the graph and export it to a file.
This method will first create
snap::Graph
and compilepoplar::Executable
. Next, it will export the executable and metadata to the file. The exported file will be in the PopEF format. This means that the file can be used to run inference using the Triton Inference Server with the Graphcore Triton backend. See the Poplar Triton Backend User Guide for more information.This method raises an
popart.OutOfMemoryException
error if an out of memory event occurs. In addition, it raises anOSError
if there are any file system related errors.- Parameters
filename (str) – The name of the file where the compiled executable and metadata will be saved. If it does not exist, the file will be created.
- Raises
popart.OutOfMemoryException – If an out of memory event occurs.
OSError – If there are any file system related errors during the export.
- Return type
None
- property dataFlow
Get the configuration for the data feeds and fetches.
- initAnchorArrays()
Create the anchor arrays to feed data back into Python.
- Returns
- Dictionary of anchor tensor names and their
relevant NumPy arrays.
- Return type
Dict[str, np.array]
- prepareDevice(loadEngine=True)
Prepare the network for execution.
This will create
snap::Graph
andpoplar::Engine
, and set uppoplar::Streams
.- Parameters
loadEngine (bool) – If
true
, load the engine and connect the streams once the device is ready.- Raises
popart.OutOfMemoryException – If an out of memory event occurs.
- Return type
None
- property replicationFactor
Get the replication factor.
2.1.2. Inference session
- class popart.InferenceSession(fnModel, dataFlow, deviceInfo, inputShapeInfo=<popart_core.InputShapeInfo object>, patterns=None, userOptions=<popart_core.SessionOptions object>, name='inference')
Session for running inference.
InferenceSession
is a runtime instance that provides an interface for executing ONNX graphs on IPU hardware, without any automatic differentiation (backpropagation).- Parameters
fnModel (bytes) –
deviceInfo (DeviceInfo) –
inputShapeInfo (InputShapeInfo) –
patterns (Patterns) –
userOptions (SessionOptions) –
name (str) –
- Return type
None
- property accumulationFactor
Get the gradient accumulation factor.
- compileAndExport(filename)
Compile the graph and export it to a file.
This method will first create
snap::Graph
and compilepoplar::Executable
. Next, it will export the executable and metadata to the file. The exported file will be in the PopEF format. This means that the file can be used to run inference using the Triton Inference Server with the Graphcore Triton backend. See the Poplar Triton Backend User Guide for more information.This method raises an:py:class:
popart.OutOfMemoryException
error if an out of memory event occurs. In addition, it raises anOSError
if there are any file system related errors.- Parameters
filename (str) – The name of the file where the compiled executable and metadata will be saved. If it does not exist, the file will be created.
- Raises
popart.OutOfMemoryException – If an out of memory event occurs.
OSError – If there are any file system related errors during the export.
- Return type
None
- property dataFlow
Get the configuration for the data feeds and fetches.
- classmethod fromIr(ir, deviceInfo, name='fromIr')
Create a session for inference from an IR.
- Parameters
ir (Ir) – The IR to create the session from.
deviceInfo (DeviceInfo) –
DeviceInfo
object specifying the device type (IPU
,IPUModel
orCPU
) and number of each type.name (str) – The name of this inference session. Default: “fromIr”.
- Returns
An inference session.
- Return type
- initAnchorArrays()
Create the anchor arrays to feed data back into Python.
- Returns
- Dictionary of anchor tensor names and their
relevant NumPy arrays.
- Return type
Dict[str, np.array]
- prepareDevice(loadEngine=True)
Prepare the network for execution.
This will create
snap::Graph
andpoplar::Engine
, and set uppoplar::Streams
.- Parameters
loadEngine (bool) – If
true
, load the engine and connect the streams once the device is ready.- Raises
popart.OutOfMemoryException – If an out of memory event occurs.
- Return type
None
- property replicationFactor
Get the replication factor.
2.1.3. Session Options
- class popart.SessionOptions
- property accumulateOuterFragmentSettings
Configuration setting for operations in the accumulate outer fragment.
- property accumulationAndReplicationReductionType
Specify how gradients are reduced when using gradient accumulation and graph replication.
- property accumulationFactor
Specify the number of micro-batches to accumulate before applying the varUpdate.
- property accumulatorTensorLocationSettings
Tensor location for gradient accumulator tensors.
- property activationTensorLocationSettings
Tensor location settings for activation/gradient tensors.
- property aliasZeroCopy
Enable zero-copy for subgraphs.
- property autoRecomputation
Enable recomputation of operations in the graph in the backwards pass to reduce model size at the cost of computation cycles.
- property batchSerializationSettings
Configuration setting for batch serialization.
- property cachePath
Folder to save the
poplar::Executable
to.
- property compileEngine
If false, the backend will build the Poplar graph but not compile it into an Engine. In this case, no execution can be performed, and nothing can be transferred to the device. API calls which retrieve information from the graph building stage, such as tile mapping introspection, can still be used.
- property constantWeights
An optimization for an inference session to have constant weights, true by default. Set this option to false if you are going to want to change the weights with a call to Session::resetHostWeights after the session has been prepared. This option has no effect on a training session
- property createImplicitPipeliningFwdOnlyProgram
deprecated Create a custom program containing the forward pipeline only
- property customCodeletCompileFlags
Compile flags for the custom codelets. For example
-g
to generate debug info.
- property customCodelets
List of codelets (with filetype) to be added to the Poplar graph. See the Poplar documentation for more information.
- property decomposeGradSum
Replaces single sums of partial gradients with a tree of additions. This can reduce max liveness at the cost of extra cycles. A typical use case for this would be if a large weight tensor is used as an input to many operations.
- property delayVarUpdates
Options to delay variable updates as much as possible.
- property disableGradAccumulationTensorStreams
If true, the weight gradient tensors are not saved off the device when
devicex.weightsFromHost()
is called. Note: this option is overridden ifsyntheticDataMode
is notSyntheticDataMode::Off
. Note that weight gradient tensors that are also optimiser tensors will only be disabled if both disableGradAccumulationTensorStreams and disableOptimizerStateTensorStreams are true.
- property disableOptimizerStateTensorStreams
If true, streaming of optimizer tensors is disabled. This setting can be used to conserve memory if you are not interested in checkpointing optimizer state. Note that weight gradient tensors that are also optimiser tensors will only be disabled if both disableGradAccumulationTensorStreams and disableOptimizerStateTensorStreams are true.
- property dotChecks
When to write
.dot
files during Ir construction.
- property dotOpNames
Include the Op name in the
.dot
file (the Op type is always exported).
- property enableDistributedReplicatedGraphs
Enable training with Poplar replicated graphs across multiple PopART instances.
- property enableEngineCaching
Enable Poplar executable caching. You can set the file save location with the
cachePath
. The file will be in the popef format. This means that it can be used to run inference using the Triton Inference Server because Graphcore provides a backend to it. See the Poplar Triton Backend for more information.
- enableExplicitIR(self: popart_core.SessionOptions, arg0: bool) None
- property enableExplicitMainLoops
Enables explicit main loop transformation, and disables implicit training loops. This will become deprecated and enabled by default.
- property enableFloatingPointChecks
Throw an exception when floating point errors occur.
- property enableFullyConnectedPass
Enable the global
fullyConnectedPass
option for matmuls.
- property enableGradientAccumulation
Enable gradient accumulation.
- property enableLoadAndOffloadRNGState
Allows to load/offload device RNG state from host.
- property enableMergeExchange
Enables merging remote and host IO operations to facilitate IO overlap
- property enableNonStableSoftmax
By default, we use the stable softmax Poplar function. The input tensor to softmax, x, is preprocessed by subtracting max(x) from each element before computing the exponentials, ensuring numerical stability. If you are sure the inputs to your softmax operations are small enough to not cause overflow when computing the exponential, you can enable the non-stable version instead, to increase the speed.
- property enableOutlining
Identify and extract repeated parts of computational graph into subgraphs.
- property enableOutliningCopyCostPruning
When
true
the cost of copying of cached sections should be included in the outlining cost model.
- property enablePipelining
Enable pipelining of virtual graphs
- property enableReplicatedGraphs
Enable replication of graphs.
- property enableStableNorm
If true, computes the mean first and subtracts the activations from it before computing the variance. The implementation with this flag set to true is slower than when set to false. The stable version requires the first order moment to be estimated and applied to the sample set before the second order central moment is calculated.
- property enableStochasticRounding
Enable stochastic rounding. PopART will set the Poplar engine option “target.deterministicWorkers” to “true” if this option is set and to “false” if it is not set. You can override this behaviour by adding a value for “target.deterministicWorkers” to SessionOptions::engineOptions.
- property enableSupportedDataTypeCasting
If enabled, casts any tensor of unsupported data types to supported data types when lowering to Poplar Currently, this implies casting: INT64 -> INT32 UINT64 -> UINT32 The cast will error for incompatible data types and over/underflows, and inform on narrowing casts
- property ensureFp32LossScaleTensor
Only compatible with models that have an fp16 loss scale tensor. When
true
the loss scale tensor will be an fp32 tensor, and will be combined with fp16 activations as late as possible to produce the first fp16 activation gradients. This allows the user to choose a loss scale value greater than max(fp16). This is also recommended when automatic loss scaling is enabled.
- property executionPhaseSettings
Configuration settings for execution phases.
- property explicitRecomputation
Enable explicit recomputation.
- property exportPoplarComputationGraph
Export Poplar computation graph.
- property exportPoplarVertexGraph
Export Poplar vertex graph.
- property finalDotOp
See
firstDotOp
.
- property firstDotOp
The ops to write to the
.dot
file will be a continuous interval of the schedule, controlled by firstDotOp and finalDotOp. In particular, it will be [min(0, firstDotOp), max(N ops in Ir, finalDotOp)).
- getGlobalReplicationFactor(self: popart_core.SessionOptions) int
Helper method to handle the different replication options. If enableDistributedReplicatedGraphs is true
return globalReplicationFactor
- if enableReplicatedGraphs
return replicatedGraphCount
- otherwise
return 1
- property globalReplicaOffset
The first replica index that this PopART instance is running.
- property globalReplicationFactor
The total number of replicas in a multi instance replicated graph training session (this should be left as the default value (1) if distributed replicated graphs are disabled). This value includes local replication.
- property groupHostSync
Allows to group the streams from host at the beginning and the streams to host at the end, this trades off sum-liveness efficiency for cycle efficiency.
- property instrumentWithHardwareCycleCounter
Add instrumentation to your program to count the number of device cycles (of a single tile, on a single IPU) that your main program takes to execute. Expect this to have a small detrimental impact on performance.
- property kahnTieBreaker
The initial scheduling is done with Kahn’s algorithm. When several Ops are free to be scheduled, this controls which method is used.
- property logDir
A directory for log traces to be written into.
- property meanAccumulationAndReplicationReductionStrategy
Specify when to divide by a mean reduction factor when accumulationAndReplicationReductionType is set to ReductionType::Mean.
- property mergeVarUpdate
Enable merging of VarUpdates into groups of VarUpdates, by flattening and concatenating variable tensors and updating tensors.
- property mergeVarUpdateMemThreshold
The
MergeVarUpdateType::AutoLoose
andMergeVarUpdateType::AutoTight
VarUpdateOp merging algorithms have a threshold on the total memory of variable tensors to merge for updating. Defined as total memory in bytes.
- property optimizerStateTensorLocationSettings
Tensor location for optimizer state tensors.
- property opxAliasChecking
Run Opx checks to verify IR tensor aliasing information corresponds to lowered Poplar tensor aliasing.
- property opxModifyChecking
Run Opx checks to verify IR tensor modification information corresponds to lowered Poplar tensor modifications.
- property outlineSequenceBreakCost
The penalty applied to outlining potential sub-graphs if the sub-graph to be created breaks up a sequence of operations that are more efficient (for example for overlapping compute and exchange) when outlined together. Default value is set to ~10 * Op::getHighSubgraphValue().
- property outlineThreshold
The incremental value that a sub-graph requires, relative to its nested sub-graphs (if any), to be eligible for outlining. A high threshold results in fewer sub-graphs being outlined, a negative value results in all being outlined. The gross value of a sub-graph is the sum of its constituent Ops’ Op::getSubgraphValue() values. To disable outlining, it is better to set enableOutlining to false than to set this value to infinity. The default value of 1.0f results in all high value operations such as convolution being cached, but standalone low Value operations such as Relu will not be.
- property partialsTypeMatMuls
Set the partials type globally for matmuls. Can be overridden individually with Builder.setPartialsType(). Valid values are
"float"
and"half"
. By default, this is not set, so no global partials type is imposed.
- property rearrangeAnchorsOnHost
Before anchor tensors are streamed from device to host, they are not necessarily arranged in memory as required when they are to be copied from host stream to host. This can be done on the device or on the host. Done on host by default to save memory, but often at the expense of cycles, especially for larger anchor tensors.
- property rearrangeStreamsOnHost
Before stream tensors are streamed from host to device, they are not necessarily arranged in memory as required when they are to be copied from host stream to device. This can be done on the device or on the host. Done on device by default.
- property replicatedGraphCount
If enableReplicatedGraphs is true,
replicatedGraphCount
will set the number of model replications. For example, if your model uses 1 IPU, areplicatedGraphCount
of 2 will use 2 IPUs. If your model is pipelined across 4 IPUs, areplicatedGraphCount
of 4 will use 16 IPUs total. Therefore, the number of IPUs you request must be a multiple ofreplicatedGraphCount.
If the training is done across multiple instances then thereplicatedGraphCount
is the number of replicas for this instance.
- property scheduleNonWeightUpdateGradientConsumersEarly
When
shouldDelayVarUpdates
is true, the other ops in the proximity of the delayed var updates may inherit the -inf schedule priority used to delay the var updates. This is undesirable for some ops that consume gradients, as we would like to consume (and thus be able to recycle the memory of) those gradients as soon as possible. Two examples are HistogramOps when doing automatic loss scaling, and the AccumulateOps that accumulate the gradients when doing gradient accumulation.If true, if
shouldDelayVarUpdates
is true, this option will cause the schedule priority of the above described ops to be re-overriden to +inf.
- property separateCallOpPdfs
When generating PDFs of IR graphs, create separate PDFs for each subgraph.
- property serializedPoprithmsAnnealGraphsDir
PopART uses Poprithms for scheduling PopART graphs. The Poprithms graphs created for scheduling can be optionally serialised (written to file). The string below specified the directory to serialize Poprithms graphs to. If it is empty, then the graphs will not be serialised. The names of serialization files will be
poprithms_shift_graph_i.json
for the lowest non-existing values ofi
. The directory must already exist, PopART will not create it.
- property serializedPoprithmsShiftGraphsDir
PopART uses Poprithms for scheduling PopART graphs. The Poprithms graphs created for scheduling can be optionally serialised (written to file). The string below specified the directory to serialize Poprithms graphs to. If it is empty, then the graphs will not be serialised. The names of serialization files will be
poprithms_shift_graph_i.json
for the lowest non-existing values ofi
. The directory must already exist, PopART will not create it.
- property strictOpVersions
Strict op version checks will throw an error if the exact version of an op required for the models opset is not supported. Turning this check off will cause PopART to fall back to the latest implementation of the op that is supported. Warning, turning off these checks may cause undefined behaviour.
- property subgraphCopyingStrategy
This setting determines how copies for inputs and outputs for subgraphs are lowered. By setting this value to JustInTime you may save memory at the cost of fragmenting subgraphs into multiple Poplar functions. This may be particularly useful when a number of weight updates are outlined in one subgraph, as it may prevent multiple weight tensors from being live at the same time inside the subgraph.
- property swapLimitScheduler
The maximum number of improving steps allowed by the scheduling algorithm before a solution must be returned.
- property syntheticDataMode
This options specifies whether to use real or synthetic data to initialize input tensors. Anything but the
SyntheticDataMode::Off
value disables streaming to/from host.
- property timeLimitScheduler
The maximum allowed time that can be spent searching for a good graph schedule before a solution must be returned.
- property weightTensorLocationSettings
Tensor location for weight tensors.
- class popart.AccumulateOuterFragmentSchedule
Enum type that determines how the operations in the accumulate outer fragment will be scheduled accross virtual graphs (only relevant to pipelined modes).
Members:
Scheduler : Don’t add additional constraints and let the scheduler work it out.
Serial : Add constraints that ensure ops are executed in virtual graph ID order.
OverlapCycleOptimized : Try and parallelise ops with different virtual graph IDs as much as possible.
OverlapMemoryOptimized : Try and parallelise ops with different virtual graph IDs but avoid certain steps that are costly in terms of memory usage.
- property name
- class popart.AccumulateOuterFragmentSettings
- property excludedVirtualGraphs
A setting to explicitly tell PopART to avoid to try and parallelise the given virtual graph ids. This setting is experimental and may change.
- property schedule
Tell PopART how you would like to schedule the accumulate outer fragment. This setting is experimental and may change.
- class popart.AutodiffSettings
- class popart.AutodiffStitchStrategy
Members:
RecomputeMinimal
RecomputeAllNonInputs
AddFwdOutputs
SafeAddFwdOutputs
- property name
- class popart.AutomaticLossScalingSettings
- class popart.BatchSerializationBatchSchedule
Enum type that describes how to change the batch serialisation subgraph schedule before outlining. NOTE: This setting is experimental and may change.
Members:
Scheduler : Don’t encourage any particular scheduling for ops within batch subgraphs (leave it to the scheduler) but tell the scheduler to schedule subgraphs in sequence.
Isomorphic : Encourage all ops within batch subgraphs to be scheduled identically and for each subgraph to be scheduled in sequence (good for outlineability).
OverlapOnIo : Attempt to put the RemoteLoad for batch N+1 right after the compute phase of batch N.
OverlapOnCompute : Attempt to put the RemoteLoad for batch N+1 right before the compute phase of batch N.
- property name
- class popart.BatchSerializationMethod
Enum type that describes how to apply the batch serialization. NOTE: This setting is experimental and may change.
Members:
UnrollDynamic : Unroll the batch with dynamic slicing
UnrollStatic : Unroll the batch with static slicing
Loop : Loop over the batch dimension
- property name
- class popart.BatchSerializationSettings
A structure containing batch serialization settings.
- property batchSchedule
Experimental value that changes how operations are scheduled.
- property concatOnExecutionPhaseChange
Break batch serialization chains when the execution phase changes (by concatenating the compute batches to the local batch).
- property concatOnPipelineStageChange
Break batch serialization chains when the pipeline stage changes (by concatenating the compute batches to the local batch).
- property concatOnVirtualGraphChange
Break batch serialization chains when the virtual graph changes (by concatenating the compute batches to the local batch).
- property factor
The number of compute batches to split operations into.
- property method
Experimental value to control how batch serialization is applied.
- property transformContext
Experimental value to control when batch serialization is applied.
- class popart.BatchSerializationTransformContext
Enum type that describes when to apply the batch serialization. NOTE: This setting is experimental and may change.
Members:
Forward : Apply before growing the backward pass
Backward : Apply after growing the backward pass
Fwd : Apply before growing the backward pass
Bwd : Apply after growing the backward pass
- property name
- class popart.CommGroup
Class to specify sub-groups of replicas.
Examples of derived sub-groups: - IPU-link domain sub-rack:
where N is power of two and replicaGroupSize > 1.
Complete IPU-link domain / full rack:
Using GW-links only:
- property replicaGroupSize
Replica group size.
- property type
Replica group type.
- class popart.CommGroupType
PopART equivalent of GCL CommGroupType. Each of these enumeration constants have a corresponding GCL CommGroupType value.
Members:
All : All replicas viewed as one group, replica group size is ignored. */
Consecutive : Groups are consecutive in replica.
If there are N replicas denoted {0, … N-1} and group size is k, then there are N/k groups of size k:
{0, 1, … k-1}, {k, … 2k-1} … {N-k-1, … N-1}
Orthogonal : Groups are sliced orthogonal to the replica ordering.
If there are N replicas denoted {0, … N-1} and group size is k, then there are m = N/k groups of size k:
{0, m, 2m, …}, {1, m+1, 2m+1, …} … {m-1, 2m-1, … N-1}
Ungrouped : Each replica is in it’s own group, replica group size is ignored. */
- property name
- class popart.ExecutionPhaseIOSchedule
Enum type to specify when to load tensors.
Members:
Preload : Preload tensors in previous phase for use in current phase.
OnDemand : Load tensors just before they are required.
- property name
- class popart.ExecutionPhaseSchedule
Enum type to specify the order of processing optimizer operations for different weights of the same execution phase.
The steps for phased execution consists of:
Copy to IO tiles if necessary (1)
Run collective operations if necessary (2)
Load optimizer state (3)
Update optimizer state (4)
Apply optimizer (5)
Store updated tensor if necessary (6)
Members:
Interleaving : Process above steps for one weight at a time (for example: 123456, 123456, 123456). The scheduler may interleave these steps.
Batch : Process above steps for all weights together, in a way that maximises overlap potential between compute and exchange (for example: 333, 111, 222, 444, 555, 666).
BatchClusteredIO : Process above steps for all weights together, in a way that maximises overlap potential between compute and exchange, and maximise stream copy merges by keeping RemoteLoad/RemoteStore operations clustered (for example: 333, 111, 222, 444, 555, 666).
- property name
- class popart.ExecutionPhaseSettings
- property activationIOSchedule
The execution phase IO schedule for activation and gradient tensors.
- property phases
Number of ExecutionPhases for the whole model
- property stages
Number of overlapping stages 1: Parallel streaming memory, default for 1 IPU / replica 2: PingPong between 2 IPUs, default for >= 2 IPUs / replica
- property weightIOSchedule
The execution phase IO schedule for weight tensors.
- class popart.GradientTensorTrackingMethod
Members:
ConvAndMatmulGradients
AllNonViewChangingGradientTensors
GradientsOfUserSpecifiedTensors
- property name
- class popart.Instrumentation
Members:
Outer : Outer loop instrumentation, graph over all IPUs.
Inner : Inner loop instrumentation, graph per IPU.
- property name
- class popart.IrSerializationFormat
Members:
JSON : JavaScript Object Notation (JSON).
- property name
- class popart.MeanReductionStrategy
Enum type that specifies when to divide by a mean reduction factor, when doing mean reduction over a sequence of tensors \(t_1, t_2, ..., t_k\).
Members:
Running : Keep the reduction buffer as the mean of the tensors accumulated so far. If we have just processed \(t_1, ..., t_f\), the current accumulator \(s\) is the mean of these values, and the next accumulator update is \(s = (f/(f+1)) * s + (1/(f+1)) * t_{f+1}\) to keep \(s\) a running mean. This strategy guarantees \(s \le \max(a_1, ..., a_k)\) throughout the accumulation, therefore it will not overflow, but it is generally slower than Post.
Post : Keep the accumulation factor as the running sum, and divide by \(k\) once at the end of the accumulation. This strategy will generally be faster than Running, but is prone to overflow (especially when using
fp16
).- property name
- class popart.MergeVarUpdateType
Enum type used to specify which
VarUpdateOp
ops to merge.Members:
Off : Do not merge VarUpdateOp ops.
All : Merge all VarUpdateOp ops into as few groups as possible. This is a good choice when memory is not a constraint.
AutoTight : Merge into groups, so that VarUpdateOp ops process tensors of exactly
mergeVarUpdateMemThreshold
in size.AutoLoose : Merge into groups while attempting not to increase maximum variable liveness, and also not slice tensor variables so they will need to be processed by different VarUpdateOp ops.
- property name
- class popart.RecomputationType
Enum type to specify which ops to recompute in the backwards pass when doing auto-recomputation.
Members:
NoRecompute : No ops are recomputed.
Standard : Algorithm to pick checkpoints to try and minimise max liveness.
NormOnly : Only Norm ops (+ non-linearities, if following) are recomputed.
RecomputeAll : Recompute all ops.
Pipeline : Recompute all forward pipeline stages.
- property name
- class popart.ReductionType
Members:
Mean : Take the mean of the input values.
NoReduction : Do not reduce the input values. Keep them stacked into a single tensor. So values \(t_1, ..., t_k\) get collected into a tensor \([t_1, ..., t_k]\).
Sum : Sum the input values and do not scale the output (Default).
- property name
- class popart.ReplicatedTensorSharding
Enum type to specify whether to shard tensors over replicas.
Members:
Off : Don’t shard tensors over replicas.
On : Do shard tensors over replicas.
- property name
- class popart.SubgraphCopyingStrategy
Members:
OnEnterAndExit : Copy all inputs before the start of the subgraph, copy all outputs after all ops in the subgraph. With this strategy subgraphs will always map to a single Poplar function.
JustInTime : Copy inputs just before they are consumed and copy outputs as soon as they are produced. With this strategy subgraphs may be lowered into multiple Poplar functions.
- property name
- class popart.SyntheticDataMode
Members:
Off : Use real data.
Zeros : Input tensors are initialised to all zeros.
RandomNormal : Input tensors are initialised with distribution ~N(0,1).
- property name
- class popart.TensorLocationSettings
- property location
The default tensor location for this tensor type.
- property minElementsForOffChip
A minimum number of elements below which offloading won’t be considered.
- property minElementsForReplicatedTensorSharding
A minimum number of elements below which replicated tensor sharding (RTS) won’t be considered.
- class popart.TileSet
Enum type to specify a set of tiles.
Members:
Compute : The set of tiles designated for compute operations.
IO : The set of tiles designated for IO operations.
- property name
- class popart.VariableRetrievalMode
Members:
OnePerGroup : Returns one variable per group (defined by the
VariableSettings::sharedVariableDomain
CommGroup)
, automatically returns the first replica of each group, where first means the one with the lowest replica ID.AllReduceReplicas : As OnePerGroup, but performs an AllReduce among the
replicas in the same group according to
VariableSettings::sharedVariableDomain
!!! CURRENTLY UNSUPPORTEDAllReplicas : Returns all replica Weights
- property name
- class popart.VariableSettings
- getGroupRepresentative(self: popart_core.VariableSettings, group: int) int
- getRealGroupSize(self: popart_core.VariableSettings, arg0: int) int
- getRetrievalMode(self: popart_core.VariableSettings) popart_core.VariableRetrievalMode
- groupCount(self: popart_core.VariableSettings, arg0: int) int
- groups(self: popart_core.VariableSettings, arg0: int) List[List[int]]
- numReplicasReturningVariable(self: popart_core.VariableSettings, arg0: int) int
- shapeOnHost(self: popart_core.VariableSettings, arg0: List[int], arg1: int) List[int]
- shapeOnReplica(self: popart_core.VariableSettings, arg0: List[int], arg1: int, arg2: str) List[int]
- verify(self: popart_core.VariableSettings) None
2.2. Data input and output
Note
The base class for data input and output in PopART is
popart::IStepIO
. The way in which this class is used is
detailed in the PopART C++ API Reference in the
Data input and output (IStepIO) section.
- class popart.PyStepIO
This class is an implementation of the IStepIO interface backed by user-provided dictionaries for both input and output. These dictionaries map TensorId values to numpy arrays for PopART to read from and write to, respectively.
- __init__(self: popart_core.PyStepIO, inputs: Dict[str, numpy.ndarray], outputs: Dict[str, numpy.ndarray]) None
Construct a new PyStepIO instance.
- Parameters
inputs – A dictionary with an entry for every input tensor, comprising a TensorId for the
key
and a numpy array for avalue
for PopART to read from. The numpy arrays are assumed to be size-compatible with a tensor of shape [replicationFactor
,accumulationFactor
,batchesPerStep
,<tensor shape>
].outputs – A dictionary with an entry for every output tensor, comprising a TensorId for the
key
and a numpy arrayvalue
to which PopART will write the associated data. The expected shape of this numpy array is explained in the C++ API documentation for popart::AnchorReturnTypeId. The convenience method Session.initAnchorArrays() is typically used to create a dictionary with suitable arrays.
- enableRuntimeAsserts(self: popart_core.PyStepIO, arg0: bool) None
Enable (or disable) run-time checks that check the sizes of the provided numpy arrays.
- Parameters
arg0 – Flag to enable/disable checks
- class popart.PyStepIOCallback
This class is an implementation of the IStepIO interface backed by user-provided callback functions. This class inherits from IStepIO and implements those member functions by delegating the logic to the callback functions passed in the constructor. This gives the user full control as to how data buffers are provisioned.”
- __init__(self: popart_core.PyStepIOCallback, input_callback: Callable[[str, bool], numpy.ndarray], input_complete_callback: Callable[[str], None], output_callback: Callable[[str], numpy.ndarray], output_complete_callback: Callable[[str], None]) None
Construct a new PyStepIOCallback instance.
- Parameters
input_callback – Callable object that the PyStepIOCallback instance will use when
IStepIO::in()
is called. See IStepIO for details on how to implement this method.input_complete_callback –
Callable object that the PyStepIOCallback instance will use when
IStepIO::inComplete()
is called. See IStepIO for details on how to implement this method.output_callback –
Callable object that the PyStepIOCallback instance will use when
IStepIO::out()
is called. See IStepIO for details on how to implement this method.output_complete_callback –
Callable object that the PyStepIOCallback instance will use when
IStepIO::outComplete()
is called. See IStepIO for details on how to implement this method.
- class popart.InputShapeInfo
- __init__(self: popart_core.InputShapeInfo) None
- add(self: popart_core.InputShapeInfo, arg0: str, arg1: popart_internal_ir.TensorInfo) None
- get(self: popart_core.InputShapeInfo, arg0: str) popart_internal_ir.TensorInfo
- has(self: popart_core.InputShapeInfo, arg0: str) bool
- class popart.DataFlow
- __init__(*args, **kwargs)
Overloaded function.
__init__(self: popart_core.DataFlow, batchesPerStep: int, anchorTensors: Dict[str, popart_core.AnchorReturnType]) -> None
__init__(self: popart_core.DataFlow, batchesPerStep: int, anchorTensors: Dict[str, popart_core.AnchorReturnType]) -> None
__init__(self: popart_core.DataFlow, batchesPerStep: int, anchorIds: List[str], anchorReturnType: popart_core.AnchorReturnType = <popart_core.AnchorReturnType object at 0x7f12f1a69928>) -> None
- anchors(self: popart_core.DataFlow) List[str]
- art(self: popart_core.DataFlow, arg0: str) popart_core.AnchorReturnType
- batchesPerStep(self: popart_core.DataFlow) int
- isAnchored(self: popart_core.DataFlow, arg0: str) bool
- nAnchors(self: popart_core.DataFlow) int
- setBatchesPerStep(self: popart_core.DataFlow, arg0: int) None
2.3. Tensors
- class popart.DataType
Members:
UINT8
INT8
UINT16
INT16
INT32
INT64
UINT32
UINT64
BOOL
FLOAT
FLOAT16
BFLOAT16
DOUBLE
COMPLEX64
COMPLEX128
STRING
UNDEFINED
- property name
- class popart.ReplicatedTensorSharding
Enum type to specify whether to shard tensors over replicas.
Members:
Off : Don’t shard tensors over replicas.
On : Do shard tensors over replicas.
- property name
- class popart.TensorInfo(*args)
Python wrapper to
TensorInfo
to handle numpy types in constructor.- For example:
TensorInfo(dtype, shape)
TensorInfo(numpy.ndarray)
- class popart.TensorLocation
- property loadTileSet
The tiles through which the tensor(s) are loaded onto the chip.
- property replicatedTensorSharding
Whether to apply replicated tensor sharding (RTS) or not.
- property shardingDomain
The GCL comm groups across which to shard the tensor
- property storage
The memory location of the tensor(s).
- property storageTileSet
The tiles on which the tensor(s) are stored.
- class popart.TensorStorage
Enum type that determines where a tensor is stored.
Members:
OnChip : Store the tensor in on-chip memory.
OffChip : Store the tensor in streaming memory.
- property name
- class popart.TileSet
Enum type to specify a set of tiles.
Members:
Compute : The set of tiles designated for compute operations.
IO : The set of tiles designated for IO operations.
- property name
- class popart.tensorinfo.TensorInfo(*args)
Python wrapper to
TensorInfo
to handle numpy types in constructor.- For example:
TensorInfo(dtype, shape)
TensorInfo(numpy.ndarray)
2.4. Optimizers
- class popart.Optimizer
- getLossScalingVal(self: popart_core.Optimizer) float
- class popart.WeightDecayMode
Members:
Decay : Weight decay (e.g. AdamW)
L2Regularization : L2 regularization (e.g. PyTorch-like Adam)
- property name
- class popart.OptimizerValue
- isConst(self: popart_internal_ir.OptimizerValue) bool
- val(self: popart_internal_ir.OptimizerValue) float
- class popart.OptimizerValueMap
- getDefault(self: popart_core.OptimizerValueMap) popart_internal_ir.OptimizerValue
2.4.1. SGD
- class popart.ClipNormSettings
- static clipAllWeights(arg0: float) popart_core.ClipNormSettings
- static clipWeights(arg0: List[str], arg1: float) popart_core.ClipNormSettings
- class popart.SGD
Stochastic Gradient Descent (SGD) optimizer.
Akin to any optimizer implementation, this class is responsible for updating each weight tensor (\(w\)) in the model using the gradient (\(g\)) of the loss function with respect to the weight as calculated during the backwards pass.
The SGD optimizer has the following state for each weight:
velocity (\(v\))
The SGD optimizer has the following hyper parameters:
learning rate (\(\text{lr}\))
momentum (\(\text{mm}\))
weight decay (\(\text{wd}\))
dampening (\(\text{dm}\))
velocity scaling (\(\text{vs}\))
loss scaling (\(\text{ls}\))
clip norm settings
The values of these parameters can be shared between all weights but some can be overridden with weight-specific values (see SGD::insertSpecific). Hyper parameters are captured using OptimizerValue objects and therefore can be either a constant value or a non-constant value that can be adjusted by the user.
In the following we will describe how this optimizer updates a weight using a gradient. In the context of this description the gradient is is the value of the gradient after any gradient accumulation has been performed and after the application of a loss scaling factor to the gradient has been corrected for.
When the optimizer needs to update a weight, \(w\), using a gradient, \(g\), it first updates the optimizer state as follows:
\[v' := v * \text{mm} + (1 - \text{dm}) * (g + \text{wd} * w) \text{ \ . }\]Following the update of the optimizer state the optimizer uses said state to update the weight:
\[w' := w - \text{lr} * v' \text{ \ . }\]In addition to the above, the velocity scaling hyper parameter is a scaling factor that can provide improved numerical stability by ensuring the values stored in the optimizer state, \(v\), are scaled by this value. When using this parameter PopART will automatically deal with the artificially scaled velocity value during the weight update and other hyper parameters do not need to be adjusted).
In addition, the loss scaling hyper parameter is similar in nature to the velocity scaling parameter. It is a scaling value that is applied to the loss gradient at the start of the the backwards pass and, at the end of the backwards pass, this scaling is reversed by multiplying the gradients for each weight with the inverse of the loss scaling value prior to updating the optimizer state. Using loss scaling can also improve numerical stability in some cases.
Finally, it is possible to add clip norm settings for this optimizer. These clip norms compute the L2 norm for a group of weights and adds a scalar term to the weight update that effectively divides it by the norm (or a constant value that is provided as part of the clip norm, which ever is greater).
See the SGD notes in optimizer.hpp for a more detailed and comprehensive derivation of the SGD optimizer step in PopART.
- dampenings(self: popart_core.SGD) popart_core.OptimizerValueMap
- insertSpecific(self: popart_core.SGD, arg0: str, arg1: dict) None
- learningRates(self: popart_core.SGD) popart_core.OptimizerValueMap
- momentums(self: popart_core.SGD) popart_core.OptimizerValueMap
- velocityScalings(self: popart_core.SGD) popart_core.OptimizerValueMap
- weightDecays(self: popart_core.SGD) popart_core.OptimizerValueMap
- class popart.ConstSGD
Stochastic Gradient Descent (SGD) optimizer with constant learning rate, weight decay, loss scaling and clip norm settings (and default values for momentum, dampening or velocity scaling).
NOTE: See SGD for detailed meaning for these parameters.
NOTE: This class exists for backwards compatibility with the Python API and may be removed at some point in the future.
2.4.2. ConstSGD
- class popart.ConstSGD
Stochastic Gradient Descent (SGD) optimizer with constant learning rate, weight decay, loss scaling and clip norm settings (and default values for momentum, dampening or velocity scaling).
NOTE: See SGD for detailed meaning for these parameters.
NOTE: This class exists for backwards compatibility with the Python API and may be removed at some point in the future.
2.4.3. Adam
- class popart.AdamMode
Members:
Adam : Adam or AdamW mode, depending on weight decay setting (see [Kingma & Ba, 2015](https://arxiv.org/abs/1412.6980) and [Loshchilov & Hutter, 2018](https://arxiv.org/pdf/1711.05101.pdf)).
AdamNoBias : Like Adam but without bias correction.
Lamb : Lamb mode (see [You et al., 2020](https://arxiv.org/abs/1904.00962)).
LambNoBias : Like Lamb but without bias correction.
AdaMax : Adamax mode.
- property name
- class popart.Adam
AdamW, Lamb and AdaMax optimizer implementation.
Akin to any optimizer implementation, this class is responsible for updating each weight tensor (\(w\)) in the model using the gradient (\(g\)) of the loss function with respect to the weight as calculated during the backwards pass.
The optimizer has the following state for each weight:
first-order momentum (\(m\))
second-order momentum (\(v\))
time step (\(t\))
The optimizer has the following hyper parameters:
learning rate (\(\text{lr}\))
weight decay (\(\text{wd}\))
beta1 (\(\beta_1\))
beta2 (\(\beta_2\))
epsilon (\(\epsilon\))
loss scaling (\(\text{ls}\))
maximum weight norm (\(\text{mwn}\))
The values of these parameters can be shared between all weights but some can be overridden with weight-specific values (see Adam::insertSpecific). Hyper parameters are captured using OptimizerValue objects and therefore can be either a constant value or a non-constant value that can be adjusted by the user.
The values of
AdamMode
andWeightDecayMode
passed to the constructor determines how weights are updated (see below).In the following we will describe how this optimizer updates a weight using a gradient. In the context of this description the gradient is is the value of the gradient after any gradient accumulation has been performed and after the application of a loss scaling factor to the gradient has been corrected for.
When the optimizer needs to update a weight, \(w\), using a gradient, \(g\), it first computes a term \(g_\text{tmp}\), which is effectively is \(g\) with L2 regularization applied if the
WeightDecayMode
is set to WeightDecayMode::L2Regularization this, as follows:\[\begin{split}g_\text{tmp} := \left\{\begin{aligned} g & \text{ \; (Decay) } \\ (g + \text{wd} * w) & \text{ \; (L2Regularization) \; . } \\ \end{aligned}\right.\\\end{split}\]Secondly, the optimizer updates the optimizer state as follows:
\[\begin{split}m' &:= \beta_1 * m + (1 - \beta_1) * g_\text{tmp} \\ v' &:= \left\{\begin{aligned} \beta_2 * v + (1 - \beta_2) * g_\text{tmp}^2 & \text{ \; (Adam/AdamNoBias) } \\ \beta_2 * v + (1 - \beta_2) * g_\text{tmp}^2 & \text{ \; (Lamb/LambNoBias) } \\ \text{max}(\beta_2 * v, |g_\text{tmp}|) & \text{ \; (AdaMax) } \\ \end{aligned}\right.\\ t' &:= t + 1 \\\end{split}\]Next, it computes the following terms:
\[\begin{split}m_\text{tmp} &:= \left\{\begin{aligned} m' & \text{ \; (AdamNoBias/LambNoBias) } \\ \frac{m'}{(1 - \beta_1^{t'})} & \text{ \; (Adam/Lamb/AdaMax) } \\ \end{aligned}\right.\\ v_\text{tmp} &:= \left\{\begin{aligned} v' & \text{ \; (AdamNoBias/LambNoBias) } \\ \frac{v'}{(1 - \beta_2^{t'})} & \text{ \; (Adam/Lamb/AdaMax) } \\ \end{aligned}\right.\\ u_\text{tmp} &:= \left\{\begin{aligned} \frac{m_\text{tmp}}{(\sqrt{v_\text{tmp}} + \epsilon)} + \text{wd} * w &\text{ \; (Decay) } \\ \frac{m_\text{tmp}}{(\sqrt{v_\text{tmp}} + \epsilon)} &\text{ \; (L2Regularization) } \\ \end{aligned}\right.\end{split}\]Finally, the optimizer updates the weight as follows:
\[\begin{split}w' := \left\{\begin{aligned} w - \text{lr} * u_\text{tmp} &\text{ \; (Adam/AdamNoBias/AdaMax) } \\ w - \biggl(\frac{\text{min}(\lVert{w}\rVert, \text{mwn})}{\lVert{u_\text{tmp}}\rVert}\biggr) * \text{lr} * u_\text{tmp} &\text{ \; (Lamb/LambNoBias) } \\ \end{aligned}\right.\end{split}\]In addition to the above, the loss scaling hyper parameter is similar in nature to the velocity scaling parameter. It is a scaling value that is applied to the loss gradient at the start of the the backwards pass and, at the end of the backwards pass, this scaling is reversed by multiplying the gradients for each weight with the inverse of the loss scaling value prior to updating the optimizer state. Using loss scaling can also improve numerical stability of the gradient calculations. If scaledOptimizerState is enabled then the the lossScaling will not be removed before updating the optimizer state. This can improve the numerical stability when accl1_type is set to FLOAT16.
NOTE: The maximum weight norm is referred to as \(\phi\) in You et al., 2020.
- beta1s(self: popart_core.Adam) popart_core.OptimizerValueMap
- beta2s(self: popart_core.Adam) popart_core.OptimizerValueMap
- epss(self: popart_core.Adam) popart_core.OptimizerValueMap
- insertSpecific(self: popart_core.Adam, arg0: str, arg1: dict) None
- learningRates(self: popart_core.Adam) popart_core.OptimizerValueMap
- maxWeightNorms(self: popart_core.Adam) popart_core.OptimizerValueMap
- weightDecays(self: popart_core.Adam) popart_core.OptimizerValueMap
2.4.4. AdaDelta, RMSProp & AdaGrad
- class popart.Adaptive
- alphas(self: popart_core.Adaptive) popart_core.OptimizerValueMap
- epss(self: popart_core.Adaptive) popart_core.OptimizerValueMap
- insertSpecific(self: popart_core.Adaptive, arg0: str, arg1: dict) None
- learningRates(self: popart_core.Adaptive) popart_core.OptimizerValueMap
- momentums(self: popart_core.Adaptive) popart_core.OptimizerValueMap
- weightDecays(self: popart_core.Adaptive) popart_core.OptimizerValueMap
2.5. Builder
- class popart.builder.AiGraphcore(builder, version)
Return the builder interface for the given ai.graphcore version.
- Raises
ValueError – Thrown if an invalid ai.graphcore opset version provided.
- Parameters
- Return type
None
- call(args, num_outputs, callee, debugName='')
Add a call operation to the model.
This is a poplar extension, to expose manual code re-use to the builder
- Parameters
- Keyword Arguments
debugName – A string to prepend to the name of the tensor. Default: “”.
- Returns
Output tensor ids.
- Return type
- class popart.builder.AiGraphcoreOpset1(builder, version)
Sub-class for backwards compatibility.
Will forward all calls to AiGraphcore class.
- class popart.builder.AiOnnx(builder, version)
Base class for the various AiOnnx builder interfaces.
The most recent version of ONNX operators that require special treatment such as Loop, Scan, Logical_If etc. go here. While, older versions where the function signature differs are implemented on a corresponding subclass.
- Parameters
- Return type
None
- logical_if(args, num_outputs, else_branch, then_branch, name='')
If conditional operation.
- Parameters
num_outputs (int) – Number of output tensors from the if operator.
else_branch (Builder) –
SubgraphBuilder
for the graph to run if condition is false. Hasnum_outputs
outputs: values you wish to live-out to the subgraph created by the if operation, other tensors will not be accessible to the wider graph. The number of outputs must match the number of outputs in thethen_branch
.then_branch (Builder) –
SubgraphBuilder
for the graph to run if condition is true. Hasnum_outputs
outputs: values you wish to be live-out to the enclosing scope. The number of outputs must match the number of outputs in theelse_branch
.name (str) –
- Keyword Arguments
name – A string to prepend to the name of the tensor. Default: “”.
- Returns
Output tensor ids.
- Return type
- loop(args, num_outputs, body, debugContext='')
Construct a generic Looping op.
- Parameters
- Keyword Arguments
debugContext – A string to prepend to the name of the tensor. Default: “”.
- Returns
Output tensor ids.
- Return type
- class popart.builder.AiOnnx10(builder, version)
Minimal builder interface for ai.onnx version 10. Once ai.onnx version 11 becomes the standard opset, this class must be updated to inherit from AiOnnx11, as described in T12084
- class popart.builder.AiOnnx11(builder, version)
Minimal builder interface for ai.onnx version 11.
- class popart.builder.AiOnnx6(builder, version)
Minimal builder interface for ai.onnx version 6.
- class popart.builder.AiOnnx7(builder, version)
Minimal builder interface for ai.onnx version 7.
- class popart.builder.AiOnnx8(builder, version)
Minimal builder interface for ai.onnx version 8.
- scan(args, num_outputs, body, num_scan_inputs, directions=[], debugContext='')
Scan-8 specific construct op.
- Parameters
num_outputs (int) – Number of output tensors from the scan operator.
body (Builder) – SubgraphBuilder for the graph to run in the scan.
num_scan_inputs (int) – The number of scan_inputs
directions (List[int]) – A list of int which specifies the direction of the scan_input. 0 indicates forward direction and 1 indicates reverse direction. If not omitted, scan_input tensors will be scanned in the forward direction.
debugContext (str) –
- Keyword Arguments
debugContext – A string to prepend to the name of the tensor. Default: “”.
- Returns
Output tensor ids.
- Return type
- class popart.builder.AiOnnx9(builder, version)
Minimal builder interface for ai.onnx version 9.
- scan(args, num_outputs, body, num_scan_inputs, scan_input_axes=[], scan_input_directions=[], scan_output_axes=[], scan_output_directions=[], debugContext='')
Construct a generic scan op.
- Parameters
num_outputs (int) – Number of output tensors from the scan operator.
body (Builder) – SubgraphBuilder for the graph to run in the scan.
num_scan_inputs (int) – The number of scan_inputs
scan_input_axes (List[int]) – A list that specifies the axis to be scanned for the scan_input. If omitted, 0 will be used as the scan axis for every scan_input.
scan_input_directions (List[int]) – A list that specifies the direction to be scanned for the scan_input tensor. 0 indicates forward direction and 1 indicates reverse direction. If omitted, all scan_input tensors will be scanned in the forward direction.
scan_output_axes (List[int]) – A list that specifies the axis for the scan_output. The scan outputs are accumulated along the specified axis. If omitted, 0 will be used as the scan axis for every scan_output.
scan_output_directions (List[int]) – A list specifies whether the scan_output should be constructed by appending or prepending a new value in each iteration: 0 indicates appending and 1 indicates prepending. If omitted, all scan_output tensors will be produced by appending a value in each iteration.
debugContext (str) –
- Keyword Arguments
debugContext – A string to prepend to the name of the tensor. Default: “”.
- Returns
Output tensor ids.
- Return type
- class popart.builder.AiOnnxMl(builder, version)
Return the builder interface for the given ai.onnx.ml version.
- Raises
ValueError – Thrown if an invalid ai.onnx.ml opset version provided.
- Parameters
- Return type
None
- class popart.builder.Builder(modelProtoOrFilename=None, opsets=None, builderCore=None)
A wrapper around the
Builder
C++ class.Tis is renamed
BuilderCore
in pybind, to enable more Pythonic use. Seebuilder.hpp
for the class definition.- Parameters
modelProtoOrFilename (Union[str, bytes]) – Model protobuf string or file path of saved ONNX model proto. Default:
None
.opsets (Dict[str, int]) – Dict of opset versions. Default:
None
.builderCore (_BuilderCore) –
_BuilderCore
object if you want to create a subgraph builder using an existingbuildercore
object. Default:None
.
- Return type
None
- createSubgraphBuilder()
Create a child builder to add ops to a subgraph using a call operation.
- Returns
The child builder.
- Return type
- reshape_const(aiOnnx, args, shape, debugContext='')
Const version of the reshape op.
- Parameters
- Keyword Arguments
debugContext – String to use as a debug Context. Default: “”.
- Returns
Output tensor ids.
- Return type
- class popart.builder.Opset(builder, version)
Minimal base class for the opsets.
- class popart.Builder(modelProtoOrFilename=None, opsets=None, builderCore=None)
A wrapper around the
Builder
C++ class.Tis is renamed
BuilderCore
in pybind, to enable more Pythonic use. Seebuilder.hpp
for the class definition.- Parameters
modelProtoOrFilename (Union[str, bytes]) – Model protobuf string or file path of saved ONNX model proto. Default:
None
.opsets (Dict[str, int]) – Dict of opset versions. Default:
None
.builderCore (_BuilderCore) –
_BuilderCore
object if you want to create a subgraph builder using an existingbuildercore
object. Default:None
.
- Return type
None
- createSubgraphBuilder()
Create a child builder to add ops to a subgraph using a call operation.
- Returns
The child builder.
- Return type
- reshape_const(aiOnnx, args, shape, debugContext='')
Const version of the reshape op.
- Parameters
- Keyword Arguments
debugContext – String to use as a debug Context. Default: “”.
- Returns
Output tensor ids.
- Return type
- class popart.AiGraphcoreOpset1
- abort(self: popart_core.AiGraphcoreOpset1, args: List[str] = [], debugContext: popart_internal_ir.DebugContext = '') None
Add an abort operation to the model.
The operation can be conditional or unconditional.
- Parameters
args – A vector of input tensor ids.
debugContext – Optional debug information.
- atan2(self: popart_core.AiGraphcoreOpset1, args: List[str], debugContext: popart_internal_ir.DebugContext = '') str
Add an ``atan2`:code:` operation to the model.
Returns the element-wise angle theta as a tensor. For :math:` -pi < theta le pi
, such that for two input tensors :math:`x
and \(y\) and given :math:` r ne 0, then :math:
x = r costheta, and :math:
y = r sintheta `, element-wise.In the case of :math:` x > 0 ` , :math:` theta = arctan(y/x)` .
- Parameters
args – A vector of input tensor ids: [
y:code:
,x
].debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- bitwiseand(self: popart_core.AiGraphcoreOpset1, args: List[str] = [], debugContext: popart_internal_ir.DebugContext = '') str
Add a bitwise AND operation to the model.
The operation computes the bitwise AND of two integer tensors.
- Parameters
args – Two broadcastable input tensors of type integer.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- bitwisenot(self: popart_core.AiGraphcoreOpset1, args: List[str] = [], debugContext: popart_internal_ir.DebugContext = '') str
Add a bitwise NOT operation to the model.
The operation computes the bitwise NOT of an integer tensor.
- Parameters
args – An input tensor of type integer.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- bitwiseor(self: popart_core.AiGraphcoreOpset1, args: List[str] = [], debugContext: popart_internal_ir.DebugContext = '') str
Add a bitwise OR operation to the model.
The operation computes the bitwise OR of two integer tensors.
- Parameters
args – Two broadcastable input tensors of type integer.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- bitwisexnor(self: popart_core.AiGraphcoreOpset1, args: List[str] = [], debugContext: popart_internal_ir.DebugContext = '') str
Add a bitwise XNOR operation to the model.
The operation computes the bitwise XNOR of two integer tensors.
- Parameters
args – Two broadcastable input tensors of type integer.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- bitwisexor(self: popart_core.AiGraphcoreOpset1, args: List[str] = [], debugContext: popart_internal_ir.DebugContext = '') str
Add a bitwise XOR operation to the model.
The operation computes the bitwise XOR of two integer tensors.
- Parameters
args – Two broadcastable input tensors of type integer.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- call(self: popart_core.AiGraphcoreOpset1, args: List[str], num_outputs: int, callee: popart::Builder, debugContext: popart_internal_ir.DebugContext = '') List[str]
Add a call operation to the model.
This is a Poplar extension, to expose manual code re-use to the builder.
- Parameters
args – A vector of input tensor ids.
callee – The subgraph to call into.
debugContext – Optional debug information.
- Returns
A vector of tensors; the subgraph outputs.
- copyvarupdate(self: popart_core.AiGraphcoreOpset1, args: List[str], debugContext: popart_internal_ir.DebugContext = '') str
Copies a tensor to an initalised tensor (variable).
This is used to update an initalised tensor (a variable created using addInitializedInputTensor()) which retains its value between iterations, by setting the value to the value of another tensor (the updater). The purpose is to manually update the tensor in use cases for variables other than trained parameters (weights) or tensors used by other ops.
- Parameters
args – A vector of the input tensor ids containing the tensor to be updated,
tensor
and the tensor containing the values for the update,updater
as [tensor
,updater
].debugContext – Optional debug information.
- Returns
to ensure correct ordering of the updated variable, you should use this variable for any op which should operate on the updated variable.
- Return type
An alias to the updated variable
- ctcbeamsearchdecoder(self: popart_core.AiGraphcoreOpset1, args: List[str], blank: int = 0, beam_width: int = 100, top_paths: int = 1, debug_context: popart_internal_ir.DebugContext = '') List[str]
Add a connectionist temporal classification (CTC) beam search decoder operation to the model.
Calculate the most likely p topPaths labels and their probabilities given the input p logProbs with lengths p dataLengths.
- Parameters
args – A vector of input tensor ids. These are [
logProbs
,dataLengths
], wherelogProbs
is of shape [maxTime
,batchSize
, *numClasses
], anddataLengths
is of shape [batchSize
].blank – The integer representing the blank class.
beamWidth – The number of beams to use when decoding.
topPaths – The number of most likely decoded paths to return, must be less than or equal to p beamWidth.
debugContext – Optional debug information.
- Returns
code:
labelProbs, `labelLengths:code:
,decodedLabels:code:
], wherelabelProbs:code:
is of shape [batchSize:code:
,topPaths:code:
],labelLengths:code:
is of shape [batchSize:code:
,topPaths:code:
], anddecodedLabels:code:
is of shape [batchSize:code:
,topPaths:code:
,maxTime
].- Return type
The names of the result tensors. These are [
- ctcloss(self: popart_core.AiGraphcoreOpset1, args: List[str], reduction: popart_core.ReductionType = <ReductionType.Mean: 1>, blank: int = 0, outDataType: str = 'UNDEFINED', zeroInfinity: bool = False, debugContext: popart_internal_ir.DebugContext = '') str
Add a connectionist temporal classification (CTC) loss operation to the model.
With maximum input length
T
, batch sizeN
, number of classesC
and maximum target lengthS
, this op calculates the CTC loss for a logarithmised probabilities tensor with shape [T
,N
,C
], a class target tensor with shape [N
,S
], an input lengths tensor [N
] and a target lengths tensor [N
].Note that
C
includes a blank class (default=0). The probabilities tensor is padded as required. Target sequences are also padded and are populated with values less than or equal toC
, not including the blank class, up to their respective target lengths. Note that target lengths cannot exceed input lengths.- Parameters
args – A vector of input tensor ids [
log_probs
,:code:targets
,input_lengths
,target_lengths
].reduction – The type of reduction to perform on the individual losses.
blank – The integer representing the blank class.
outDataType – The data type of the output tensors. Default =
UNDEFINED
.zeroInfinity – If
true
infinite losses and the associated gradients are zeroed-out. Default =false
.debugContext – Optional debug information
- Returns
The tensor id of the result tensor.
- depthtospace(self: popart_core.AiGraphcoreOpset1, args: List[str], blocksize: int, mode: str, debugContext: popart_internal_ir.DebugContext = '') str
Add a depth-to-space operation to the model.
This allows DepthToSpace_11 to be targeted from earlier opsets.
The purpose of a depth-to-space operation, also known as pixel shuffling, is to rearrange data from the depth (channels) dimension into the spatial (width and height) dimensions. It is an efficient means of learning upsampling alongside mixing convolution with bilinear interpolation and using transpose convolution.
See also
- Parameters
args – A vector containing a single tensor id of the input tensor
:param of shape [
N
: :paramC
: :paramH
: :paramW
]: :param whereN
is the batch axis: :paramC
is the: :param channel or depth: :paramH
is the height andW
is the width.: :param blocksize: The size of the blocks to be moved. If the input is :param [N
: :paramC
: :paramH
: :paramW
] and the blocksize isB
: :param the output will be: :param [N
: :type [N
: B*B :param :code:`C/: :type :code:`C/: B*B :param mode: Specifies how the data is rearranged: :param * “DCR”: depth-column-row order :type * “DCR”: Default :param * “CRD”: column-row-depth order :param debugContext: Optional debug information.- Returns
A tensor which is a rearrangement of the input tensor.
- detach(self: popart_core.AiGraphcoreOpset1, args: List[str], debugContext: popart_internal_ir.DebugContext = '') str
Add a detach operation to the model.
- Parameters
args – A vector of input tensor ids.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- dynamicadd(self: popart_core.AiGraphcoreOpset1, args: List[str], axes: List[int], sizes: List[int], debugContext: popart_internal_ir.DebugContext = '') str
Add a dynamic add operation to the model.
Creates a copy of a tensor,
tensor
, with a slice tensor,slice
, added at an offset position,offset.
For example:- Args:
args: A vector of input tensor ids: [
tensor:code:
,offset:code:
,slice
]. axes: The axes along which to add the slice. sizes: The size of the slice along each axis. debugContext: Optional debug information.
- Returns
The tensor id of the result tensor.
- dynamicslice(self: popart_core.AiGraphcoreOpset1, args: List[str], axes: List[int], sizes: List[int], noOverlap: int = 0, debugContext: popart_internal_ir.DebugContext = '') str
- Args:
args: A vector of input tensor ids: [tensor, offset]. axes: The axes along which to slice. sizes: The size of the slice along each axis. noOverlap: Indicates whether the slice regions overlap or not. If 1,
slice regions do not overlap, otherwise they do overlap.
debugContext: Optional debug information.
- Returns
The tensor id of the result tensor.
- dynamicupdate(self: popart_core.AiGraphcoreOpset1, args: List[str], axes: List[int], sizes: List[int], noOverlap: int = 0, debugContext: popart_internal_ir.DebugContext = '') str
- Args:
args: A vector of input tensor ids: [tensor, offset, slice]. axes: The axes along which to update. sizes: The size of the slice along each axis. noOverlap: Indicates whether the updates overlap or not. If 1,
the updates do not overlap, otherwise they do overlap.
debugContext: Optional debug information.
- Returns
The tensor id of the result tensor.
- dynamiczero(self: popart_core.AiGraphcoreOpset1, args: List[str], axes: List[int], sizes: List[int], debugContext: popart_internal_ir.DebugContext = '') str
Add a dynamic zero operation to the model.
Creates a copy of a tensor,
tensor
, with a slice tensor at offset position,offset
set to zero. For example: ```out = tensor out[offset] = 0.0
- Parameters
args – A vector of input tensor ids: [tensor, offset].
axes – The axes along which to zero elements.
sizes – The size of the slice along each axis.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- expm1(self: popart_core.AiGraphcoreOpset1, args: List[str], debugContext: popart_internal_ir.DebugContext = '') str
Add a
expm1
operation to the model.This calculates the element-wise exponential of the input tensor and subtracts one: :math:` exp(x) - 1 `.
- Parameters
args – A vector of input tensor ids.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- fmod(self: popart_core.AiGraphcoreOpset1, args: List[str], debugContext: popart_internal_ir.DebugContext = '') str
Add an
fmod
operation to the model.This is equivalent to the C
fmod
function. The result has the same sign as the dividend.- Parameters
args – A vector of input tensor ids.
debugContext – Optional debug information.
- Returns
Computes the element-wise remainder of division. The remainder has the same sign as the dividend.
- gelu(self: popart_core.AiGraphcoreOpset1, args: List[str], debugContext: popart_internal_ir.DebugContext = '') str
Add a GELU operation to the model.
This is a Poplar extension.
- Parameters
args – A vector of input tensor ids.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- groupnormalization(self: popart_core.AiGraphcoreOpset1, args: List[str], num_groups: int, epsilon: float = 9.999999747378752e-06, debugContext: popart_internal_ir.DebugContext = '') List[str]
Add a group normalization operation to the model.
This is a Poplar extension.
The group will be created from a strided input.
- Parameters
args – A vector of input tensor ids for input data
x
, scalescale
, and biasbias
as [x
,scale
,bias
].num_groups – The number of groups to separate the channels into.
epsilon – The epsilon value to use to avoid division by zero.
debugContext – Optional debug information.
- Returns
A vector of output tensor ids for output data
y
, the meanmean
and the variance
var
as [y
,mean
,var
].
- identityloss(self: popart_core.AiGraphcoreOpset1, args: List[str], reduction: popart_core.ReductionType = <ReductionType.Mean: 1>, debugContext: popart_internal_ir.DebugContext = '') str
Add an identity loss operation to the model.
Calculates the loss using the identity operator.
- Parameters
args – A vector of input tensor ids.
reduction – The type of reduction to perform on the individual losses.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- incrementmod(self: popart_core.AiGraphcoreOpset1, args: List[str] = [], increment: float, modulus: float, debugContext: popart_internal_ir.DebugContext = '') str
Add an incrementmod operation to the model.
The operation is of the form
y = (x + increment) % modulus
.- Parameters
args – A vector with a single input tensor id.
increment – A scalar increment
modulus – A scalar modulus
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- init(*args, **kwargs)
Overloaded function.
init(self: popart_core.AiGraphcoreOpset1, shape: List[int], data_type: int, init_type: int, batch_axis: int, debugContext: popart_internal_ir.DebugContext = ‘’) -> str
Add an init operation to the model.
- Parameters
shape – The shape of the tensor to initialise.
data_type – The data type to initialise tensor with. The value is the integer attribute taken from the DataType enum.
init_type – The mode of the tensor initialisation. The value is the integer attribute taken from the InitType enum.
batch_axis – Batch axis specifies the axis that the batches are split along and is a literal integer.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
init(self: popart_core.AiGraphcoreOpset1, shape: List[int], data_type: int, init_type: int, debugContext: popart_internal_ir.DebugContext = ‘’) -> str
Add an init operation to the model.
- Parameters
shape – The shape of the tensor to initialise.
data_type – The data type to initialise tensor with. The value is the integer attribute taken from the DataType enum.
init_type – The mode of the tensor initialisation. The value is the integer attribute taken from the InitType enum.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- l1loss(self: popart_core.AiGraphcoreOpset1, args: List[str], lambda: float, reduction: popart_core.ReductionType = <ReductionType.Mean: 1>, debugContext: popart_internal_ir.DebugContext = '') str
Add an
l1
loss operation to the model.Calculates the mean absolute error between each element in the input with a zero target.
- Parameters
args – A vector of input tensor ids.
lambda – The scale factor of the L1 loss.
reduction – The type of reduction to perform on the individual losses.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- log1p(self: popart_core.AiGraphcoreOpset1, args: List[str], debugContext: popart_internal_ir.DebugContext = '') str
Add a
log1p
operation to the model.This calculates the element-wise logarithm of the input tensor plus one: :math:` log(x + 1) `.
- Parameters
args – A vector of input tensor ids.
name – Optional identifier for operation.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- lstm(self: popart_core.AiGraphcoreOpset1, args: List[str], outputFullSequence: int = 1, debugContext: popart_internal_ir.DebugContext = '') List[str]
- multiconv(self: popart_core.AiGraphcoreOpset1, args: List[List[str]], dilations: List[List[int]] = [], inDilations: List[List[int]] = [], pads: List[List[int]] = [], outPads: List[List[int]] = [], strides: List[List[int]] = [], availableMemoryProportions: List[float] = [], partialsTypes: List[str] = [], planType: Optional[str] = None, perConvReservedTiles: Optional[int] = None, cycleBackOff: Optional[float] = None, enableConvDithering: List[int] = [], debugContext: popart_internal_ir.DebugContext = '') List[str]
Add a multi-convolution operation to the model.
Using this multi-convolution API ensures that the convolutions are executed in parallel on the device.
Functionally, a multi-convolution is equivalent to a series of single convolutions. Using this multi-convolution API is always equivalent to calling the single-convolution API (conv) once for each argument.
A0 = conv({X0, W0, B0}) A1 = conv({X1, W1})
- ```
It is possible that any two convolutions cannot be executed in parallel due to topological constraints. For example, the following:
- ```
{B, D} = multiconv({{A, W0}, {C, W1}}).
- ``:code:`
Note that it is not possible to create such a cycle by adding a multi-convolution with this API.
Calls to multiconv() are mapped to poplar::poplin::multiconv::convolution().
All input vectors must be either empty, or equal in length to the number of convolutions. Note that groups for each convolution are automatically inferred from the shapes of the data and weight inputs.
- Args:
- tensors: List of tensor ids for input tensors for data, weights and
biases as [
data:code:
,weight:code:
,`bias:code:] for each convolution. `bias:code:
is optional.
dilations: The dilations attributes for each convolution. inDilations: The input dilations attributes for each convolution. pads: The pads for each convolution. outPads: The output padding for each convolution. strides: The strides for each convolution. availableMemoryProportions: The available memory proportions per
convolution, each [0, 1).
partialsTypes: The partials type per convolution. planType: Run convolutions in parallel or series. perConvReservedTiles: The number of tiles to reserve per convolution
when planning.
cycleBackOff: Cycle back-off proportion, [0, 1). enableConvDithering: Enable convolution dithering per convolution. If
true:code:
, then convolutions with different parameters will be laid out from different tiles in an effort to improve tile balance in models.debugContext: Optional debug information.
- Returns
A vector of tensor ids of the output tensor from each convolution.
See also
Optimising Temporary Memory Usage for Convolutions and Matmuls on the IPU for some practical examples of using
availableMemoryProportion
.
- nllloss(self: popart_core.AiGraphcoreOpset1, args: List[str], reduction: popart_core.ReductionType = <ReductionType.Mean: 1>, ignoreIndex: Optional[int] = None, inputIsLogProbability: bool = False, debugContext: popart_internal_ir.DebugContext = '') str
Add a negative log-likelihood loss operation to the model.
Calculates the negative log likelihood (NLL) loss given a probability tensor over classes, and a target tensor containing class labels.
- Parameters
args – A vector of input tensor ids: probability and tensor.
reduction – The type of reduction to perform on the individual losses.
ignoreIndex – Optional class index to ignore in loss calculation.
inputIsLogProbability – If
true
the input tensor contains log-probabilities, otherwise raw probabilities. Default =false
.debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- nop(self: popart_core.AiGraphcoreOpset1, args: List[str], debugContext: popart_internal_ir.DebugContext = '') str
Add a no-op operation to the model.
- Parameters
args – A vector of input tensor ids.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- packedDataBlock(self: popart_core.AiGraphcoreOpset1, args: List[str], maxSequenceLengths: List[int], resultSize: int, callbackBatchSize: int, callback: popart::Builder, debugContext: popart_internal_ir.DebugContext = '') str
Add a call operation to the model.
This is a Poplar extension, to expose manual code re-use to the builder.
- Parameters
args – A vector of input tensor ids.
callee – The subgraph to call into.
debugContext – Optional debug information.
- Returns
A vector of tensors; the subgraph outputs.
- printtensor(self: popart_core.AiGraphcoreOpset1, args: List[str], print_gradient: int = 1, debugContext: popart_internal_ir.DebugContext = '', title: str = '') str
Add a print tensor operation to the model.
This is a Poplar extension.
- Parameters
args – A vector of tensor ids to print.
print_gradient – Indicates whether the gradient tensor(s) associated with the input tensor(s) are also printed. If 1, the gradient tensor(s) are also printed, otherwise the gradient tensor(s) are not printed.
debugContext – Optional debug information.
title – An optional title to print.
- Returns
The tensor id of the result tensor.
- reducemedian(self: popart_core.AiGraphcoreOpset1, args: List[str], axes: Optional[List[int]] = None, keepdims: int = 1, debugContext: popart_internal_ir.DebugContext = '') List[str]
Add reducemedian operation to the model.
This method computes the median values along the specified axes. In the case of an even number of elements, the lower of the two medians is selected. By default, the input tensor is reduced over all axes. Additionally, the operation also returns the indices of found median values in the reduction axis. If reduction is performed over multiple axes, the indices are “flattened” over the reduced axes, similar to
numpy.ndarray.flat
. The index may not be the first occurrence of the median value found in the input tensor.- Parameters
args – A vector with a single input tensor id.
axes – The axes over which the reduction is performed.
keepdims – If 1, the result tensors are of equal size as the input, but with reduction axes of size 1. Otherwise, the reduction axes are squeezed and the result tensors have fewer dimensions compared to the input. Default = 1.
debugContext – Optional debug information.
- Returns
The names of the two result tensors, one for median values and one for indices.
- remainder(self: popart_core.AiGraphcoreOpset1, args: List[str], debugContext: popart_internal_ir.DebugContext = '') str
Add a remainder operation to the model.
This is equivalent to Python’s modulo operator
%
. The result has the same sign as the divisor.- Parameters
args – A vector of input tensor ids.
debugContext – Optional debug information.
- Returns
Computes the element-wise remainder of division. The remainder has the same sign as the divisor.
- replicatedallreduce(self: popart_core.AiGraphcoreOpset1, args: List[str], collectiveOperator: Optional[popart::CollectiveOperator] = None, commGroup: Optional[popart_internal_ir.CommGroup] = None, debugContext: popart_internal_ir.DebugContext = '') str
DEPRECATED: Add a replicated allreduce operation to the model.
This is a Poplar extension, to expose manual code re-use to the builder.
- Parameters
args – A vector of input tensor ids to reduce across.
commGroup – GCL CommGroup parameter.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- replicatedreducescatter(self: popart_core.AiGraphcoreOpset1, args: List[str], collectiveOperator: Optional[popart::CollectiveOperator] = None, commGroup: Optional[popart_internal_ir.CommGroup] = None, debugContext: popart_internal_ir.DebugContext = '') str
Add a replicated reduce-scatter operation to the model.
This is a Poplar extension, to expose manual code re-use to the builder.
- Parameters
args – A vector of input tensor ids to reduce across.
collectiveOperator – A Graphcore Communication Library (GCL) collective operator.
commGroup – A GCL CommGroup parameter.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- reshape(self: popart_core.AiGraphcoreOpset1, args: str, shape: List[int], debugContext: popart_internal_ir.DebugContext = '') str
Add a reshape operation to the model.
This reshapes an input tensor. This reshape takes the target shape as an attribute instead of a tensor input as for the ONNX reshape op.
- Parameters
arg – The tensor id of the input tensor.
shape – The shape of the output tensor. The output tensor must contain the same number of elements as the input tensor.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- reverse(self: popart_core.AiGraphcoreOpset1, args: List[str], dimensions: List[int], debugContext: popart_internal_ir.DebugContext = '') str
Add a reverse operator to the model.
This reverses or flips the tensor along the specified dimensions.
- Parameters
args – A vector of input tensor ids.
dimensions – The dimensions along which to reverse the tensor. If this is empty then this is equivalent to the identity operator.
debugContext – Optional debug information.
- Returns
The tensor id of the reversed tensor.
- round(self: popart_core.AiGraphcoreOpset1, args: List[str], debugContext: popart_internal_ir.DebugContext = '') str
Add a rounding operation to the model.
This allows
Round_11
to be targeted from earlier opsets.See also
- Parameters
args – A vector of input tensor ids.
debugContext – Optional debug information.
- Returns
The normalized output tensor ids.
- scale(self: popart_core.AiGraphcoreOpset1, args: List[str], scale: float, debugContext: popart_internal_ir.DebugContext = '') str
Add a scale operation to the model.
This is a Poplar extension.
- Parameters
args – A vector of input tensor ids.
scale – The scale to apply.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- scaledadd(self: popart_core.AiGraphcoreOpset1, args: List[str], scale0: float = 1.0, scale1: float = 1.0, debugContext: popart_internal_ir.DebugContext = '') str
Add a scaled add operation to the model.
The scaled add operation takes the form: ```
X = scale0 * T0 + scale1 * T1
` where ``scale0
is the scale factor to be applied to tensor T0 andscale1
is the scale factor to be applied to tensor T1.- Parameters
args – A vector of input tensor ids: [T0, T1, scale0, scale1].
scale0 – The scale to apply (if no
scale0
tensor is supplied).scale1 – The scale to apply (if no
scale1
tensor is supplied).debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- scatterreduce(self: popart_core.AiGraphcoreOpset1, args: List[str] = [], axis_size: int, axis: int = -1, reduction: popart_core.ScatterReduction = <ScatterReduction.Sum: 0>, debugContext: popart_internal_ir.DebugContext = '') str
- sequenceslice(self: popart_core.AiGraphcoreOpset1, args: List[str], zeroUnused: int = 0, debugContext: popart_internal_ir.DebugContext = '') str
Slice a 2D tensor based on offsets.
- The outermost dimension is sliced. For the following:
source
is the source tensor.destination
is the destination tensor.N
is the number of elements to copy.sourceOffset
is the first element read from the source tensor.destinationOffset
is the first element written to in the destinationtensor.
Then, for each entry in
N
,sourceOffset
anddestinationOffset
:- ```
destination[destinationOffset:destinationOffset+N][…] = source[sourceOffset:sourceOffset+N][…] ``:code:` Entries after the first
N==0:code:
may be ignored. Unreferenced elements ofdestination:code:
are zeroed ifzeroUnused:code:
is set. The same output element should not be written by multiple inputs.source:code:
anddestination:code:
must have rank greater than or equal to 2. The outer dimension is sliced; the product of the inner dimensions must match.sourceOffset:code:
,destinationOffset:code:
andN
must be 1-dimensional and of the same size. For example:
- Args:
- args: A vector of input tensor ids for the following tensors
[
source:code:
,destination:code:
,N:code:
,sourceOffset:code:
,destinationOffset:code:
].- zeroUnused: Determines whether to zero unreferenced
destination
elements. If 1, the unreferenced elements are zeroed, otherwise they are not zeroed.
debugContext: Optional debug information.
- shapeddropout(self: popart_core.AiGraphcoreOpset1, args: List[str], shape: List[int], ratio: float = 0.5, debugContext: popart_internal_ir.DebugContext = '') str
Add a shaped dropout operation to the model.
Applies a shaped dropout to the input tensor. This operator requires a shape parameter that is used to define the shape of the dropout mask so that strongly correlated features in the input tensor can be preserved. The provided shape must be broadcastable to the input tensor. Note that this operation targets the
poprand
library function of the same name.- Parameters
args – A vector of input tensor ids.
shape – The shape of dropout mask. This must be broadcastable to the input.
ratio – The probability of dropping an input feature. Default = 0.5.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- slice(self: popart_core.AiGraphcoreOpset1, args: List[str], ends: List[int], starts: List[int], axes: List[int], debugContext: popart_internal_ir.DebugContext = '') str
Add a slice to the model.
This version of slice uses the
starts
,ends
andaxes
attributes rather than tensor inputs. This reduces the number of ops as constant tensors are treated as ops while attributes are not.- Parameters
args – A vector of input tensor ids.
ends – The
ends
attribute.starts – The
starts
attribute.axes – The
axes
attribute.debugContext – Optional debug information.
- Returns
The normalized output tensor id.
- subsample(self: popart_core.AiGraphcoreOpset1, args: List[str], strides: List[int], debugContext: popart_internal_ir.DebugContext = '') str
Add a sub-sample operation to the model.
This is a Poplar extension.
If multiple tensors are provided, the strides will be applied to them all.
- Parameters
args – A vector of tensor ids to sub-sample.
strides – The strides to use.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- swish(self: popart_core.AiGraphcoreOpset1, args: List[str] = [], debugContext: popart_internal_ir.DebugContext = '') str
Add a swish operation to the model.
The operation computes the swish activation function, also known as the SiLU activation.
- Parameters
args – A vector with a single input tensor id.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- tensorremap(self: popart_core.AiGraphcoreOpset1, args: List[str], remap_type: int = 0, debugPrefix: popart_internal_ir.DebugContext = '') str
2.5.1. AiGraphcoreOpset1
- class popart.AiGraphcoreOpset1
- abort(self: popart_core.AiGraphcoreOpset1, args: List[str] = [], debugContext: popart_internal_ir.DebugContext = '') None
Add an abort operation to the model.
The operation can be conditional or unconditional.
- Parameters
args – A vector of input tensor ids.
debugContext – Optional debug information.
- atan2(self: popart_core.AiGraphcoreOpset1, args: List[str], debugContext: popart_internal_ir.DebugContext = '') str
Add an ``atan2`:code:` operation to the model.
Returns the element-wise angle theta as a tensor. For :math:` -pi < theta le pi
, such that for two input tensors :math:`x
and \(y\) and given :math:` r ne 0, then :math:
x = r costheta, and :math:
y = r sintheta `, element-wise.In the case of :math:` x > 0 ` , :math:` theta = arctan(y/x)` .
- Parameters
args – A vector of input tensor ids: [
y:code:
,x
].debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- bitwiseand(self: popart_core.AiGraphcoreOpset1, args: List[str] = [], debugContext: popart_internal_ir.DebugContext = '') str
Add a bitwise AND operation to the model.
The operation computes the bitwise AND of two integer tensors.
- Parameters
args – Two broadcastable input tensors of type integer.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- bitwisenot(self: popart_core.AiGraphcoreOpset1, args: List[str] = [], debugContext: popart_internal_ir.DebugContext = '') str
Add a bitwise NOT operation to the model.
The operation computes the bitwise NOT of an integer tensor.
- Parameters
args – An input tensor of type integer.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- bitwiseor(self: popart_core.AiGraphcoreOpset1, args: List[str] = [], debugContext: popart_internal_ir.DebugContext = '') str
Add a bitwise OR operation to the model.
The operation computes the bitwise OR of two integer tensors.
- Parameters
args – Two broadcastable input tensors of type integer.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- bitwisexnor(self: popart_core.AiGraphcoreOpset1, args: List[str] = [], debugContext: popart_internal_ir.DebugContext = '') str
Add a bitwise XNOR operation to the model.
The operation computes the bitwise XNOR of two integer tensors.
- Parameters
args – Two broadcastable input tensors of type integer.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- bitwisexor(self: popart_core.AiGraphcoreOpset1, args: List[str] = [], debugContext: popart_internal_ir.DebugContext = '') str
Add a bitwise XOR operation to the model.
The operation computes the bitwise XOR of two integer tensors.
- Parameters
args – Two broadcastable input tensors of type integer.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- call(self: popart_core.AiGraphcoreOpset1, args: List[str], num_outputs: int, callee: popart::Builder, debugContext: popart_internal_ir.DebugContext = '') List[str]
Add a call operation to the model.
This is a Poplar extension, to expose manual code re-use to the builder.
- Parameters
args – A vector of input tensor ids.
callee – The subgraph to call into.
debugContext – Optional debug information.
- Returns
A vector of tensors; the subgraph outputs.
- copyvarupdate(self: popart_core.AiGraphcoreOpset1, args: List[str], debugContext: popart_internal_ir.DebugContext = '') str
Copies a tensor to an initalised tensor (variable).
This is used to update an initalised tensor (a variable created using addInitializedInputTensor()) which retains its value between iterations, by setting the value to the value of another tensor (the updater). The purpose is to manually update the tensor in use cases for variables other than trained parameters (weights) or tensors used by other ops.
- Parameters
args – A vector of the input tensor ids containing the tensor to be updated,
tensor
and the tensor containing the values for the update,updater
as [tensor
,updater
].debugContext – Optional debug information.
- Returns
to ensure correct ordering of the updated variable, you should use this variable for any op which should operate on the updated variable.
- Return type
An alias to the updated variable
- ctcbeamsearchdecoder(self: popart_core.AiGraphcoreOpset1, args: List[str], blank: int = 0, beam_width: int = 100, top_paths: int = 1, debug_context: popart_internal_ir.DebugContext = '') List[str]
Add a connectionist temporal classification (CTC) beam search decoder operation to the model.
Calculate the most likely p topPaths labels and their probabilities given the input p logProbs with lengths p dataLengths.
- Parameters
args – A vector of input tensor ids. These are [
logProbs
,dataLengths
], wherelogProbs
is of shape [maxTime
,batchSize
, *numClasses
], anddataLengths
is of shape [batchSize
].blank – The integer representing the blank class.
beamWidth – The number of beams to use when decoding.
topPaths – The number of most likely decoded paths to return, must be less than or equal to p beamWidth.
debugContext – Optional debug information.
- Returns
code:
labelProbs, `labelLengths:code:
,decodedLabels:code:
], wherelabelProbs:code:
is of shape [batchSize:code:
,topPaths:code:
],labelLengths:code:
is of shape [batchSize:code:
,topPaths:code:
], anddecodedLabels:code:
is of shape [batchSize:code:
,topPaths:code:
,maxTime
].- Return type
The names of the result tensors. These are [
- ctcloss(self: popart_core.AiGraphcoreOpset1, args: List[str], reduction: popart_core.ReductionType = <ReductionType.Mean: 1>, blank: int = 0, outDataType: str = 'UNDEFINED', zeroInfinity: bool = False, debugContext: popart_internal_ir.DebugContext = '') str
Add a connectionist temporal classification (CTC) loss operation to the model.
With maximum input length
T
, batch sizeN
, number of classesC
and maximum target lengthS
, this op calculates the CTC loss for a logarithmised probabilities tensor with shape [T
,N
,C
], a class target tensor with shape [N
,S
], an input lengths tensor [N
] and a target lengths tensor [N
].Note that
C
includes a blank class (default=0). The probabilities tensor is padded as required. Target sequences are also padded and are populated with values less than or equal toC
, not including the blank class, up to their respective target lengths. Note that target lengths cannot exceed input lengths.- Parameters
args – A vector of input tensor ids [
log_probs
,:code:targets
,input_lengths
,target_lengths
].reduction – The type of reduction to perform on the individual losses.
blank – The integer representing the blank class.
outDataType – The data type of the output tensors. Default =
UNDEFINED
.zeroInfinity – If
true
infinite losses and the associated gradients are zeroed-out. Default =false
.debugContext – Optional debug information
- Returns
The tensor id of the result tensor.
- depthtospace(self: popart_core.AiGraphcoreOpset1, args: List[str], blocksize: int, mode: str, debugContext: popart_internal_ir.DebugContext = '') str
Add a depth-to-space operation to the model.
This allows DepthToSpace_11 to be targeted from earlier opsets.
The purpose of a depth-to-space operation, also known as pixel shuffling, is to rearrange data from the depth (channels) dimension into the spatial (width and height) dimensions. It is an efficient means of learning upsampling alongside mixing convolution with bilinear interpolation and using transpose convolution.
See also
- Parameters
args – A vector containing a single tensor id of the input tensor
:param of shape [
N
: :paramC
: :paramH
: :paramW
]: :param whereN
is the batch axis: :paramC
is the: :param channel or depth: :paramH
is the height andW
is the width.: :param blocksize: The size of the blocks to be moved. If the input is :param [N
: :paramC
: :paramH
: :paramW
] and the blocksize isB
: :param the output will be: :param [N
: :type [N
: B*B :param :code:`C/: :type :code:`C/: B*B :param mode: Specifies how the data is rearranged: :param * “DCR”: depth-column-row order :type * “DCR”: Default :param * “CRD”: column-row-depth order :param debugContext: Optional debug information.- Returns
A tensor which is a rearrangement of the input tensor.
- detach(self: popart_core.AiGraphcoreOpset1, args: List[str], debugContext: popart_internal_ir.DebugContext = '') str
Add a detach operation to the model.
- Parameters
args – A vector of input tensor ids.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- dynamicadd(self: popart_core.AiGraphcoreOpset1, args: List[str], axes: List[int], sizes: List[int], debugContext: popart_internal_ir.DebugContext = '') str
Add a dynamic add operation to the model.
Creates a copy of a tensor,
tensor
, with a slice tensor,slice
, added at an offset position,offset.
For example:- Args:
args: A vector of input tensor ids: [
tensor:code:
,offset:code:
,slice
]. axes: The axes along which to add the slice. sizes: The size of the slice along each axis. debugContext: Optional debug information.
- Returns
The tensor id of the result tensor.
- dynamicslice(self: popart_core.AiGraphcoreOpset1, args: List[str], axes: List[int], sizes: List[int], noOverlap: int = 0, debugContext: popart_internal_ir.DebugContext = '') str
- Args:
args: A vector of input tensor ids: [tensor, offset]. axes: The axes along which to slice. sizes: The size of the slice along each axis. noOverlap: Indicates whether the slice regions overlap or not. If 1,
slice regions do not overlap, otherwise they do overlap.
debugContext: Optional debug information.
- Returns
The tensor id of the result tensor.
- dynamicupdate(self: popart_core.AiGraphcoreOpset1, args: List[str], axes: List[int], sizes: List[int], noOverlap: int = 0, debugContext: popart_internal_ir.DebugContext = '') str
- Args:
args: A vector of input tensor ids: [tensor, offset, slice]. axes: The axes along which to update. sizes: The size of the slice along each axis. noOverlap: Indicates whether the updates overlap or not. If 1,
the updates do not overlap, otherwise they do overlap.
debugContext: Optional debug information.
- Returns
The tensor id of the result tensor.
- dynamiczero(self: popart_core.AiGraphcoreOpset1, args: List[str], axes: List[int], sizes: List[int], debugContext: popart_internal_ir.DebugContext = '') str
Add a dynamic zero operation to the model.
Creates a copy of a tensor,
tensor
, with a slice tensor at offset position,offset
set to zero. For example: ```out = tensor out[offset] = 0.0
- Parameters
args – A vector of input tensor ids: [tensor, offset].
axes – The axes along which to zero elements.
sizes – The size of the slice along each axis.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- expm1(self: popart_core.AiGraphcoreOpset1, args: List[str], debugContext: popart_internal_ir.DebugContext = '') str
Add a
expm1
operation to the model.This calculates the element-wise exponential of the input tensor and subtracts one: :math:` exp(x) - 1 `.
- Parameters
args – A vector of input tensor ids.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- fmod(self: popart_core.AiGraphcoreOpset1, args: List[str], debugContext: popart_internal_ir.DebugContext = '') str
Add an
fmod
operation to the model.This is equivalent to the C
fmod
function. The result has the same sign as the dividend.- Parameters
args – A vector of input tensor ids.
debugContext – Optional debug information.
- Returns
Computes the element-wise remainder of division. The remainder has the same sign as the dividend.
- gelu(self: popart_core.AiGraphcoreOpset1, args: List[str], debugContext: popart_internal_ir.DebugContext = '') str
Add a GELU operation to the model.
This is a Poplar extension.
- Parameters
args – A vector of input tensor ids.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- groupnormalization(self: popart_core.AiGraphcoreOpset1, args: List[str], num_groups: int, epsilon: float = 9.999999747378752e-06, debugContext: popart_internal_ir.DebugContext = '') List[str]
Add a group normalization operation to the model.
This is a Poplar extension.
The group will be created from a strided input.
- Parameters
args – A vector of input tensor ids for input data
x
, scalescale
, and biasbias
as [x
,scale
,bias
].num_groups – The number of groups to separate the channels into.
epsilon – The epsilon value to use to avoid division by zero.
debugContext – Optional debug information.
- Returns
A vector of output tensor ids for output data
y
, the meanmean
and the variance
var
as [y
,mean
,var
].
- identityloss(self: popart_core.AiGraphcoreOpset1, args: List[str], reduction: popart_core.ReductionType = <ReductionType.Mean: 1>, debugContext: popart_internal_ir.DebugContext = '') str
Add an identity loss operation to the model.
Calculates the loss using the identity operator.
- Parameters
args – A vector of input tensor ids.
reduction – The type of reduction to perform on the individual losses.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- incrementmod(self: popart_core.AiGraphcoreOpset1, args: List[str] = [], increment: float, modulus: float, debugContext: popart_internal_ir.DebugContext = '') str
Add an incrementmod operation to the model.
The operation is of the form
y = (x + increment) % modulus
.- Parameters
args – A vector with a single input tensor id.
increment – A scalar increment
modulus – A scalar modulus
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- init(*args, **kwargs)
Overloaded function.
init(self: popart_core.AiGraphcoreOpset1, shape: List[int], data_type: int, init_type: int, batch_axis: int, debugContext: popart_internal_ir.DebugContext = ‘’) -> str
Add an init operation to the model.
- Parameters
shape – The shape of the tensor to initialise.
data_type – The data type to initialise tensor with. The value is the integer attribute taken from the DataType enum.
init_type – The mode of the tensor initialisation. The value is the integer attribute taken from the InitType enum.
batch_axis – Batch axis specifies the axis that the batches are split along and is a literal integer.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
init(self: popart_core.AiGraphcoreOpset1, shape: List[int], data_type: int, init_type: int, debugContext: popart_internal_ir.DebugContext = ‘’) -> str
Add an init operation to the model.
- Parameters
shape – The shape of the tensor to initialise.
data_type – The data type to initialise tensor with. The value is the integer attribute taken from the DataType enum.
init_type – The mode of the tensor initialisation. The value is the integer attribute taken from the InitType enum.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- l1loss(self: popart_core.AiGraphcoreOpset1, args: List[str], lambda: float, reduction: popart_core.ReductionType = <ReductionType.Mean: 1>, debugContext: popart_internal_ir.DebugContext = '') str
Add an
l1
loss operation to the model.Calculates the mean absolute error between each element in the input with a zero target.
- Parameters
args – A vector of input tensor ids.
lambda – The scale factor of the L1 loss.
reduction – The type of reduction to perform on the individual losses.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- log1p(self: popart_core.AiGraphcoreOpset1, args: List[str], debugContext: popart_internal_ir.DebugContext = '') str
Add a
log1p
operation to the model.This calculates the element-wise logarithm of the input tensor plus one: :math:` log(x + 1) `.
- Parameters
args – A vector of input tensor ids.
name – Optional identifier for operation.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- lstm(self: popart_core.AiGraphcoreOpset1, args: List[str], outputFullSequence: int = 1, debugContext: popart_internal_ir.DebugContext = '') List[str]
- multiconv(self: popart_core.AiGraphcoreOpset1, args: List[List[str]], dilations: List[List[int]] = [], inDilations: List[List[int]] = [], pads: List[List[int]] = [], outPads: List[List[int]] = [], strides: List[List[int]] = [], availableMemoryProportions: List[float] = [], partialsTypes: List[str] = [], planType: Optional[str] = None, perConvReservedTiles: Optional[int] = None, cycleBackOff: Optional[float] = None, enableConvDithering: List[int] = [], debugContext: popart_internal_ir.DebugContext = '') List[str]
Add a multi-convolution operation to the model.
Using this multi-convolution API ensures that the convolutions are executed in parallel on the device.
Functionally, a multi-convolution is equivalent to a series of single convolutions. Using this multi-convolution API is always equivalent to calling the single-convolution API (conv) once for each argument.
A0 = conv({X0, W0, B0}) A1 = conv({X1, W1})
- ```
It is possible that any two convolutions cannot be executed in parallel due to topological constraints. For example, the following:
- ```
{B, D} = multiconv({{A, W0}, {C, W1}}).
- ``:code:`
Note that it is not possible to create such a cycle by adding a multi-convolution with this API.
Calls to multiconv() are mapped to poplar::poplin::multiconv::convolution().
All input vectors must be either empty, or equal in length to the number of convolutions. Note that groups for each convolution are automatically inferred from the shapes of the data and weight inputs.
- Args:
- tensors: List of tensor ids for input tensors for data, weights and
biases as [
data:code:
,weight:code:
,`bias:code:] for each convolution. `bias:code:
is optional.
dilations: The dilations attributes for each convolution. inDilations: The input dilations attributes for each convolution. pads: The pads for each convolution. outPads: The output padding for each convolution. strides: The strides for each convolution. availableMemoryProportions: The available memory proportions per
convolution, each [0, 1).
partialsTypes: The partials type per convolution. planType: Run convolutions in parallel or series. perConvReservedTiles: The number of tiles to reserve per convolution
when planning.
cycleBackOff: Cycle back-off proportion, [0, 1). enableConvDithering: Enable convolution dithering per convolution. If
true:code:
, then convolutions with different parameters will be laid out from different tiles in an effort to improve tile balance in models.debugContext: Optional debug information.
- Returns
A vector of tensor ids of the output tensor from each convolution.
See also
Optimising Temporary Memory Usage for Convolutions and Matmuls on the IPU for some practical examples of using
availableMemoryProportion
.
- nllloss(self: popart_core.AiGraphcoreOpset1, args: List[str], reduction: popart_core.ReductionType = <ReductionType.Mean: 1>, ignoreIndex: Optional[int] = None, inputIsLogProbability: bool = False, debugContext: popart_internal_ir.DebugContext = '') str
Add a negative log-likelihood loss operation to the model.
Calculates the negative log likelihood (NLL) loss given a probability tensor over classes, and a target tensor containing class labels.
- Parameters
args – A vector of input tensor ids: probability and tensor.
reduction – The type of reduction to perform on the individual losses.
ignoreIndex – Optional class index to ignore in loss calculation.
inputIsLogProbability – If
true
the input tensor contains log-probabilities, otherwise raw probabilities. Default =false
.debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- nop(self: popart_core.AiGraphcoreOpset1, args: List[str], debugContext: popart_internal_ir.DebugContext = '') str
Add a no-op operation to the model.
- Parameters
args – A vector of input tensor ids.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- packedDataBlock(self: popart_core.AiGraphcoreOpset1, args: List[str], maxSequenceLengths: List[int], resultSize: int, callbackBatchSize: int, callback: popart::Builder, debugContext: popart_internal_ir.DebugContext = '') str
Add a call operation to the model.
This is a Poplar extension, to expose manual code re-use to the builder.
- Parameters
args – A vector of input tensor ids.
callee – The subgraph to call into.
debugContext – Optional debug information.
- Returns
A vector of tensors; the subgraph outputs.
- printtensor(self: popart_core.AiGraphcoreOpset1, args: List[str], print_gradient: int = 1, debugContext: popart_internal_ir.DebugContext = '', title: str = '') str
Add a print tensor operation to the model.
This is a Poplar extension.
- Parameters
args – A vector of tensor ids to print.
print_gradient – Indicates whether the gradient tensor(s) associated with the input tensor(s) are also printed. If 1, the gradient tensor(s) are also printed, otherwise the gradient tensor(s) are not printed.
debugContext – Optional debug information.
title – An optional title to print.
- Returns
The tensor id of the result tensor.
- reducemedian(self: popart_core.AiGraphcoreOpset1, args: List[str], axes: Optional[List[int]] = None, keepdims: int = 1, debugContext: popart_internal_ir.DebugContext = '') List[str]
Add reducemedian operation to the model.
This method computes the median values along the specified axes. In the case of an even number of elements, the lower of the two medians is selected. By default, the input tensor is reduced over all axes. Additionally, the operation also returns the indices of found median values in the reduction axis. If reduction is performed over multiple axes, the indices are “flattened” over the reduced axes, similar to
numpy.ndarray.flat
. The index may not be the first occurrence of the median value found in the input tensor.- Parameters
args – A vector with a single input tensor id.
axes – The axes over which the reduction is performed.
keepdims – If 1, the result tensors are of equal size as the input, but with reduction axes of size 1. Otherwise, the reduction axes are squeezed and the result tensors have fewer dimensions compared to the input. Default = 1.
debugContext – Optional debug information.
- Returns
The names of the two result tensors, one for median values and one for indices.
- remainder(self: popart_core.AiGraphcoreOpset1, args: List[str], debugContext: popart_internal_ir.DebugContext = '') str
Add a remainder operation to the model.
This is equivalent to Python’s modulo operator
%
. The result has the same sign as the divisor.- Parameters
args – A vector of input tensor ids.
debugContext – Optional debug information.
- Returns
Computes the element-wise remainder of division. The remainder has the same sign as the divisor.
- replicatedallreduce(self: popart_core.AiGraphcoreOpset1, args: List[str], collectiveOperator: Optional[popart::CollectiveOperator] = None, commGroup: Optional[popart_internal_ir.CommGroup] = None, debugContext: popart_internal_ir.DebugContext = '') str
DEPRECATED: Add a replicated allreduce operation to the model.
This is a Poplar extension, to expose manual code re-use to the builder.
- Parameters
args – A vector of input tensor ids to reduce across.
commGroup – GCL CommGroup parameter.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- replicatedreducescatter(self: popart_core.AiGraphcoreOpset1, args: List[str], collectiveOperator: Optional[popart::CollectiveOperator] = None, commGroup: Optional[popart_internal_ir.CommGroup] = None, debugContext: popart_internal_ir.DebugContext = '') str
Add a replicated reduce-scatter operation to the model.
This is a Poplar extension, to expose manual code re-use to the builder.
- Parameters
args – A vector of input tensor ids to reduce across.
collectiveOperator – A Graphcore Communication Library (GCL) collective operator.
commGroup – A GCL CommGroup parameter.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- reshape(self: popart_core.AiGraphcoreOpset1, args: str, shape: List[int], debugContext: popart_internal_ir.DebugContext = '') str
Add a reshape operation to the model.
This reshapes an input tensor. This reshape takes the target shape as an attribute instead of a tensor input as for the ONNX reshape op.
- Parameters
arg – The tensor id of the input tensor.
shape – The shape of the output tensor. The output tensor must contain the same number of elements as the input tensor.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- reverse(self: popart_core.AiGraphcoreOpset1, args: List[str], dimensions: List[int], debugContext: popart_internal_ir.DebugContext = '') str
Add a reverse operator to the model.
This reverses or flips the tensor along the specified dimensions.
- Parameters
args – A vector of input tensor ids.
dimensions – The dimensions along which to reverse the tensor. If this is empty then this is equivalent to the identity operator.
debugContext – Optional debug information.
- Returns
The tensor id of the reversed tensor.
- round(self: popart_core.AiGraphcoreOpset1, args: List[str], debugContext: popart_internal_ir.DebugContext = '') str
Add a rounding operation to the model.
This allows
Round_11
to be targeted from earlier opsets.See also
- Parameters
args – A vector of input tensor ids.
debugContext – Optional debug information.
- Returns
The normalized output tensor ids.
- scale(self: popart_core.AiGraphcoreOpset1, args: List[str], scale: float, debugContext: popart_internal_ir.DebugContext = '') str
Add a scale operation to the model.
This is a Poplar extension.
- Parameters
args – A vector of input tensor ids.
scale – The scale to apply.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- scaledadd(self: popart_core.AiGraphcoreOpset1, args: List[str], scale0: float = 1.0, scale1: float = 1.0, debugContext: popart_internal_ir.DebugContext = '') str
Add a scaled add operation to the model.
The scaled add operation takes the form: ```
X = scale0 * T0 + scale1 * T1
` where ``scale0
is the scale factor to be applied to tensor T0 andscale1
is the scale factor to be applied to tensor T1.- Parameters
args – A vector of input tensor ids: [T0, T1, scale0, scale1].
scale0 – The scale to apply (if no
scale0
tensor is supplied).scale1 – The scale to apply (if no
scale1
tensor is supplied).debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- scatterreduce(self: popart_core.AiGraphcoreOpset1, args: List[str] = [], axis_size: int, axis: int = -1, reduction: popart_core.ScatterReduction = <ScatterReduction.Sum: 0>, debugContext: popart_internal_ir.DebugContext = '') str
- sequenceslice(self: popart_core.AiGraphcoreOpset1, args: List[str], zeroUnused: int = 0, debugContext: popart_internal_ir.DebugContext = '') str
Slice a 2D tensor based on offsets.
- The outermost dimension is sliced. For the following:
source
is the source tensor.destination
is the destination tensor.N
is the number of elements to copy.sourceOffset
is the first element read from the source tensor.destinationOffset
is the first element written to in the destinationtensor.
Then, for each entry in
N
,sourceOffset
anddestinationOffset
:- ```
destination[destinationOffset:destinationOffset+N][…] = source[sourceOffset:sourceOffset+N][…] ``:code:` Entries after the first
N==0:code:
may be ignored. Unreferenced elements ofdestination:code:
are zeroed ifzeroUnused:code:
is set. The same output element should not be written by multiple inputs.source:code:
anddestination:code:
must have rank greater than or equal to 2. The outer dimension is sliced; the product of the inner dimensions must match.sourceOffset:code:
,destinationOffset:code:
andN
must be 1-dimensional and of the same size. For example:
- Args:
- args: A vector of input tensor ids for the following tensors
[
source:code:
,destination:code:
,N:code:
,sourceOffset:code:
,destinationOffset:code:
].- zeroUnused: Determines whether to zero unreferenced
destination
elements. If 1, the unreferenced elements are zeroed, otherwise they are not zeroed.
debugContext: Optional debug information.
- shapeddropout(self: popart_core.AiGraphcoreOpset1, args: List[str], shape: List[int], ratio: float = 0.5, debugContext: popart_internal_ir.DebugContext = '') str
Add a shaped dropout operation to the model.
Applies a shaped dropout to the input tensor. This operator requires a shape parameter that is used to define the shape of the dropout mask so that strongly correlated features in the input tensor can be preserved. The provided shape must be broadcastable to the input tensor. Note that this operation targets the
poprand
library function of the same name.- Parameters
args – A vector of input tensor ids.
shape – The shape of dropout mask. This must be broadcastable to the input.
ratio – The probability of dropping an input feature. Default = 0.5.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- slice(self: popart_core.AiGraphcoreOpset1, args: List[str], ends: List[int], starts: List[int], axes: List[int], debugContext: popart_internal_ir.DebugContext = '') str
Add a slice to the model.
This version of slice uses the
starts
,ends
andaxes
attributes rather than tensor inputs. This reduces the number of ops as constant tensors are treated as ops while attributes are not.- Parameters
args – A vector of input tensor ids.
ends – The
ends
attribute.starts – The
starts
attribute.axes – The
axes
attribute.debugContext – Optional debug information.
- Returns
The normalized output tensor id.
- subsample(self: popart_core.AiGraphcoreOpset1, args: List[str], strides: List[int], debugContext: popart_internal_ir.DebugContext = '') str
Add a sub-sample operation to the model.
This is a Poplar extension.
If multiple tensors are provided, the strides will be applied to them all.
- Parameters
args – A vector of tensor ids to sub-sample.
strides – The strides to use.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- swish(self: popart_core.AiGraphcoreOpset1, args: List[str] = [], debugContext: popart_internal_ir.DebugContext = '') str
Add a swish operation to the model.
The operation computes the swish activation function, also known as the SiLU activation.
- Parameters
args – A vector with a single input tensor id.
debugContext – Optional debug information.
- Returns
The tensor id of the result tensor.
- tensorremap(self: popart_core.AiGraphcoreOpset1, args: List[str], remap_type: int = 0, debugPrefix: popart_internal_ir.DebugContext = '') str
2.6. Data flow
- class popart.AnchorReturnTypeId
Members:
Final : Only return the tensor value for the last micro batch of the Session::run call for each replica. The buffer shape required for this anchor in IStepIO is [
replicationFactor
,<anchorTensorShape>
] (with dimensions of size 1 removed).EveryN : Return the tensor value for every N-th global batch for each replica and for all accumulation steps in that global batch. Note that the value of N is captured by AnchorReturnType. The buffer shape required for this anchor in IStepIO is [
batchesPerStep / N
,accumulationFactor
,replicationFactor
,<anchorTensorShape>
] (with dimensions of size 1 removed).All : Return the tensor value for all micro batches for each replica. The buffer shape required for this anchor in IStepIO is [
batchesPerStep
,accumulationFactor
,replicationFactor
,<anchorTensorShape>
] (with dimensions of size 1 removed).Sum : Return one tensor value for each replica, doing a sum reduction over the
batchesPerStep
andaccumulationFactor
dimensions. The buffer shape required for this anchor in IStepIO is [replicationFactor
,<anchorTensorShape>
] (with dimensions of size 1 removed).- property name
- class popart.ExchangeStrategy
Enum type to specify an exchange strategy
JustInTime: .- outer loop ————-. |.- inner loop -----------.| || load - compute - store || |'------------------------'| ‘————————–’
OverlapInnerLoop: - Boxes denote subgraphs / subgraph Ops / loops - Inputs/outputs are loop carried in order
.- outer loop —————————————-. | .- inner loop -. | | load - compute - | - store | | | load - | - compute – | - store | | | load —– | - compute - store | | ‘————–’ | ‘—————————————————–’
^^^^^^^ ^^^^^^^ ^^^^^^^ overlap overlap overlap
OverlapLoops - Boxes denote subgraphs / subgraph Ops / loops - Numbers on boxes are matching subgraph/loop inputs and outputs - Overlap indicators indicate compute & load/store pairs overlapping in time
- load
- compute load load < overlap
- | |
1 2 |
.– inner loop –. | | | | | | | store compute | | < overlap | load | | | < overlap | | | | | ‘—————-‘ |
2 1 load compute < overlap | | | | 1 2 3 4
.- outer loop ———————————–. | | | | | | | compute store | store | < overlap | / | | 1 2 | | .– inner loop –. | | | | | | | | | store compute | | < overlap | | load | | | < overlap | | | | | | | ‘—————-‘ | | 2 1 | | | | | | load compute | load | < overlap | | | | | | ‘————————————————’
3 4 2 1 | | | |
- compute | store | < overlap
- /1 2.– inner loop –.| | | || store compute | < overlap| load | | < overlap| | | |‘—————-‘2 1| |
- store compute store < overlap
store
OverlapStep: Not supported yet
Members:
JustInTime : Copy tensor when required
OverlapInnerLoop : Preload values in previous inner loop iteration for the next iteration
OverlapLoops : Preload values in the previous loop iteration for the next iteration (implies OverlapInnerLoop)
OverlapStep : Preload values in the previous host training step for next step (implies OverlapLoops) - not supported yet
- property name
- class popart.AnchorReturnType
- exchangeStrategy(self: popart_core.AnchorReturnType) popart_core.ExchangeStrategy
- rp(self: popart_core.AnchorReturnType) int
- tileSet(self: popart_core.AnchorReturnType) popart_core.TileSet
- class popart.DataFlow
- anchors(self: popart_core.DataFlow) List[str]
- art(self: popart_core.DataFlow, arg0: str) popart_core.AnchorReturnType
- batchesPerStep(self: popart_core.DataFlow) int
- isAnchored(self: popart_core.DataFlow, arg0: str) bool
- nAnchors(self: popart_core.DataFlow) int
- setBatchesPerStep(self: popart_core.DataFlow, arg0: int) None
- class popart.InputSettings
- exchangeStrategy(self: popart_core.InputSettings) popart_core.ExchangeStrategy
- replicatedStreamMode(self: popart_core.InputSettings) popart_core.ReplicatedStreamMode
- tileSet(self: popart_core.InputSettings) popart_core.TileSet
2.7. Device manager
- class popart.DeviceType
Members:
IpuModel : Use the Poplar IPU Model for graph compilation and execution. The IPU Model will simulate the behaviour of the IPU hardware. It will not completely implement every aspect of a real IPU. (Default).
Cpu : Use CPU for graph compilation and execution.
Ipu : Use IPU for graph execution.
OfflineIpu : Compile graph for later execution. This can be done even if IPUs are not present. Offline graph compilation is also useful for verifying memory constraints.
Sim : [For Graphcore intenal use only] Use a simulator for graph compilation and execution.
- property name
- class popart.DeviceConnectionType
Members:
Always : Attach to the IPU from the start (Default).
OnDemand : Wait until the compilation is complete and the executable is ready to be run before attaching to the IPU.
Never : Never try to attach to an IPU. This is useful for offline compilation (
DeviceType::OfflineIpu
. Trying to run an executable will throw an error.- property name
- class popart.SyncPattern
Controls synchronisation in multi-IPU systems.
Members:
Full : Require all IPUs to synchronise on every communication between IPUs or between IPUs and host (Default).
SinglePipeline : Allow IPUs to synchronise with the host independently, without having to synchronise with each other. This permits any one IPU to perform host IO while other IPUs are processing data.
ReplicaAndLadder : Allow an IPU group to communicate with the host without requiring synchronisation between groups. This permits multiple IPU groups to alternate between performing host IO and computation.
- property name
- class popart.DeviceInfo
- attach(self: popart_core.DeviceInfo) bool
- detach(self: popart_core.DeviceInfo) None
- tryAttachUntilTimeout(self: popart_core.DeviceInfo) bool
- class popart.DeviceManager
- acquireAvailableDevice(self: popart_core.DeviceManager, numIpus: int = 1, tilesPerIpu: int = 0, pattern: popart_core.SyncPattern = <SyncPattern.Full: 0>, connectionType: popart_core.DeviceConnectionType = <DeviceConnectionType.Always: 0>, selectionCriterion: popart_core.DeviceSelectionCriterion = <DeviceSelectionCriterion.First: 0>) popart::DeviceInfo
Finds an available hardware device, with a certain number of IPUs. This method will attach to the device if
connectionType
is equal to DeviceConnectionType::Always. Throws an error if there are less thannumIpus
IPUs available.- Parameters
numIpus – The number of IPUs on the device [=1].
tilesPerIPU – The number of tiles per IPU. An input of 0 will match any number. (Default: 0).
pattern – The setting for when to synchronise in a multi-IPU system. (Default: SyncPattern::Full).
connectionType – The connection type, for deciding when to attach to the device.
selectionCriterion – How to select a device from the list of valid selections.
- Returns
A device, which can be used with a session.
- acquireDeviceById(self: popart_core.DeviceManager, id: int, pattern: popart_core.SyncPattern = <SyncPattern.Full: 0>, connectionType: popart_core.DeviceConnectionType = <DeviceConnectionType.Always: 0>) popart::DeviceInfo
- Allocates the hardware device by ID. This ID can be found running :code:`gc-info
-l`. This method will attach to the device if
connectionType
is equal
to DeviceConnectionType::Always.
- Parameters
id – The ID of the IPU to be used.
pattern – The setting for when to synchronise in a multi-IPU system. (Default: SyncPattern::Full).
connectionType – The connection type, for deciding when to attach to the device. (Default: DeviceConnectionType::Always).
- Returns
A device, which can be used with a session.
- createCpuDevice(self: popart_core.DeviceManager) popart::DeviceInfo
- createIpuModelDevice(self: popart_core.DeviceManager, arg0: dict) popart::DeviceInfo
- createOfflineIPUDevice(self: popart_core.DeviceManager, opts: dict) popart::DeviceInfo
- createOfflineIpuFromDeviceInfo(self: popart_core.DeviceManager, arg0: popart::DeviceInfo) popart::DeviceInfo
- createSimDevice(self: popart_core.DeviceManager, arg0: dict) popart::DeviceInfo
- enumerateDevices(self: popart_core.DeviceManager, pattern: popart_core.SyncPattern = <SyncPattern.Full: 0>, numIpus: int = 1, deviceType: popart_core.DeviceType = <DeviceType.Ipu: 2>, connectionType: popart_core.DeviceConnectionType = <DeviceConnectionType.Always: 0>, tilesPerIPU: int = 0) List[popart::DeviceInfo]
Get the list of all devices with the required criteria.
- Parameters
pattern – The setting for when to synchronise in a multi-IPU system. (Default: SyncPattern::Full).
numIpus – The number of IPUs required. (Default: 1).
deviceType – The type of the device required. (Default: DeviceType::Ipu).
connectionType – The setting for when to connect to the device. (Default: DeviceConnectionType::Always).
tilesPerIPU – The number of tiles per IPU required. (Default: 0).
- Returns
The list of devices with the required criteria.
- setOnDemandAttachTimeout(self: popart_core.DeviceManager, attachTimeout: int) None
If unable to attach to a device on first try, the attach timeout set here is the length of time (in seconds) that the DeviceManager will wait to try and attach. Note: this only takes effect when trying to attach with a DeviceConnectionType::OnDemand DeviceConnectionType.
- Parameters
seconds – The attach timeout in seconds.
- tryAcquireAvailableDevice(self: popart_core.DeviceManager, numIpus: int = 1, tilesPerIpu: int = 0, pattern: popart_core.SyncPattern = <SyncPattern.Full: 0>, connectionType: popart_core.DeviceConnectionType = <DeviceConnectionType.Always: 0>, selectionCriterion: popart_core.DeviceSelectionCriterion = <DeviceSelectionCriterion.First: 0>) popart::DeviceInfo
Finds an available hardware device, with the specified number of IPUs. This method will attach to the device if
connectionType
is equal to DeviceConnectionType::Always. This method is suitable when polling for an available device when resources are constrained.- Parameters
numIpus – The number of IPUs on the device (Default: 1).
tilesPerIPU – The number of tiles per IPU. An input of 0 will match any number. (Default: 0).
pattern – The setting for when to synchronise in a multi-IPU system. (Default: SyncPattern::Full).
connectionType – The setting for when to connect to the device. (Default: DeviceConnectionType::Always).
selectionCriterion – The method for selecting a device from the list of valid selections. (Default: DeviceSelectionCriterion::First).
- Returns
A device, which can be used with a session. If no device is acquired, a nullptr is returned.
- tryAcquireDeviceById(self: popart_core.DeviceManager, id: int, pattern: popart_core.SyncPattern = <SyncPattern.Full: 0>, connectionType: popart_core.DeviceConnectionType = <DeviceConnectionType.Always: 0>) popart::DeviceInfo
- Allocates the hardware device by ID. This ID can be found running :code:`gc-info
-l`. This method will try to attach to the device if
connectionType
is equal to DeviceConnectionType::Always. This method is suitable when
polling for an available device when resources are constrained.
- Parameters
id – The ID of the IPU to be used.
pattern – The setting for when to synchronise in a multi-IPU system. (Default: SyncPattern::Full).
connectionType – The connection type, for deciding when to attach to the device. (Default: DeviceConnectionType::Always).
- Returns
A device, which can be used with a session. If no device is
acquired, a nullptr is returned.
2.8. Ops
2.8.1. Op definition for PopART IR
- class popart.RecomputeType
Define the type of recomputation.
Members:
Undefined : Default value if RecomputeType has not been set.
Checkpoint : Do not recompute. Outputs from the op are kept from the forward pass.
Recompute : Recompute operation.
Recomputed : For explicit recomputation, this marks a cloned operation that had RecomputeType::Recompute set. After cloning, the original op is changed to RecomputeType::Checkpoint, and the cloned op is changed to Recomputed.
- property name
- class popart.OperatorIdentifier
- class popart.OpDefinition
2.9. Patterns
- class popart.Patterns
Bases:
pybind11_object
- enablePattern(self: popart_core.Patterns, arg0: str, arg1: bool) popart_core.Patterns
- enableRuntimeAsserts(self: popart_core.Patterns, arg0: bool) popart_core.Patterns
- isPatternEnabled(self: popart_core.Patterns, arg0: str) bool
2.10. Utility classes
2.10.1. Writer
Framework independent functionality for driving PopART.
- class popart.writer.NetWriter(inNames, outNames, optimizer, dataFlow, inputShapeInfo)
Base class, to be inherited once per framework.
- Parameters
inNames – A list (in order) of all the inputs to the ONNX Model.
outNames – names of the outputs of the ONNX Model.
optimizer – An optimizer (ConstSGD, SGD, etc) or
None
if in inference mode.anchors – Only relevant if in training mode: the names of tensors which must be computed and returned. If not in training mode, then outputs of forward are the (only) tensors to return.
dataFlow – Configuration for the data feeds and fetches.
inputShapeInfo – For every loss stream input and standard input: the shape, ONNX DataType and how to get data.
- infer(inputsMap)
Perform
batchesPerStep
inference steps.This function only needs to be implemented by frameworks which will be used to verify PopART. See
torchwriter.py
for an example implementation.
- saveModel(filename)
Save the model.
To be implemented once per framework: framework specific details of generating the ONNX model and writing it to file
- train(inputsMap)
Perform
batchesPerStep
training steps.This function only needs to be implemented by frameworks which will be used to verify PopART. See
torchwriter.py
for an example implementation.
2.10.2. Error handling
- class popart.OutOfMemoryException(e)
Represent out of memory exceptions that that occur during runtime.
- Parameters
e (popart_exception) –
- Return type
None
- getProfilePath()
Get the absolute path of the profile file.
The profile file is named
profile.pop
and contains full details of the exception.- Returns
- The absolute path of
profile.pop
, or an empty string if the file does not exist.
- The absolute path of
- Return type
2.10.3. Debug context
- class popart.DebugContext
- class popart.DebugInfo
- getId(self: popart_internal_ir.DebugInfo) int
- setValue(self: popart_internal_ir.DebugInfo, name: str, value: popart_internal_ir.ProfileValue) bool
2.10.4. Input shape information
- class popart.InputShapeInfo
- add(self: popart_core.InputShapeInfo, arg0: str, arg1: popart_internal_ir.TensorInfo) None
- get(self: popart_core.InputShapeInfo, arg0: str) popart_internal_ir.TensorInfo
- has(self: popart_core.InputShapeInfo, arg0: str) bool
2.10.5. Type definitions
2.10.6. Enums
- class popart.CommGroupType
PopART equivalent of GCL CommGroupType. Each of these enumeration constants have a corresponding GCL CommGroupType value.
Members:
All : All replicas viewed as one group, replica group size is ignored. */
Consecutive : Groups are consecutive in replica.
If there are N replicas denoted {0, … N-1} and group size is k, then there are N/k groups of size k:
{0, 1, … k-1}, {k, … 2k-1} … {N-k-1, … N-1}
Orthogonal : Groups are sliced orthogonal to the replica ordering.
If there are N replicas denoted {0, … N-1} and group size is k, then there are m = N/k groups of size k:
{0, m, 2m, …}, {1, m+1, 2m+1, …} … {m-1, 2m-1, … N-1}
Ungrouped : Each replica is in it’s own group, replica group size is ignored. */
- property name