2. PopART Python API¶

2.1. Sessions¶

class popart.session.InferenceSession(fnModel, dataFlow, deviceInfo, inputShapeInfo=<popart_core.InputShapeInfo object>, patterns=None, userOptions=<popart_core.SessionOptions object>)¶

Bases: popart_core._InferenceSessionCore

Create a runtime class for executing an ONNX graph on a set of IPU hardware for inference.

A wrapper around the Session C++ class, renamed SessionCore in pybind, to enable more Pythonic use. See session.hpp for parameter descriptions.

Parameters

fnModel – ONNX model proto. Usually a loaded ONNX model, or from builder.getModelProto().
dataFlow – Configuration for the data feeds and fetches.
deviceInfo – DeviceInfo object specifying device type. (one of IPU, IPUModel or CPU) and count.
inputShapeInfo – Information about the shapes of input and output tensors. Default: popart.InputShapeInfo().
patterns – Patterns to be run for optimization etc. Note: default for patterns must not be popart.Patterns(). When import popart is run, the default arguments are created. If the user then loads a custom pattern using ctypes.cdll.LoadLibrary(custom_pattern_lib.so) then the already constructed popart.Patterns will not include the custom pattern. Default None.
userOptions – Session options to apply. Default: popart.SessionOptions().

compileAndExport(filename)¶

Compiles the graph and exports it to the specified file.

This will form the poplar::Graph and compile the polar::Executable before exporting the executable and metadata.

Parameters

filename – Where to save the executable and metadata. If
does not exist (it) –
will be created. (it) –

Raises

popart.OutOfMemoryException – If an out of memory event occurs
OSError – Thrown in the event of any file system related errors during the export

Return type

None

initAnchorArrays()¶

Create the anchor arrays to feed data back into Python with.

Returns: Dict of anchor names and their relevant np arrays.
Return type: Dict[str, numpy.array]

prepareDevice()¶

Prepare the network for execution.

This will create the poplar::Graph and poplar::Engine, and set up poplar::Streams.

Raises: popart.OutOfMemoryException – If an out of memory event occurs
Return type: None

exception popart.session.OutOfMemoryException(e)¶

Bases: popart_core.popart_exception

Parameters: e (popart_core.popart_exception) –
Return type: None

getGraphReport()¶

Get the graph report

Returns: The graph report string.
Return type: str

getSummaryReport()¶

Get the summary report

Returns: The summary report string.
Return type: str

class popart.session.TrainingSession(fnModel, dataFlow, loss, optimizer, deviceInfo, inputShapeInfo=<popart_core.InputShapeInfo object>, patterns=None, userOptions=<popart_core.SessionOptions object>)¶

Bases: popart_core._TrainingSessionCore

Create a runtime class for executing an ONNX graph on a set of IPU hardware for training

A wrapper around the Session C++ class, renamed SessionCore in pybind, to enable more Pythonic use. See session.hpp for parameter descriptions.

Parameters

fnModel – ONNX model proto. Usually a loaded ONNX model, or from builder.getModelProto().
dataFlow – Configuration for the data feeds and fetches.
loss – A TensorId of the final scalar loss to use when training.
optimizer – The type of optimizer to use when training and it’s properties.
deviceInfo – DeviceInfo object specifying device type (IPU, IPUModel, CPU) and count.
inputShapeInfo – Information about the shapes of input and output tensors. Default: popart.InputShapeInfo().
patterns – Optimization patterns to apply. Default: None.
userOptions – Session options to apply. Default: popart.SessionOptions().

compileAndExport(filename)¶

Compiles the graph and exports it to the specified file.

This will form the poplar::Graph and compile the polar::Executable before exporting the executable and metadata.

Parameters

filename – Where to save the executable and metadata. If
does not exist (it) –
will be created. (it) –

Raises

popart.OutOfMemoryException – If an out of memory event occurs
OSError – Thrown in the event of any file system related errors during the export

Return type

None

initAnchorArrays()¶

Create the anchor arrays to feed data back into Python with.

Returns: Dict of anchor names and their relevant np arrays.
Return type: Dict[str, numpy.array]

prepareDevice()¶

Prepare the network for execution.

This will create the poplar::Graph and poplar::Engine, and set up poplar::Streams.

Raises: popart.OutOfMemoryException – If an out of memory event occurs
Return type: None

popart.session.makedirsAndCheckWritable(path)¶

2.2. Builder¶

class popart.builder.AiGraphcore(builder, version)¶

Bases: popart.builder.Opset

Return the builder interface for the given ai.graphcore version.

Raises: ValueError – Thrown if an invalid ai.graphcore opset version provided.

call(args, num_outputs, callee, debugName='')¶

Add a call operation to the model

This is a poplar extension, to expose manual code re-use to the builder

Parameters

args (List[int]) – List of tensor ids to feed as arguments.
num_outputs (int) – Number of output tensors from the called graph.
callee (popart.builder.Builder) – SubgraphBuilder for the graph to be called.
debugName (str) –

Keyword Arguments

debugName – A string to prepend to the name of the tensor. Default: “”.

Returns

Output tensor ids.

Return type

List[str]

class popart.builder.AiGraphcoreOpset1(builder, version)¶

Bases: popart.builder.AiGraphcore

Sub-class for backwards compatibility. Will forward all calls to AiGraphcore class.

class popart.builder.AiOnnx(builder, version)¶

Bases: popart.builder.Opset

Base class for the various AiOnnx builder interfaces. The most recent version of ONNX operators that require special treatment such as Loop, Scan, Logical_If etc. go here. While, older versions where the function signature differs are implemented on a corresponding subclass.

Parameters

builder – Parent class for access.
version – ai.Onnx opset version to use; 6 <= version <= 10. Default: 10.

logical_if(args, num_outputs, else_branch, then_branch, name='')¶

If conditional operation.

Parameters

args (List[str]) – List of tensor ids to feed as arguments.
num_outputs (int) – Number of output tensors from the if operator.
else_branch (popart.builder.Builder) – SubgraphBuilder for the graph to run if condition is false. Has num_outputs outputs: values you wish to live-out to the subgraph created by the if operation, other tensors will not be accessible to the wider graph. The number of outputs must match the number of outputs in the then_branch.
then_branch (popart.builder.Builder) – SubgraphBuilder for the graph to run if condition is true. Has num_outputs outputs: values you wish to be live-out to the enclosing scope. The number of outputs must match the number of outputs in the else_branch.
name (str) –

Keyword Arguments

name – A string to prepend to the name of the tensor. Default: “”.

Returns

Output tensor ids.

Return type

List[str]

loop(args, num_outputs, body, debugPrefix='')¶

Generic Looping construct op.

Parameters

args (List[str]) – List of tensor ids to feed as arguments.
num_outputs (int) – Number of output tensors from the loop operator.
body (popart.builder.Builder) – SubgraphBuilder for the graph to run in the loop.
debugPrefix (str) –

Keyword Arguments

debugPrefix – A string to prepend to the name of the tensor. Default: “”.

Returns

Output tensor ids.

Return type

List[str]

class popart.builder.AiOnnx10(builder, version)¶

Bases: popart.builder.AiOnnx9

Minimal builder interface for ai.onnx version 10. Once ai.onnx version 11 becomes the standard opset, this class must be updated to inherit from AiOnnx11, as described in T12084

class popart.builder.AiOnnx11(builder, version)¶

Bases: popart.builder.AiOnnx10

Minimal builder interface for ai.onnx version 11.

class popart.builder.AiOnnx6(builder, version)¶

Bases: popart.builder.AiOnnx

Minimal builder interface for ai.onnx version 6.

class popart.builder.AiOnnx7(builder, version)¶

Bases: popart.builder.AiOnnx6

Minimal builder interface for ai.onnx version 7.

class popart.builder.AiOnnx8(builder, version)¶

Bases: popart.builder.AiOnnx7

Minimal builder interface for ai.onnx version 8.

scan(args, num_outputs, body, num_scan_inputs, directions=[], debugPrefix='')¶

Scan-8 specific construct op.

Parameters

args (List[str]) – List of tensor ids to feed as arguments.
num_outputs (int) – Number of output tensors from the scan operator.
body (popart.builder.Builder) – SubgraphBuilder for the graph to run in the scan.
num_scan_inputs (int) – The number of scan_inputs
directions (List[int]) – A list of int which specifies the direction
the scan_input. 0 indicates forward direction and 1 (of) –
reverse direction. If not omitted (indicates) –
tensors (scan_input) –
be scanned in the forward direction. (will) –
debugPrefix (str) –

Keyword Arguments

debugPrefix – A string to prepend to the name of the tensor. Default: “”.

Returns

Output tensor ids.

Return type

List[str]

class popart.builder.AiOnnx9(builder, version)¶

Bases: popart.builder.AiOnnx8

Minimal builder interface for ai.onnx version 9.

scan(args, num_outputs, body, num_scan_inputs, scan_input_axes=[], scan_input_directions=[], scan_output_axes=[], scan_output_directions=[], debugPrefix='')¶

Generic Scan construct op.

Parameters

args (List[str]) – List of tensor ids to feed as arguments.
num_outputs (int) – Number of output tensors from the scan operator.
body (popart.builder.Builder) – SubgraphBuilder for the graph to run in the scan.
num_scan_inputs (int) – The number of scan_inputs
scan_input_axes (List[int]) – A list that specifies the axis to be scanned for the scan_input. If omitted, 0 will be used as the scan axis for every scan_input.
scan_input_directions (List[int]) – A list that specifies the direction to be scanned for the scan_input tensor. 0 indicates forward direction and 1 indicates reverse direction. If omitted, all scan_input tensors will be scanned in the forward direction.
scan_output_axes (List[int]) – A list that specifies the axis for the scan_output. The scan outputs are accumulated along the specified axis. If omitted, 0 will be used as the scan axis for every scan_output.
scan_output_directions (List[int]) – A list specifies whether the scan_output should be constructed by appending or prepending a new value in each iteration: 0 indicates appending and 1 indicates prepending. If omitted, all scan_output tensors will be produced by appending a value in each iteration.
debugPrefix (str) –

Keyword Arguments

debugPrefix – A string to prepend to the name of the tensor. Default: “”.

Returns

Output tensor ids.

Return type

List[str]

class popart.builder.AiOnnxMl(builder, version)¶

Bases: popart.builder.Opset

Return the builder interface for the given ai.onnx.ml version.

Raises: ValueError – Thrown if an invalid ai.onnx.ml opset version provided.

class popart.builder.Builder(modelProtoOrFilename=None, opsets=None, builderCore=None)¶

Bases: object

A wrapper around the Builder C++ class, renamed BuilderCore in pybind, to enable more Pythonic use. See builder.hpp for the class definition.

Parameters

modelProtoOrFilename – Model protobuf string or file path of saved ONNX model proto. Default: None.
opsets – Dict of opset versions. Default: None.
builderCore – _BuilderCore object if you want to create a subgraph builder using an existing buildercore object. Default: None.

aiOnnxOpsetVersion(version)¶

Parameters: version (int) –
Return type: None

createSubgraphBuilder()¶

Create a child builder to add ops to a subgraph using a call operation.

Returns: The child builder.
Return type: popart.builder.Builder

reshape_const(aiOnnx, args, shape, debugPrefix='')¶

Const version of the reshape op.

Parameters

aiOnnx (popart.builder.Opset) – Versioned aiOnnx opset, for example: aiOnnxOpset11.
args (List[str]) – List of tensor ids to feed as arguments.
shape (Iterable[int]) – Shape to reshape to, for example [3, 2, 4].
debugPrefix (str) –

Keyword Arguments

debugPrefix – String to use as a debug prefix. Default: “”.

Returns

Output tensor ids.

Return type

List[int]

class popart.builder.Opset(builder, version)¶

Bases: object

Minimal base class for the opsets

Parameters

builder – An interface for a Builder, used for creating ONNX graphs.
version – Opset version to use for the given opset sub-class.

2.3. Tensor information¶

class popart.tensorinfo.TensorInfo(*args)¶

Bases: popart_core._TensorInfoCore

Python wrapper to TensorInfo to handle numpy types in constructor.

For example:: TensorInfo(dtype, shape) TensorInfo(numpy.ndarray)

Raises: TypeError – Raised if incorrect type is used to create a tensorinfo.

2.4. Writer¶

Framework independent functionality for driving PopART

class popart.writer.NetWriter(inNames, outNames, optimizer, dataFlow, inputShapeInfo)¶

Bases: object

Base class, to be inherited once per framework

Parameters

inNames – A list (in order) of all the inputs to the ONNX Model.
outNames – names of the outputs of the ONNX Model.
optimizer – An optimizer (ConstSGD, SGD, etc) or None if in inference mode.
anchors – Only relevant if in training mode: the names of tensors which must be computed and returned. If not in training mode, then outputs of forward are the (only) tensors to return.
dataFlow – Configuration for the data feeds and fetches.
inputShapeInfo – For every loss stream input and standard input: the shape, ONNX DataType and how to get data.

infer(inputsMap)¶: Perform batchesPerStep inference steps. This function only needs to be implemented by frameworks which will be used to verify PopART. See torchwriter.py for an example implementation.

saveModel(filename)¶: To be implemented once per framework: framework specific details of generating the ONNX model and writing it to file

train(inputsMap)¶: Perform batchesPerStep training steps. This function only needs to be implemented by frameworks which will be used to verify PopART. See torchwriter.py for an example implementation.

2.5. Builder¶

class popart_core._BuilderCore¶

addInitializedInputTensor(*args, **kwargs)¶

Overloaded function.

addInitializedInputTensor(self: popart_core._BuilderCore, initVal: array, debugPrefix: popart_core.DebugContext = ‘’) -> str
addInitializedInputTensor(self: popart_core._BuilderCore, initVal: array, debugContext: popart_core.DebugContext = ‘’) -> str

addInputTensor(*args, **kwargs)¶

Overloaded function.

addInputTensor(self: popart_core._BuilderCore, tensorInfo: popart_core._TensorInfoCore, debugPrefix: popart_core.DebugContext = ‘’) -> str
addInputTensor(self: popart_core._BuilderCore, tensorInfo: popart_core._TensorInfoCore, debugContext: popart_core.DebugContext = ‘’) -> str
addInputTensor(self: popart_core._BuilderCore, dataType: str, shape: List[int], debugPrefix: popart_core.DebugContext = ‘’) -> str
addInputTensor(self: popart_core._BuilderCore, dataType: str, shape: List[int], debugContext: popart_core.DebugContext = ‘’) -> str

addInputTensorFromParentGraph(self: popart_core._BuilderCore, tensorId: str) → None¶

Add a new named input tensor to the model.

Parameter tensorId:: The identifier string of the input tensor. This identifier must already exist in the parent GraphProto’s name scope and must appear topologically before this sub-graph.

addNodeAttribute(*args, **kwargs)¶

Overloaded function.

addNodeAttribute(self: popart_core._BuilderCore, attributeName: str, attributeValue: int, nodeOutputNames: Set[str]) -> None
addNodeAttribute(self: popart_core._BuilderCore, attributeName: str, attributeValue: List[int], nodeOutputNames: Set[str]) -> None
addNodeAttribute(self: popart_core._BuilderCore, attributeName: str, attributeValue: float, nodeOutputNames: Set[str]) -> None
addNodeAttribute(self: popart_core._BuilderCore, attributeName: str, attributeValue: List[float], nodeOutputNames: Set[str]) -> None
addNodeAttribute(self: popart_core._BuilderCore, attributeName: str, attributeValue: str, nodeOutputNames: Set[str]) -> None
addNodeAttribute(self: popart_core._BuilderCore, attributeName: str, attributeValue: List[str], nodeOutputNames: Set[str]) -> None

addOutputTensor(self: popart_core._BuilderCore, outputName: str) → None¶

addUntypedInputTensor(*args, **kwargs)¶

Overloaded function.

addUntypedInputTensor(self: popart_core._BuilderCore, debugPrefix: popart_core.DebugContext = ‘’) -> str

Add a new input tensor without a type or shape to the model.

Parameter debugContext:: Optional debug information.

Returns: The unique name of the input tensor.

addUntypedInputTensor(self: popart_core._BuilderCore, debugContext: popart_core.DebugContext = ‘’) -> str

Add a new input tensor without a type or shape to the model.

Parameter debugContext:: Optional debug information.

Returns: The unique name of the input tensor.

checkpointOutput(self: popart_core._BuilderCore, nodeOutputNames: List[str]) → List[str]¶

customOp(self: popart_core._BuilderCore, opName: str, opVersion: int, domain: str, inputs: list, attributes: dict, numOutputs: int = 1, name: str = '') → List[str]¶

excludePatterns(self: popart_core._BuilderCore, nodeOutputName: str, patternNames: List[str]) → None¶

executionPhase(*args, **kwargs)¶

Overloaded function.

executionPhase(self: popart_core._BuilderCore, nodeOutputNames: str, value: int = 0) -> None
executionPhase(self: popart_core._BuilderCore, nodeOutputNames: Set[str], value: int = 0) -> None
executionPhase(self: popart_core._BuilderCore, value: int = 0) -> AttributeContextManager

getAllNodeAttributeNames(self: popart_core._BuilderCore, nodeOutputNames: Set[str]) → List[str]¶

Get all the attribute names from the ONNX node. This functions will throw an exception if it can’t find the unique node.

Parameter nodeOutputNames:: Names of the output tensors of the ONNX node used to find the node in the ONNX model.

getExecutionPhase(self: popart_core._BuilderCore) → int¶

getFloatNodeAttribute(self: popart_core._BuilderCore, attributeName: str, nodeOutputNames: Set[str]) → float¶

Get the float value of the attribute for the ONNX node. This functions will throw an exception if it can’t find the unique node or the attribute does not exist or it has not been set to the float type.

Parameter attributeName:: The name of the attribute to find.
Parameter nodeOutputNames:: Names of the output tensors of the ONNX node used to find the node in the ONNX model.

Returns: Value of the attribute.

getFloatVectorNodeAttribute(self: popart_core._BuilderCore, attributeName: str, nodeOutputNames: Set[str]) → List[float]¶

Get the ``std::vector``<float> value of the attribute for the ONNX node. This functions will throw an exception if it can’t find the unique node or the attribute does not exist.

Parameter attributeName:: The name of the attribute to find.
Parameter nodeOutputNames:: Names of the output tensors of the ONNX node used to find the node in the ONNX model.

Returns: Value of the attribute.

getInputTensorIds(self: popart_core._BuilderCore) → List[str]¶

getInt64NodeAttribute(self: popart_core._BuilderCore, attributeName: str, nodeOutputNames: Set[str]) → int¶

Get the int64_t value of the attribute for the ONNX node. This functions will throw an exception if it can’t find the unique node or the attribute does not exist or it has not been set to the int64_t type.

Parameter attributeName:: The name of the attribute to find.
Parameter nodeOutputNames:: Names of the output tensors of the ONNX node used to find the node in the ONNX model.

Returns: Value of the attribute.

getInt64VectorNodeAttribute(self: popart_core._BuilderCore, attributeName: str, nodeOutputNames: Set[str]) → List[int]¶

Get the ``std::vector``<int64_t> value of the attribute for the ONNX node. This functions will throw an exception if it can’t find the unique node or the attribute does not exist or it has not been set to the ``std::vector``<int64_t> type.

Parameter attributeName:: The name of the attribute to find.
Parameter nodeOutputNames:: Names of the output tensors of the ONNX node used to find the node in the ONNX model.

Returns: Value of the attribute.

getModelProto(self: popart_core._BuilderCore) → bytes¶

getNameScope(self: popart_core._BuilderCore, name: str = '') → str¶

getOutputTensorIds(self: popart_core._BuilderCore) → List[str]¶

getPartialsType(self: popart_core._BuilderCore, nodeOutputName: str) → str¶

Get the partials type for the given node.

Parameter nodeOutputName:: Name of the output tensor of the ONNX node.

getPipelineStage(self: popart_core._BuilderCore) → int¶

getRecomputeOutputInBackwardPass(*args, **kwargs)¶

Overloaded function.

getRecomputeOutputInBackwardPass(self: popart_core._BuilderCore, nodeOutputName: str) -> bool
getRecomputeOutputInBackwardPass(self: popart_core._BuilderCore, nodeOutputNames: Set[str]) -> bool

getStringNodeAttribute(self: popart_core._BuilderCore, attributeName: str, nodeOutputNames: Set[str]) → str¶

Get the std::string value of the attribute for the ONNX node. This functions will throw an exception if it can’t find the unique node or the attribute does not exist or it has not been set to the std::string type.

Parameter attributeName:: The name of the attribute to find.
Parameter nodeOutputNames:: Names of the output tensors of the ONNX node used to find the node in the ONNX model.

Returns: Value of the attribute.

getStringVectorNodeAttribute(self: popart_core._BuilderCore, attributeName: str, nodeOutputNames: Set[str]) → List[str]¶

Get the ``std::vector``<std::string> value of the attribute for the ONNX node. This functions will throw an exception if it can’t find the unique node or the attribute does not exist.

Parameter attributeName:: The name of the attribute to find.
Parameter nodeOutputNames:: Names of the output tensors of the ONNX node used to find the node in the ONNX model.

Returns: Value of the attribute.

getTensorDtypeString(self: popart_core._BuilderCore, id: str) → str¶

getTensorShape(self: popart_core._BuilderCore, id: str) → List[int]¶

Return an ONNX graph tensor shape, from either the input, output, or value_info lists in the GraphProto.

Parameter id:: Tensor id.

Returns: A vector of tensor dimensions.

getTrainableTensorIds(self: popart_core._BuilderCore) → List[str]¶

getValueTensorIds(self: popart_core._BuilderCore) → List[str]¶

getVirtualGraph(*args, **kwargs)¶

Overloaded function.

getVirtualGraph(self: popart_core._BuilderCore) -> int
getVirtualGraph(self: popart_core._BuilderCore, nodeOutputNames: str) -> int

hasExecutionPhase(self: popart_core._BuilderCore) → bool¶

hasPipelineStage(self: popart_core._BuilderCore) → bool¶

hasVirtualGraph(self: popart_core._BuilderCore) → bool¶

isInitializer(self: popart_core._BuilderCore, id: str) → bool¶

Returns true if the ONNX tensor is in the initializer list of the GraphProto.

Parameter id:: Tensor id.

Returns: A boolean.

nameScope(self: popart_core._BuilderCore, name: str) → NameContextManager¶

nodeHasAttribute(self: popart_core._BuilderCore, attributeName: str, nodeOutputNames: Set[str]) → bool¶

Check whether the ONNX node has an attribute set. This functions will throw an exception if it can’t find the unique node.

Parameter attributeName:: The name of the attribute to find.
Parameter nodeOutputNames:: Names of the output tensors of the ONNX node used to find the node in the ONNX model.

outlineAttributes(self: popart_core._BuilderCore, arg0: dict) → KeyValueContextManager¶

outputTensorLocation(*args, **kwargs)¶

Overloaded function.

outputTensorLocation(self: popart_core._BuilderCore, nodeOutputNames: str, value: popart_core.TensorLocation = <popart_core.TensorLocation object at 0x7f8f4edbb3e8>) -> None
outputTensorLocation(self: popart_core._BuilderCore, value: popart_core.TensorLocation = <popart_core.TensorLocation object at 0x7f8f4edbb420>) -> AttributeContextManager

pipelineStage(*args, **kwargs)¶

Overloaded function.

pipelineStage(self: popart_core._BuilderCore, nodeOutputNames: str, value: int = 0) -> None
pipelineStage(self: popart_core._BuilderCore, value: int) -> AttributeContextManager

recomputeOutput(*args, **kwargs)¶

Overloaded function.

recomputeOutput(self: popart_core._BuilderCore, nodeOutputNames: str, value: popart_core.RecomputeType = RecomputeType.Undefined) -> None
recomputeOutput(self: popart_core._BuilderCore, value: popart_core.RecomputeType = RecomputeType.Undefined) -> AttributeContextManager

recomputeOutputInBackwardPass(*args, **kwargs)¶

Overloaded function.

recomputeOutputInBackwardPass(self: popart_core._BuilderCore, nodeOutputName: str, value: popart_core.RecomputeType = RecomputeType.Recompute) -> None
recomputeOutputInBackwardPass(self: popart_core._BuilderCore, nodeOutputNames: Set[str], value: popart_core.RecomputeType = RecomputeType.Recompute) -> None

removeNodeAttribute(self: popart_core._BuilderCore, attributeName: str, nodeOutputNames: Set[str]) → None¶

Remove an attribute from the ONNX node. This functions will throw an exception if it can’t find the unique node or the attribute does not exist.

Parameter attributeName:: The name of the attribute to find.
Parameter nodeOutputNames:: Names of the output tensors of the ONNX node used to find the node in the ONNX model.

saveInitializersExternally(self: popart_core._BuilderCore, ids: List[str], filename: str) → None¶

Save tensor data externally.

The model data cannot exceed 2GB - the maximum size of a Protobuf message. To avoid this, for large models ONNX tensor data can be saved separately.

Parameter ids:: The names of tensors whose data is to be saved externally.
Parameter fn:: The name of a file containing the binary tensor data. This can be an absolute or relative path. If a relative path, when the ONNX model is saved, external tensor data will be written to a path relative to your current working directory.

saveModelProto(self: popart_core._BuilderCore, filename: str) → None¶

Save the builder’s ONNX ModelProto into the builder and validate it.

Parameter fn:: The name of a file containing an ONNX model protobuf.

schedulePriority(self: popart_core._BuilderCore, value: float) → AttributeContextManager¶

setAvailableMemoryProportion(self: popart_core._BuilderCore, nodeOutputName: str, availableMemoryProportion: float) → None¶

Set the available memory for the given node. Used on the convolution op.

Parameter nodeOutputName:: Name of the output tensor of the ONNX node.
Parameter availableMemoryProportion:: The available memory proportion 0 < x <= 1.

setGraphName(*args, **kwargs)¶

Overloaded function.

setGraphName(self: popart_core._BuilderCore, name: str) -> None

Specifies a graph name.

Parameter name:: String to name the graph.

setGraphName(self: popart_core._BuilderCore, name: str) -> None

setInplacePreferences(self: popart_core._BuilderCore, nodeOutputName: str, prefs: Dict[str, float]) → None¶

setPartialsType(self: popart_core._BuilderCore, nodeOutputName: str, partialsType: str) → None¶

Set the partials type for the given node. Used on the convolution op.

Parameter nodeOutputName:: Name of the output tensor of the ONNX node.
Parameter partialsType:: The type for the partials. Can be either FLOAT or HALF.

setSerializeMatMul(self: popart_core._BuilderCore, nodeOutputName: Set[str], mode: str, factor: int = 0, keep_precision: bool = False) → None¶

virtualGraph(*args, **kwargs)¶

Overloaded function.

virtualGraph(self: popart_core._BuilderCore, nodeOutputNames: str, value: int = 0) -> None
virtualGraph(self: popart_core._BuilderCore, value: int) -> AttributeContextManager

2.6. Session¶

class popart_core._InferenceSessionCore¶

compileAndExport(self: popart_core._InferenceSessionCore, filename: str, err: popart_core.OutOfMemoryError) → None¶

getCycleCount(self: popart_core._InferenceSessionCore, id: str = '') → int¶: Copy the cycle count tensor to host from the device.

getExecutionReport(self: popart_core._InferenceSessionCore, useCbor: bool = False, resetProfile: bool = True) → bytes¶

Retrieve the execution report from the poplar::Engine.

The options which were given to the constructor will influence the information in the report. By default a JSON format report is produced.

This may only be called after the prepareDevice() call has been made.

Parameter useCbor:: Produce a CBOR formatted report.
Parameter resetProfile:: Resets the execution profile.

Returns: A string containing the execution report.

getGraphReport(self: popart_core._InferenceSessionCore, useCbor: bool = False) → bytes¶

Retrieve the graph report from the poplar::Engine.

The options which were given to the constructor will influence the information in the report. By default a JSON format report is produced.

This may only be called after the prepareDevice() call has been made.

Parameter useCbor:: Produce a CBOR formatted report.

Returns: A string containing the graph (compilation) report.

getInfo(self: popart_core._InferenceSessionCore, arg0: str) → popart_core._TensorInfoCore¶: Get the TensorInfo on a Tensor.

getRNGState(self: popart_core._InferenceSessionCore) → List[int]¶

getSerializedGraph(self: popart_core._InferenceSessionCore) → bytes¶

getSummaryReport(self: popart_core._InferenceSessionCore, resetProfile: bool = True) → str¶

Retrieve the summary from from the poplar::Engine.

The options which were given to the constructor will influence the information in the report.

This may only be called after the prepareDevice() call has been made.

Parameter resetProfile:: Resets the execution profile.

Returns: A string containing the report.

getTensorTileMap(self: popart_core._InferenceSessionCore) → Dict[str, List[List[Tuple[int, int]]]]¶

loadExecutable(self: popart_core._InferenceSessionCore, filename: str) → None¶

Load the poplar::Executable and the PopART metadata from the given file. The file must have been created with compileAndExport()

Parameter filename:: Name of the file to load the executable from.

modelToHost(self: popart_core._InferenceSessionCore, arg0: str) → None¶

Write current model to ONNX file.

Parameter fn:: Path to file. Can be absolute or relative. If you plan to run your program in multiple processes simultaneously, you should avoid possible race conditions by writing to different files, for example by using temporary files.

prepareDevice(self: popart_core._InferenceSessionCore, err: popart_core.OutOfMemoryError) → None¶

Prepare the network for execution.

This will create the poplar::Graph and poplar::Engine, and set up poplar::Streams.

resetHostWeights(self: popart_core._InferenceSessionCore, modelProtoOrFilename: str, ignoreWeightsInModelWithoutCorrespondingHostWeight: bool = False) → None¶

Reset the weights with the weights in an ONNX model that differs from the current model only in weights. This only updates the weights on the host; the user still needs to call weightsFromHost() after this to update the weights on the device.

Parameter model:: Either an ONNX model protobuf, or the name of a file containing an ONNX model protobuf.
Parameter ignoreWeightsInModelWithoutCorrespondingHostWeight:: If true, do not error if there are initializers in the ONNX model with no corresponding initializer tensor in the session’s IR.

run(self: popart_core._InferenceSessionCore, stepio: popart_core.IStepIO, debugName: str = '') → None¶

Perform one step.

Read input data from address in stepIO.in. Write the output data to addresses in stepIO.out.

Parameter stepIO:: Input and output data.
Parameter debugName:: Debug string to identify this run in logs.

setRNGState(self: popart_core._InferenceSessionCore, rngValue: List[int]) → None¶

setRandomSeed(self: popart_core._InferenceSessionCore, seedValue: int) → None¶

Sets the random number generator seed on all tiles of the device. This ensures deterministic behaviour of random operations in the graph.

Parameter The:: seed value.

updateExternallySavedTensorLocations(self: popart_core._InferenceSessionCore, arg0: str, arg1: str) → None¶

Update the tensor locations of the tensors in the Session’s ONNX model. The new file will be created at this point, and written to when the ONNX model is saved with a subsequent call to modelToHost.

Parameter fromLocation:: All externally saved tensors with location fromLocation will have their location updated to toLocation.
Parameter toLocation:: The updated location. Must not already exist.

weightsFromHost(self: popart_core._InferenceSessionCore) → None¶: Write weights from host to the device.

writeWeights(self: popart_core._InferenceSessionCore, arg0: popart_core.IWeightsIO) → None¶

Write the weights. Must call weightsFromHost() after this.

The weight data is written to the addresses in weightsIo.out.

class popart_core._TrainingSessionCore¶

compileAndExport(self: popart_core._TrainingSessionCore, filename: str, err: popart_core.OutOfMemoryError) → None¶

connectStreamToCallback(self: popart_core._TrainingSessionCore, arg0: str, arg1: Callable[[capsule], None], arg2: int) → None¶: Connect Poplar stream callbacks. In conjunction with getGradAndVarStreamIds the streams can be used to copy gradients to the host to perform collective operations after which the variables can be streamed back after they have been updated to the device. p index referes to the replica index when using replicated graphs.

getCycleCount(self: popart_core._TrainingSessionCore, id: str = '') → int¶: Copy the cycle count tensor to host from the device.

getExecutionReport(self: popart_core._TrainingSessionCore, useCbor: bool = False, resetProfile: bool = True) → bytes¶

Retrieve the execution report from the poplar::Engine.

The options which were given to the constructor will influence the information in the report. By default a JSON format report is produced.

This may only be called after the prepareDevice() call has been made.

Parameter useCbor:: Produce a CBOR formatted report.
Parameter resetProfile:: Resets the execution profile.

Returns: A string containing the execution report.

getGraphReport(self: popart_core._TrainingSessionCore, useCbor: bool = False) → bytes¶

Retrieve the graph report from the poplar::Engine.

The options which were given to the constructor will influence the information in the report. By default a JSON format report is produced.

This may only be called after the prepareDevice() call has been made.

Parameter useCbor:: Produce a CBOR formatted report.

Returns: A string containing the graph (compilation) report.

getHostReduceStreamIds(self: popart_core._TrainingSessionCore) → List[str]¶: Access the stream IDs for variables that are involved in host side reductions on the host. Only populated if hostAllReduce is enabled in the SessionOptions

getInfo(self: popart_core._TrainingSessionCore, arg0: str) → popart_core._TensorInfoCore¶: Get the TensorInfo on a Tensor.

getIr(self: popart_core._TrainingSessionCore) → popart::Ir¶

getRNGState(self: popart_core._TrainingSessionCore) → List[int]¶

getSerializedGraph(self: popart_core._TrainingSessionCore) → bytes¶

Retrieve the serialized graph from the poplar::Engine.

A JSON format report is produced.

This may only be called after the prepareDevice() call has been made.

Returns: A string containing the serialized graph.

getSummaryReport(self: popart_core._TrainingSessionCore, resetProfile: bool = True) → str¶

Retrieve the summary from from the poplar::Engine.

The options which were given to the constructor will influence the information in the report.

This may only be called after the prepareDevice() call has been made.

Parameter resetProfile:: Resets the execution profile.

Returns: A string containing the report.

getTensorTileMap(self: popart_core._TrainingSessionCore) → Dict[str, List[List[Tuple[int, int]]]]¶

Retrieve the tensor tile mapping from the poplar::Graph.

This may only be called after the prepareDevice() call has been made.

Returns: A TensorTileMap object for all tensors in the graph.

loadExecutable(self: popart_core._InferenceSessionCore, filename: str) → None¶

Load the poplar::Executable and the PopART metadata from the given file. The file must have been created with compileAndExport()

Parameter filename:: Name of the file to load the executable from.

modelToHost(self: popart_core._TrainingSessionCore, arg0: str) → None¶

Write current model to ONNX file.

Parameter fn:: Path to file. Can be absolute or relative. If you plan to run your program in multiple processes simultaneously, you should avoid possible race conditions by writing to different files, for example by using temporary files.

prepareDevice(self: popart_core._TrainingSessionCore, err: popart_core.OutOfMemoryError) → None¶

Prepare the network for execution.

This will create the poplar::Graph and poplar::Engine, and set up poplar::Streams.

readWeights(self: popart_core._TrainingSessionCore, arg0: popart_core.IWeightsIO) → None¶

Read the weights. Must have called weightsToHost() first.

The weight data is written to the addresses in weightsIo.out.

resetHostWeights(self: popart_core._TrainingSessionCore, modelProtoOrFilename: str, ignoreWeightsInModelWithoutCorrespondingHostWeight: bool = False) → None¶

Reset the weights with the weights in an ONNX model that differs from the current model only in weights. This only updates the weights on the host; the user still needs to call weightsFromHost() after this to update the weights on the device.

Parameter model:: Either an ONNX model protobuf, or the name of a file containing an ONNX model protobuf.
Parameter ignoreWeightsInModelWithoutCorrespondingHostWeight:: If true, do not error if there are initializers in the ONNX model with no corresponding initializer tensor in the session’s IR.

run(self: popart_core._TrainingSessionCore, stepio: popart_core.IStepIO, debugName: str = '') → None¶

Perform one step.

Read input data from address in stepIO.in. Write the output data to addresses in stepIO.out.

Parameter stepIO:: Input and output data.
Parameter debugName:: Debug string to identify this run in logs.

setRNGState(self: popart_core._TrainingSessionCore, rngValue: List[int]) → None¶

setRandomSeed(self: popart_core._TrainingSessionCore, seedValue: int) → None¶

Sets the random number generator seed on all tiles of the device. This ensures deterministic behaviour of random operations in the graph.

Parameter The:: seed value.

updateExternallySavedTensorLocations(self: popart_core._TrainingSessionCore, arg0: str, arg1: str) → None¶

Update the tensor locations of the tensors in the Session’s ONNX model. The new file will be created at this point, and written to when the ONNX model is saved with a subsequent call to modelToHost.

Parameter fromLocation:: All externally saved tensors with location fromLocation will have their location updated to toLocation.
Parameter toLocation:: The updated location. Must not already exist.

updateOptimizerFromHost(self: popart_core._TrainingSessionCore, arg0: popart_core.Optimizer) → None¶

Update the optimizer and the associated hyperparameters but not the optimizer state tensors.

NOTE: The optimizer parameter has to be compatible with the optimizer passed to the constructor. For example, you cannot call this function with an SDG1 optimizer if you created the session with an SDG0 optimizer. The reason for this is that it is not possible to change the IR after it has been constructed.

Parameter optimizer:: A pointer to a popart::Optimizer.

weightsFromHost(self: popart_core._TrainingSessionCore) → None¶: Write weights from host to the device.

weightsToHost(self: popart_core._TrainingSessionCore) → None¶: Copy the weights to host from the device.

writeWeights(self: popart_core._TrainingSessionCore, arg0: popart_core.IWeightsIO) → None¶

Write the weights. Must call weightsFromHost() after this.

The weight data is written to the addresses in weightsIo.out.

2.7. Session Options¶

class popart_core.SessionOptions¶

property accumulationAndReplicationReductionType¶: Specify how gradients are reduced when using gradient accumulation. The options are equivalent to how gradients are reduced on lossOps.

property accumulationFactor¶: Specify the number of micro-batches to accumulate before applying the varUpdate.

property accumulationReductionType¶: Specify how gradients are reduced when using gradient accumulation. The options are equivalent to how gradients are reduced on lossOps.

property aliasZeroCopy¶: Enable zero-copy for subgraphs.

property autoRecomputation¶: Enable recomputation of operations in the graph in the backwards pass to reduce model size at the cost of computation cycles.

property batchSerializationSettings¶: Configuration setting for batch serialization.

property cachePath¶: Folder to save the poplar::Executable to.

property compileEngine¶: If false, the backend will build the Poplar graph but not compile it into an Engine. In this case, no execution can be performed, and nothing can be transferred to the device. API calls which retrieve information from the graph building stage, such as tile mapping introspection, can still be used.

property constantWeights¶: An optimization for an inference session to have constant weights, true by default. Set this option to false if you are going to want to change the weights with a call to Session::resetHostWeights after the session has been prepared. This option has no effect on a training session

property customCodeletCompileFlags¶: Compile flags for the custom codelets. For example -g to generate debug info.

property customCodelets¶: List of codelets (with filetype) to be added to the Poplar graph. See the Poplar documentation for more information.

property decomposeGradSum¶: Replaces single sums of partial gradients with a tree of additions. This can reduce max liveness at the cost of extra cycles. A typical use case for this would be if a large weight tensor is used as an input to many operations.

property disableGradAccumulationTensorStreams¶: If true, the weight gradient tensors are not saved off the device when devicex.weightsFromHost() is called. Note: this option is overridden if #syntheticDataMode is not #SyntheticDataMode::Off.

property dotChecks¶: When to write .dot files during Ir construction.

property dotOpNames¶: Include the Op name in the .dot file (the Op type is always exported).

property enableDistributedReplicatedGraphs¶: Enable training with Poplar replicated graphs across multiple PopART instances.

property enableEngineCaching¶: Enable Poplar executable caching.

property enableFloatingPointChecks¶: Throw an exception when floating point errors occur.

property enableFullyConnectedPass¶: Enable the global #fullyConnectedPass option for matmuls.

property enableGradientAccumulation¶: Enable gradient accumulation.

property enableGroupedMatmuls¶: Enable/disable the grouping of matmuls that are the same shape.

property enableNonStableSoftmax¶: By default, we use the stable softmax Poplar function. The input tensor to softmax, _x_, is preprocessed by subtracting max(_x_) from each element before computing the exponentials, ensuring numerical stability. If you are sure the inputs to your softmax operations are small enough to not cause overflow when computing the exponential, you can enable the non-stable version instead, to increase the speed.

property enableOutlining¶: Identify and extract repeated parts of computational graph into subgraphs.

property enableOutliningCopyCostPruning¶: When true the cost of copying of cached sections should be included in the outlining cost model.

property enablePipelining¶: Enable pipelining of virtual graphs

property enableReplicatedGraphs¶: Enable replication of graphs.

property enableStableNorm¶: If true, computes the mean first and subtracts the activations from it before computing the variance. The implementation with this flag set to true is slower than when set to false. The stable version requires the first order moment to be estimated and applied to the sample set before the second order central moment is calculated.

property enableStochasticRounding¶: Enable stochastic rounding.

property executionPhaseSettings¶: Configuration settings for execution phases.

property explicitRecomputation¶: Enable explicit recomputation.

property exportPoplarComputationGraph¶: Export Poplar computation graph.

property exportPoplarVertexGraph¶: Export Poplar vertex graph.

property finalDotOp¶: See #firstDotOp.

property firstDotOp¶: The ops to write to the .dot file will be a continuous interval of the schedule, controlled by firstDotOp and finalDotOp. In particular, it will be [min(0, firstDotOp), max(N ops in Ir, finalDotOp)).

property globalReplicationFactor¶: The total number of replicas in a multi instance replicated graph training session (this should be left as the default value (1) if distributed replicated graphs are disabled). This value includes local replication.

property hostAllReduce¶: Perform AllReduce operation on the host. Only useful for training session.

property hostAllReduceRemoteBuffer¶: Enable the use of poplar::RemoteBuffers for hostAllReduce operations.

property hostWeightUpdate¶: Perform weight update on the host. Only useful for training session.

property instrumentWithHardwareCycleCounter¶: Add instrumentation to your program to count the number of device cycles (of a single tile, on a single IPU) that your main program takes to execute. Expect this to have a small detrimental impact on performance.

property kahnTieBreaker¶: The initial scheduling is done with Kahn’s algorithm. When several Ops are free to be scheduled, this controls which method is used.

property logDir¶: A directory for log traces to be written into.

property mergeVarUpdate¶: Enable merging of VarUpdates into groups of VarUpdates, by flattening and concatenating variable tensors and updating tensors.

property mergeVarUpdateMemThreshold¶: The #MergeVarUpdateType::AutoLoose and #MergeVarUpdateType::AutoTight VarUpdateOp merging algorithms have a threshold on the total memory of variable tensors to merge for updating. Defined as total memory in bytes.

property outlineSequenceBreakCost¶: The penalty applied to outlining potential sub-graphs if the sub-graph to be created breaks up a sequence of operations that are more efficient (for example for overlapping compute and exchange) when outlined together. Default value is set to ~10 * Op::getHighSubgraphValue().

property outlineThreshold¶: The incremental value that a sub-graph requires, relative to its nested sub-graphs (if any), to be eligible for outlining. A high threshold results in fewer sub-graphs being outlined, a negative value results in all being outlined. The gross value of a sub-graph is the sum of its constituent Ops’ Op::getSubgraphValue() values. To disable outlining, it is better to set enableOutlining to false than to set this value to infinity. The default value of 1.0f results in all high value operations such as convolution being cached, but standalone low Value operations such as Relu will not be.

property partialsTypeMatMuls¶: Set the partials type globally for matmuls. Can be overridden individually with Builder.setPartialsType(). Valid values are “float” and “half”. By default, this is not set, so no global partials type is imposed.

property rearrangeAnchorsOnHost¶: Before anchor tensors are streamed from device to host, they are not necessarily arranged in memory as required when they are to be copied from host stream to host. This can be done on the device or on the host. Done on host by default to save memory, but often at the expense of cycles, especially for larger anchor tensors.

property replicatedGraphCount¶: If enableReplicatedGraphs is true, replicatedGraphCount will set the number of model replications. For example, if your model uses 1 IPU, a replicatedGraphCount of 2 will use 2 IPUs. If your model is pipelined across 4 IPUs, a replicatedGraphCount of 4 will use 16 IPUs total. Therefore, the number of IPUs you request must be a multiple of replicatedGraphCount. If the training is done across multiple instances then the replicatedGraphCount is the number of replicas for this instance.

property separateCallOpPdfs¶: When generating PDFs of IR graphs, create separate PDFs for each subgraph.

property serializedPoprithmsAnnealGraphsDir¶: PopART uses Poprithms for scheduling PopART graphs. The Poprithms graphs created for scheduling can be optionally serialised (written to file). The string below specified the directory to serialize Poprithms graphs to. If it is empty, then the graphs will not be serialised. The names of serialization files will be poprithms_anneal_graph_i.json for the lowest non-existing values of i. The directory must already exist, PopART will not create it.

property subgraphCopyingStrategy¶: This setting determines how copies for inputs and outputs for subgraphs are lowered. By setting this value to JustInTime you may save memory at the cost of fragmenting subgraphs into multiple Poplar functions. This may be particularly useful when a number of weight updates are outlined in one subgraph, as it may prevent multiple weight tensors from being live at the same time inside the subgraph.

property swapLimitScheduler¶: The maximum number of improving steps allowed by the scheduling algorithm before a solution must be returned.

property syntheticDataMode¶

disable data transfer to/from the host. Set to #SyntheticDataMode::Off to use real data.

Type: Use synthetic data

property timeLimitScheduler¶: The maximum allowed time that can be spent searching for a good graph schedule before a solution must be returned.

2.8. Optimizers¶

class popart_core.Optimizer¶

getLossScalingVal(self: popart_core.Optimizer) → float¶

2.8.1. SGD¶

class popart_core.SGD¶

Stochastic Gradient Descent (%SGD) optimizer.

Akin to any optimizer implementation, this class is responsible for updating each weight tensor ($w$) in the model using the gradient ($g$) of the loss function with respect to the weight as calculated during the backwards pass.

The %SGD optimizer has the following state for each weight:

velocity ($v$)

The %SGD optimizer has the following hyper parameters:

learning rate ($text{lr}$) * momentum ($text{mm}$) * *weight

decay* ($text{wd}$) * dampening ($text{dm}$) * velocity scaling ($text{vs}$) * loss scaling ($text{ls}$) * clip norm settings

The values of these parameters can be shared between all weights but some can be overridden with weight-specific values (see SGD::insertSpecific). Hyper parameters are captured using OptimizerValue objects and therefore can be either a constant value or a non-constant value that can be adjusted by the user.

In the following we will describe how this optimizer updates a weight using a gradient. In the context of this description the gradient is is the value of the gradient after any gradient accumulation has been performed and after the application of a loss scaling factor to the gradient has been corrected for.

When the optimizer needs to update a weight, $w$, using a gradient, $g$, it first updates the optimizer state as follows:

f[ v’ := v * text{mm} + (1 - text{dm}) * (g + text{wd} * w) text{ . } f]

Following the update of the optimizer state the optimizer uses said state to update the weight:

f[ w’ := w - text{lr} * v’ text{ . } f]

In addition to the above, the velocity scaling hyper parameter is a scaling factor that can provide improved numerical stability by ensuring the values stored in the optimizer state, $v$, are scaled by this value. When using this parameter PopART will automatically deal with the artificially scaled velocity value during the weight update and other hyper parameters do not need to be adjusted).

In addition, the loss scaling hyper parameter is similar in nature to the velocity scaling parameter. It is a scaling value that is applied to the loss gradient at the start of the the backwards pass and, at the end of the backwards pass, this scaling is reversed by multiplying the gradients for each weight with the inverse of the loss scaling value prior to updating the optimizer state. Using loss scaling can also improve numerical stability in some cases.

Finally, it is possible to add clip norm settings for this optimizer. These clip norms compute the L2 norm for a group of weights and adds a scalar term to the weight update that effectively divides it by the norm (or a constant value that is provided as part of the clip norm, which ever is greater).

dampenings(self: popart_core.SGD) → popart_core.OptimizerValueMap¶

insertSpecific(self: popart_core.SGD, arg0: str, arg1: dict) → None¶

learningRates(self: popart_core.SGD) → popart_core.OptimizerValueMap¶

momentums(self: popart_core.SGD) → popart_core.OptimizerValueMap¶

velocityScalings(self: popart_core.SGD) → popart_core.OptimizerValueMap¶

weightDecays(self: popart_core.SGD) → popart_core.OptimizerValueMap¶

2.8.2. ConstSGD¶

class popart_core.ConstSGD¶

Stochastic Gradient Descent (SGD) optimizer with constant learning rate, weight decay, loss scaling and clip norm settings (and default values for momentum, dampening or velocity scaling).

NOTE: See SGD for detailed meaning for these parameters.

NOTE: This class exists for backwards compatibility with the Python API and may be removed at some point in the future.

2.8.3. Adam¶

class popart_core.Adam¶

AdamW, Lamb and AdaMax optimizer implementation.

Akin to any optimizer implementation, this class is responsible for updating each weight tensor ($w$) in the model using the gradient ($g$) of the loss function with respect to the weight as calculated during the backwards pass.

The optimizer has the following state for each weight:

first-order momentum ($m$) * second-order momentum ($v$) * *time

step* ($t$)

The optimizer has the following hyper parameters:

learning rate ($text{lr}$) * weight decay ($text{wd}$) *

beta1 ($beta_1$) * beta2 ($beta_2$) * epsilon ($epsilon$) * loss scaling ($text{ls}$) * maximum weight norm ($text{mwn}$)

The values of these parameters can be shared between all weights but some can be overridden with weight-specific values (see Adam::insertSpecific). Hyper parameters are captured using OptimizerValue objects and therefore can be either a constant value or a non-constant value that can be adjusted by the user.

The values of #AdamMode and #WeightDecayMode passed to the constructor determines how weights are updated (see below).

In the following we will describe how this optimizer updates a weight using a gradient. In the context of this description the gradient is is the value of the gradient after any gradient accumulation has been performed and after the application of a loss scaling factor to the gradient has been corrected for.

When the optimizer needs to update a weight, $w$, using a gradient, $g$, it first computes a term $g_text{tmp}$, which is effectively is $g$ with L2 regularization applied if the #WeightDecayMode is set to WeightDecayMode::L2Regularization this, as follows:

f[ g_text{tmp} := left{begin{aligned} g & text{ ; (Decay) } \ (g + text{wd} * w) & text{ ; (L2Regularization) ; . } \ end{aligned}right.\ f]

Secondly, the optimizer updates the optimizer state as follows:

f[ m’ &:= beta_1 * m + (1 - beta_1) * g_text{tmp} \ v’ &:= left{begin{aligned} beta_2 * v + (1 - beta_2) * g_text{tmp}^2 & text{ ; (Adam/AdamNoBias) } \ beta_2 * v + (1 - beta_2) * g_text{tmp}^2 & text{ ; (Lamb/LambNoBias) } \ text{max}(beta_2 * v, |g_\text{tmp}|) & text{ ; (AdaMax) } \ end{aligned}right.\ t’ &:= t + 1 \ f]

Next, it computes the following terms:

f[ m_text{tmp} &:= left{begin{aligned} m’ & text{ ; (AdamNoBias/LambNoBias) } \ frac{m’}{(1 - beta_1^{t’})} & text{ ; (Adam/Lamb/AdaMax) } \ end{aligned}right.\ v_text{tmp} &:= left{begin{aligned} v’ & text{ ; (AdamNoBias/LambNoBias) } \ frac{v’}{(1 - beta_2^{t’})} & text{ ; (Adam/Lamb/AdaMax) } \ end{aligned}right.\ u_text{tmp} &:= left{begin{aligned} frac{m_text{tmp}}{(sqrt{v_text{tmp}} + epsilon)} + text{wd} * w &text{ ; (Decay) } \ frac{m_text{tmp}}{(sqrt{v_text{tmp}} + epsilon)} &text{ ; (L2Regularization) } \ end{aligned}right. f]

Finally, the optimizer updates the weight as follows:

f[ w’ := left{begin{aligned} w - text{lr} * u_text{tmp} &text{ ; (Adam/AdamNoBias/AdaMax) } \ w - biggl(frac{text{min}(lVert{w}rVert, text{mwn})}{lVert{u_text{tmp}}rVert}biggr) * text{lr} * u_text{tmp} &text{ ; (Lamb/LambNoBias) } \ end{aligned}right. f]

In addition to the above, the loss scaling hyper parameter is similar in nature to the velocity scaling parameter. It is a scaling value that is applied to the loss gradient at the start of the the backwards pass and, at the end of the backwards pass, this scaling is reversed by multiplying the gradients for each weight with the inverse of the loss scaling value prior to updating the optimizer state. Using loss scaling can also improve numerical stability in some cases.

NOTE: The maximum weight norm is referred to as $phi$ in [You et al., 2020](https://arxiv.org/abs/1904.00962).

beta1s(self: popart_core.Adam) → popart_core.OptimizerValueMap¶

beta2s(self: popart_core.Adam) → popart_core.OptimizerValueMap¶

epss(self: popart_core.Adam) → popart_core.OptimizerValueMap¶

insertSpecific(self: popart_core.Adam, arg0: str, arg1: dict) → None¶

learningRates(self: popart_core.Adam) → popart_core.OptimizerValueMap¶

maxWeightNorms(self: popart_core.Adam) → popart_core.OptimizerValueMap¶

weightDecays(self: popart_core.Adam) → popart_core.OptimizerValueMap¶