3. Poplar API reference¶
Utility classes¶
poplar/ArrayRef.hpp¶
References to arrays.

namespace
poplar
¶ Poplar classes and functions.
Functions

template<class
T
>
classArrayRef
¶ Subclassed by poplar::StringRef
Public Types
Public Functions

constexpr
ArrayRef
()¶

template<class
U
, classAlloc
, typename std::enable_if<std::is_same<U, T>::value  (std::is_pointer<T>::value && std::is_convertible<U const*, T const*>::value), bool>::type = true>ArrayRef
(const std::vector<U, Alloc> &v)¶

template<class
U
, typename std::enable_if<std::is_same<U, T>::value  (std::is_pointer<T>::value && std::is_convertible<U const*, T const*>::value), bool>::type = true>
constexprArrayRef
(const ArrayRef<U> &a)¶

constexpr bool
empty
() const¶

const_iterator
begin
() const¶

const_iterator
end
() const¶

const_iterator
cbegin
() const¶

const_iterator
cend
() const¶

constexpr

template<class
poplar/Interval.hpp¶

namespace
poplar
Poplar classes and functions.
Typedefs

typedef GenericInterval<std::size_t>
Interval
¶
Functions

template<class
T
>
booloperator==
(const GenericInterval<T> &a, const GenericInterval<T> &b)¶

template<class
T
>
booloperator<
(const GenericInterval<T> &a, const GenericInterval<T> &b)¶

template<class
T
>
booloperator!=
(const GenericInterval<T> &a, const GenericInterval<T> &b)¶

template<class
T
>
booloperator>=
(const GenericInterval<T> &a, const GenericInterval<T> &b)¶

template<class
T
>
booloperator>
(const GenericInterval<T> &a, const GenericInterval<T> &b)¶

template<class
T
>
booloperator<=
(const GenericInterval<T> &a, const GenericInterval<T> &b)¶

template<class
T
>
std::ostream &operator<<
(std::ostream &os, const GenericInterval<T> &b)¶

template<class
T
>
structGenericInterval
¶  #include <Interval.hpp>
This class represents an interval that is closed at its lower bound and open at its upper bound.
It is almost always used with T = std::size_t, for which there is a convenient Interval typedef.
Public Functions

GenericInterval
() = default¶ Initialise with begin and end set to their default value of 0.


typedef GenericInterval<std::size_t>
poplar/OptionFlags.hpp¶

namespace
poplar
Poplar classes and functions.
Functions

ProfileValue
getAsProfileValue
(const OptionFlags &flags)¶

void
readJSON
(StringRef string, OptionFlags &flags)¶ Read options from a string in JSON format.
 Parameters
string
: The string to parse.flags
: The OptionFlags to update.
 Exceptions
parse_error
: if the input cannot be parsed.

void
readJSON
(std::istream &stream, OptionFlags &flags)¶ Read options from a stream in JSON format.
 Parameters
stream
: The input stream to read from.flags
: The OptionFlags to update.
 Exceptions
parse_error
: if the input cannot be parsed.

std::ostream &
operator<<
(std::ostream &ostream, const OptionFlags &flags)¶ Write the contents of the given flags to an ostream in JSON format.
 Parameters
ostream
: The stream to write to.flags
: The OptionFlags to write.

class
OptionFlags
¶  #include <OptionFlags.hpp>
A set of option/value string flags to be used in various APIs.
Public Types

using
initializer_list
= std::initializer_list<OptionFlag>¶
Public Functions

OptionFlags
()¶ Construct a set of option flags.
The default constructor creates an empty set of flags.

~OptionFlags
()¶

OptionFlags
(const OptionFlags &other)¶

OptionFlags
(OptionFlags &&other) noexcept¶

OptionFlags &
operator=
(const OptionFlags &other)¶

OptionFlags &
operator=
(OptionFlags &&other) noexcept¶

bool
operator==
(const OptionFlags &other) const¶ Option flags are an exact match.
Each collection contains the same keys, and both collections have the same values for each key

OptionFlags
(initializer_list &&list)¶ Construct a set of option flags from an initializer list of string pairs.
Flags are set in the order they appear in the constructor.
Setting a flag more than once will result in the previous value for that option being overwritten.
 Parameters
initializer
: A list of option/value string pairs to set in the flags.

void
set
(initializer_list &&list)¶ Set option flags from an initializer list of string pairs.
Flags are set in the order they appear in the list.
Setting a flag more than once will result in the previous value for that option being overwritten. If the option was already set in these flags then the previous value will be overwritten.
 Parameters
initializer
: A list of option/value string pairs to set in the flags.

void
set
(StringRef option, StringRef value)¶ Set a single option to a value.
If the option was already set in these flags then the previous value will be over written.
 Parameters
option
: The option to set in the flags.value
: The value to set the option to in the flags.

StringRef
at
(StringRef option) const¶ Retrieves the value of the given option.
If the option does not exist, then an exception is thrown.
 Parameters
option
: The option to retrieve in the flags.

void
clear
()¶ Remove all set flags.

class
iterator
: public std::iterator<std::forward_iterator_tag, OptionFlag>¶ Public Functions

~iterator
()¶

const OptionFlag &
operator*
() const¶

const OptionFlag *
operator>
() const¶
Friends
 friend class OptionFlags


using

namespace
core
¶

ProfileValue
poplar/RandomSeed.hpp¶

namespace
poplar
Poplar classes and functions.
Functions

Tensor
getHwSeeds
(Graph &graph, program::Sequence &prog, const DebugContext &debugContext = {})¶ Gets a snapshot of the h/w seeds for each worker in device.
 Return
A tensor of shape {number of tiles, number of worker contexts, 4}, containing seed values for each of the 4
PRNG_x_y
registers for each worker context on each tile. Parameters
graph
: The Poplar graph.prog
: The program sequence to be extended.debugPrefix
: The prefix prepended to debugging info.

void
setHwSeeds
(Graph &graph, const Tensor &hwSeeds, program::Sequence &prog, const DebugContext &debugContext = {})¶ Sets the hw seeds for each worker in a device from a snapshot of the seeds.
 Parameters
graph
: The Poplar graph.hwSeeds
: A tensor of shape {number of tiles, number of worker contexts, 4} containing seed values for each of the 4PRNG_x_y
registers for each worker context on each tile.prog
: The program sequence to be extended.debugPrefix
: The prefix prepended to debugging info.

Tensor
poplar/ReplicatedStreamMode.hpp¶

namespace
poplar
Poplar classes and functions.
poplar/SerializationFormat.hpp¶

namespace
poplar
Poplar classes and functions.
poplar/StringRef.hpp¶
poplar/SyncType.hpp¶

namespace
poplar
Poplar classes and functions.
Enums

enum
SyncType
¶ An enumeration used to state what type of synchronisation a Sync program represents.
Values:

enumerator
INTERNAL
¶ Each tile waits until all the other tiles in the same IPU reach the Sync program before continuing.

enumerator
EXTERNAL
¶ Each tile waits until all the other tiles in all IPUs in the device reach the Sync program before continuing.

enumerator
GLOBAL
¶ Each tile waits until all the other tiles in all IPUs globally reach the Sync program before continuing.

enumerator

enum
poplar/TypeTraits.hpp¶

namespace
poplar
Poplar classes and functions.

struct
TypeTraits
¶  #include <TypeTraits.hpp>
A structure to provide information about arithmetic (integer and floating point) types.
Public Functions

bool
isSimpleType
() const¶

template<>
TypeTraitsmake
()¶

template<>
constexpr boolisSimpleType
()¶
Public Static Functions

template<typename
T
>
TypeTraitsmake
()¶

template<typename
T
>
constexpr boolisSimpleType
()¶ Return true if it is a basic numeric type, i.e.
std::is_integral<> or std::is_floating_point<> is true, or it is IeeeHalf.

bool

struct
poplar/CSRFunctions.hpp¶
Functions to configure the floating behaviour of the tiles by programming the Control and Status Registers (CSR).

namespace
poplar
Poplar classes and functions.
Functions

void
setFloatingPointBehaviour
(poplar::Graph &graph, poplar::program::Sequence &prog, const FloatingPointBehaviour &behaviour, const DebugContext &debugContext = {})¶ Set the floating point behaviour of a tile.
Configures the floating point behaviour of a tile, affecting the treatment of exceptions and selecting stochastic rounding according to the passed
behaviour
structure.Note that, in Poplar, stochastic rounding is disabled by default until either this function, setStochasticRounding() or the Engine options are used to enable it.
 Parameters
graph
: The Poplar graphprog
: The program to be extendedbehaviour
: A structure of type floatingPointBehaviourdebugPrefix
: The prefix prepended to debugging info

void
setFloatingPointBehaviour
(poplar::Graph &graph, poplar::program::Sequence &prog, const poplar::Tensor &behaviour, const DebugContext &debugContext = {})¶ Set the floating point behaviour of a tile.
Configures the floating point behaviour of a tile, affecting the treatment of exceptions and selecting stochastic rounding according to the passed
behaviour
tensor.The behaviour tensor must be one returned by getAndModifyFloatingPointBehaviour.
 Parameters
graph
: The Poplar graphprog
: The program to be extendedbehaviour
: A tensor containing representation of floatingPointBehaviourdebugPrefix
: The prefix prepended to debugging info

void
setStochasticRounding
(poplar::Graph &graph, poplar::program::Sequence &prog, bool behaviour, const DebugContext &debugContext = {})¶ Set stochastic rounding on or off for the selected tile.
Configures the stochastic rounding operation of a tile according to the passed
behaviour
parameter.Note that, in Poplar, stochastic rounding is disabled by default until either this function, setFloatingPointBehaviour() or the Engine options are used to enable it.
 Parameters
graph
: The Poplar graphprog
: The program to be extendedbehaviour
: Select stochastic rounding: true or falsedebugPrefix
: The prefix prepended to debugging info

poplar::Tensor
getAndModifyFloatingPointBehaviour
(poplar::Graph &graph, poplar::program::Sequence &prog, const FloatingPointBehaviour &clear, const FloatingPointBehaviour &set, const DebugContext &debugContext = {})¶ Get state and and modify floating point behaviour on every tile that belongs to the target in the graph.
Returns the previous state and modifies behaviour. Behaviour modification first clears behaviour set in
clear
followed by setting behaviour set inset
.The recommended usage of this function should be as follows to avoid unexpected numerical behaviour:
… auto state = getAndModifyFloatingPointBehaviour(…) // operations that require the modified FP behaviour … setFloatingPointBehaviour(state)
 Return
State before FP behaviour is modified
 Parameters
graph
: The Poplar graphprog
: The program to be extendedclear
: Select behaviour to clear with fields set being ones cleared. Eg: if set.inv is true, that is cleared and if not set, behaviour is unchanged.set
: Select behaviour to set. The behaviour to set always followsclear
. Only set if field is true.debugPrefix
: The prefix prepended to debugging info

struct
FloatingPointBehaviour
¶  #include <CSRFunctions.hpp>
Structure to specify floating point behaviour.
 Parameters
inv
: If true, a floatingpoint invalid operation (defined by IEEE 754) will cause an exception.The invalid operations are:
Addition or subtraction where the operands are + or  infinity (inf) and the operation results in the subtraction of two infs; for example: (inf)+(+inf) or (+inf)(+inf).
Divisions: (+/0)/(+/0) and (+/inf)/(+/inf).
Multiplications: (+/0)*(+/inf) and (+/inf)*(+/0).
Remainder: x REM y where y=0 or x=(+/inf).
Real operations with complex results such as the square root or logarithm of a negative number.
Operations with NotaNumber as at least one operand.
Comparisons where one of the operands is NotaNumber.
See also nanoo below.
div
: If true a floating point divide by zero operation will cause an exception.oflo
: If true a floating point overflow will cause an exception.esr
: Enable stochastic rounding.nanoo
: Enable NotaNumber on overflow mode. When enabled, half precision calculations that have overflowed will produce a NotaNumber result, rather than saturating to the half precision max/min value, and the invalid operation (inv
) flag will be set.
Public Functions

FloatingPointBehaviour
(bool inv, bool div0, bool oflo, bool esr, bool nanoo)¶

FloatingPointBehaviour
() = default¶

FloatingPointBehaviour
operator!
() const¶

void
Exceptions¶
poplar/exceptions.hpp¶

namespace
poplar
Poplar classes and functions.
Enums

enum
RecoveryAction
¶ An enumeration that specifies how to recover from a recoverable_runtime_error.
Values:

enumerator
IPU_RESET
¶ Reset the IPU and reload IPU memory.

enumerator
PARTITION_RESET
¶ Reset the IPU partition. This retrains the IPUlinks between IPUs.

enumerator
FULL_RESET
¶ Power cycle the system.

enumerator
Functions

std::string
toString
(RecoveryAction recoveryAction)¶ Convert the recovery action to a string.

std::ostream &
operator<<
(std::ostream &os, RecoveryAction recoveryAction)¶

struct
application_runtime_error
: public poplar::runtime_error¶  #include <exceptions.hpp>
This exception is thrown when running a program fails due to an error in the program or a misuse of an API.

struct
control_program_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when the construction of a graph program is invalid.

struct
file_load_error
: public poplar::poplar_error¶

struct
graph_connection_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown during construction of an Engine object if there is an error in the structure of graph, for example, if there are no edges to a vertex input or if there are multiple edges to a vertex input.

struct
graph_cycle_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown during the construction is an Engine object if there are any cycles in the graph that are not broken by recurrent edges.

struct
graph_memory_allocation_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when an memory allocation fails.
Public Functions

graph_memory_allocation_error
(const char *s)¶


struct
graph_object_creation_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown in the construction of a GraphProgEnv object if there was an error in the creation of the graph program object file.

struct
graph_object_load_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown in the construction of a GraphProgEnv object if there was an error in loading the graph program object file.

struct
graph_program_compilation_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown in the construction of a GraphProgEnv object if there are any compilation errors in the graph program.

struct
graph_replication_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when an invalid operation is carried out on a replicated graph.

struct
index_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown if the index of a subscript is out of the bounds of the field it is accessing or if a index of a tensor is invalid.

struct
invalid_machine_model
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when an invalid model of the IPU (for performance model profiling) has been specified.

struct
invalid_option
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when an unrecognised or invalid option is passed to a Poplar API.

struct
invalid_tile_mapping
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when the tile mapping passed to the UserTilePartitioner is invalid.

struct
link_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when the linking stage for codelets fails.
output is the output from the linker command.
Public Functions

link_error
(const char *s, const char *out = "")¶


struct
memory_elem_constraints_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when an invalid memory element constraint has been provided in a codelet.

struct
missing_perf_estimate
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when an Engine is constructed with profiling enabled but a vertex does not have a getPerfEstimate method specified.

struct
no_environment
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown, in the construction of a GraphProgEnv object, in mixedmode compilation, if there is no graphprogramming environment available, in particular if the program has not been compiled with the ‘popc’ commandline tool.

struct
no_size_specified
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown if the size of a field is not specified in a Graph object when an EngineBuilder object is constructed.

struct
overflow_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when an arithmetic overflow occurs within Poplar.

struct
parse_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when an input file or string cannot be parsed.

struct
poplar_error
: public runtime_error¶  #include <exceptions.hpp>
Base class for Poplar exceptions.
Subclassed by poplar::control_program_error, poplar::file_load_error, poplar::graph_connection_error, poplar::graph_cycle_error, poplar::graph_memory_allocation_error, poplar::graph_object_creation_error, poplar::graph_object_load_error, poplar::graph_program_compilation_error, poplar::graph_replication_error, poplar::index_error, poplar::invalid_machine_model, poplar::invalid_option, poplar::invalid_tile_mapping, poplar::link_error, poplar::memory_elem_constraints_error, poplar::missing_perf_estimate, poplar::no_environment, poplar::no_size_specified, poplar::overflow_error, poplar::parse_error, poplar::profiling_disabled, poplar::runtime_error, poplar::stream_connection_error, poplar::stream_memory_allocation_error, poplar::symbol_error, poplar::tensor_creation_error, poplar::tensor_io_state_error, poplar::type_error, poplar::unknown_field, poplar::unknown_vertex_type

struct
profiling_disabled
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown if profiling information is requested from an Engine but that Engine has not been constructed with profiling enabled.
Public Functions

profiling_disabled
()¶


struct
recoverable_runtime_error
: public poplar::system_runtime_error¶  #include <exceptions.hpp>
This exception is thrown when when running a program fails due to a system error that is likely to be transient.
getRecoveryAction() indicates what to do to recover from this error.
Public Functions

RecoveryAction
getRecoveryAction
() const¶ Return the action required to recover from error.

recoverable_runtime_error
(RecoveryAction recoveryAction, const std::string &s)¶

recoverable_runtime_error
(RecoveryAction recoveryAction, const char *s)¶
Private Members

RecoveryAction
recoveryAction
¶

RecoveryAction

struct
runtime_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when running a program on a system fails.
Subclassed by poplar::application_runtime_error, poplar::system_runtime_error

struct
stream_connection_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when an invalid attempt is made to connect a data stream.

struct
stream_memory_allocation_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when allocation of stream buffers fails.

struct
symbol_error
: public poplar::poplar_error¶

struct
system_runtime_error
: public poplar::runtime_error¶  #include <exceptions.hpp>
This exception is thrown when running a program fails due to an error in the system it is running on.
Subclassed by poplar::recoverable_runtime_error, poplar::unknown_runtime_error, poplar::unrecoverable_runtime_error

struct
tensor_creation_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown in the construction of a tensor if invalid arguments are provided to the tensor creation function or method.

struct
tensor_io_state_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when an attempt is made to mark a tensor as an input or output, but the argument references a view of a tensor, rather than a whole tensor.

struct
type_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when there is an error related to the field types of vertices, for example, when the source of an edge contains an input, the types of inputs and source field between an edge do not match, or when a field cannot be subscripted.

struct
unknown_field
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when a field name is specified that does not exist in the graphprogramming environment.

struct
unknown_runtime_error
: public poplar::system_runtime_error¶  #include <exceptions.hpp>
This exception is throw when execution fails due to a system error where the cause cannot be automatically determined, for example a timeout without a specific error being raised.

struct
unknown_vertex_type
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when a vertex type name is specified that does not exist in the graph programming environment.

struct
unrecoverable_runtime_error
: public poplar::system_runtime_error¶  #include <exceptions.hpp>
This exception is thrown when execution fails due to a system error that is likely to persist.
The system should be taken out of operation for analysis and repair.

enum
Graph classes¶
poplar/CodeletFileType.hpp¶

namespace
poplar
Poplar classes and functions.
Enums

enum
CodeletFileType
¶ Values:

enumerator
PreprocessedAsmSource
¶ A graph assembly language source file.

enumerator
AsmSource
¶ A graph assembly language file with preprocessor macros.

enumerator
CSource
¶ A graph C source file.

enumerator
CppSource
¶ A graph C++ source file.

enumerator
IrSource
¶ A graph LLVM IR source file.

enumerator
Object
¶ A graph program object file.

enumerator
Auto
¶ Auto detect based on file name.

enumerator
Functions

CodeletFileType
getCodeletFileType
(const char *path)¶

std::string
getExtensionFromFileType
(CodeletFileType type)¶

std::string
getLanguageOption
(CodeletFileType type)¶

enum
poplar/DataStream.hpp¶

namespace
poplar
Poplar classes and functions.

class
DataStream
¶  #include <DataStream.hpp>
An object representing a stream for communicating between the host and the device.
A stream is a unidirectional communication from the host to the device, or from the device to the host.
The maximum buffer size for each stream is 128 MBytes.
Public Functions

DataStream
()¶

DataStream
(const DataStream&)¶

DataStream
(DataStream&&) noexcept¶

~DataStream
()¶

DataStream &
operator=
(const DataStream&)¶

DataStream &
operator=
(DataStream&&) noexcept¶

unsigned
replicationFactor
() const¶

ReplicatedStreamMode
replicatedMode
() const¶

DataStreamType
type
() const¶


class
RemoteBuffer
¶  #include <DataStream.hpp>
A remote buffer is a region of remote (meaning not on the IPU) memory that is used as a cache.
It is implemented as two DataStreams: one to write to the remote memory, the other to read the data back to the IPU.
Public Functions

RemoteBuffer
()¶

RemoteBuffer
(const RemoteBuffer&)¶

RemoteBuffer
(RemoteBuffer&&) noexcept¶

~RemoteBuffer
()¶

RemoteBuffer &
operator=
(const RemoteBuffer&)¶

RemoteBuffer &
operator=
(RemoteBuffer&&) noexcept¶

DataStream
getIpuToHostStream
() const¶

DataStream
getHostToIpuStream
() const¶

size_t
numElements
() const¶

size_t
getRepeats
() const¶

bool
isRearrangeOnHost
() const¶

bool
isOptimisedForMemory
() const¶

bool
operator==
(const RemoteBuffer &b) const¶

bool
operator!=
(const RemoteBuffer &b) const¶


namespace
core

class
poplar/DataStreamType.hpp¶

namespace
poplar
Poplar classes and functions.
Enums

enum
DataStreamType
¶ An enumeration to represent the different types of DataStream or stream components of a RemoteBuffer.
Values:

enumerator
HostToDeviceFIFO
¶ A DataStream from host to device.

enumerator
DeviceToHostFIFO
¶ A DataStream from device to host.

enumerator
HostToDeviceBuffer
¶ A stream from host to device in a remote buffer.

enumerator
DeviceToHostBuffer
¶ A stream from device to host in a remote buffer.

enumerator
Functions

bool
isDeviceToHost
(DataStreamType type)¶

bool
isHostToDevice
(DataStreamType type)¶

bool
isRemoteBuffer
(DataStreamType type)¶

enum
poplar/Graph.hpp¶

namespace
poplar
Poplar classes and functions.

class
Graph
¶  #include <Graph.hpp>
This class represents a graph program to be executed on the IPU.
Public Types

using
TraceFn
= std::function<void()>¶ Record some compilation time as part of the graph construction phase.
 Deprecated:
Tracing via Poplar is deprecated and will be removed.
 Parameters
name
: The name of the phase. This can be composed of multiple parts.fn
: The construction code to be timed.
Public Functions

Graph
()¶

Graph
(const Target &target, replication_factor r = replication_factor(1))¶ Construct a graph object.
This constructor creates a Graph object using the given graph programming environment.
 Parameters
target
: The target the graph is being constructed to work with.r
: Number of times graph is to be replicated (default is no replication)

Graph
(const Device &device, replication_factor r = replication_factor(1))¶ Construct a graph object.
This constructor creates a Graph object using the given graph programming environment.
 Parameters
device
: The device the graph is being constructed to work with.r
: Number of times graph is to be replicated (default is no replication).

~Graph
()¶

bool
addCodelets
(StringRef src, CodeletFileType type = CodeletFileType::Auto, StringRef compileFlags = "")¶ Add a codelet to the graph.
A codelet is either a C, C++, or assembly source file, or a
.gp
object file. If a source file is given it is compiled for the graph’s target and then loaded into the graph. If it is an object file then it is loaded into the graph.Symbols that codelets use are not resolved until the engine is built, so codelets can use symbols from each other by calling addCodelets() for each source or object file (or passing a list of files as a vector).
 Return
True if the codelet is added to the graph successfully, or false if the codelet already existed in the graph.
 Parameters
src
: The path to a source or object file containing codelets.type
: Specify the type of the codelet (source or precompiled). If the value is CodeletFileType::Auto is used, the type is determined from the filename extension.compileFlags
: Additional flags to pass to the compiler if using source code. For example,g
to generate debug info.

bool
addCodelets
(StringRef src, CodeletFileType type, StringRef compileFlags, std::ostream &compileOutput)¶ Add a codelet to the graph and write error messages from the compilation process to the given output stream.
By default they are printed to
cerr
.

bool
addCodelets
(ArrayRef<std::string> xs, StringRef compileFlags = "")¶ Add a set of codelets to the graph.
These codelets can depend on each other. For example, symbols defined in one can be used by any other. The order is not important.
 Return
True if all the codelets are added successfully, or false if any of the codelets are not added because they already exist in the graph.

void
addCodelets
(std::stringstream &stream, StringRef compileFlags = "", CodeletFileType type = CodeletFileType::CppSource)¶ Take a codelet contained within the
stream
and store it in a temporary file which we then use to compile the codelet.The language type of the codelet in the
stream
can be specified, defaulting to C++.Note that this is not idempotent, in other words, this function will throw an exception if called twice with the same stream, unlike the overload that takes a file path instead.

void
addCodelets
(std::stringstream &stream, StringRef compileFlags, std::ostream &compileOutput, CodeletFileType type = CodeletFileType::CppSource)¶

VertexRef
addVertex
(ComputeSet cs, StringRef vertexType)¶ Add a vertex to the graph.
 Parameters
cs
: The compute set to add the vertex to.vertexType
: The name of the type of the vertex. This must be a declared vertex type in the graph programming environment used to create the graph builder.

VertexRef
addVertex
(ComputeSet cs, StringRef vertexType, ArrayRef<ConnectionDesc> connections)¶ Add a vertex to the graph and connect graph elements to some of its fields.
This variant of add vertex allows you to pass in a list of connection descriptions to connect graph elements to fields of the newly created vertex. The connection descriptions can be initialized with:
{ string, Tensor }
 connect a tensor to a field.{ string, FieldRef, bool }
 connect a vertex field to a field.{ string, T v }
 connect a constant value to an input field.
For example, the following:
addVertex(cs, "MyVertex", {{"x", tensor[4]}, {"y", v["z"], false}});
Will create a vertex and connect a tensor to its x field and the vertex field v[“z”] to its y field.
 Parameters
cs
: The compute set to add the vertex to.vertexType
: The name of the type of the vertex. This must be a declared vertex type in the graph programming environment used to create the graph builder.connections
: A list of connection descriptions.

VertexRef
addExternalExchangeVertex
(ComputeSet cs, StringRef vertexType, unsigned incomingDownCount, bool usesEastEdge, bool sendsXReq)¶ Add an external exchange vertex to the graph.
A compute set can contain at most one external exchange vertex per tile. External exchange vertices cannot be mixed with non external exchange vertices in the same compute set. Before an external vertex is called we set the
INCOMING_DCOUNT
andINCOMING_MUX
mux registers and synchronize all tiles containing external exchange vertices. Parameters
cs
: The compute set to add the vertex to.vertexType
: The name of the type of the vertex. This must be a declared vertex type in the graph programming environment used to create the graph builder.incomingDownCount
: The value to set theINCOMING_DCOUNT
register to.usesEastEdge
: Whether the vertex uses an east edge exchange block. TheINCOMING_MUX
register is set to point to either the east edge or west edge depending on this argument.sendsXReq
: Whether this vertex is responsible for sending the XREQ packet. There must be at most one tile per exchange block context that sends the XREQ and the tile must be the same in every compute set containing external exchange vertices.

Tensor
addVariable
(const Type &type, ArrayRef<std::size_t> shape, const DebugContext &debugContext = {})¶ Add a variable to the graph.
If using this function with a target with multiple tiles then the variable will initially have no tile mapping. It is expected that the tile mapping will be set later with Graph::setTileMapping(). If the target of the graph has only one tile then the tensor will be automatically mapped to that tile.
 Parameters
type
: The type of the elements of the variable.shape
: The shape of the variable.name
: An optional name to identify the variable for debugging/profiling purposes.returns
: A tensor referring to the variable in the graph.

Tensor
addVariable
(const Type &type, ArrayRef<std::size_t> shape, VariableMappingMethod mappingMethod, const DebugContext &debugContext = {})¶ Add a variable to the graph.
 Return
A tensor referring to the variable in the graph.
 Parameters
type
: The type of the elements of the variable.shape
: The shape of the variable.mappingMethod
: The method to use to initially map the variable to tiles.name
: An optional name to identify the variable for debugging/profiling purposes.

template<typename
T
>
TensoraddConstant
(const Type &type, ArrayRef<std::size_t> shape, ArrayRef<T> values, const DebugContext &debugContext = {"<const>"})¶ Add a constant to the graph.
A constant tensor is a tensor with every element initialized.
 Parameters
type
: The type of the elements of the constant.shape
: The shape of the constant.values
: Vector of values to initialize tensor elements to.name
: An optional name to identify the variable for debugging/profiling purposes.

template<typename
T
>
TensoraddConstant
(const Type &type, ArrayRef<std::size_t> shape, T val, const DebugContext &debugContext = {"<const>"}, typename std::enable_if<TypeTraits::isSimpleType<T>()>::type* = nullptr)¶ Add a constant to the graph.
A constant tensor is a tensor with every element initialized to the same value. It cannot be connected to a vertex output.
 Parameters
type
: The type of the elements of the constant.shape
: The shape of the constant.val
: The value to initialize tensor elements to.name
: An optional name to identify the variable for debugging/profiling purposes.

template<typename
T
>
TensoraddConstant
(const Type &type, ArrayRef<std::size_t> shape, const T *val, const DebugContext &debugContext = {"<const>"}, typename std::enable_if<TypeTraits::isSimpleType<T>()>::type* = nullptr)¶ Add a constant to the graph with multiple cell values.
A constant tensor is a tensor with every element initialized to the same value. It cannot be connected to a vertex output.
 Parameters
type
: The type of the elements of the constant.shape
: The shape of the constant.val
: The value to initialize tensor elements to.name
: An optional name to identify the variable for debugging/profiling purposes.

Tensor
addConstant
(const Type &type, ArrayRef<std::size_t> shape, const void *val, const TypeTraits &traits, bool broadcast, const DebugContext &debugContext = {"<const>"})¶

Tensor
addConstantHalf
(const Type &type, ArrayRef<std::size_t> shape, uint16_t val, const DebugContext &debugContext = {"<const>"})¶ Add a constant to the graph, where the host data is type IEEE half.
A constant tensor is a tensor with every element initialized to the same value. It cannot be connected to a vertex output.
 Parameters
type
: The type of the elements of the constant.shape
: The shape of the constant.val
: The value to initialize tensor elements to.

Tensor
addConstantHalf
(const Type &type, ArrayRef<std::size_t> shape, const uint16_t *val, const DebugContext &debugContext = {"<const>"})¶ Add a constant to the graph with multiple cell values, where the host data is type IEEE half.
A constant tensor is a tensor with every element initialized to the same value. It cannot be connected to a vertex output.
 Parameters
type
: The type of the elements of the constant.shape
: The shape of the constant.val
: The value to initialize tensor elements to.

Tensor
clone
(const Type &type, const Tensor &t, const DebugContext &debugContext = {}, TensorCloneMethod method = TensorCloneMethod::PRESERVE_ORDER_UNLESS_ALIASES)¶ Add a tensor to the graph that has the same size and tile mapping as Tensor t.
 Parameters
type
: The element type of the new tensor.t
: The tensor to be cloned.name
: A debug name to give to any new tensors allocated in the graph during the clone. If this is empty then the debug names will be derived from existing tensor debug names.method
: The method to use for the cloning (decides whether to preserve ordering/aliasing in the new tensor).

Tensor
cloneN
(const Type &type, const Tensor &t, std::size_t N, const DebugContext &debugContext = {}, TensorCloneMethod method = TensorCloneMethod::PRESERVE_ORDER_UNLESS_ALIASES, TensorCloneDuplicationMethod duplicationMethod = TensorCloneDuplicationMethod::DUPLICATE_BY_OUTER_DIMENSION)¶ Clone a tensor N times.
Given a tensor of shape [D1, D2, … Dn], this function will create a new tensor of shape [N, D1, D2, …, Dn] where each of the N subtensors is a clone of the original tensor (meaning it has the same layout and tile mapping).
 See
 Parameters
type
: The element type of the new tensor.t
: The tensor to clone.N
: The replication factor to clone with.name
: The name for the new variables created.method
: The tensor cloning method (see Graph::clone()).duplicationMethod
: The behaviour used when a tensor is cloned.

Tensor
clone
(const Tensor &t, const DebugContext &debugContext = {}, TensorCloneMethod method = TensorCloneMethod::PRESERVE_ORDER_UNLESS_ALIASES)¶ Add a tensor to the graph that has the same size and tile mapping as Tensor t.
 Parameters
t
: The tensor to be cloned.name
: A debug name to give to any new tensors allocated in the graph during the clone. If this is empty then the debug names will be derived from existing tensor debug names.method
: The method to use for the cloning (decides whether to preserve ordering/aliasing in the new tensor).

Tensor
cloneN
(const Tensor &t, std::size_t N, const DebugContext &debugContext = {}, TensorCloneMethod method = TensorCloneMethod::PRESERVE_ORDER_UNLESS_ALIASES, TensorCloneDuplicationMethod duplicationMethod = TensorCloneDuplicationMethod::DUPLICATE_BY_OUTER_DIMENSION)¶ Clone a tensor N times.
Given a tensor of shape [D1, D2, … Dn], this function will create a new tensor of shape [N, D1, D2, …, Dn] where each of the N subtensors is a clone of the original tensor (meaning it has the same layout and tile mapping).
 See
 Parameters
t
: The tensor to clone.N
: The replication factor to clone with.name
: The name for the new variables created.method
: The tensor cloning method (see Graph::clone()).duplicationMethod
: The behaviour used when a tensor is cloned.

void
connect
(FieldRef field, const Tensor &tensor)¶ Connect a tensor to a vertex field.
This function connects an a tensor with a vertex field. If the vertex field is an scalar input/output then a simple edge is added (and the tensor must be of zero dimension; in other words, a scalar).
If the vertex field is an input/output of a vector then a vector edge is added (and the tensor must be of dimension 1).
If the vertex field is a vector of inputs or outputs then the size of the field is set to the correct size and edges are added for every element of the tensor tensor (and the tensor must be of dimension 1).
If the vertex field is a vector of input or output vectors then the tensor must be 2dimensional. In this case, the size of the vector field is set to the size of the first dimension and vector edges are added for every subvector of the two dimensional tensor.
 Parameters
tensor
: The tensor.field
: Reference to the vertex field to connect.

template<typename
T
>
voidconnect
(FieldRef field, T v, typename std::enable_if<TypeTraits::isSimpleType<T>()>::type* = nullptr)¶ Connect a constant value to an input field.
This method creates a singleelement tensor containing a specified value and connects that tensor element to an input field.
 Parameters
v
: The value to connect.field
: The field to connect to.

void
connect
(FieldRef field, ArrayRef<Tensor> tensors)¶ Connect a vector of tensors to a vertex field.
This function connects an vector a tensors with a vertex field. The field must be a vector of inputs or outputs. The field will be sized to the provided vector and each element will be connect to the corresponding element of the field.
 Parameters
tensors
: The vector of tensors.field
: Reference to the vertex field to connect.

void
setPerfEstimate
(const VertexRef &v, std::uint64_t cycles, std::uint64_t flops = 0)¶ Set the performance estimate for a vertex.
 Parameters
v
: The vertex to set the estimate for.cycles
: The number of cycles that this vertex will use when run.flops
: The number of flops that this vertex will use when run.

void
setPerfEstimate
(const VertexRef &v, const VertexPerfEstimate &estimate)¶ Set the performance estimate for a vertex.
 Parameters
v
: The vertex to set the estimate for.estimate
: The performance estimates for this vertex when run.

VertexPerfEstimate
getPerfEstimate
(const VertexRef &v) const¶ Get the performance estimate for the specified vertex.
 Return
The performance estimates used when this vertex is run.
 Parameters
v
: The vertex to get the estimate for.
 Exceptions
missing_perf_estimate
: if the performance estimate is not available (for example, because the graph hasn’t been executed yet).

void
registerPerfEstimator
(StringRef vertexTypeName, PerfEstimateFunc f)¶  Parameters
vertexTypeName
: Type of vertex to register the estimator for.f
: Callback function that will compute a performance estimate for all vertices of this type.

unsigned
getNumVertices
(void) const¶ Get the number of vertices currently in the graph.
 Return
The numbers of vertices currently in the graph.

ComputeSet
addComputeSet
(const DebugContext &debugContext = {})¶ Create a compute set within the graph.
 Return
The reference to the compute set.
 Parameters
name
: An optional identifier for the compute set that may be used during profiling/debugging.

void
setFieldSize
(FieldRef field, std::size_t size)¶ Set the size of a vector field.
 Parameters
field
: The reference to the field.size
: The size of the field.

std::size_t
getFieldSize
(FieldRef field) const¶ Get the size of a vector field.
 Return
The size of the field.
 Parameters
field
: The reference to the field.

std::size_t
getMaxFieldDim
(StringRef vertexName, StringRef fieldName, unsigned dimIndex) const¶ Find the maximum size for a dimension of a field.
 Parameters
vertexType
: The type of vertexfield
: The fielddimIndex
: The index of the dimension
 Exceptions
index_error
: If there is no such dimensionpoplar_error
: If the field is not indexable

double
getMaxVertexFieldValue
(StringRef vertexName, StringRef fieldName) const¶ Find the maximum value that can be represented by an element of a field.
 Parameters
vertexType
: The type of vertexfield
: The field

template<typename
T
>
voidsetInitialValue
(FieldRef field, T val, typename std::enable_if<TypeTraits::isSimpleType<T>()>::type* = nullptr)¶ Set the initial value of a field.
 Parameters
field
: The reference to the field.val
: The value to set the field to when the graph engine is created.

template<typename
T
>
voidsetInitCallback
(FieldRef field, LateInitCallback<T> callback, typename std::enable_if<std::is_arithmetic<T>::value>::type* = nullptr)¶ Set the init callback for a field; the callback function will be called after graph construction and must return the init value of the field.
This can be called instead of calling setInitialValue(), or both can be called for the field, to ensure that the field has a (at least partially) valid starting value, for instance it if needs to be retrieved in an early stage of graph compilation, before storage allocation (for instance during cycle estimation)
Note that you must explicitly provide the template parameter
T
in the specialisation when using this function. For example:setInitCallback<uint16_t>(vertex["size"], sizeCallback)
This is because the compiler will not be able to detect the correct type from the callback parameter.
 Parameters
field
: The reference to the field.callback
: The callback that will return the value for the field.<unnamed>
: This exists only to allow the insertion of theis_arithmetic<T>
check for the typeT
.

void
setInitialValueHalf
(FieldRef field, uint16_t val)¶ Set the initial value of a field of type IEEE half.
 Parameters
field
: The reference to the field.val
: The value to set the field to when the graph engine is created.

template<typename
T
>
voidsetInitialValue
(FieldRef field, ArrayRef<T> val)¶ Set initial values of a vector field.
 Parameters
field
: The reference to the vector field.val
: A vector value to set the field to when the graph engine is created.

void
setInitialValueHalf
(FieldRef field, ArrayRef<uint16_t> val)¶ Set initial values of a vector field of type IEEE half.
 Parameters
field
: The reference to the vector field.val
: A vector value to set the field to when the graph engine is created.

template<typename
T
>
voidsetInitialValue
(const Tensor &t, T val, typename std::enable_if<TypeTraits::isSimpleType<T>()>::type* = nullptr)¶ Set the initial value of a tensor element.
 Parameters
t
: The tensor representing the value to set.val
: The value to set the field to when the graph engine is created. A buffer of values can be provided to set the elements of a nonscalar tensor.

void
setInitialValueHalf
(const Tensor &t, uint16_t val)¶ Set the initial value of a tensor element of type IEEE half.
 Parameters
t
: The tensor representing the value to set.val
: The value to set the field to when the graph engine is created. A buffer of values can be provided to set the elements of a nonscalar tensor.

void
createHostWrite
(StringRef handle, const Tensor &t, bool rearrangeOnHost = false)¶ Mark a Tensor as being available as the destination of host to device copies.
This is a convenience function that creates a hosttodevice FIFO, and a Copy program that copies data from the FIFO to the tensor. When you call Engine::writeTensor() it copies the input data to the FIFO and then executes the Copy program on the device.
 See
 Parameters
handle
: A name to be associated with this host copy.t
: The tensor to be marked as an input.rearrangeOnHost
: Save IPU memory at the cost of exchange speed by rearranging the data on the host before sending it to the IPU, rather than doing an internal exchange. Note that due to alignment and size requirements of host exchange packets this may still require part of the transfer to be received to a temporary variable and copied to its destination.

void
createHostRead
(StringRef handle, const Tensor &t, bool rearrangeOnHost = false)¶ Mark a Tensor as being available as the source of device to host copies.
This is a convenience function that creates a devicetohost FIFO, and a Copy program that copies data to the FIFO from the tensor. When you call Engine::writeTensor() it executes the Copy program on the device and then outputs the data from the FIFO.
 See
 Parameters
handle
: A name to be associated with this host copy.t
: The tensor to be marked as an output.rearrangeOnHost
: Save IPU memory at the cost of exchange speed by sending data in any order and rearranging it on the host, rather than doing an internal exchange before sending it.

DataStream
addHostToDeviceFIFO
(StringRef handle, const Type &elementType, std::size_t numElements, ReplicatedStreamMode replicatedMode = ReplicatedStreamMode::REPLICATE, const OptionFlags &options = {})¶ Add a data stream to the graph for copying data from the host to the device.
Supported options:
splitLimit
Integer [=50 * 1024 * 1024]The maximum size of the FIFO before it is split into multiple FIFOs. This is a useful option to avoid exceeding the stream buffer size limit. If the original FIFO is larger than the specified split limit, then it is replaced by a number of FIFOs which represent chunks of the original FIFO, and are read from sequentially. Setting
splitLimit
to 0 orUINT_MAX
disables this option.bufferingDepth
Integer [=1]The depth of the FIFO which can be prefetched before being read by the device. By default the FIFO size is 1, so it prefetches a single entry, after it has been read, to refill the FIFO. Increasing the size of the FIFO allows for prefetching of multiple entries, increasing the probability there will be a valid entry in the FIFO for the device to read before falling back to synchronously fetching the next entry.
addressSpace
(pageTable, addressTranslationTable) [=pageTable]The type of address mapping used by the hardware to translate an exchange address to a host physical address.
pageTable: The stream uses a lookup table which maps one memory page per entry.
addressTranslationTable: This uses a translation table. This table contains very few entries but each of them can map large regions. This type of address mapping is only supported for replicated streams. It is also necessary to set the Target option
gatewayMode
to false.
 Parameters
handle
: A name to be associated with this stream.elementType
: The type of data in the stream.numElements
: The number of elements to be transferred from the stream by a Copy program.replicatedMode
: How the stream is replicated if this is a replicated graph.options
: List of options.

DataStream
addDeviceToHostFIFO
(StringRef handle, const Type &elementType, std::size_t numElements, const OptionFlags &options = {})¶ Add a data stream to the graph for copying data from the device to the host.
Supported options:
splitLimit
Integer [=50 * 1024 * 1024]The maximum size of the FIFO before it is split into multiple FIFOs. This is a useful option to avoid exceeding the stream buffer size limit. If the original FIFO is larger than the specified split limit, then it is replaced by a number of FIFOs which represent chunks of the original FIFO, and are read from sequentially. Setting
splitLimit
to 0 orUINT_MAX
disables this option.
 Parameters
handle
: A name to be associated with this stream.elementType
: The type of data in the stream.numElements
: The number of elements to be transferred to the stream by a Copy program.options
: List of options.

RemoteBuffer
addRemoteBuffer
(StringRef handle, const Type &elementType, std::size_t numElements, std::size_t repeats = 1, bool rearrangeOnHost = false, bool optimiseMemory = false)¶ Add a remote buffer to the graph.
A remote buffer is memory outside the IPU which can be read and written by the IPU. A read returns the last written value. The remote buffer is (
repeats
*numElements
* sizeof(elementType
) + padding) bytes in size. Padding is added to meet any alignment constraints of the hardware. Parameters
handle
: A name to be associated with this remote buffer.elementType
: The type of data in the remote buffer.numElements
: The number of elements to be transferred to the remote buffer by a Copy program.repeats
: The buffer can store multiple blocks of data to be transferred. The total number of data elements in the buffer isnumElements
*repeats
.rearrangeOnHost
: Perform any necessary data rearrangement on the on the host instead of on the IPU.optimiseMemory
: Optimise for memory use rather than speed.

void
outputVertexGraph
(std::ostream &outputStream, ArrayRef<program::Program> progs = {}) const¶ Output the vertex graph to a stream in dot file format.
 Parameters
outputStream
: The C++ stream to output the dot file onto.

void
outputComputeGraph
(std::ostream &outputStream, ArrayRef<program::Program> progs = {}) const¶ Output the compute graph to a stream in dot file format.
 Parameters
outputStream
: The C++ stream to output the dot file onto.

void
setTileMapping
(VertexRef v, unsigned tileNum)¶ Map a vertex to a specific tile on the device.
 Parameters
v
: Reference to the vertex to map.tileNum
: The tile number to map the vertex to.

void
setTileMapping
(const Tensor &t, unsigned tileNum)¶ Map a tensor slice to a specific tile on the device.
 Parameters
t
: The tensor or tensor slice to map.tileNum
: The tile number to map to.

TileToTensorMapping
getTileMapping
(const Tensor &t, bool requireComplete = true) const¶ Inspect the tile mapping of a tensor.
 Return
The mapping from tiles to a vector of intervals mapped to the tile (implemented as vector indexed by the tile number). The lower and upper bound of each interval are elements number in the flattened tensor.
 Parameters
t
: The tensor to inspect.requireComplete
: Ift
is not fully mapped andrequireComplete
is true then an invalid_tile_mapping exception will be thrown.

TileToTensorMapping
getTileMapping
(const Tensor &t, bool *isComplete) const¶ Inspect the tile mapping of a tensor.
 Return
The mapping from tiles to a vector of intervals mapped to the tile (implemented as vector indexed by the tile number). The lower and upper bound of each interval are elements number in the flattened tensor.
 Parameters
t
: The tensor to inspectisComplete
: If nonnull, updated to indicate whether the mapping is complete.

TileToTensorMapping
getVariableTileMapping
(const Tensor &t) const¶ Inspect the tile mapping of a tensor.
This excludes any constant regions.
 Return
The mapping from tiles to a vector of intervals mapped to the tile (implemented as vector indexed by the tile number). The lower and upper bound of each interval are elements number in the flattened tensor.
 Parameters
t
: The tensor to inspect

void
setTileMapping
(const Tensor &t, const TileToTensorMapping &mapping)¶ Set the tile mapping of a tensor based on an explicit map from tiles to tensor intervals.
 Parameters
t
: The tensor to mapmapping
: The mapping from tiles to a vector of intervals to be placed on that tile (implemented as vector indexed by the tile number). The lower and upper bound of each interval are elements number in the flattened tensor.

Tensor
getVariable
(VariableRef v) const¶ Get a tensor representing an entire variable.
 Return
A Tensor object representing that variable.
 Parameters
v
: The variable to retrieve.

bool
isConstant
(VariableRef v) const¶ Check whether a variable reference refers to a constant.
When Graph::addConstant() is called, a variable is created to represent that constant. This call checks whether a variable was created by that method or by Graph::addVariable().
 Return
True if and only if the variable refers to a constant.
 Parameters
v
: The variable to examine.

std::vector<std::vector<Interval>>
getSortedContiguousRegions
(const Tensor &t, ArrayRef<Interval> regions, bool removeAliasedIntervals = false, std::vector<std::size_t> *aliases = nullptr) const¶ Get a list of sequences of intervals over a tensor such that each sequence represents a contiguous region of memory.
 Return
A list of sequences of intervals. The intervals will cover the same elements as the intput tensor.
 Parameters
t
: The tensor to get intervals over.regions
: A list of intervals representing the elements to sort in to contiguous sequences in memory.removeAliasedIntervals
: If true, remove intervals which alias others in the given regions from the result.aliases
: Optional list of indices for each region in the returned intervals where an index is always the same for a region representing the same underlying elements in memory. If this is nullptr, then no aliases will be returned.

void
reorderToSimplify
(Tensor *t, ArrayRef<Tensor*> ts, bool requireSimplestOrder = true) const¶ Reorder a set of tensors in order to simplify the view on data.
This function will update
t
to be a (simpler) reordered view on the same data. The same reordering will be applied to all elements ofts
. The reordering will be the same for all tensors, so orderinvariant or elementwise operations ont
andts
can still be performed.The main purpose of this function is to provide a way to implement more efficient graph construction of elementwise or orderinvariant operations.
If
requireSimplestOrder
is set to true then, after execution,t
will consist of the minimum number of possible contiguous regions. If not, then no guarantee is give on the order oft
.All the tensors provided to this function must be of rank 1 (flattened tensors) and have the same number of elements.

TensorRearranger
getSimplifyingRearranger
(const Tensor &t) const¶ Get a rearranger object for simplifying the underlying representation of a tensor.
This rearranger will rearrange the tensor to simplify the underlying representation to reduce the processing time for functions such as getContiguousRegions(), getTileMapping().
The actual reordering is unspecified and depends on the underlying representation with the Poplar library (however it can always be undone using the TensorRearranger object).
 Return
A TensorRearranger object that can perform the rearrangement.
 Parameters
t
: The tensor to simplify.

Tensor
findUnbroadcastTensor
(const Tensor &t) const¶ Attempt to determine the shape of a Tensor prior to it having been broadcast.
Under some circumstances this may not be possible, failure is indicated by the returned tensor having the same shape as the input tensor
 Return
A tensor which will be set to the unbroadcast (sliced from
t
) tensor if it is possible to do so. Each dimension of the returned tensor will be a factor of the same dimension of the input tensor. The returned tensor will have the same rank as the input tensor. If it is not possible to determine the shape of the unbroadcast tensor the input tensor will be returned. Parameters
t
: The input tensor

void
serializeTensors
(std::ostream &out, ArrayRef<Tensor> tensors, SerializationFormat format) const¶ Serialize a set of tensors to JSON or CapnProto.
The tensors must all be from this graph or an exception is thrown. The information saved is:
The type, shape and expression of the tensors.
The type and number of elements of any variables used.
This is intended to be used for debugging, testing and visualisation.
 Parameters
out
: Stream to write to.tensors
: A set of tensors to serialize.format
: Serialize in JSON or CapnProto format. JSON is pretty printed.
 Exceptions
poplar_error
: if any tensor is not from this graph. CapnProto may also throw an exception if serialization fails.

std::vector<Tensor>
deserializeTensors
(std::istream &in, SerializationFormat format)¶ Deserialize a set of tensors from a CapnProto message.
JSON deserialization is not currently supported and an exception will be thrown if format is SerializationFormat::JSON.
This will recreate the tensors in this graph. It throws an exception on failure (for example, if the tensor type does not match the variable types). Whenever a variable is used by a tensor a new variable is added to the graph.
The layout of the tensors and variables should be the same as when they were serialized.
This function is primarily intended for testing and benchmarks. You should not use it as a general method of creating tensors.
 Return
The deserialized set of tensors.
 Parameters
in
: A stream from which serialised tensor data can be read.format
: Must be SerializationFormat::Binary.

Graph
createVirtualGraph
(unsigned numTilesPerIPU)¶ Create a “virtual” graph using a subset of the target’s tile.
This method returns a graph object that references the same state as this graph but has a virtual target than only uses a subset of the target’s tiles.
If the getTarget() method is called on the new graph it will return a target with the new number of tiles.
 Return
The virtual graph object.
 Parameters
numTilesPerIPU
: The number of tiles per IPU for the new graph to use.

Graph
createVirtualGraph
(unsigned lowerTile, unsigned upperTile)¶ Create a “virtual” graph that uses a subset of the target’s tiles.
This method returns a graph object that references the same state as this graph but has a virtual target than only uses a subset of the target’s tiles.
This variant of the method takes a tile range for the new virtual graph to use. The range is [
lowerTile
,upperTile
). This tile range must be contained within a single IPU.If the getTarget() method is called on the new graph it will return a target with the new number of tiles.
 Return
The virtual graph object.
 Parameters
lowerTile
: The starting tile of the tile range for the virtual graph to use.upperTile
: The upper bound of the tile range for the virtual graph to use. This is a noninclusive upper bound.

Graph
createVirtualGraph
(const std::vector<unsigned> &perIpuTiles)¶ Create a “virtual” graph that uses a subset of the target’s tiles.
This method returns a graph object that references the same state as this graph but has a virtual target than only uses a subset of the target’s tiles.
This variant of the method takes the set of tiles in each IPU that should be included in the new graph.
If the getTarget() method is called on the new graph it will return a target with the new number of tiles.
 Return
The virtual graph object.
 Parameters
perIpuTiles
: The tiles to include in the graph. Tiles are specified by their index in the IPU. Each tile index must be unique and less than the number of tiles per IPU.

Graph
createReplicatedGraph
(unsigned replicationFactor)¶ Create a replicated graph.
The replicated graph is a view on
replicationFactor
virtual subgraphs. Operations on the replicated graph are implicitly applied to each virtual subgraph, for example adding a variable to the replicated graph implicitly creates a variable in all of the underlying subgraphs.The replication factor must divide the number of tiles in the graph. If n is the number of tiles in this graph then:
the first subgraph contains tiles [0, n /
replicationFactor
)the second subgraph contains tiles [n /
replicationFactor
, 2n /replicationFactor
)and so on.
 Deprecated:
API for replicated graphs where replication factor is not supplied directly to the toplevel Graph constructor is deprecated and will be removed.

Graph
getTopLevelGraph
()¶ Return the top level graph.
The createVirtualGraph() and createReplicatedGraph() methods can be used to create graph objects that are views on an underlying graph. If this is a virtual or replicated graph then this function returns the top level underlying graph, otherwise it returns the current graph.

unsigned
getReplicationFactor
() const¶ Return the replication factor of the graph.

Tensor
addReplicationIndexConstant
(const DebugContext &debugContext = {})¶ Add a constant that is initialized with the replication index.

Tensor
getNonReplicatedTensor
(const Tensor &t) const¶ Given a replicated tensor return the underlying tensors in this graph that the replicated tensor is a placeholder for.
The tensor returned by this function has an extra outer dimension equal to the replication factor of the tensor in this graph and it is formed by concatenating the underlying tensors for each replicated subgraph in this dimension.
This function can only be used with replicated graphs created by the createReplicatedGraph function, not when the Graph is constructed.
 Deprecated:
API for replicated graphs where replication factor is not supplied directly to the toplevel Graph constructor is deprecated and will be removed.

void
serialize
(std::ostream &out, SerializationFormat format) const¶ Serialize a graph to JSON or binary (CapnProto) format.
This is equivalent to
serialize(out, {}, format)
.Note that this does not currently serialize every bit of graph data, so it cannot be used to save and reload a graph.
 Parameters
out
: Stream to write to.format
: Serialize in JSON or CapnProto format. JSON is pretty printed.

void
serialize
(std::ostream &out, ArrayRef<program::Program> progs, SerializationFormat format) const¶ Serialize a graph to JSON or binary (CapnProto) format.
Programs can be passed so that information about
Copy
programs can be serialized (the Graph class itself does not know about them).Note that this does not currently serialize every bit of graph data, so it cannot be used to save and reload a graph.
 Parameters
out
: Stream to write to.progs
: A set of programs that are searched for Copy programs. Information about the variables copied is serialised.format
: Serialize in JSON or CapnProto format. JSON is pretty printed.

Function
addFunction
(const program::Program &program)¶ Add a function to the graph.
A function is a partial control program that can be reused. By registering a repeated program as a function and calling it, less control code is generated than repeating the sequence.
 Return
The Function object that can be used by a Call program.
 Parameters
program
: The control program to register as a callable function.

unsigned
convertVirtualTileToPhysicalTile
(unsigned virtualTileId) const¶ Convert a virtual tile ID to a physical tile ID.
This provides the conversion required by the Graphcore communication library (GCL) to know which exchangeblock context a tile is associated with.
 Return
The corresponding physical tile ID.
 Parameters
virtualTileId
: A virtual tile ID.

unsigned
convertPhysicalTileToVirtualTile
(unsigned physicalTileId) const¶ Convert a physical tile ID to a virtual tile ID.
This provides the conversion required by the Graphcore communication library (GCL) to know which exchangeblock context a tile is associated with.
 Return
The corresponding virtual tile ID.
 Parameters
physicalTileId
: A physical tile ID.

unsigned
convertPhysicalTileToVirtualTile
(unsigned ipuId, unsigned physicalTileId) const¶ Convert a physical tile ID to a virtual tile ID.
This returns the virtual tile ID based on a parameters pair of IPU and and physical tile ID. This is required by the Graphcore communication library (GCL) to know what exchangeblock context a tile is associated with.
 Return
The corresponding virtual tile ID.
 Parameters
ipuId
: The IPU ID.physicalTileId
: The physical tile ID.
Private Functions

void
setInitialValue
(FieldRef field, const void *val, const TypeTraits&)¶

template<typename
T
>
voidsetInitCallback
(FieldRef field, LateInitCallback<T> callback, const TypeTraits&)¶

void
setInitialValue
(const Tensor &t, const void *val, const TypeTraits&)¶

void
connect
(FieldRef field, void *val, const TypeTraits&)¶

class
ConnectionDesc
¶ Public Functions

template<typename
T
>ConnectionDesc
(StringRef field, T v, typename std::enable_if<TypeTraits::isSimpleType<T>()>::type* = nullptr)¶
Private Types
Private Members

TypeTraits
traits
¶
Friends
 friend class Graph

template<typename

using

namespace
core

namespace
program
¶ Namespace for program classes.

class
poplar/GraphElements.hpp¶

namespace
poplar
Poplar classes and functions.
Typedefs
Functions

bool
operator==
(const ComputeSet &lhs, const ComputeSet &rhs)¶

class
ComputeSet
¶  #include <GraphElements.hpp>
A reference to a compute set within a graph.
This type provides a way to address compute sets within a graph.
Private Members

unsigned
computeset_id
¶

unsigned

class
FieldRef
¶  #include <GraphElements.hpp>
A reference to a field within a vertex instance.
This type provides a way to address fields (inputs or internal state) within a vertex. FieldRef’s are normally obtained using
VertexRef::operator[](StringRef fieldName)
, for example:VertexRef vertex = graph.addVertex(...); FieldRef input = vertex["input"]; graph.connect(input, ...);
A FieldRef can also be indexed, for example:
FieldRef input_5 = vertex["input"][5];
This is used when a field is a list of regions, for example a
Vector<Input<Vector<...>>> or an Input<VectorList<...>>
.Public Functions

FieldRef
()¶

FieldRef
operator[]
(std::size_t index) const¶ Access an element of a vector field.
Subscript a vector field to access the element at position
index
. Return
A reference to the field.
 Parameters
index
: The subscript of the field

bool
isIndexed
() const¶
Private Functions

FieldRef
(VertexRef vertex, StringRef fieldName)¶ FieldRef constructor from vertex id and field name.
Construct a FieldRef out of a vertex id and the name of the field.
Friends
 friend class VertexRef


class
Function
¶  #include <GraphElements.hpp>
A reference to a function stored within a graph.
Private Members

unsigned
function_id
¶

unsigned

class
VertexRef
¶  #include <GraphElements.hpp>
A reference to a vertex within a graph.
This type provides a way to address vertices within a graph.
Public Functions

VertexRef
()¶
Private Functions
Friends
 friend class core::GraphBuilder
 friend class Graph
 friend class FieldRef


namespace
core

bool
poplar/LateInitCallback.hpp¶

namespace
poplar
Poplar classes and functions.
Typedefs

using
LateInitCallback
= std::function<T(const VertexEdgeInfo&)>¶ A callback function of this type can be specified for a field of a vertex, instead of specifying an initialisation value with setInitialValue.
Will be called after the graph has been built. Will be passed information about the vertex fields. Needs to return the value for the field.

struct
VertexEdgeInfo
¶  #include <LateInitCallback.hpp>
Data structure that will be passed to the callback used for ‘late initialisation’ for vertex fields.
Contains address information for the other (edge) vertex fields to allow the callback to appropriately initialise the ‘late init’ field itself.
Public Members

std::map<std::string, std::vector<StorageInfo>>
storage
¶

struct
StorageInfo
¶

std::map<std::string, std::vector<StorageInfo>>

using
poplar/PerfEstimateFunc.hpp¶

namespace
poplar
Poplar classes and functions.
Typedefs

using
PerfEstimateFunc
= std::function<VertexPerfEstimate(const VertexIntrospector &v, const Target &target)>¶ Functions of this type can be used as performance estimator callbacks for new vertex types.

using
poplar/Tensor.hpp¶

namespace
poplar
Poplar classes and functions.
Enums

enum
UpsampleMethod
¶ Enum passed to Tensor::upsample(unsigned scale, unsigned dimension) specifying the upsampling method.
Values:

enumerator
REPEAT
¶ If dimension is of size s, for every i in [0, s), repeats the subtensor at index i scale times.
For example, with scale = 2 and dimension = 1: Shape(2,3) Shape(2x6) [[1, 2, 3], becomes [[1, 1, 2, 2, 3, 3], [4, 5, 6]] [4, 4, 5, 5, 6, 6]]
Note that a scale of 0 means repeat each tensor 0 times. So a (i, j, k, l) tensor upsampled with scale = 0 and dimension = 3 would become an (i, j, k, 0) tensor containing 0 elements.
scale = 1 is the identity operation.

enumerator
Functions

Tensor
concat
(ArrayRef<Tensor> ts, unsigned dimension = 0)¶ Concatenate several tensors.
The tensors are concatenated along the specified dimension.
 Return
The result of the concatenation
 Parameters
ts
: The tensors to concatenatedimension
: The number of the dimension to concatenate across

Tensor
concat
(const Tensor &first, const Tensor &second, unsigned dimension = 0)¶ Concatenate two tensors.
The tensors are concatenated along the specified dimension.
 Return
The result of the concatenation
 Parameters
first
: The first tensor to concatenatesecond
: The second tensor to concatenatedimension
: The number of the dimension to concatenate across

Tensor
append
(const Tensor &first, const Tensor &second, unsigned dimension)¶ Append a tensor as an element to another tensor.
 Return
The extended tensor
 Parameters
first
: The tensor to append tosecond
: The tensor to add as an element in the specified dimensiondimension
: The number of the dimension to append to

class
Tensor
¶  #include <Tensor.hpp>
A reference to a subset of tensor elements.
Public Functions

Tensor
()¶

~Tensor
()¶

Tensor
operator[]
(std::size_t i) const &¶ Get the subtensor indexed by i in the first dimension of the tensor.
 Parameters
i
: The index into the first dimension of the tensor.

Tensor
slice
(std::size_t begin, std::size_t end, unsigned dimension) const &¶ Get the subtensor given by a specific range [begin, end) in one dimension of the tensor.
 Parameters
begin
: The first element of the rangeend
: The upper bound to the range (the last element + 1)dimension
: The dimension to slice in

Tensor
slice
(std::size_t begin, std::size_t end) const¶ Get the subtensor given by a specific range [begin, end) in the first dimension of the tensor.
 Parameters
begin
: The first element of the rangeend
: The upper bound to the range (the last element + 1)

Tensor
slice
(const Interval ®ion, unsigned dimension = 0) const¶ Get the subtensor given by a specific range [begin, end) in one dimension of the tensor.
 Parameters
region
: The region to slicedimension
: The dimension to slice in

Tensor
slice
(ArrayRef<std::size_t> begin, ArrayRef<std::size_t> end) const¶ Get the subtensor given by slicing the tensor in multiple dimensions, starting at dimension 0.
Each pair begin[i], end[i] specifies that the tensor is sliced in dimension i by the range [begin[i], end[i]). The rank of the returned tensor is the same as the input tensor.
 Parameters
begin
: The lower bounds of the ranges used to slice the tensorend
: The upper bounds of the ranges used to slice the tensor

std::vector<Tensor>
slices
(ArrayRef<Interval> intervals, unsigned dimension = 0) const¶ Get a vector of slices.
 Return
A vector of slices where each slice is obtained by slicing this tensor between the two points in the given interval list.
 Parameters
intervals
: A list of intervals.dimension
: The dimension to slice in

std::vector<Tensor>
slices
(const std::vector<std::vector<Interval>> &intervals, unsigned dimension = 0) const¶ Get a vector of slices.
 Return
A vector of tensors where each tensor is the concatenation of the sequence of several slices, each slice being this tensor between the two point in the corresponding interval in the sequences given as input.
 Parameters
intervals
: A list of sequences of intervals.dimension
: The dimension to slice in

Tensor
index
(ArrayRef<std::size_t> indices) const¶ Get the subtensor indexed by the specified indices.
This is equivalent to repeatedly applying operator[] for each index in the vector of indices.
 Return
The subtensor indexed by the indices.
 Parameters
indices
: The indices used to index into the tensor.

Tensor
flatten
() const¶ Flatten the tensor.
 Return
A tensor consisting of all elements of the original tensor but with a single dimension.

Tensor
flatten
(unsigned dimBegin, unsigned dimEnd) const¶ Flatten the a subset of the dimensions of a tensor.
 Return
A tensor consisting of all elements of the original tensor with the specified dimension range flattened into one dimension.
 Parameters
dimBegin
: The first dimension to flattendimEnd
: One past the last dimension to flatten.

Tensor
reshape
(ArrayRef<std::size_t> shape) const¶ Reshape the tensor.
The reshaping operation changes the shape of the tensor but cannot change the total number of elements.
 Return
A tensor consisting of all elements of the original but with new dimensions.
 Parameters
shape
: The new shape of the tensor.

Tensor
dimShuffle
(ArrayRef<unsigned> permutation) const¶ Permute the dimensions of a tensor.
The dimShuffle operation reorders the tensor to a permutation of its dimensions. It can be seen as the generalized form of a matrix transpose.
Note that this operation does not create a copy of the tensor but returns a reordered view on this tensor’s data.
 Return
The shuffled tensor
 Parameters
permutation
: The permutation vector specifies a mapping from the output dimension to the input dimension. For example the permutation of {2, 0, 1} specifies that element element [a][b][c] in the original tensor is remapped to element [c][a][b] in the new tensor.

Tensor
dimShufflePartial
(ArrayRef<unsigned> source, ArrayRef<unsigned> destination) const¶ Permute some of a tensor’s dimensions.
dimShufflePartial reorders the tensors dimensions. The unspecified dimensions stay in the same relative order.
Note that this operation does not create a copy of the tensor but returns a reordered view on this tensor’s data.
 Return
The shuffled tensor.
 Parameters
source
: The dimensions to move.destination
: The index at which to move each source dimension.

Tensor
dimRoll
(unsigned dimIdx, unsigned newIdx = 0) const¶ Roll a specified dimension to the specified dimension.
The other dimensions remain in the same relative order
Note that this operation does not create a copy of the tensor but returns a reordered view on this tensor’s data.
 Return
The shuffled .
 Parameters
dimIdx
: The dimension to move.newIdx
: Its new location, default 0.

Tensor
reshapePartial
(unsigned beginIndex, unsigned endIndex, ArrayRef<std::size_t> newDims) const¶ Reshape a range of dimensions of a tensor.
reshapePartial reshapes the input tensor such that the total number of elements of the resultant tensor is the same as the input tensor.
Note that this operation does not create a copy of the tensor but returns a reshaped view on the input tensor’s data.
The following conditions define the valid use of this function:
1) beginIndex == endIndex
beginIndex and endIndex must each lie in the closed interval [0, rank()]. Singleton dimensions are added before beginIndex. The number of dimensions added is equal to the length of the newDims vector. For example:
Adds two singleton dimensions at indicies 0 and 1reshapePartial(0, {1, 1})
2) size(newDims) == 0 and beginIndex != endIndex
beginIndex must lie in the half closed interval [0, rank()) endIndex must lie in the half closed interval (0, rank()] The product of vector newDims must be 1. For example:
Removes singleton dimensions 1 and 2 from the tensorreshapePartial(1, 3, {})
3) size(newDims) != 0 and beginIndex != endIndex
beginIndex must lie in the half closed interval [0, rank()) endIndex must lie in the half close interval (0, rank()] The product of vector newDims must be equal to the product of the number of elements in the interval [beginIndex, endIndex)
The input dimensions [0, beginIndex) and [endIndex, rank()) are prepended and appended at the end of the tensor respectively. For example:
reshapePartial(1, 3, {10, 20, 30}) reshapePartial(1, 3, {10})
 Return
Reshaped view of tensor
 Parameters
beginIndex
: Index of the dimension from which reshape startsendIndex
: Index of the first dimension after reshape endsnewDims
: The new dimensions of the partial tensor

Tensor
expand
(ArrayRef<std::size_t> indices) const¶ Expand tensor by adding singleton dimensions at specified indices of tensor.
The rank is expanded by the size of dimensions to be added. To add more than one dimension at a given position, the same index shall be repeated.
 Return
A view of expanded tensor
 Parameters
indices
: Dimension indices before which the singleton dimensions are added

Tensor
squeeze
(ArrayRef<std::size_t> indices) const¶ Reduce dimension of tensor by removing singleton dimensions at specified indices of tensor.
 Return
A view of squeezed tensor
 Parameters
indices
: Indices of singleton dimensions which are removed

Tensor
subSample
(unsigned stride, unsigned dimension) const¶ Subsample the tensor.
Subsample this tensor by selecting every strideth element of the tensor in a specified dimension
 Return
The subsampled tensor
 Parameters
stride
: The size of the stridedimension
: The dimension to subsample in

Tensor
upsample
(unsigned scale, unsigned dimension, UpsampleMethod method) const¶ Upsample the tensor.
Note that this operation does not create a copy of the tensor but creates a view of the tensor’s data. The repeated data is represented by repeated views into the tensor.
 See
UpsampleMethod for descriptions of how the tensor can be upsampled.
 Return
The upsampled tensor.
 Parameters
scale
: The scaling factor, >= 0.dimension
: The dimension to upsample in.method
: The method by which to upsample the tensor.

Tensor
broadcast
(unsigned N, unsigned dimension) const¶ Broadcast/repeat the tensor along a specified dimension.
Create a view with this tensor repeated N times along a specified dimension.
 Return
The broadcast tensor.
 Parameters
N
: The number of times to repeat.dimension
: The dimension to broadcast in.

Tensor
reinterpret
(const Type &type) const¶ Reinterpret the tensor as a new type.
The new type must be the same size as the old type. See elementType() for a list of valid types and their sizes.
 Return
A tensor with the same shape and referencing the same data but of the new type.
 Parameters
type
: The type to reinterpret to

Tensor
reverse
(unsigned dimension) const¶ reverse this tensor along a specified dimension.
 Return
The reversed tensor.
 Parameters
dimension
: The dimension to reverse.

std::size_t
dim
(unsigned i) const¶ Get a dimension of the tensor.
 Parameters
i
: The index of the dimension to get.

std::vector<std::size_t>
shape
() const¶ Get the shape of the tensor.
 Return
A vector of all the dimensions of the tensor.

unsigned
rank
() const¶ Get the rank of the tensor.
 Return
The number of dimensions a tensor has.

bool
isContiguous
() const¶ Get whether the tensor is contiguous.

bool
containsAliases
() const¶ Get whether the tensor contains an alias to the same storage location.
 Return
True if the tensor contains an alias to the same storage location.

bool
containsConstant
() const¶ Get whether the tensor contains any constant tensors.
 Return
True if the tensor contains any constant tensors.

bool
isParallelWriteable
() const¶ Get whether the elements of this tensor can be written in parallel.
This is equivalent to !(containsAliases()  containsConstant()).
 Return
True if the tensor can be written in parallel.

const std::vector<Interval>
getContiguousRegions
() const¶ Get the contiguous regions of a tensor.
 Return
A vector of intervals in order representing regions of the tensor that are contiguous in the tensors storage ordering.

const std::vector<VariableInterval>
getVarRegions
() const¶ Get the contiguous regions of a tensor with reference to the variables allocated in the graph.
 Return
A vector of variable intervals (variable id, interval pairs) representing the regions of the tensor.

template<typename
T
>
boolgetConstantValue
(T *val) const¶ Read a single element of data from a tensor if it is a constant.
 Return
True if tensor is constant and data is read
 Parameters
val
: Buffer to which tensor data is copied to

bool
intersectsWith
(const Tensor &other) const¶ Return whether this tensor intersects with another tensor.
 Return
True if this tensor intersects with the other tensor.
 Parameters
other
: The tensor to compare with.

std::ostream &
output
(std::ostream &os) const¶ Display the expression representing the tensor on a stream.
 Return
The ostream written to
 Parameters
os
: The ostream to output to

std::ostream &
outputRegions
(std::ostream &os) const¶ Display the regions of the tensor on a stream.
 Return
The ostream written to
 Parameters
os
: The ostream to output to

void
dump
() const¶ Display the expression representing the tensor.

void
dumpRegions
() const¶ Display the regions of the tensor.

bool
valid
() const¶
Private Functions

bool
getConstantData
(void *dst, const TypeTraits &traits) const¶


namespace
core

enum
poplar/TensorCloneMethod.hpp¶

namespace
poplar
Poplar classes and functions.
Enums

enum
TensorCloneMethod
¶ Define behaviour when a Tensor is cloned.
 See
Values:

enumerator
PRESERVE_ORDER_AND_ALIASES
¶ Preserve the ordering and aliasing within the original tensor reference.

enumerator
CREATE_NEW_ORDER
¶ Create a new tensor with natural ordering based on the dimensions of the cloned tensor (in the same way as addTensor).

enumerator
PRESERVE_ORDER_UNLESS_ALIASES
¶ Preserve the ordering of the original tensor unless it contains aliases.
In the case of aliases, create a new tensor ordering and duplicate the aliased elements.

enumerator
GATHER_AND_PRESERVE_TILE_ORDER_AND_ALIASES
¶ Gather elements of the underlying variables that are mapped to the same tile so they form one contiguous region on the tile in the cloned tensor.
Contiguous regions on the tile and the aliasing of elements are preserved.

enum
TensorCloneDuplicationMethod
¶ Define behaviour when a Tensor is cloned and duplicated using Graph::cloneN.
If DUPLICATE_BY_TILE_CONTIGUOUS_REGION and a new order needs to be created (either via TensorCloneMethod::CREATE_NEW_ORDER or TensorCloneMethod::PRESERVE_ORDER_UNLESS_ALIASES) then Poplar will error.
Values:

enumerator
DUPLICATE_BY_OUTER_DIMENSION
¶ The multiple clones are concatenated in their outermost dimension.
i.e. the result is the same as concat(clone1, clone2, …, cloneN); There is no guarantee of any ordering constraints in memory between the clones.

enumerator
DUPLICATE_BY_TILE_CONTIGUOUS_REGION
¶ The underlying variables of the clones are contenated for each contiguous region on each tile.
Each clone will have the same contiguous regions on each tile but each of those regions will also form bigger contiguous regions across the N duplicates. This option is particular useful for efficient slicing/copying between the duplicates being cloned.

enumerator
Functions

std::string
toString
(const TensorCloneMethod &method)¶

std::string
toString
(const TensorCloneDuplicationMethod &method)¶

enum
poplar/TensorRearranger.hpp¶

namespace
poplar
Poplar classes and functions.

class
TensorRearranger
¶  #include <TensorRearranger.hpp>
A TensorRearranger is an object that can reorder the view on a tensor and undo that reordering.
Public Functions

TensorRearranger
()¶

TensorRearranger
(const TensorRearranger &other)¶

TensorRearranger
(TensorRearranger &&other) noexcept¶

TensorRearranger &
operator=
(const TensorRearranger &other) &¶

TensorRearranger &
operator=
(TensorRearranger &&other) & noexcept¶

~TensorRearranger
()¶

Tensor
undoRearrangement
(const Tensor &t) const¶ Undo the rearrangement done via the rearrange method.

std::vector<Interval>
rearrange
(ArrayRef<Interval> is) const¶ Apply the rearrangement to intervals.
is A list of intervals w.r.t the original tensor.
 Return
A list of equivalent intervals w.r.t the rearranged tensor.

std::vector<Interval>
undoRearrangement
(ArrayRef<Interval> is) const¶ Apply the undoing of the rearrangement to intervals.
is A list of intervals w.r.t the rearranged tensor.
 Return
A list of equivalent intervals w.r.t the original tensor.

bool
valid
() const¶


namespace
core

class
poplar/Type.hpp¶
Defines

POPLAR_DECLARE_EQUIV_TYPE
(T1, T2)¶

namespace
poplar
Poplar classes and functions.
Variables

template<typename
T
>
structequivalent_device_type
¶  #include <Type.hpp>
Template structure to relate a host type to a device type.
This structure is specialized to allow a program to relate a host type to a corresponding device type. For example::
poplar::Type t = equivalent_device_type<int>().value;

class
Type
¶  #include <Type.hpp>
Class representing device data types.
The following types are not supported on the IPU:
LONG
UNSIGNED_LONG
LONGLONG
UNSIGNED_LONGLONG
DOUBLE
For other types, the sizes on the IPU are:
BOOL: 1 byte
CHAR: 1 byte (signed)
SIGNED_CHAR: 1 byte
UNSIGNED_CHAR: 1 byte
SHORT: 2 bytes
SIGNED_SHORT: 2 bytes
UNSIGNED_SHORT: 2 bytes
INT: 4 bytes
SIGNED_INT: 4 bytes
SIGNED: 4 bytes
UNSIGNED_INT: 4 bytes
UNSIGNED: 4 bytes
HALF: 2 bytes
FLOAT: 4 bytes

namespace
core

template<typename
poplar/VariableMappingMethod.hpp¶

namespace
poplar
Poplar classes and functions.
Enums

enum
VariableMappingMethod
¶ When variables are added to the graph, a tile mapping can be created.
This class enumerates the method for creating that mapping.
Values:

enumerator
NONE
¶ No mapping is created.
The tile mapping will be set later via Graph::setTileMapping.

enumerator
LINEAR
¶ The variable will be spread evenly across the tiles with the element ordering matching the tile number ordering.
The tile mapping can also be overridden later via Graph::setTileMapping.

enumerator
Functions

std::string
toString
(const VariableMappingMethod &method)¶

enum
poplar/VariableRef.hpp¶

template<>
structstd
::
hash
<poplar::VariableRef>¶ Public Functions

size_t
operator()
(const poplar::VariableRef &v) const¶

size_t

namespace
poplar
Poplar classes and functions.
Functions

bool
operator==
(const VariableInterval &a, const VariableInterval &b)¶

bool
operator<
(const VariableInterval &a, const VariableInterval &b)¶

struct
VariableInterval
¶  #include <VariableRef.hpp>
Type representing a segment of a particular variable.
Public Functions

VariableInterval
(VariableRef var, Interval interval)¶

VariableInterval
() = default¶

VariableInterval
(const VariableInterval &other) = default¶

VariableInterval
(VariableInterval &&other) = default¶

VariableInterval &
operator=
(const VariableInterval &other) = default¶

VariableInterval &
operator=
(VariableInterval &&other) = default¶


class
VariableRef
¶  #include <VariableRef.hpp>
Type representing a reference to a variable in a graph.
Public Functions

VariableRef
(unsigned id, unsigned replicationFactor)¶

VariableRef
() = default¶

VariableRef
(const VariableRef &other) = default¶

VariableRef
(VariableRef &&other) = default¶

VariableRef &
operator=
(const VariableRef &other) = default¶

VariableRef &
operator=
(VariableRef &&other) = default¶
Friends
 friend class Graph

friend friend bool operator== (const VariableRef &a, const VariableRef &b)

friend friend bool operator< (const VariableRef &a, const VariableRef &b)


bool

namespace
std
¶ 
template<> VariableRef >
Public Functions

size_t
operator()
(const poplar::VariableRef &v) const¶

size_t

poplar/VectorLayout.hpp¶

namespace
poplar
Poplar classes and functions.

namespace
layout
¶ Namespace for layout classes.

namespace
poplar/VertexIntrospector.hpp¶

namespace
poplar
Poplar classes and functions.

class
FieldData
¶  #include <VertexIntrospector.hpp>
Information about a vertex field, including its size and its initial value if set.
This is used when calculating cycle estimates.
Vertex fields can be scalar, 1D or 2D. For example:
Their sizes can always be returned, and the initial values can be returned for nonedge fields (
float
,Vector<float>
) and edge fields (Input
etc.) that are connected to constants.Note that 2D fields are vectors of vectors, in other words they are jagged 2D arrays.
Public Functions

~FieldData
()¶

unsigned
rank
() const¶ Return the rank of the field: 0 for scalar fields, 1 for 1D and 2 for 2D.

std::size_t
size
() const¶ Return the size of the field.
For scalar fields it returns 1, for 1D fields it returns the size of the vector, and for 2D fields it returns the number of subvectors.

std::size_t
getSizeAtIndex
(std::size_t i) const¶ For 2D fields, return the size of the subvector.
Throws an error if called on non2D fields.
 Parameters
i
: Index of subvector to return size of

layout::Vector
getProfilerVectorLayout
(std::size_t nestingLevel) const¶ For Vector fields return the layout.
 Parameters
i
: The dimension to query, 0 for the outer vector, 1 for the inner.

layout::VectorList
getProfilerVectorListLayout
() const¶ For VectorList fields return the layout.
We only support introspecting a VectorList that is the outermost vector.

SizeT
operator[]
(std::size_t i) const¶ Instead of field.getSizeAtIndex(i) you can alternatively use field[i].size().

template<typename
T
>
TgetInitialValue
(const Target &target) const¶ Get the inital value for a scalar field.
T should be a scalar type. Throws an error if this is not a scalar field.
Private Functions

template<typename
T
>
voidgetInitialValuesOverload
(const Target &target, std::vector<T> &result) const¶

struct
SizeT
¶


class
VertexIntrospector
¶  #include <VertexIntrospector.hpp>
Available to cycle estimators to inspect the shape and initial values of a vertex’s fields.
Public Functions

ComputeSet
getComputeSet
() const¶ Return the compute set that this vertex is in.

VertexIntrospector
(VertexIntrospector&&) noexcept¶

ComputeSet

namespace
core

class
Control program classes¶
poplar/Program.hpp¶

namespace
poplar
Poplar classes and functions.

namespace
core

namespace
program
Namespace for program classes.
Functions

class
Abort
: public poplar::program::Program¶ Public Functions

Abort
(const DebugContext &debugContext = {})¶ Throws an exception.
 Parameters
debugContext
: Optional DebugId and program name.


class
AbortOnCondition
: public poplar::program::Program¶ Public Functions

AbortOnCondition
(Tensor predicate, const DebugContext &debugContext = {})¶ Throws an exception if the test tensor tests to true.
 Parameters
predicate
: Scalar tensor to test.debugContext
: Optional DebugId and program name.


class
AssumeEqualAcrossReplicas
: public poplar::program::Program¶  #include <Program.hpp>
A program to mark a tensor as equal across replicas.
This can be used to tell Poplar that the value of a tensor is the same in all replicas (for example, the result of a crossreplica allgather operation). Poplar will assume this property while checking for divergence in the control flow, and accept programs that it would otherwise have to reject due to the lack of knowledge of tensor values.
Public Functions

AssumeEqualAcrossReplicas
(Tensor t, const DebugContext &debugContext = {})¶


class
Call
: public poplar::program::Program¶  #include <Program.hpp>
A program to perform a function call to a previously stored program.
Public Functions

Call
(Function f, const DebugContext &debugContext = {})¶ Call the function.
 Parameters
f
: A program that has been added to the graph using Graph::addFunction.debugContext
: Optional DebugId and program name.


class
Copy
: public poplar::program::Program¶  #include <Program.hpp>
A program that copies data.
Public Functions

Copy
(Tensor src, Tensor dst, bool dontOutline = false, const DebugContext &debugContext = {})¶ Construct a program to copy data from one tensor to another.
This constructor creates a program that will copy data from the
src
tensor to thedst
tensor. Parameters
src
: The tensor to copy from.dst
: The tensor to copy to.dontOutline
: Do not outline this copy as a function call. Default is false (the copy will be outlined).debugContext
: Optional DebugId and program name.

Copy
(const DataStream &stream, Tensor dst, bool optimiseMemory = false, const DebugContext &debugContext = {})¶ Construct a program to copy from a data stream to a tensor.
 Parameters
stream
: The stream to copy from.dst
: The tensor to copy to.optimiseMemory
: If set to true will sacrifice speed in order to reduce memory use. For example, rearranging data on host and outlining writes.debugContext
: Optional DebugId and program name.

Copy
(Tensor src, const DataStream &stream, bool optimiseMemory = false, const DebugContext &debugContext = {})¶ Construct a program to copy a Tensor to a data stream.
 Parameters
src
: The tensor to copy from.stream
: The stream to copy to.optimiseMemory
: Set to true to sacrifice speed in order to reduce memory usage.debugContext
: Optional DebugId and program name.

Copy
(const RemoteBuffer &buffer, Tensor dst, const DebugContext &debugContext = {})¶ Construct a program to copy a remote buffer to a tensor.
 Parameters
buffer
: The remote buffer to copy from.dst
: The tensor to copy to.debugContext
: Optional DebugId and program name.

Copy
(const RemoteBuffer &buffer, Tensor dst, Tensor offset, const DebugContext &debugContext = {})¶ Construct a program to copy a remote buffer to a tensor.
The data to be transferred is controlled by the definition of the buffer and the
offset
parameter.The buffer has
repeat
datatransfer “rows” each containingnumElements
data items (these are not necessarily the same as rows in the destination tensor.) The size ofoffset
defines the number of rows to copy. The rows to be copied are defined byoffset:
each element ofoffset
is the index of a row to be copied.The size of
dst
must be equal to the data transfer size:sizeof(offset)
*numElements
.If the
offset
tensor has more than one element then thedst
must be a rank 2 tensor with dimensions [offset.numElements(), remoteBuffer.numElements()].Multiple values in the
offset
tensor with the same value will result in undefined behaviour because the order of writes to the buffer is not guaranteed. See
 Parameters
buffer
: The remote buffer to copy from.dst
: The tensor to copy to.offset
: The “rows”” in the remote buffer to copy from.debugContext
: Optional DebugId and program name.

Copy
(Tensor src, const RemoteBuffer &buffer, const DebugContext &debugContext = {})¶ Construct a program to copy a tensor to a remote buffer.
 Parameters
src
: The tensor to copy from.buffer
: The remote buffer buffer to copy to.debugContext
: Optional DebugId and program name.

Copy
(Tensor src, const RemoteBuffer &buffer, Tensor offset, const DebugContext &debugContext = {})¶ Construct a program to copy a tensor to a remote buffer.
The data that is transferred is controlled by the definition of the buffer and the
offset
parameter.The buffer has
repeat
data transfer “rows” each containingnumElements
data items. (These are not necessarily the same as rows in the source tensor) The rows to be copied are defined byoffset
. The size ofoffset
defines the number of rows to copy. Each element ofoffset
is the index of a row to be copied.The size of
src
must be equal to the data transfer size:sizeof(offset)
*numElements
.If the
offset
tensor has more than one element then thesrc
must be a rank 2 tensor with dimensions [offset.numElements(), remoteBuffer.numElements()].Multiple values in the
offset
tensor with the same value will result in undefined behaviour. See
 Parameters
src
: The tensor to copy from.buffer
: The remote buffer buffer to copy to.offset
: The “rows” in the remote buffer to copy to.debugContext
: Optional DebugId and program name.

Copy
(const DataStream &stream, Tensor dst, Tensor expectedIndex, bool rearrangeOnHost = false, const OptionFlags &options = {}, const DebugContext &debugContext = {})¶ Construct a program to copy from a data stream to a tensor.
 Parameters
stream
: The data stream to copy from.dst
: The tensor to copy to.expectedIndex
:rearrangeOnHost
:options
:debugContext
: Optional DebugId and program name.

Copy
(Tensor src, const DataStream &stream, Tensor index, bool rearrangeOnHost = false, const OptionFlags &options = {}, const DebugContext &debugContext = {})¶ Construct a program to copy a tensor to a data stream.
 Parameters
src
: The tensor to copy from.stream
: The data stream to copy to.index
:rearrangeOnHost
:options
:debugContext
: Optional DebugId and program name.
Private Functions

Copy
(const DataStream &stream, Tensor dst, bool rearrangeOnHost, Tensor offset, size_t repeats, bool optimiseMemory, const OptionFlags &options = {}, const DebugContext &debugContext = {})¶

Copy
(Tensor src, const DataStream &stream, bool rearrangeOnHost, Tensor offset, size_t repeats, bool optimiseMemory, const OptionFlags &options = {}, const DebugContext &debugContext = {})¶


class
CrossReplicaCopy
: public poplar::program::Program¶  #include <Program.hpp>
A program that copies tensors between replicated subgraphs.
Public Functions

CrossReplicaCopy
(Tensor src, Tensor dst, std::map<unsigned, unsigned> replicaMap, const DebugContext &debugContext = {})¶ Constructor to create a program to copy a tensor to the equivalent tensor in a different replica subgraph.
When the replicated graphs are created, this will create a Copy program in each replica. Each replica sends to exactly one other replica and receives from exactly one other replica. A replica may not copy to itself.
 Parameters
src
: Replicated tensor to copy from.dst
: Replicated tensor to copy to.replicaMap
: Each key in this map specifies the subgraph or replica that contains the source tensor. The corresponding value is the replica that contains the destination tensor.The size of the replica map is equal to the graph replication factor.
Each replica must be represented once as a key (source) and once as a value (destination).
debugContext
: Optional DebugId and program name.


class
ErrorProgram
: public poplar::program::Program¶ Public Functions

ErrorProgram
(StringRef message, Tensor debugTensor, const DebugContext &debugContext = {})¶ Throw an error.
Prints out a message and then throws an error.
 Parameters
message
: String to print.debugTensor
: Tensor that will be printed after the message to aid debugging. \param debugContext Optional DebugId and program name.


class
Execute
: public poplar::program::Program¶  #include <Program.hpp>
Program that executes a compute set in the graph.
Public Functions

Execute
(ComputeSet cs, const DebugContext &debugContext = {})¶ Construct a graph execution program.
 Parameters
cs
: The compute set to execute.debugContext
: Optional DebugId and program name.

Execute
(ComputeSet cs, Tensor t, const DebugContext &debugContext = {})¶ Construct a graph execution program and write the exit status to a scalar tensor.
The exit status is the logical and of the return values of the vertices in the compute set.
 Parameters
cs
: The compute set to execute.t
: The tensor to write the exit status to.debugContext
: Optional DebugId and program name.


class
If
: public poplar::program::Program¶  #include <Program.hpp>
A program that runs one of two programs depending on the value of a scalar tensor.
Public Functions

If
(Tensor predicate, const Program &trueBody, const Program &falseBody, const DebugContext &debugContext = {})¶ A program that executes
trueBody
orfalseBody
depending on the value ofpredicate
.You can pass an empty Sequence to either
trueBody
orfalseBody
if you don’t want that branch to do anything. Any nonzero value of the predicate is treated as true. Parameters
predicate
: The scalar tensor (Type BOOL, UNSIGNED_INT, INT, SHORT or UNSIGNED_SHORT) that determines which branch to execute.trueBody
: This program is run if the predicate is true.falseBody
: This program is run if the predicate is false.debugContext
: Optional DebugId and program name.


class
PrintTensor
: public poplar::program::Program¶ Public Functions

PrintTensor
(Tensor t, const DebugContext &debugContext = {})¶ Print the contents of a tensor.
You can send the output to a different stream by using the Engine::setPrintTensorStream function.
 Parameters
t
: The Tensor to print.debugContext
: Optional DebugId and program name.


class
Program
¶  #include <Program.hpp>
This class represents a control program that executes operations on the graph.
The class should not be explicitly constructed but one of its subclasses should be constructed instead.
Subclassed by poplar::program::Abort, poplar::program::AbortOnCondition, poplar::program::AssumeEqualAcrossReplicas, poplar::program::Call, poplar::program::Copy, poplar::program::CrossReplicaCopy, poplar::program::ErrorProgram, poplar::program::Execute, poplar::program::If, poplar::program::PrintTensor, poplar::program::Repeat, poplar::program::RepeatWhileFalse, poplar::program::RepeatWhileTrue, poplar::program::Sequence, poplar::program::Switch, poplar::program::Sync, poplar::program::WriteUndef
Friends

friend friend bool operator== (const Program &lhs, const Program &rhs)


class
Repeat
: public poplar::program::Program¶  #include <Program.hpp>
A program that repeatedly executes for a fixed number of iterations.
For more flexible loop operations see the PopLibs functions popops::countedLoop() and popops::countedForLoop().
Public Functions

Repeat
(unsigned count, const Program &prog, const DebugContext &debugContext = {})¶ Construct a repeat program.
 Parameters
count
: The number of iterations to repeat for.prog
: The program to repeatedly execute.debugContext
: Optional DebugId and program name.


class
RepeatWhileFalse
: public poplar::program::Program¶  #include <Program.hpp>
A program that executes a program repeatedly while a condition is false.
The program starts by executing the condition program,
cond
, which should set the value ofpredicate
. Ifpredicate
is true, then the loop exits. Ifpredicate
is false then thebody
program is executed and then it loops to executecond
program again.This is like a C while statement with an inverted condition.
Public Functions

RepeatWhileFalse
(const Program &cond, Tensor predicate, const Program &body, const DebugContext &debugContext = {})¶ Construct a repeatwhilefalse program.
 Parameters
cond
: The program executed beforepredicate
is evaluated. The normal use case is that this will set the value ofpredicate
.predicate
: The scalar tensor (Type BOOL, UNSIGNED_INT, INT, SHORT or UNSIGNED_SHORT) that determines whether to executebody
. Any nonzero value of the predicate is treated as true.body
: The body to execute whenpredicate
is false.debugContext
: Optional DebugId and program name.


class
RepeatWhileTrue
: public poplar::program::Program¶  #include <Program.hpp>
A program that executes a program repeatedly while a condition is true.
The program starts by executing the condition program,
cond
, which should set the value ofpredicate
. Ifpredicate
is false, then the loop exits. Ifpredicate
is true then thebody
program is executed, and then it loops to executecond
program again.This is like a C while statement.
Public Functions

RepeatWhileTrue
(const Program &cond, Tensor predicate, const Program &body, const DebugContext &debugContext = {})¶ Construct a repeatwhiletrue program.
 Parameters
cond
: The program executed beforepredicate
is evaluated. The normal use case is that this will set the value ofpredicate
.predicate
: The scalar tensor (Type BOOL, UNSIGNED_INT, INT, SHORT or UNSIGNED_SHORT) that determines whether to executebody
. Any nonzero value of the predicate is treated as true.body
: The body to execute whenpredicate
is true.debugContext
: Optional DebugId and program name.


class
Sequence
: public poplar::program::Program¶  #include <Program.hpp>
Program that executes a sequence of programs.
Public Functions

template<class ...
T
>Sequence
(T&&... args)¶ Construct an execution sequence from a list of programs.
This variadic constructor is used to create a sequence of programs where the programs are provided as arguments to the constructor. For example:
Sequence(prog1, prog2, prog3)
 Deprecated:
Use Sequence(std::initializer_list<Program>) instead.
 Parameters
args
: Parameter pack of all programs in the sequence.

Sequence
(const DebugContext &debugContext = {})¶ Construct an empty execution sequence (with optional debug context).

Sequence
(std::initializer_list<Program> programs, const DebugContext &debugContext = {})¶ Construct an execution sequence from a list of programs.
This constructor is used to create a sequence of programs where the programs are provided as arguments to the constructor.
Sequence{prog1, prog2, prog3} Sequence({prog1, prog2, prog3}, {debugId}) Sequence({prog1, prog2, prog3}, {debugId, "debugName"})
 Parameters
programs
: List of programs in the sequence.debugContext
: Optional DebugId and program name.

template<class ...

class
Switch
: public poplar::program::Program¶  #include <Program.hpp>
A program that runs one of many programs depending on the value of a tensor.
The controlling tensor must be a scalar of type INT or UNSIGNED_INT. A switch contains of a number of switch cases, each with a case value and a case body and a default case. The case values must be unique. If the value of the controlling tensor matches the case value of a case the corresponding case body is run, otherwise the default case is run.
Public Functions

Switch
(Tensor control, const std::vector<std::pair<std::int32_t, Program>> &cases, const DebugContext &debugContext = {})¶ Construct a switch with the specified set of cases and an empty default case.
 Parameters
control
: The controlling tensor.cases
: The cases of the switch: value and program to run.debugContext
: Optional DebugId and program name.

Switch
(Tensor control, const std::vector<std::pair<std::int32_t, Program>> &cases, const Program &defaultCaseBody, const DebugContext &debugContext = {})¶ Construct a switch with the specified set of cases and default case.
 Parameters
control
: The controlling tensor.cases
: The cases of the switch: value and program to run.defaultCaseBody
: The body of the default case.debugContext
: Optional DebugId and program name.

Switch
(Tensor control, const DebugContext &debugContext = {})¶ Construct a switch with no cases and an empty default case.
The add() method can be used to add cases after the switch is constructed.
 Parameters
control
: The controlling tensor.debugContext
: Optional DebugId and program name.

Switch
(Tensor control, const Program &defaultCaseBody, const DebugContext &debugContext = {})¶ Construct a switch with no cases and the specified default case.
The add() method can be used to add cases after the switch is constructed.
 Parameters
control
: The controlling tensor.defaultCaseBody
: The body of the default case.debugContext
: Optional DebugId and program name.
Public Static Functions

Switch
switchWithBoundsChecking
(Tensor control, const std::vector<std::pair<std::int32_t, Program>> &cases, const DebugContext &debugContext = {})¶ A helper function that causes the default case to throw an error.

Switch
switchWithUnreachableDefault
(Tensor control, const DebugContext &debugContext = {})¶ This function lets the compiler assume the default case is unreachable.
If the control value is something other than one of the cases, it results in undefined behaviour (although there is some very minimal error checking at runtime).
Private Functions

Switch
(Tensor control, const Program &defaultCaseBody, const bool unreachableDefault, const DebugContext &debugContext = {})¶


class
Sync
: public poplar::program::Program¶  #include <Program.hpp>
A program to synchronise at a certain granularity dictated by the
SyncType
.Public Functions

Sync
(SyncType type, const DebugContext &debugContext = {})¶  Parameters
type
: The type of sync to perform.debugContext
: Optional DebugId and program name.


class
WriteUndef
: public poplar::program::Program¶  #include <Program.hpp>
A program to mark a tensor as containing an undefined value.
This can be used to improve the liveness analysis of tensors and save memory in some situations.
Poplar does liveness analysis using the standard algorithm, except that Poplar’s variables are not scalar values; they are arrays (tensors). In the standard analysis, a variable is “killed” when it is written to with a new value. This means that it is dead immediately before that point because its value there can never be read.
int a = 1; // a is dead here because its current value (1) can never be read. a = 2; // a is killed here, which makes it dead on the line above.
In Poplar, a variable is killed when all of its elements are written in the same compute set. Consider the pseudocode:
var = graph.addVariable(FLOAT, {2}, ...); seq.add(Execute( var[0] = 1, var[1] = 2 )); // var is dead here (it is killed on the line below) because none of its // element values (1, 2) can ever be read. seq.add(Execute( var[0] = 3, var[1] = 4 ));
If only some of the elements are written then the entire variable is still live before the write because we may still need the values of the elements that were not written to.
seq.add(Execute( var[0] = 1, var[1] = 2 )); // var is alive here because the value 2 might be read later. seq.add(Execute( var[0] = 3 ));
var
is still alive because no compute set writes to every element. If the entire variable is overwritten but in separate compute sets, then it will still be considered to be live because Poplar does not track the liveness of each variable element, only the entire variable.seq.add(Execute( var[0] = 1, var[1] = 2 )); // var is alive here even though 1 and 2 can never be read. seq.add(Execute( var[0] = 3 )); seq.add(Execute( var[1] = 4 ));
This means
var
is alive more than necessary which may lead to increased memory use. One solution is for Poplar to track the liveness of every variable element separately, but that would be prohibitively expensive.Instead, this program provides a way to manually mark a tensor as being dead by writing an undefined value to it. Changing the above code to the following results in the correct liveness.
seq.add(Execute( var[0] = 1, var[1] = 2 )); // Manually kill var because we know  even if Poplar does not  that // it is about to be completely overwritten. seq.add(WriteUndef(var)); seq.add(Execute( var[0] = 3 )); seq.add(Execute( var[1] = 4 ));
For more information about liveness analysis see https://en.wikipedia.org/wiki/Live_variable_analysis and https://www.cl.cam.ac.uk/teaching/2006/OptComp/slides/lecture03.pdf
 Parameters
t
: The tensor to mark as undefined.debugContext
: Optional DebugId and program name.
Public Functions

WriteUndef
(Tensor t, const DebugContext &debugContext = {})¶

class

namespace
Device management¶
poplar/TargetType.hpp¶

namespace
poplar
Poplar classes and functions.
Enums

enum
TargetType
¶ Enum to represent the type of a device capable of running a graph.
Values:

enumerator
IPU
¶ Run on real IPU hardware.

enumerator
IPU_MODEL
¶ Model of the IPU which actually runs on the CPU but behaves like an IPU.

enumerator
CPU
¶ Run code on the CPU.
This does not accurately replicate all the functionality of an IPU and should only be used for running simple tests.

enumerator
Functions

std::string
toString
(TargetType t)¶ Convert the target type to a string.
Throws an exception if an undefined type is passed, e.g. static_cast<TargetType>(100).

enum
poplar/Target.hpp¶

namespace
poplar
Poplar classes and functions.
Functions

void
copyDeviceHalfToFloat
(const Target &target, const void *src, float *dst, std::size_t numElements)¶ Convert device halfprecision values to floats.
 Parameters
target
: Target that the halfprecision data is to be copied from.src
: Pointer to the start of the halfprecision data.dst
: Pointer to the float data to write.numElements
: Number of items to convert.

void
copyFloatToDeviceHalf
(const Target &target, const float *src, void *dst, std::size_t numElements)¶ Convert float values to device halfprecision values.
 Parameters
target
: Target that the halfprecision data is to be copied to.src
: Pointer to the float data to read.dst
: Pointer to the halfprecision data to write.numElements
: Number of items to convert.

void
copyDeviceHalfToDouble
(const Target &target, const void *src, double *dst, std::size_t numElements)¶ Convert device halfprecision values to doubles.
 Parameters
target
: Target that the halfprecision data is to be copied from.src
: Pointer to the start of the halfprecision data.dst
: Pointer to the double precision data to write.numElements
: Number of items to convert.

void
copyDoubleToDeviceHalf
(const Target &target, const double *src, void *dst, std::size_t numElements)¶ Convert double precision values to device halfprecision values.
 Parameters
target
: Target that the halfprecision data is to be copied to.src
: Pointer to the double precision data to read.dst
: Pointer to the halfprecision data to write.numElements
: Number of items to convert.

class
Target
¶  #include <Target.hpp>
A target representation.
The Target class holds characteristics of a compilation target and enables interaction with it.
Target creation options
ipuLinkConfiguration
(Default, BarleyTwist, SlidingWindow) [=Poplar decides]The configuration used for the IPU to IPU connections (known as the Newmanry network). If it is not set, Poplar decides based on the number of IPUs.
Note that ‘Default’ is not the default!
syncConfiguration
(intraReplicaAndAll, ipuAndAll) [=intraReplicaAndAll]The configuration of the hardware synchronisation groups. Note the ‘target.syncReplicasIndependently’ engine option determines which of the synchronisation groups is used for host synchronisation.
intraReplicaAndAll: The first sync group is used to sync IPUs within a replica and the second sync group is used to sync all IPUs.
ipuAndAll: The first sync group is used to sync each IPU independently with the host (if the target.syncReplicasIndependently option is set) and the second sync group is used to sync all IPUs.
ipuLinkTopology
(mesh, torus) [=mesh]The topology of the IPU links. It describes how the IPUs in the system are connected.
mesh: The IPUs are connected as a ladder.
torus: The IPUs are connected as a ladder, with the top and bottom of the ladder linked together.
IpuLinkDomainSize
Integer [=64]The number of IPUs connected via IPU links. Two IPU link domains can be connected together via gateway links.
gatewayMode
(true, false) [=false]Enable GWMODE (Gateway Mode) in the PCI Complex
gatewayMultiReadServiceTable
(true, false) [=false]Enable 32 read service table entries in the Gateway, default is 1.
Public Functions

Target
()¶

~Target
()¶

TargetType
getTargetType
() const¶ The target type.

unsigned
getNumIPUs
() const¶ The number of IPUs.

unsigned
getTilesPerIPU
() const¶ The number of tiles per IPU.

unsigned
getNumWorkerContexts
() const¶ The number of worker contexts per tile.

unsigned
getBytesPerTile
() const¶ Bytes of memory per tile.

unsigned
getExchangeBytesPerCycle
() const¶ The bandwidth of internal IPU exchange in bytes per cycle.

unsigned
getMemcpyBytesPerCycle
() const¶ The maximum bandwidth for internal data copies on a tile.

unsigned
getMinIPUSyncDelay
() const¶ The IPU sync delay for the tile that is closest to the sync controller.

unsigned
getGlobalSyncCycles
() const¶ The number of clock cycles required to synchronize all IPUs.

unsigned
getInterleavedMemoryElementIndex
() const¶ Memory element offset index for interleaved memory.

const std::vector<GlobalExchangeConstraint> &
getGlobalExchangeConstraints
() const¶ Set of constraints that provide a lower bound on the time it takes to send data between IPUs.

unsigned
getNumStrideBits
() const¶

unsigned
getDataPathWidth
() const¶ The width of the load/store data path within the tile.

unsigned
getFp16ConvUnitMaxPipelineDepth
() const¶ The maximum pipeline depth of the convolution units within the tile for fp16.

unsigned
getFp32ConvUnitMaxPipelineDepth
() const¶ The maximum pipeline depth of the convolution units within the tile for fp32.

unsigned
getFp16ConvUnitInputLoadElemsPerCycle
() const¶ The number of input elements loaded per cycle in f16 convolution unit.

unsigned
getFp32ConvUnitInputLoadElemsPerCycle
() const¶ The number of input elements loaded per cycle in f32 convolution unit.

unsigned
getFp16InFp16OutConvUnitsPerTile
() const¶ The number of convolution units in the tile that can be used when partial results are outputs as 16bits and inputs are 16 bits.

unsigned
getFp16InFp32OutConvUnitsPerTile
() const¶ The number of convolution units in the tile that can be used when partial results are outputs as 32bits and inputs are 16 bits.

unsigned
getFp32InFp32OutConvUnitsPerTile
() const¶ The number of convolution units in the tile that can be used when accumulating to 32 bit values.

unsigned
getConvUnitCoeffLoadBytesPerCycle
() const¶ The number of convolutional weights that can be loaded in a cycle.

unsigned
getRptCountMax
() const¶

bool
supportsExchangeBusSharing
() const¶ Whether tiles can share the local exchange bus during exchange.
The number of consecutive tiles that can share the exchange bus.

unsigned
getNumTiles
() const¶ Get the total number of tiles for this target (tiles per IPU * number of IPUs).

std::uint64_t
getMemoryBytes
() const¶ Get the total amount of memory on this target, across all IPUs.

unsigned
getFloatVectorWidth
() const¶ How many floats can be processed in one vector operation.
Equivalent to getDataPathWidth() / 32.

unsigned
getHalfVectorWidth
() const¶ How many halves can be processed in one vector operation.
Equivalent to getDataPathWidth() / 16.

unsigned
getVectorWidth
(const poplar::Type &type) const¶ How many of the given type can be processed in one vector operation.

unsigned
getWeightsPerConvUnit
(bool floatActivations) const¶

unsigned
getConvUnitInputLoadElemsPerCycle
(bool floatActivations) const¶

unsigned
getMaxIPUSyncDelay
() const¶ Get the maximum number of cycles required for an IPU sync in the best case scenario (all tiles are immediately ready).

double
getTileClockFrequency
() const¶ Get the tile clock frequency in Hertz.

unsigned
getNumTilesPerXBContext
() const¶ Get the number of tiles per exchangeblock context (with repair).

unsigned
getNumContextsPerXB
() const¶ Get the number of contexts per exchangeblock.

std::size_t
getAtomicStoreGranularity
() const¶ Get the granularity of atomic stores that can be made by independent parallel worker threads.
 Return
The granularity in bytes.

uint32_t
makeFpIctlValue
(bool inv, bool div0, bool oflo, bool esr, bool nanoo) const¶ Generate a value that could be written to Floating Point Initial Control Value register CSR_S.FP_ICTL in order to configure it with the specified options.
 Parameters
inv
: If true, a floatingpoint invalid operation (defined by IEEE 754) will cause an exception.The invalid operations are:
Addition or subtraction where the operands are + or  infinity (inf) and the operation results in the subtraction of two infs; for example: (inf)+(+inf) or (+inf)(+inf).
Divisions: (+/0)/(+/0) and (+/inf)/(+/inf).
Multiplications: (+/0)*(+/inf) and (+/inf)*(+/0).
Remainder: x REM y where y=0 or x=(+/inf)
Real operations with complex results such as the square root or logarithm of a negative number.
Operations with NotaNumber as at least one operand.
Comparisons where one of the operands is NotaNumber.
See also nanoo below.
div
: If true a floating point divide by zero operation will cause an exceptionoflo
: If true a floating point overflow will cause an exceptionesr
: Enable stochastic roundingnanoo
: Enable NotaNumber on overflow mode. When enabled half precision calculations that have overflowed will produce a NotaNumber result, rather than saturating to the half precision max/min value, and the invalid operation (inv
) flag will be set

unsigned
getFpIctlRegIndex
() const¶ Return the register index of the Floating Point Initial Control Value register CSR_S.FP_ICTL.

unsigned
getDbgDataRegIndex
() const¶ Return the register index of CSR_C.DBG_DATA.

IpuLinkConfiguration
getIpuLinkConfiguration
() const¶ Return the ipu link configuration of this target.

IpuLinkTopology
getIpuLinkTopology
() const¶ Return the IPU link topology.

unsigned
getIpuLinkDomainSize
() const¶ Return the size of the IPU link domain.
That is the number of IPUs that are connected via IPU links.

bool
getGatewayMode
() const¶

bool
getGatewayMultiReadServiceTable
() const¶

Target
createVirtualTarget
(unsigned numIPUs, unsigned tilesPerIPU) const¶ Create a “virtual” target consisting of a subset of the target’s tiles.
This method returns a target object that references the same state as this target but only uses a subset of the target’s tiles.
 Return
The virtual target object.
 Parameters
numIPUs
: The number of IPUs the target should be for.tilesPerIPU
: The number of tiles per IPU.
Public Static Functions

Target
createCPUTarget
(bool accurateHalf = false, unsigned numIPUs = 1)¶ Create a CPU target.
Create a target for executing a simple graph on the CPU. This target will have 1 IPU with 1 tile on 1 worker thread.
This should only be used for simple functional testing.
 Return
A Target object that can be used to create a graph.

Target
createIPUTarget
(unsigned numIPUs, StringRef systemType, const OptionFlags &opts = {})¶ Create an IPU target.
Create an IPU target with a specified number of IPUs based on the given system type.
 Return
A Target object that can be used to create a graph.
 Parameters
numIPUs
: The number of IPUs the target should be for.systemType
: The ID of the system. Possible options:"ipu1"
opts
: The option passed to the target.

Target
createIPUTarget
(unsigned numIPUs, unsigned tilesPerIPU, StringRef systemType, const OptionFlags &opts = {})¶ Create an IPU target with a virtual number of tiles.
Create an IPU target with a specified number of IPUs based on the given system type. In addition, the number of tiles can be restricted to a smaller virtual number of observable tiles.
 Return
A Target object that can be used to create a graph.
 Parameters
numIPUs
: The number of IPUs the target should be for.tilesPerIPU
: The number of tiles per IPU.systemType
: The ID of the system. Possible options:"ipu1"
opts
: The option passed to the target.

Target
createIPUTarget
(unsigned numIPUs, StringRef systemType, const core::TargetOptions &opts)¶ Create an IPU target.
Create an IPU target with a specified number of IPUs based on the given system type.
 Return
A Target object that can be used to create a graph.
 Parameters
numIPUs
: The number of IPUs the target should be for.systemType
: The ID of the system. Possible options:"ipu1"
opts
: The option passed to the target.

Target
createIPUTarget
(unsigned numIPUs, unsigned tilesPerIPU, StringRef systemType, const core::TargetOptions &opts)¶ Create an IPU target with a virtual number of tiles, and target options.
Create an IPU target with a specified number of IPUs based on the given system type. In addition, the number of tiles can be restricted to a smaller virtual number of observable tiles. This overload also accepts target options that can be obtained from another target.
 Return
A Target object that can be used to create a graph.
 Parameters
numIPUs
: The number of IPUs the target should be for.tilesPerIPU
: The number of tiles per IPU.systemType
: The ID of the system. Possible options:"ipu1"
opts
: The option passed to the target.

namespace
core

void
poplar/Device.hpp¶

namespace
poplar
Poplar classes and functions.

class
Device
¶  #include <Device.hpp>
A device refers to a physical entity that can execute code.
Devices should be obtained from a poplar::DeviceManager object or from appropriate factory poplar::Device::createXXXDevice(). Devices can not be copied but can be moved.
Public Functions

Device
()¶

~Device
()¶

unsigned
getId
() const¶ Get the numerical ID of this device as known by the DeviceManager.

bool
attach
() const¶ Try and acquire this device and lock it to the current process.

void
detach
() const¶ Release this device to other processes.

void
getDriverVersion
(unsigned &major, unsigned &minor, unsigned &point) const¶ Retrieve driver version of the attached device.
Throws if the device is not attached or is not an IPU device.

bool
supportsRemoteBuffers
() const¶ Retrieve availability of remote buffers from the attached device.
Throws if the device is not attached or is not an IPU device.

bool
hasGateway
() const¶ Retrieve gateway (such as with an IPUM2000) availability from the attached device.
Throws an exception if the device is not attached or is not an IPU device.

std::vector<int>
getNumaNodesUsedForIPUs
() const¶ Get the NUMA nodes that Poplar will use to execute code that communicates with each IPU that makes up this device.
If Poplar can’t execute code on the NUMA node for an IPU then this function returns 1 for that IPU. Poplar will interpret the 1 as disabling NUMA node pinning for that IPU.
Note that this function is not necessarily the same as getNumaTopology(), as it also handles NUMA node restrictions imposed by the Poplar process’ CPU affinity. For example on a machine with two NUMA nodes, with ids of 0 and 1, each connected to one CPU and one IPU then a Poplar process that is bound to CPU 1 will use CPU 1 to execute stream callbacks for IPUs on both NUMA node 0 and 1, so this function would return [1, 1] whereas the getNumaTopology() would return [0, 1].
Note that if the lookup of available host NUMA nodes fails then this function will return a vector of
1
s, with one element for each IPU.

std::vector<unsigned>
getDriverIDs
() const¶ Get the list of driver device IDs that make up this device.

void
reset
() const¶ Reset the device’s state.

Device
createVirtualDevice
(unsigned tilesPerIPU)¶ Create a virtual device with a restricted number of tiles per IPU.
This method provides a smaller “virtual” device whose target only shows a subset of the tiles on the underlying device.
The calling object becomes a null device (the underlying device is moved into the returned Device object).
Public Static Functions

Device
createCPUDevice
(unsigned numOfIPUs = 1)¶ Create a device that executes vertex code on the host CPU.
This is only suitable for running small amounts of code; for example, for functional testing. It may not reproduce exactly the same functionality as running on an IPU. Also, functions such as Engine::getTileClockFrequency() may not return meaningful results.
Note that the number of IPUs in a CPU device must equal the toplevel replication factor of the graph that will run on the device (1 IPU per replica).
 Parameters
numOfIPUs
: The number of IPUs the device will have.

Device
createSimulatorDevice
(const Target &target, const OptionFlags &options = {})¶


namespace
core

class
poplar/DeviceManager.hpp¶

namespace
poplar
Poplar classes and functions.

class
DeviceManager
¶  #include <DeviceManager.hpp>
A DeviceManager is able to enumerate and return groups of physical IPUs connected to an entity/host.
It returns such a group of IPUs as a single poplar::Device with a unique device manager id.
The physical devices within any returned Device may overlap with other Devices returned.
Any poplar::Device(s) returned can not be copied but can be moved for further use.
It is thread safe to both construct multiple DeviceManager’s in different threads and use them at the same time (although both threads might return the same device and therefore only one will succeed in attaching to it). It is also thread safe to use the same DeviceManager in different threads.
Public Functions

DeviceManager
()¶

DeviceManager
(const DeviceManager&)¶

DeviceManager
(DeviceManager&&) noexcept¶

~DeviceManager
()¶

std::vector<Device>
getDevices
(const OptionFlags &opts = {}) const¶ Get the list of all devices.

std::vector<Device>
getDevices
(TargetType type, unsigned requiredNumIPUs, const OptionFlags &opts = {}) const¶ Get the list of all devices fulfilling the specified criteria.
Depending on the criteria, the list may be empty  for example, if the
requiredNumIPUs
cannot be satisfied by any available device configurations. To view available device configurations, see the gcinfo command line tool. Return
A potentially empty list of matching devices
 Parameters
type
: The desired target type (IPU, IPU_Model, CPU)requiredNumIPUs
: Number of IPUs requiredopts
: The arguments passed to the target (optional)

Device
getDevice
(unsigned deviceManagerId, const OptionFlags &opts = {}) const¶ Get a specific device by its device manager id.
 Return
A matching device
 Parameters
deviceManagerId
: The ID of the requested device. The ID is that returned by thegcinfo
command. This can specify a single device or a group of devices.opts
: The arguments passed to the target (optional)

std::vector<unsigned>
getChildDeviceIds
(unsigned parentId, unsigned numChildDeviceIpus = 1) const¶ Get the deviceIds of the child devices of a multiIPU device.
A multiIPU device will fully overlap “child” devices that are made out of the same IPUs. This method returns the set of child devices.
 Parameters
parentId
: The device ID of the parent devicenumChildDeviceIpus
: The number of IPUs the child devices must contain to be considered a child.
Public Static Functions

DeviceManager
createDeviceManager
()¶ Create a device manager for the current host.


namespace
core

class
poplar/IpuLinkConfiguration.hpp¶
Graph execution¶
poplar/Engine.hpp¶

namespace
pva
¶

namespace
poplar
Poplar classes and functions.
Functions

Executable
compileGraph
(const Graph &graph, ArrayRef<program::Program> progs, const OptionFlags &opt = {}, ProgressFunc progressCallBack = ProgressFunc(), const DebugContext &debugContext = {})¶ Compile the given graph and programs to make an executable that can be executed using a poplar::Engine.
 Parameters
graph
: The graph to compile.progs
: The list of programs to run over the graph. Each program can be run separately by calling Engine::run() and passing the index, in this list, of the program to run.opt
: Options that can be used to control compilation and execution. The available options are listed under Engine.progressCallBack
: A function that will be called to indicate engine compilation progress. See Engine::ProgressFunc for more information.debugContext
: Optional DebugId and debug name.profileWriter
: Optional parameter to manage profiler writing.
 Exceptions
invalid_option
: If any of the options passed inopt
were not recognised or improperly formatted.link_error
: If program linking fails; for example, due to undefined symbols or lack of memory on a tile.
Variables

const unsigned
WORKER_SCRATCH_SIZE
= 48¶ Size in bytes of the scratch space available to each worker thread.

class
Engine
¶  #include <Engine.hpp>
A graph compute engine.
The Engine class provides the ability to execute a graph program.
Engine creation options
Options can be overridden with the environment variable
POPLAR_ENGINE_OPTIONS
. For example:POPLAR_ENGINE_OPTIONS='{"target.deterministicWorkers":"true"}'
Engine creation options: Debug
debug.allowOutOfMemory
(true, false) [=false]If true, allow outofmemory while compiling and linking. This is automatically set to true if
autoReport.outputGraphProfile
is set to true (direct or indirectly).debug.computeInstrumentationLevel
(vertex, tile, ipu) [=tile]The granularity of compute instrumentation. This option has no effect unless
debug.instrumentCompute
is true.vertex: Store the last cycle count of each vertex on every tile.
tile: Store the last cycle count of each compute set on every tile.
ipu: Store the last cycle count of each compute set on one tile per IPU. This saves memory compared to tile (since the cycle counts are always live and this needs to store them on only one tile), but it loses all pertile cycle information. It works by adding a sync after each compute set and timing how long it takes to get to that sync. So, effectively, it measures the cycle time of the longestrunning tile in the compute set.
device: Deprecated. Similar to ipu, but instead of storing the cycle counts on one tile per IPU, it stores them on one single tile across all IPUs which adds the need for global syncs.
debug.retainDebugInformation
(true, false) [=true] Retain compilation information to help with debugging. Must be true if profiling is enabled.debug.cpuMultiThreadExecution
(true, false) [=true] If true, operations are executed using multiple host threads for a CPU or IPU Model target. Setting to false may simplify debugging at the cost of reduced performance.debug.instrument
(true, false) [=false]If true, enable all instrument options (below). This will instruct the engine to add cycle counters to the compiled program to enable the execution profile to be retrieved after the program is run. This is only available for an IPU target (not an IPU Model target). Note that the more specific instrumentation options may override the default. For example,
{"debug.instrument":"true", "debug.instrumentExternalExchange":"false"}
will instrument everything apart from external exchange.
debug.instrumentCompute
(true, false) [=false]If true, enable instrumentation of compute sets. See
debug.instrument
.debug.instrumentExternalExchange
(true, false) [=false]If true, enable instrumentation of external exchanges. See
debug.instrument
.debug.instrumentControlFlow
(true, false) [=false]If true, enable instrumentation of loops and conditionals. See
debug.instrument
.debug.outputAllSymbols
(true, false) [=false]If true, output additional symbols to the ELF files that are not required but aid debugging.
debug.exceptOnSOCError
(true, false) [=true]If true, throw an exception on a SoC error. If false the error will be reported in the log instead.
debug.checkForSOCErrorAtRun
(true, false) [=false]If true, check for SoC errors before and after program execution.
debug.profilingTile
Integer [=Tiles per IPU  1]The tile on which to store the cycle counter for every comput set. This has no effect unless
debug.computeInstrumentationLevel
is set to ipu.debug.branchRecordTile
Integer [=NTILES1]The tile on which to store the branch record. This has no effect unless
debug.instrumentControlFlow
flag is set. In a CPU target, this option has no effect. In an IPU Model, it only affects the memory profile.debug.runtimeVerify
(true, false) [=false]If true, expensive verification steps are enabled at runtime.
debug.trace
(true, false) [=false]If true, a trace is printed to the error stream with the state of every edge before and after the execution of a compute set or exchange.
debug.traceFile
StringOnly used if
debug.trace
is true. If set, the debug trace is output to the specified file instead of the error stream.debug.verify
(true, false) [=false]If true, expensive verification steps are enabled at compile time. The checks mostly focus on exchange code, including the following:
ensuring variables have been set,
ensuring section/instruction alignment is correct,
and ensuring the total number of bytes received is as expected.
In addition, after laying out memory we verify the memory constraints on variables are satisfied.
debug.supervisorStackSizeInBytes
IntegerIf set, the automatically computed stack size for supervisor threads will be overridden with the specified value (in bytes) for all tiles.
debug.workerStackSizeInBytes
IntegerIf set, the automatically computed stack size for worker threads will be overridden with the specified value (in bytes) for all tiles.
Engine creation options: Optimisations
opt.maxCompilationThreads
Integer [=0]The maximum number of threads to use during compilation. A value of 0 means the hardware will be fully utilised.
opt.maxLinkerThreads
Integer [=0]The maximum number of threads to use during compilation. A value of 0 means the same number will be used as were used for compilation.
opt.enableSwSyncs
(true, false) [=false]If true, use a software synchronisation scheme to synchronise with the host following a stream copy. The software based synchronisation scheme lets IPUs start executing the next step as soon as they have finished sending and receiving all their data, without having to wait for every IPU to reach the end of the stream copy.
opt.internalExchangeOptimisationTarget
(balanced, cycles, memory) [=cycles]What balance of heuristics to use when generating exchange code. Can be used to balance exchange memory usage against speed.
cycles: Focus completely on speed at the expense of alwayslive memory
memory: Focus completely on minimising the memory footprint, at the expense of speed
balanced: Sacrifice some speed to attempt to reduce the amount of always live memory produced.
opt.enableMultiAccessCopies
(true, false) [=true]Enable this option to make some of the copies faster at the expense of adding more constraints on variables used in the copies.
opt.limitVertexStateToLower256K
(true, false) [=false]Enable this option to optimise the control code by allocating all of the vertex state in the first 256KB of memory. This has a disadvantage that this is the same range of memory that the code must be put in, so if the sum of the two is larger than 256KB then the model will fail to compile.
opt.useAutoloader
(true, false) [=true on Mk2 IPU, false otherwise]If true, use the secondary loading mechanism to load the executable. This option is ignored on nonIPU targets.
Engine creation options: Profiler
The profiler options control how Poplar generates the reports that can be viewed in the PopVision Graph Analyser (the graph and execution profiles)
profiler.replicaToProfile
Integer [=All replicas]Specifies which replica (0based index) will be profiled. Note that a highlevel summary of several metrics and timings will still be provided for the whole execution. If not specified, all replicas will be profiled.
Engine creation options: Target
target.deterministicWorkers
(true, false, portable) [=true]Ensure that the mapping of vertices to worker threads is the same for repeated execution either on the same IPU (true), or on every IPU (portable). This guarantee does not hold following breakpoints or exceptions.
target.saveArchive
StringIf set, the binary archive will be saved to the specified filename during graph compilation. This archive contains the ELF files for each tile. No archive will be saved unless this option is set.
target.syncMethod
(polling, hybrid, default) [=default]Controls how the host determines when an IPU wants to sync.
polling: Using polling to determine when an IPU wants to sync.
hybrid: Use a mixture of interrupts and polling to determine when an IPU wants to sync.
default: Choose a sensible default method based on the device type. Currently, we default to polling for all device types but this may change in future.
target.syncPollPeriodUs
Integer [=0]The period to use when polling for a host sync, in microseconds.
target.exceptionPollPeriodUs
Integer [=250000]The period to use to poll for possible exceptions in the device, while waiting for the host sync, in microseconds.
target.hostSyncTimeout
Integer [=300]The amount of time to wait for a response from the IPU after running a program, in seconds. “0” means no timeout.
target.gatewayWriteCombining
(true, false) [=false]Optimise writetohost code to use IPUMachine gateway write combining.
target.maxStreamCallbackThreadsPerNumaNode
Integer or “auto” [=0] (deprecated)The maximum number of threads per NUMA node to use to execute stream callbacks. A value of 0 means the main thread will execute all of the callbacks, which is the default because a nonzero number of threads requires threadsafe callbacks.
A value of auto means the hardware will be fully utilised, this typically means up to one thread per CPU core is used.
Note that this is the maximum number of threads in addition to the main thread. For example, on a system with two NUMA nodes setting this option to 1 would mean that a total of three threads could execute callbacks, with one thread pinned to each NUMA node and the main thread operating on one of the two nodes as well (assuming the main thread is free to execute callbacks).
streamCallbacks.multiThreadMode
(singleThread, collaborative, dedicated) [=singleThread]Specifies how the invocation of stream callbacks is parallelised. Be aware that using a multithreading mode other than singleThread requires that the stream callbacks connected to the engine are threadsafe.
singleThread: the main application thread is used to perform the invocation of stream callbacks. No other threads assist it.
collaborative: the main application thread and any specified number of worker threads collaborate in invoking any necessary stream callbacks.
dedicated: the main application thread will not assist worker threads and will focus on IPUhost exchange events.
streamCallbacks.numaAware
(true, false) [=false]Can only be enabled in a multithread mode. Creates a separate thread pool for each NUMA node used by the devices the application is using.
streamCallbacks.numWorkerThreads
Integer or “auto” [=0]The maximum number of threads to execute stream callbacks. A value of 0 means the main thread will execute all of the callbacks.
Specifying the value auto will create as many threads as necessary to fill all the available CPU resources. This typically means one thread per CPU core.
If the value of
multiThreadMode
is singleThread then the only allowed value is 0. All other values ofmultiThreadMode
require a value greater than 0.If you enable the option
target.numaAwareCallbacks
, the worker threads are distributed as evenly as possible to multiple thread pools: one thread pool per NUMA node in use by the attached IPU devices.
Engine creation options: Report generation
The report generation options will automatically output the Poplar reports that can be viewed in the PopVision Graph Analyser.
These options provide a basic ability to capture the reports. For more complex use cases the reports should be generated programmatically via functions in the framework (TensorFlow, PopTorch, PopART or Poplar) in which the application is written.
autoReport.all
(true, false) [=false]Output all the available reports described below.
You can exclude individual reports by combining options. For example, this will generate all reports apart from the serialized graph:
{"autoReport.all":"true", "autoReport.outputSerializedGraph":"false"}
autoReport.outputGraphProfile
(true, false) [=false]Output the graph profile report to
profile.pop
.autoReport.outputLoweredVars
(true, false) [=false]Generate lowered variables info in
profile.pop
. This is equivalent to using thedebug.loweredVarDumpFile
option with the filename set toprofile.pop
.To generate the old capnp format, set
debug.loweredVarDumpFile
tovars.capnp
.autoReport.outputArchive
(true, false) [=false]Output the archive report:
archive.a
. This is equivalent to using thetarget.saveArchive
option with the filename set toarchive.a
.autoReport.outputSerializedGraph
(true, false) [=false]Output the serialized graph: serialized_graph.capnp.
autoReport.outputExecutionProfile
(true, false) [=false]Output the execution profile report to
profile.pop
By default this setting will also set
debug.instrument
to true. If you do not want instrumentation enabled you can setautoReport.outputExecutionProfile
ordebug.instrument
to false.autoReport.streamAtEachRun
(true, false) [=true]Applies to profiler format V3 or higher. Enable or disable the streaming of the execution profile to disk at each run. If false, the whole execution will be written to disk on Engine destruction (note, some frameworks like TensorFlow may not properly destroy the Engine).
autoReport.outputDebugInfo
(true, false) [=false]Output debug info:
debug.json
. This file gathers the data in every DebugInfo object created. Elements in the graph report with debugIds can be related to these DebugInfo objects.autoReport.executionProfileProgramRunCount
Integer [=2]Specify how many runs of each program to capture in the execution profile.
autoReport.directory
String [=./]Specify which directory you want the reports to be written to. By default they will be written to the current working directory.
Engine creation options: Other
prng.enableStochasticRounding
(true, false) [=false]If true, stochastic rounding is enabled.
You can also enable or disable stochastic rounding using the functions setFloatingPointBehaviour() and setStochasticRounding(). For setFloatingPointBehaviour() the default behaviour is to enable stochastic rounding.
prng.seed
Integer [=0]Base seed for PRNG initialisation.
Public Types

using
ProgressFunc
= std::function<void(int, int)>¶ Callback function used to to indicate engine compilation progress.
The function is passed two integers. The first is the progress value and the second is the maximum value for the progress.
If a progress callback is used, the function should not block. All calls to the callback function will be made in a single dedicated thread so blocking in the callback will block the receipt of further notifications (but will not block compilation from progressing). The callback should not use Poplar objects or functions relating to the Graph, Engine or Device that are being compiled.
Public Functions

Engine
(const Graph &graph, ArrayRef<program::Program> progs, const OptionFlags &opt = {}, ProgressFunc progressCallBack = ProgressFunc(), const DebugContext &debugContext = {})¶ Construct the engine from a graph and a list of programs.
 Parameters
graph
: The graph to compile into th

Executable