3. Poplar API reference

3.1. Utility classes

3.1.1. poplar/ArrayRef.hpp

References to arrays.

namespace poplar

Poplar classes and functions.

Functions

template<class T>
bool operator==(const ArrayRef<T> &l, const ArrayRef<T> &r)
template<class T>
bool operator!=(const ArrayRef<T> &l, const ArrayRef<T> &r)
template<class T>
class ArrayRef

Subclassed by poplar::StringRef

Public Types

using value_type = T
using reference = T&
using const_reference = const T&
using difference_type = std::ptrdiff_t
using size_type = std::size_t
using iterator = const T*
using const_iterator = const T*

Public Functions

ArrayRef(std::nullptr_t) = delete
constexpr ArrayRef()
constexpr ArrayRef(const T *p, std::size_t size)
ArrayRef(const std::vector<T> &v)
template<std::size_t N>
constexpr ArrayRef(const std::array<T, N> &a)
template<std::size_t N>
constexpr ArrayRef(const T (&p)[N])
constexpr ArrayRef(const std::initializer_list<T> &list)
constexpr ArrayRef(const ArrayRef&) = default
constexpr const T *data() const
constexpr std::size_t size() const
constexpr bool empty() const
const T &front() const
const T &operator[](std::size_t i) const
const_iterator begin() const
const_iterator end() const
const_iterator cbegin() const
const_iterator cend() const
std::vector<T> cloneAsVector() const
std::size_t max_size() const

Private Members

const T *ptr
std::size_t len

3.1.2. poplar/Interval.hpp

namespace poplar

Poplar classes and functions.

Typedefs

typedef GenericInterval<std::size_t> Interval

Functions

template<class T>
bool operator==(const GenericInterval<T> &a, const GenericInterval<T> &b)
template<class T>
bool operator<(const GenericInterval<T> &a, const GenericInterval<T> &b)
template<class T>
bool operator!=(const GenericInterval<T> &a, const GenericInterval<T> &b)
template<class T>
bool operator>=(const GenericInterval<T> &a, const GenericInterval<T> &b)
template<class T>
bool operator>(const GenericInterval<T> &a, const GenericInterval<T> &b)
template<class T>
bool operator<=(const GenericInterval<T> &a, const GenericInterval<T> &b)
template<class T>
std::ostream &operator<<(std::ostream &os, const GenericInterval<T> &b)
template<class T>
struct GenericInterval
#include <Interval.hpp>

This class represents an interval that is closed at its lower bound and open at its upper bound.

It is almost always used with T = std::size_t, for which there is a convenient Interval typedef.

Public Functions

GenericInterval() = default

Initialise with begin and end set to their default value of 0.

GenericInterval(T begin, T end)
const T &begin() const
const T &end() const
T size() const

Get the size of the interval.

Return

The size of the interval.

Private Members

T begin_ = {}
T end_ = {}

3.1.3. poplar/OptionFlags.hpp

namespace poplar

Poplar classes and functions.

Functions

ProfileValue getAsProfileValue(const OptionFlags &flags)
void readJSON(StringRef string, OptionFlags &flags)

Read options from a string in JSON format.

Parameters
  • string: The string to parse.

  • flags: The OptionFlags to update.

Exceptions

void readJSON(std::istream &stream, OptionFlags &flags)

Read options from a stream in JSON format.

Parameters
  • stream: The input stream to read from.

  • flags: The OptionFlags to update.

Exceptions

std::ostream &operator<<(std::ostream &ostream, const OptionFlags &flags)

Write the contents of the given flags to an ostream in JSON format.

Parameters
  • ostream: The stream to write to.

  • flags: The OptionFlags to write.

class OptionFlags
#include <OptionFlags.hpp>

A set of option/value string flags to be used in various APIs.

Public Types

using OptionFlag = std::pair<const std::string, std::string>
using initializer_list = std::initializer_list<OptionFlag>

Public Functions

OptionFlags()

Construct a set of option flags.

The default constructor creates an empty set of flags.

~OptionFlags()
OptionFlags(const OptionFlags &other)
OptionFlags(OptionFlags &&other)
OptionFlags &operator=(const OptionFlags &other)
OptionFlags &operator=(OptionFlags &&other)
bool operator==(const OptionFlags &other) const

Option flags are an exact match.

Each collection contains the same keys, and both collections have the same values for each key

OptionFlags(initializer_list &&list)

Construct a set of option flags from an initializer list of string pairs.

Flags are set in the order they appear in the constructor.

Setting a flag more than once will result in the previous value for that option being overwritten.

Parameters
  • initializer: A list of option/value string pairs to set in the flags.

void set(initializer_list &&list)

Set option flags from an initializer list of string pairs.

Flags are set in the order they appear in the list.

Setting a flag more than once will result in the previous value for that option being overwritten. If the option was already set in these flags then the previous value will be overwritten.

Parameters
  • initializer: A list of option/value string pairs to set in the flags.

void set(StringRef option, StringRef value)

Set a single option to a value.

If the option was already set in these flags then the previous value will be over- written.

Parameters
  • option: The option to set in the flags.

  • value: The value to set the option to in the flags.

StringRef at(StringRef option) const

Retrieves the value of the given option.

If the option does not exist, then an exception is thrown.

Parameters
  • option: The option to retrieve in the flags.

void clear()

Remove all set flags.

iterator begin() const

Get iterators for the currently set option flags.

All iterators are invalidated when a new flag is set or the option flags are re-assigned.

iterator end() const

Private Members

std::unique_ptr<core::OptionFlags> impl
class iterator : public std::iterator<std::forward_iterator_tag, OptionFlag>

Public Functions

iterator(const iterator &other)
iterator &operator=(const iterator &other)
iterator(iterator &&other) noexcept
iterator &operator=(iterator &&other) noexcept
~iterator()
const OptionFlag &operator*() const
const OptionFlag *operator->() const
bool operator==(const iterator &other) const
bool operator!=(const iterator &other) const
iterator &operator++()
iterator operator++(int)

Private Functions

iterator(std::unique_ptr<core::OptionFlagsIterator> &&p) noexcept

Private Members

std::unique_ptr<core::OptionFlagsIterator> impl

Friends

friend class OptionFlags
namespace core

3.1.4. poplar/RandomSeed.hpp

namespace poplar

Poplar classes and functions.

Functions

Tensor getHwSeeds(Graph &graph, program::Sequence &prog, const DebugContext &debugContext = {})

Gets a snapshot of the h/w seeds for each worker in device.

Return

A tensor of shape {number of tiles, number of worker contexts, 4}, containing seed values for each of the 4 PRNG_x_y registers for each worker context on each tile.

Parameters
  • graph: The Poplar graph.

  • prog: The program sequence to be extended.

  • debugPrefix: The prefix prepended to debugging info.

void setHwSeeds(Graph &graph, const Tensor &hwSeeds, program::Sequence &prog, const DebugContext &debugContext = {})

Sets the hw seeds for each worker in a device from a snapshot of the seeds.

Parameters
  • graph: The Poplar graph.

  • hwSeeds: A tensor of shape {number of tiles, number of worker contexts, 4} containing seed values for each of the 4 PRNG_x_y registers for each worker context on each tile.

  • prog: The program sequence to be extended.

  • debugPrefix: The prefix prepended to debugging info.

3.1.5. poplar/ReplicatedStreamMode.hpp

namespace poplar

Poplar classes and functions.

Enums

enum ReplicatedStreamMode

Define how a stream is replicated when added to a replicated graph.

See

Graph::addHostToDeviceFIFO

Values:

enumerator REPLICATE

Create a stream per replica.

enumerator BROADCAST

Create a single stream whose data is implicitly broadcast to every replica.

3.1.6. poplar/SerializationFormat.hpp

namespace poplar

Poplar classes and functions.

Enums

enum SerializationFormat

Format for serializing a Tensor or Graph object.

See

Graph::serializeTensors Graph::deserializeTensors Graph::serialize

Values:

enumerator Binary

Serialise in binary (CapnProto) format.

enumerator JSON

Serialise in JSON format.

3.1.7. poplar/StringRef.hpp

namespace poplar

Poplar classes and functions.

Functions

bool operator==(StringRef l, StringRef r)
bool operator!=(StringRef l, StringRef r)
bool operator<(StringRef l, StringRef r)
std::string operator+(StringRef l, StringRef r)
std::string &operator+=(std::string &s, StringRef r)
std::ostream &operator<<(std::ostream &os, const StringRef &s)
struct StringRef : public poplar::ArrayRef<char>

Public Functions

constexpr StringRef()
constexpr StringRef(const StringRef&) = default
StringRef(std::nullptr_t) = delete
StringRef(const std::string &s)
constexpr StringRef(const char *p, std::size_t len)
StringRef(const char *p)
template<std::size_t N>
constexpr StringRef(const char (&p)[N])
std::string cloneAsString() const
operator std::string() const

3.1.8. poplar/SyncType.hpp

namespace poplar

Poplar classes and functions.

Enums

enum SyncType

An enumeration used to state what type of synchronisation a Sync program represents.

Values:

enumerator INTERNAL

Each tile waits until all the other tiles in the same IPU reach the Sync program before continuing.

enumerator EXTERNAL

Each tile waits until all the other tiles in all IPUs in the device reach the Sync program before continuing.

enumerator GLOBAL

Each tile waits until all the other tiles in all IPUs globally reach the Sync program before continuing.

Functions

std::ostream &operator<<(std::ostream &os, const SyncType &t)

3.1.9. poplar/TypeTraits.hpp

namespace poplar

Poplar classes and functions.

struct TypeTraits
#include <TypeTraits.hpp>

A structure to provide information about arithmetic (integer and floating point) types.

Public Functions

bool isSimpleType() const
template<>
TypeTraits make()
template<>
constexpr bool isSimpleType()

Public Members

std::size_t size
std::size_t align
bool isIntegral
bool isFloat
bool isSigned

Public Static Functions

template<typename T>
TypeTraits make()
template<typename T>
constexpr bool isSimpleType()

Return true if it is a basic numeric type, i.e.

std::is_integral<> or std::is_floating_point<> is true, or it is IeeeHalf.

3.1.10. poplar/CSRFunctions.hpp

Functions to configure the floating behaviour of the tiles by programming the Control and Status Registers (CSR).

namespace poplar

Poplar classes and functions.

Functions

void setFloatingPointBehaviour(poplar::Graph &graph, poplar::program::Sequence &prog, const FloatingPointBehaviour &behaviour, const DebugContext &debugContext = {})

Set the floating point behaviour of a tile.

Configures the floating point behaviour of a tile, affecting the treatment of exceptions and selecting stochastic rounding according to the passed behaviour structure.

Note that, in Poplar, stochastic rounding is disabled by default until either this function, setStochasticRounding() or the Engine options are used to enable it.

Parameters
  • graph: The Poplar graph

  • prog: The program to be extended

  • behaviour: A structure of type floatingPointBehaviour

  • debugPrefix: The prefix prepended to debugging info

void setFloatingPointBehaviour(poplar::Graph &graph, poplar::program::Sequence &prog, const poplar::Tensor &behaviour, const DebugContext &debugContext = {})

Set the floating point behaviour of a tile.

Configures the floating point behaviour of a tile, affecting the treatment of exceptions and selecting stochastic rounding according to the passed behaviour tensor.

The behaviour tensor must be one returned by getAndModifyFloatingPointBehaviour.

Parameters
  • graph: The Poplar graph

  • prog: The program to be extended

  • behaviour: A tensor containing representation of floatingPointBehaviour

  • debugPrefix: The prefix prepended to debugging info

void setStochasticRounding(poplar::Graph &graph, poplar::program::Sequence &prog, bool behaviour, const DebugContext &debugContext = {})

Set stochastic rounding on or off for the selected tile.

Configures the stochastic rounding operation of a tile according to the passed behaviour parameter.

Note that, in Poplar, stochastic rounding is disabled by default until either this function, setFloatingPointBehaviour() or the Engine options are used to enable it.

Parameters
  • graph: The Poplar graph

  • prog: The program to be extended

  • behaviour: Select stochastic rounding: true or false

  • debugPrefix: The prefix prepended to debugging info

poplar::Tensor getAndModifyFloatingPointBehaviour(poplar::Graph &graph, poplar::program::Sequence &prog, const FloatingPointBehaviour &clear, const FloatingPointBehaviour &set, const DebugContext &debugContext = {})

Get state and and modify floating point behaviour on every tile that belongs to the target in the graph.

Returns the previous state and modifies behaviour. Behaviour modification first clears behaviour set in clear followed by setting behaviour set in set.

The recommended usage of this function should be as follows to avoid unexpected numerical behaviour:

… auto state = getAndModifyFloatingPointBehaviour(…) // operations that require the modified FP behaviour … setFloatingPointBehaviour(state)

Return

State before FP behaviour is modified

Parameters
  • graph: The Poplar graph

  • prog: The program to be extended

  • clear: Select behaviour to clear with fields set being ones cleared. Eg: if set.inv is true, that is cleared and if not set, behaviour is unchanged.

  • set: Select behaviour to set. The behaviour to set always follows clear. Only set if field is true.

  • debugPrefix: The prefix prepended to debugging info

struct FloatingPointBehaviour
#include <CSRFunctions.hpp>

Structure to specify floating point behaviour.

Parameters
  • inv: If true, a floating-point invalid operation (defined by IEEE 754) will cause an exception.

    The invalid operations are:

    • Addition or subtraction where the operands are + or - infinity (inf) and the operation results in the subtraction of two infs; for example: (-inf)+(+inf) or (+inf)-(+inf).

    • Divisions: (+/-0)/(+/-0) and (+/-inf)/(+/-inf).

    • Multiplications: (+/-0)*(+/-inf) and (+/-inf)*(+/-0).

    • Remainder: x REM y where y=0 or x=(+/-inf).

    • Real operations with complex results such as the square root or logarithm of a negative number.

    • Operations with Not-a-Number as at least one operand.

    • Comparisons where one of the operands is Not-a-Number.

      See also nanoo below.

  • div: If true a floating point divide by zero operation will cause an exception.

  • oflo: If true a floating point overflow will cause an exception.

  • esr: Enable stochastic rounding.

  • nanoo: Enable Not-a-Number on overflow mode. When enabled, half precision calculations that have overflowed will produce a Not-a-Number result, rather than saturating to the half precision max/min value, and the invalid operation (inv) flag will be set.

Public Functions

FloatingPointBehaviour(bool inv, bool div0, bool oflo, bool esr, bool nanoo)
FloatingPointBehaviour() = default
FloatingPointBehaviour operator!() const

Public Members

bool inv = true
bool div0 = true
bool oflo = true
bool esr = true
bool nanoo = true

3.2. Exceptions

3.2.1. poplar/exceptions.hpp

namespace poplar

Poplar classes and functions.

struct control_program_error : public poplar::poplar_error
#include <exceptions.hpp>

This exception is thrown when the construction of a graph program is invalid.

Public Functions

control_program_error(const std::string &msg)
struct file_load_error : public poplar::poplar_error

Public Functions

file_load_error(const std::string &path)
struct graph_connection_error : public poplar::poplar_error
#include <exceptions.hpp>

This exception is thrown during construction of an Engine object if there is an error in the structure of graph, for example, if there are no edges to a vertex input or if there are multiple edges to a vertex input.

Public Functions

graph_connection_error(const std::string &s)
graph_connection_error(const char *s)
struct graph_cycle_error : public poplar::poplar_error
#include <exceptions.hpp>

This exception is thrown during the construction is an Engine object if there are any cycles in the graph that are not broken by recurrent edges.

Public Functions

graph_cycle_error(const std::string &s)
graph_cycle_error(const char *s)
struct graph_memory_allocation_error : public poplar::poplar_error
#include <exceptions.hpp>

This exception is thrown when an memory allocation fails.

Public Functions

graph_memory_allocation_error(const std::string &s)
graph_memory_allocation_error(const char *s)

Public Members

ProfileValue graphProfile
struct graph_object_creation_error : public poplar::poplar_error
#include <exceptions.hpp>

This exception is thrown in the construction of a GraphProgEnv object if there was an error in the creation of the graph program object file.

Public Functions

graph_object_creation_error(const std::string &s)
graph_object_creation_error(const char *s)
struct graph_object_load_error : public poplar::poplar_error
#include <exceptions.hpp>

This exception is thrown in the construction of a GraphProgEnv object if there was an error in loading the graph program object file.

Public Functions

graph_object_load_error(const std::string &path, const std::string &error)
struct graph_program_compilation_error : public poplar::poplar_error
#include <exceptions.hpp>

This exception is thrown in the construction of a GraphProgEnv object if there are any compilation errors in the graph program.

Public Functions

graph_program_compilation_error(const std::string &s)
graph_program_compilation_error(const char *s)
struct graph_replication_error : public poplar::poplar_error
#include <exceptions.hpp>

This exception is thrown when an invalid operation is carried out on a replicated graph.

Public Functions

graph_replication_error(const std::string &s)
graph_replication_error(const char *s)
struct index_error : public poplar::poplar_error
#include <exceptions.hpp>

This exception is thrown if the index of a subscript is out of the bounds of the field it is accessing or if a index of a tensor is invalid.

Public Functions

index_error(const std::string &s)
index_error(const char *s)
index_error(const std::string &vertexDotField, std::size_t index)
struct invalid_machine_model : public poplar::poplar_error
#include <exceptions.hpp>

This exception is thrown when an invalid model of the IPU (for performance model profiling) has been specified.

Public Functions

invalid_machine_model(const std::string &s)
invalid_machine_model(const char *s)
struct invalid_option : public poplar::poplar_error
#include <exceptions.hpp>

This exception is thrown when an unrecognised or invalid option is passed to a Poplar API.

Public Functions

invalid_option(const std::string &s)
invalid_option(const char *s)
struct invalid_tile_mapping : public poplar::poplar_error
#include <exceptions.hpp>

This exception is thrown when the tile mapping passed to the UserTilePartitioner is invalid.

Public Functions

invalid_tile_mapping(const std::string &s)
invalid_tile_mapping(const char *s)
#include <exceptions.hpp>

This exception is thrown when the linking stage for codelets fails.

output is the output from the linker command.

Public Functions

Public Members

struct memory_elem_constraints_error : public poplar::poplar_error
#include <exceptions.hpp>

This exception is thrown when an invalid memory element constraint has been provided in a codelet.

Public Functions

memory_elem_constraints_error(const std::string &s)
memory_elem_constraints_error(const char *s)
struct missing_perf_estimate : public poplar::poplar_error
#include <exceptions.hpp>

This exception is thrown when an Engine is constructed with profiling enabled but a vertex does not have a getPerfEstimate method specified.

Public Functions

missing_perf_estimate(const std::string &vertexName)
struct no_environment : public poplar::poplar_error
#include <exceptions.hpp>

This exception is thrown, in the construction of a GraphProgEnv object, in mixed-mode compilation, if there is no graph-programming environment available, in particular if the program has not been compiled with the ‘popc’ command-line tool.

Public Functions

no_environment(const std::string &s)
no_environment(const char *s)
struct no_size_specified : public poplar::poplar_error
#include <exceptions.hpp>

This exception is thrown if the size of a field is not specified in a Graph object when an EngineBuilder object is constructed.

Public Functions

no_size_specified(const std::string &s)
no_size_specified(const char *s)
no_size_specified(const std::string &fieldName, const std::string &vertexName)
struct overflow_error : public poplar::poplar_error
#include <exceptions.hpp>

This exception is thrown when an arithmetic overflow occurs within Poplar.

Public Functions

overflow_error(const std::string &s)
overflow_error(const char *s)
struct parse_error : public poplar::poplar_error
#include <exceptions.hpp>

This exception is thrown when an input file or string cannot be parsed.

Public Functions

parse_error(const std::string &s)
parse_error(const char *s)
struct poplar_error : public runtime_error
#include <exceptions.hpp>

Base class for Poplar exceptions.

Subclassed by poplar::control_program_error, poplar::file_load_error, poplar::graph_connection_error, poplar::graph_cycle_error, poplar::graph_memory_allocation_error, poplar::graph_object_creation_error, poplar::graph_object_load_error, poplar::graph_program_compilation_error, poplar::graph_replication_error, poplar::index_error, poplar::invalid_machine_model, poplar::invalid_option, poplar::invalid_tile_mapping, poplar::link_error, poplar::memory_elem_constraints_error, poplar::missing_perf_estimate, poplar::no_environment, poplar::no_size_specified, poplar::overflow_error, poplar::parse_error, poplar::profiling_disabled, poplar::runtime_error, poplar::stream_connection_error, poplar::stream_memory_allocation_error, poplar::symbol_error, poplar::tensor_creation_error, poplar::tensor_io_state_error, poplar::type_error, poplar::unknown_field, poplar::unknown_vertex_type

Public Functions

poplar_error(const std::string &s)
poplar_error(const char *s)

Public Members

std::string type
struct profiling_disabled : public poplar::poplar_error
#include <exceptions.hpp>

This exception is thrown if profiling information is requested from an Engine but that Engine has not been constructed with profiling enabled.

Public Functions

profiling_disabled()
struct runtime_error : public poplar::poplar_error
#include <exceptions.hpp>

This exception is thrown when the interaction with the device via graphcore device access fails.

Public Functions

runtime_error(const std::string &s)
runtime_error(const char *s)
struct stream_connection_error : public poplar::poplar_error
#include <exceptions.hpp>

This exception is thrown when an invalid attempt is made to connect a data stream.

Public Functions

stream_connection_error(const std::string &s)
stream_connection_error(const char *s)
struct stream_memory_allocation_error : public poplar::poplar_error
#include <exceptions.hpp>

This exception is thrown when allocation of stream buffers fails.

Public Functions

stream_memory_allocation_error(const std::string &s)
stream_memory_allocation_error(const char *s)
struct symbol_error : public poplar::poplar_error

Public Functions

symbol_error(const std::string &name, const unsigned tile)
struct tensor_creation_error : public poplar::poplar_error
#include <exceptions.hpp>

This exception is thrown in the construction of a tensor if invalid arguments are provided to the tensor creation function or method.

Public Functions

tensor_creation_error(const std::string &s)
tensor_creation_error(const char *s)
struct tensor_io_state_error : public poplar::poplar_error
#include <exceptions.hpp>

This exception is thrown when an attempt is made to mark a tensor as an input or output, but the argument references a view of a tensor, rather than a whole tensor.

Public Functions

tensor_io_state_error(const std::string &s)
tensor_io_state_error(const char *s)
struct type_error : public poplar::poplar_error
#include <exceptions.hpp>

This exception is thrown when there is an error related to the field types of vertices, for example, when the source of an edge contains an input, the types of inputs and source field between an edge do not match, or when a field cannot be subscripted.

Public Functions

type_error(const std::string &s)
type_error(const char *s)
struct unknown_field : public poplar::poplar_error
#include <exceptions.hpp>

This exception is thrown when a field name is specified that does not exist in the graph-programming environment.

Public Functions

unknown_field(const std::string &s)
unknown_field(const char *s)
unknown_field(const std::string &fieldName, const std::string &vertexTypeName)
struct unknown_vertex_type : public poplar::poplar_error
#include <exceptions.hpp>

This exception is thrown when a vertex type name is specified that does not exist in the graph programming environment.

Public Functions

unknown_vertex_type(const std::string &name)

3.3. Graph classes

3.3.1. poplar/CodeletFileType.hpp

namespace poplar

Poplar classes and functions.

Enums

enum CodeletFileType

Values:

enumerator PreprocessedAsmSource

A graph assembly language source file.

enumerator AsmSource

A graph assembly language file with preprocessor macros.

enumerator CSource

A graph C source file.

enumerator CppSource

A graph C++ source file.

enumerator IrSource

A graph LLVM IR source file.

enumerator Object

A graph program object file.

enumerator Auto

Auto detect based on file name.

Functions

CodeletFileType getCodeletFileType(const char *path)
std::string getExtensionFromFileType(CodeletFileType type)

3.3.2. poplar/DataStream.hpp

namespace poplar

Poplar classes and functions.

class DataStream
#include <DataStream.hpp>

An object representing a stream for communicating between the host and the device.

A stream is a unidirectional communication from the host to the device, or from the device to the host.

The maximum buffer size for each stream is 128 MBytes.

See

Graph::addHostToDeviceFIFO Graph.addDeviceToHostFIFO

Public Functions

DataStream()
DataStream(const DataStream&)
DataStream(DataStream&&)
~DataStream()
DataStream &operator=(const DataStream&)
DataStream &operator=(DataStream&&)
std::string handle() const
std::size_t numElements() const
unsigned replicationFactor() const
ReplicatedStreamMode replicatedMode() const
DataStreamType type() const
Type elementType() const
DataStream(std::unique_ptr<core::DataStreamRef>)
const core::DataStreamRef &getImpl() const

Private Members

std::unique_ptr<core::DataStreamRef> impl
class RemoteBuffer
#include <DataStream.hpp>

A remote buffer is a region of remote (meaning not on the IPU) memory that is used as a cache.

It is implemented as two DataStreams: one to write to the remote memory, the other to read the data back to the IPU.

Public Functions

RemoteBuffer()
RemoteBuffer(const RemoteBuffer&)
RemoteBuffer(RemoteBuffer&&)
~RemoteBuffer()
RemoteBuffer &operator=(const RemoteBuffer&)
RemoteBuffer &operator=(RemoteBuffer&&)
std::string handle() const
DataStream getIpuToHostStream() const
DataStream getHostToIpuStream() const
size_t numElements() const
size_t getRepeats() const
Type elementType() const
bool isRearrangeOnHost() const
bool isOptimisedForMemory() const
RemoteBuffer(std::unique_ptr<core::RemoteBufferRef>)
const core::RemoteBufferRef &getImpl() const
bool operator==(const RemoteBuffer &b) const
bool operator!=(const RemoteBuffer &b) const

Private Members

std::unique_ptr<core::RemoteBufferRef> impl
namespace core

3.3.3. poplar/DataStreamType.hpp

namespace poplar

Poplar classes and functions.

Enums

enum DataStreamType

An enumeration to represent the different types of DataStream or stream components of a RemoteBuffer.

See

Graph::addHostToDeviceFIFO, Graph::addDeviceToHostFIFO, Graph::addRemoteBuffer

Values:

enumerator HostToDeviceFIFO

A DataStream from host to device.

enumerator DeviceToHostFIFO

A DataStream from device to host.

enumerator HostToDeviceBuffer

A stream from host to device in a remote buffer.

enumerator DeviceToHostBuffer

A stream from device to host in a remote buffer.

Functions

bool isDeviceToHost(DataStreamType type)
bool isHostToDevice(DataStreamType type)
bool isRemoteBuffer(DataStreamType type)

3.3.4. poplar/Graph.hpp

namespace poplar

Poplar classes and functions.

Functions

StringRef versionString()
StringRef packageHash()
class Graph
#include <Graph.hpp>

This class represents a graph program to be executed on the IPU.

Public Types

using TileToTensorMapping = std::vector<std::vector<Interval>>
using TraceFn = std::function<void()>

Record some compilation time as part of the graphConstruction phase.

Parameters
  • name: the name of the phase, can be composed of multiple parts

  • fn: the construction code to be timed

Public Functions

Graph(const Target &target, replication_factor r = replication_factor(1))

Construct a graph object.

This constructor creates a Graph object using the given graph programming environment.

Parameters
  • target: The target the graph is being constructed to work with.

  • r: Number of times graph is to be replicated (default is no replication)

Graph(const Device &device, replication_factor r = replication_factor(1))

Construct a graph object.

This constructor creates a Graph object using the given graph programming environment.

Parameters
  • device: The device the graph is being constructed to work with.

  • r: Number of times graph is to be replicated (default is no replication)

Graph(Graph&&)
Graph &operator=(Graph&&)
~Graph()
const Target &getTarget() const

Retrieve the target that this graph is targeting.

bool addCodelets(StringRef src, CodeletFileType type = CodeletFileType::Auto, StringRef compileFlags = "")

Add a codelet to the graph.

A codelet is either a C, C++, or assembly source file, or a .gp object file. If a source file is given it is compiled for the graph’s target and then loaded into the graph. If it is an object file then it is loaded into the graph.

Symbols that codelets use are not resolved until the engine is built, so codelets can use symbols from each other by calling addCodelets() for each source or object file (or passing a list of files as a vector).

Return

True if the codelet is added to the graph successfully, or false if the codelet already existed in the graph.

Parameters
  • src: The path to a source or object file containing codelets.

  • type: Specify the type of the codelet (source or precompiled). If Auto is used, the type is determined from the filename extension.

  • compileFlags: Additional flags to pass to the compiler if using source code. For example, -g to generate debug info.

bool addCodelets(StringRef src, CodeletFileType type, StringRef compileFlags, std::ostream &compileOutput)

Add a codelet to the graph and write error messages from the compilation process to the given output stream.

By default they are printed to cerr.

bool addCodelets(ArrayRef<std::string> xs, StringRef compileFlags = "")

Add a set of codelets to the graph.

These codelets can depend on each other, for example symbols defined in one can be used by any other. The order is not important.

Return

True if all the codelets are added successfully, or false if any of the codelets are not added because they already exist in the graph.

void addCodelets(std::stringstream &stream, StringRef compileFlags = "", CodeletFileType type = CodeletFileType::CppSource)
void addCodelets(std::stringstream &stream, StringRef compileFlags, std::ostream &compileOutput, CodeletFileType type = CodeletFileType::CppSource)
VertexRef addVertex(ComputeSet cs, StringRef vertexType)

Add a vertex to the graph.

Parameters
  • cs: The compute set to add the vertex to.

  • vertexType: The name of the type of the vertex. This must be a declared vertex type in the graph programming environment used to create the graph builder.

VertexRef addVertex(ComputeSet cs, StringRef vertexType, ArrayRef<ConnectionDesc> connections)

Add a vertex to the graph and connect graph elements to some of its fields.

This variant of add vertex allows you to pass in a list of connection descriptions to connect graph elements to fields of the newly created vertex. The connection descriptions can be initialized with:

  • { string, Tensor } - connect a tensor to a field.

  • { string, FieldRef, bool } - connect a vertex field to a field.

  • { string, T v } - connect a constant value to an input field.

For example, the following:

addVertex(cs, "MyVertex", {{"x", tensor[4]}, {"y", v["z"], false}});

Will create a vertex and connect a tensor to its x field and the vertex field v[“z”] to its y field.

Parameters
  • cs: The compute set to add the vertex to.

  • vertexType: The name of the type of the vertex. This must be a declared vertex type in the graph programming environment used to create the graph builder.

  • connections: A list of connection descriptions

VertexRef addExternalExchangeVertex(ComputeSet cs, StringRef vertexType, unsigned incomingDownCount, bool usesEastEdge, bool sendsXReq)

Add an external exchange vertex to the graph.

A compute set can contain at most one external exchange vertex per tile. External exchange vertices cannot be mixed with non external exchange vertices in the same compute set. Before an external vertex is called we set the INCOMING_DCOUNT and INCOMING_MUX mux registers and synchronize all tiles containing external exchange vertices.

Parameters
  • cs: The compute set to add the vertex to.

  • vertexType: The name of the type of the vertex. This must be a declared vertex type in the graph programming environment used to create the graph builder.

  • incomingDownCount: The value to set the INCOMING_DCOUNT register to.

  • usesEastEdge: Whether the vertex uses an east edge exchange block. The INCOMING_MUX register is set to point to either the east edge or west edge depending on this argument.

  • sendsXReq: Whether this vertex is responsible for sending the XREQ packet. There must be at most one tile per exchange block context that sends the XREQ and the tile must be the same in every compute set containing external exchange vertices.

Tensor addVariable(const Type &type, ArrayRef<std::size_t> shape, const DebugContext &debugContext = {})

Add a variable to the graph.

If using this function with a target with multiple tiles then the variable will initially have no tile mapping under the expectation that the tile mapping will be set later with

Graph::setTileMapping. If the target of the graph has only one tile then the tensor will be automatically mapped to that tile.
Parameters
  • type: The type of the elements of the variable.

  • shape: The shape of the variable.

  • name: An optional name to identify the variable for debugging/profiling purposes

  • returns: A Tensor referring to the variable in the graph.

Tensor addVariable(const Type &type, ArrayRef<std::size_t> shape, VariableMappingMethod mappingMethod, const DebugContext &debugContext = {})

Add a variable to the graph.

Return

A Tensor referring to the variable in the graph.

Parameters
  • type: The type of the elements of the variable.

  • shape: The shape of the variable.

  • mappingMethod: The method to use to initially map the variable to tiles.

  • name: An optional name to identify the variable for debugging/profiling purposes

template<typename T>
Tensor addConstant(const Type &type, ArrayRef<std::size_t> shape, ArrayRef<T> values, const DebugContext &debugContext = {"<const>"})

Add a constant to the graph.

A constant tensor is a tensor with every element initialized.

Parameters
  • type: The type of the elements of the constant.

  • shape: The shape of the constant.

  • values: Vector of values to initialize tensor elements to.

  • name: An optional name to identify the variable for debugging/profiling purposes

template<typename T>
Tensor addConstant(const Type &type, ArrayRef<std::size_t> shape, T val, const DebugContext &debugContext = {"<const>"}, typename std::enable_if<TypeTraits::isSimpleType<T>()>::type* = nullptr)

Add a constant to the graph.

A constant tensor is a tensor with every element initialized to the same value. It cannot be connected to a vertex output.

Parameters
  • type: The type of the elements of the constant.

  • shape: The shape of the constant.

  • val: The value to initialize tensor elements to.

  • name: An optional name to identify the variable for debugging/profiling purposes

template<typename T>
Tensor addConstant(const Type &type, ArrayRef<std::size_t> shape, const T *val, const DebugContext &debugContext = {"<const>"}, typename std::enable_if<TypeTraits::isSimpleType<T>()>::type* = nullptr)

Add a constant to the graph with multiple cell values.

A constant tensor is a tensor with every element initialized to the same value. It cannot be connected to a vertex output.

Parameters
  • type: The type of the elements of the constant.

  • shape: The shape of the constant.

  • val: The value to initialize tensor elements to.

  • name: An optional name to identify the variable for debugging/profiling purposes

Tensor addConstant(const Type &type, ArrayRef<std::size_t> shape, const void *val, const TypeTraits &traits, bool broadcast, const DebugContext &debugContext = {"<const>"})
Tensor addConstantHalf(const Type &type, ArrayRef<std::size_t> shape, uint16_t val, const DebugContext &debugContext = {"<const>"})

Add a constant to the graph, where the host data is type IEEE half.

A constant tensor is a tensor with every element initialized to the same value. It cannot be connected to a vertex output.

Parameters
  • type: The type of the elements of the constant.

  • shape: The shape of the constant.

  • val: The value to initialize tensor elements to.

Tensor addConstantHalf(const Type &type, ArrayRef<std::size_t> shape, const uint16_t *val, const DebugContext &debugContext = {"<const>"})

Add a constant to the graph with multiple cell values, where the host data is type IEEE half.

A constant tensor is a tensor with every element initialized to the same value. It cannot be connected to a vertex output.

Parameters
  • type: The type of the elements of the constant.

  • shape: The shape of the constant.

  • val: The value to initialize tensor elements to.

Tensor clone(const Type &type, const Tensor &t, const DebugContext &debugContext = {}, TensorCloneMethod method = TensorCloneMethod::PRESERVE_ORDER_UNLESS_ALIASES)

Add a tensor to the graph that has the same size and tile mapping as Tensor t.

Parameters
  • type: The element type of the new tensor.

  • t: The tensor to be cloned.

  • name: A debug name to give to any new tensors allocated in the graph during the clone. If this is empty then the debug names will be derived from existing tensor debug names.

  • method: The method to use for the cloning (decides whether to preserve ordering/aliasing in the new tensor).

Tensor cloneN(const Type &type, const Tensor &t, std::size_t N, const DebugContext &debugContext = {}, TensorCloneMethod method = TensorCloneMethod::PRESERVE_ORDER_UNLESS_ALIASES, TensorCloneDuplicationMethod duplicationMethod = TensorCloneDuplicationMethod::DUPLICATE_BY_OUTER_DIMENSION)

Clone a tensor N times.

Given a tensor of shape [D1, D2, … Dn], this function will create a new tensor of shape [N, D1, D2, …, Dn] where each of the N sub-tensors is a clone of the original tensor (i.e. has the same layout and tile mapping).

See

TensorCloneDuplicationMethod

Parameters
  • type: The element type of the new tensor.

  • t: The tensor to clone

  • N: The replication factor to clone with

  • name: The name for the new variables created

  • method: The tensor cloning method (see Graph::clone)

  • duplicationMethod: The behaviour used when a Tensor is cloned.

Tensor clone(const Tensor &t, const DebugContext &debugContext = {}, TensorCloneMethod method = TensorCloneMethod::PRESERVE_ORDER_UNLESS_ALIASES)

Add a tensor to the graph that has the same size and tile mapping as Tensor t.

Parameters
  • t: The tensor to be cloned.

  • name: A debug name to give to any new tensors allocated in the graph during the clone. If this is empty then the debug names will be derived from existing tensor debug names.

  • method: The method to use for the cloning (decides whether to preserve ordering/aliasing in the new tensor).

Tensor cloneN(const Tensor &t, std::size_t N, const DebugContext &debugContext = {}, TensorCloneMethod method = TensorCloneMethod::PRESERVE_ORDER_UNLESS_ALIASES, TensorCloneDuplicationMethod duplicationMethod = TensorCloneDuplicationMethod::DUPLICATE_BY_OUTER_DIMENSION)

Clone a tensor N times.

Given a tensor of shape [D1, D2, … Dn], this function will create a new tensor of shape [N, D1, D2, …, Dn] where each of the N sub-tensors is a clone of the original tensor (i.e. has the same layout and tile mapping).

See

TensorCloneDuplicationMethod

Parameters
  • t: The tensor to clone

  • N: The replication factor to clone with

  • name: The name for the new variables created

  • method: The tensor cloning method (see Graph::clone)

  • duplicationMethod: The behaviour used when a Tensor is cloned.

void connect(FieldRef field, const Tensor &tensor)

Connect a tensor to a vertex field.

This function connects an a tensor with a vertex field. If the vertex field is an scalar input/output then a simple edge is added (and the tensor must be of zero dimension; in other words, a scalar). If the vertex field is an input/output of a vector then a vector edge is added (and the tensor must be of dimension 1). If the vertex field is a vector of inputs or outputs then the size of the field is set to the correct size and edges are added for every element of the tensor tensor (and the tensor must be of dimension 1). If the vertex field is a vector of input or output vectors then the tensor must be 2-dimensional. In this case, the size of the vector field is set to the size of the first dimension and vector edges are added for every sub-vector of the two dimensional tensor.

Parameters
  • tensor: The tensor.

  • field: Reference to the vertex field to connect.

template<typename T>
void connect(FieldRef field, T v, typename std::enable_if<TypeTraits::isSimpleType<T>()>::type* = nullptr)

Connect a constant value to an input field.

This method creates a single-element tensor containing a specified value and connects that tensor element to an input field.

Parameters
  • v: The value to connect.

  • field: The field to connect to.

void connect(FieldRef field, ArrayRef<Tensor> tensors)

Connect a vector of tensors to a vertex field.

This function connects an vector a tensors with a vertex field. The field must be a vector of inputs or outputs. The field will be sized to the provided vector and each element will be connect to the corresponding element of the field.

Parameters
  • tensors: The vector of tensors.

  • field: Reference to the vertex field to connect.

void setPerfEstimate(const VertexRef &v, std::uint64_t cycles, std::uint64_t flops = 0)

Set the performance estimate for a vertex.

Parameters
  • v: The vertex to set the estimate for.

  • cycles: The number of cycles that this vertex will use when run.

  • flops: The number of flops that this vertex will use when run.

void setPerfEstimate(const VertexRef &v, const VertexPerfEstimate &estimate)

Set the performance estimate for a vertex.

Parameters
  • v: The vertex to set the estimate for.

  • estimate: The performance estimates for this vertex when run.

VertexPerfEstimate getPerfEstimate(const VertexRef &v) const

Get the performance estimate for the specified vertex.

Return

The performance estimates used when this vertex is run.

Parameters
  • v: The vertex to get the estimate for.

Exceptions
  • missing_perf_estimate: if the performance estimate is not available (for example, because the graph hasn’t been executed yet).

void registerPerfEstimator(StringRef vertexTypeName, PerfEstimateFunc f)

Parameters
  • vertexTypeName: Type of vertex to register the estimator for.

  • f: Callback function that will compute a performance estimate for all vertices of this type.

unsigned getNumVertices(void) const

Get the number of vertices currently in the graph.

Return

The numbers of vertices currently in the graph.

ComputeSet addComputeSet(const DebugContext &debugContext = {})

Create a compute set within the graph.

Return

The reference to the compute set.

Parameters
  • name: An optional identifier for the compute set that may be used during profiling/debugging.

void setFieldSize(FieldRef field, std::size_t size)

Set the size of a vector field.

Parameters
  • field: The reference to the field.

  • size: The size of the field.

std::size_t getFieldSize(FieldRef field) const

Get the size of a vector field.

Return

The size of the field.

Parameters
  • field: The reference to the field.

std::size_t getMaxFieldDim(StringRef vertexName, StringRef fieldName, unsigned dimIndex) const

Find the maximum size for a dimension of a field.

Parameters
  • vertexType: The type of vertex

  • field: The field

  • dimIndex: The index of the dimension

Exceptions

double getMaxVertexFieldValue(StringRef vertexName, StringRef fieldName) const

Find the maximum value that can be represented by an element of a field.

Parameters
  • vertexType: The type of vertex

  • field: The field

template<typename T>
void setInitialValue(FieldRef field, T val, typename std::enable_if<TypeTraits::isSimpleType<T>()>::type* = nullptr)

Set the initial value of a field.

Parameters
  • field: The reference to the field.

  • val: The value to set the field to when the graph engine is created.

template<typename T>
void setInitCallback(FieldRef field, LateInitCallback<T> callback, typename std::enable_if<std::is_arithmetic<T>::value>::type* = nullptr)

Set the init callback for a field; the callback function will be called after graph construction and must return the init value of the field.

This can be called instead of calling setInitialValue(), or both can be called for the field, to ensure that the field has a (at least partially) valid starting value, for instance it if needs to be retrieved in an early stage of graph compilation, before storage allocation (for instance during cycle estimation)

Note that you must explicitly provide the template parameter T in the specialisation, when using this function, e.g.: setInitCallback<uint16_t>(vertex[“size”], sizeCallback) because the compiler will not be able to detect the correct type from the callback parameter.

Parameters
  • field: The reference to the field.

  • callback: The callback that will return the value for the field.

  • <unnamed>: This exists only to allow to insert the ‘is_arithmetic<T>’ check for the type T.

template<typename T>
void setInitialValue(FieldRef field, const std::vector<T> &v)
template<typename T>
void setInitialValue(FieldRef field, const std::initializer_list<T> &l)
void setInitialValueHalf(FieldRef field, uint16_t val)

Set the initial value of a field of type IEEE half.

Parameters
  • field: The reference to the field.

  • val: The value to set the field to when the graph engine is created.

template<typename T>
void setInitialValue(FieldRef field, ArrayRef<T> val)

Set initial values of a vector field.

Parameters
  • field: The reference to the vector field.

  • val: A vector value to set the field to when the graph engine is created.

void setInitialValueHalf(FieldRef field, ArrayRef<uint16_t> val)

Set initial values of a vector field of type IEEE half.

Parameters
  • field: The reference to the vector field.

  • val: A vector value to set the field to when the graph engine is created.

template<typename T>
void setInitialValue(const Tensor &t, T val, typename std::enable_if<TypeTraits::isSimpleType<T>()>::type* = nullptr)

Set the initial value of a tensor element.

Parameters
  • t: The tensor representing the value to set.

  • val: The value to set the field to when the graph engine is created. A buffer of values can be provided to set the elements of a non-scalar tensor.

template<typename T>
void setInitialValue(const Tensor &t, ArrayRef<T> values)
void setInitialValueHalf(const Tensor &t, uint16_t val)

Set the initial value of a tensor element of type IEEE half.

Parameters
  • t: The tensor representing the value to set.

  • val: The value to set the field to when the graph engine is created. A buffer of values can be provided to set the elements of a non-scalar tensor.

void setInitialValueHalf(const Tensor &t, ArrayRef<uint16_t> values)
void createHostWrite(StringRef handle, const Tensor &t, bool rearrangeOnHost = false)

Mark a Tensor as being available as the destination of host to device copies.

This is a convenience function that creates a host-to-device FIFO, and a Copy program that copies data from the FIFO to the tensor. When you call Engine::writeTensor() it copies the input data to the FIFO and then executes the Copy program on the device.

See

Engine::writeTensor()

Parameters
  • handle: A name to be associated with this host copy.

  • t: The tensor to be marked as an input.

  • rearrangeOnHost: Save IPU memory at the cost of exchange speed by rearranging the data on the host before sending it to the IPU, rather than doing an internal exchange. Note that due to alignment and size requirements of host exchange packets this may still require part of the transfer to be received to a temporary variable and copied to its destination.

void createHostRead(StringRef handle, const Tensor &t, bool rearrangeOnHost = false)

Mark a Tensor as being available as the source of device to host copies.

This is a convenience function that creates a device-to-host FIFO, and a Copy program that copies data to the FIFO from the tensor. When you call Engine::writeTensor() it executes the Copy program on the device and then outputs the data from the FIFO.

See

Engine::readTensor()

Parameters
  • handle: A name to be associated with this host copy.

  • t: The tensor to be marked as an output.

  • rearrangeOnHost: Save IPU memory at the cost of exchange speed by sending data in any order and rearranging it on the host, rather than doing an internal exchange before sending it.

DataStream addHostToDeviceFIFO(StringRef handle, const Type &elementType, std::size_t numElements, ReplicatedStreamMode replicatedMode = ReplicatedStreamMode::REPLICATE, const OptionFlags &options = {})

Add a data stream to the graph for copying data from the host to the device.

The maximum size of the FIFO before being split into multiple FIFOs. This is a useful option to avoid exceeding the stream buffer size limit. If the original FIFO is larger than the specified split limit, then it is replaced by a number of FIFOs which represent chunks of the original FIFO, and are read from sequentially. Setting

splitLimit to 0 or UINT_MAX disables this option.
Parameters
  • handle: A name to be associated with this stream

  • elementType: The type of data in the stream

  • numElements: The number of elements to be transferred from the stream by a Copy program.

  • replicatedMode: How the stream is replicated if this is a replicated graph.

  • options: List of options Supported options:

    • splitLimit Integer [=50 * 1024 * 1024]

  • bufferingDepth Integer [=1]

The depth of the FIFO which may be prefetched before being read by the device. By default the FIFO size is 1, so prefetches a single entry after it has been read to refill the FIFO. Increasing the size of the FIFO allows for prefetching of multiple entries, increasing the probability there will be a valid entry in the FIFO for the device to read before falling back to synchronously fetching the next entry.

  • addressSpace Enum [=pageTable]

The type of address mapping used by the hardware to translate an exchange address to a host physical address. Possible values:

  • pageTable (default): uses a lookup table which maps one memory page per entry.

  • addressTranslationTable: uses a translation table. This table contains very few entries but each of them can map large regions. This type of address mapping is only supported for replicated streams.

DataStream addDeviceToHostFIFO(StringRef handle, const Type &elementType, std::size_t numElements, const OptionFlags &options = {})

Add a data stream to the graph for copying data from the device to the host.

The maximum size of the FIFO before being split into multiple FIFOs. This is a useful option to avoid exceeding the stream buffer size limit. If the original FIFO is larger than the specified split limit, then it is replaced by a number of FIFOs which represent chunks of the original FIFO, and are read from sequentially. Setting

splitLimit to 0 or UINT_MAX disables this option.
Parameters
  • handle: A name to be associated with this stream

  • elementType: The type of data in the stream

  • numElements: The number of elements to be transferred to the stream by a Copy program.

  • options: List of options Supported options:

    • splitLimit Integer [=50 * 1024 * 1024]

RemoteBuffer addRemoteBuffer(StringRef handle, const Type &elementType, std::size_t numElements, std::size_t repeats = 1, bool rearrangeOnHost = false, bool optimiseMemory = false)

Add a remote buffer to the graph.

A remote buffer is memory outside the IPU which can be read and written by the IPU. A read returns the last written value. The remote buffer is (repeats * numElements * sizeof(elementType) + padding) bytes in size. Padding is added to meet any alignment constraints of the hardware.

Parameters
  • handle: A name to be associated with this remote buffer.

  • elementType: The type of data in the remote buffer.

  • numElements: The number of elements to be transferred to the remote buffer by a Copy program.

  • repeats: The buffer can store multiple blocks of data to be transferred. The total number of data elements in the buffer is numElements * repeats.

  • rearrangeOnHost: Perform any necessary data rearrangement on the on the host instead of on the IPU.

  • optimiseMemory: Optimise for memory use rather than speed.

void outputVertexGraph(std::ostream &outputStream, ArrayRef<program::Program> progs = {}) const

Output to a stream the vertex graph in dot file format.

Parameters
  • outputStream: The C++ stream to output the dot file onto.

void outputComputeGraph(std::ostream &outputStream, ArrayRef<program::Program> progs = {}) const

Output to a stream the compute graph in dot file format.

Parameters
  • outputStream: The C++ stream to output the dot file onto.

void setTileMapping(VertexRef v, unsigned tileNum)

Map a vertex to a specific tile on the device.

Parameters
  • v: Reference to the vertex to map

  • tileNum: The tile number to map the vertex to.

void setTileMapping(const Tensor &t, unsigned tileNum)

Map a tensor slice to a specific tile on the device.

Parameters
  • t: The tensor or tensor slice to map.

  • tileNum: The tile number to map to.

TileToTensorMapping getTileMapping(const Tensor &t, bool requireComplete = true) const

Inspect the tile mapping of a tensor.

Return

The mapping from tiles to a vector of intervals mapped to the tile (implemented as vector indexed by the tile number). The lower and upper bound of each interval are elements number in the flattened tensor.

Parameters
  • t: The tensor to inspect

  • requireComplete: If t is not fully mapped and requireComplete is true then an invalid_tile_mapping exception will be thrown.

TileToTensorMapping getTileMapping(const Tensor &t, bool *isComplete) const

Inspect the tile mapping of a tensor.

Return

The mapping from tiles to a vector of intervals mapped to the tile (implemented as vector indexed by the tile number). The lower and upper bound of each interval are elements number in the flattened tensor.

Parameters
  • t: The tensor to inspect

  • isComplete: If non-null, updated to indicate whether the mapping is complete.

TileToTensorMapping getVariableTileMapping(const Tensor &t) const

Inspect the tile mapping of a tensor.

This excludes any constant regions.

Return

The mapping from tiles to a vector of intervals mapped to the tile (implemented as vector indexed by the tile number). The lower and upper bound of each interval are elements number in the flattened tensor.

Parameters
  • t: The tensor to inspect

void setTileMapping(const Tensor &t, const TileToTensorMapping &mapping)

Set the tile mapping of a tensor based on an explicit map from tiles to tensor intervals.

Parameters
  • t: The tensor to map

  • mapping: The mapping from tiles to a vector of intervals to be placed on that tile (implemented as vector indexed by the tile number). The lower and upper bound of each interval are elements number in the flattened tensor.

Tensor getVariable(VariableRef v) const

Get a tensor representing an entire variable.

Return

A Tensor object representing that variable.

Parameters
  • v: The variable to retrieve.

bool isConstant(VariableRef v) const

Check whether a variable reference refers represents a constant.

When Graph::addConstant() is called a variable is created to represent that constant. This call checks whether a variable was created by that method or by Graph::addVariable().

Return

True if and only if the variable refers to a constant.

Parameters
  • v: The variable to examine.

std::vector<std::vector<Interval>> getSortedContiguousRegions(const Tensor &t, ArrayRef<Interval> regions, bool removeAliasedIntervals = false, std::vector<std::size_t> *aliases = nullptr) const

Get a list of sequences of intervals over a tensor such that each sequence represents a contiguous region of memory.

Return

A list of sequences of intervals. The intervals will cover the same elements of the tensor as provided as input.

Parameters
  • t: The tensor to get intervals over.

  • regions: A list of intervals representing the elements to sort into memory contiguous sequences.

  • removeAliasedIntervals: If true, remove intervals which alias others in the given regions from the result.

  • aliases: Optional list of indices for each region in the returned intervals where an index is always the same for a region representing the same underlying elements in memory. If this is nullptr, then no aliases will be returned.

void reorderToSimplify(Tensor *t, ArrayRef<Tensor*> ts, bool requireSimplestOrder = true) const

Reorder a set of tensors in order to simplify the view on data.

This function will update ‘t’ to be a (simpler) reordered view on the same data. The same reordering will be applied to all elements of ‘ts’. The reordering will be the same for all tensors so order-invariant or element-wise operations on ‘t’ and ‘ts’ can still be performed.

The main purpose of this function is to provide a way to implement more efficient graph construction of element-wise or order-invariant operations.

If ‘requireSimplestOrder’ is set to true then after execution t will consist of the minimum number of possible contiguous regions. If not, then no guarantee is give on the order of t.

All the tensors provided to this function must be of rank 1 (i.e flattened tensors) and have the same number of elements.

TensorRearranger getSimplifyingRearranger(const Tensor &t) const
Tensor findUnbroadcastTensor(const Tensor &t) const

Attempt to determine the shape of a Tensor prior to it having been broadcast.

Under some circumstances this may not be possible, failure is indicated by the returned tensor having the same shape as the input tensor

Return

A tensor which will be set to the unbroadcast (sliced from ‘t’) tensor if it is possible to do so. Each dimension of the returned tensor will be a factor of the same dimension of the input tensor. The returned tensor will have the same rank as the input tensor. If it is not possible to determine the shape of the unbroadcast tensor the input tensor will be returned.

Parameters
  • t: The input tensor

void serializeTensors(std::ostream &out, ArrayRef<Tensor> tensors, SerializationFormat format) const

Serialize a set of tensors to JSON or CapnProto.

The tensors must all be from this graph or an exception is thrown. The information saved is:

  • The type, shape and expression of the tensors.

  • The type and number of elements of any variables used.

This is intended to be used for debugging, testing and visualisation.

Parameters
  • out: Stream to write to.

  • tensors: A set of tensors to serialize.

  • format: Serialize in JSON or CapnProto format. JSON is pretty printed.

Exceptions
  • poplar_error: if any tensor is not from this graph. CapnProto may also throw an exception if serialization fails.

std::vector<Tensor> deserializeTensors(std::istream &in, SerializationFormat format)

Deserialize a set of tensors from a CapnProto message.

JSON deserialization is not currently supported and an exception will be thrown if format is SerializationFormat::JSON.

This will recreate the tensors in this graph. It throws an exception on failure (for example, if the tensor type does not match the variable types). Whenever a variable is used by a tensor a new variable is added to the graph.

The layout of the tensors and variables should be the same as when they were serialized.

This function is primarily intended for testing and benchmarks. You should not use it as a general method of creating tensors.

Return

The deserialized set of tensors.

Parameters

Graph createVirtualGraph(unsigned numTilesPerIPU)

Create a “virtual” graph working over a subset of the target’s tile.

This method returns a graph object that references the same state as this graph but has a virtual target than only uses a subset of the target’s tiles.

If the getTarget() method is called on the new graph it will return a target with the new number of tiles.

Return

The virtual graph object

Parameters
  • numTilesPerIPU: The number of tiles per IPU for the new graph to work over.

Graph createVirtualGraph(unsigned lowerTile, unsigned upperTile)

Create a “virtual” graph working over a subset of the target’s tiles.

This method returns a graph object that references the same state as this graph but has a virtual target than only uses a subset of the target’s tiles.

This variant of the method takes a tile range for the new virtual graph to work over. This is the range [lowerTile:upperTile). This tile range must be contained within a single IPU.

If the getTarget() method is called on the new graph it will return a target with the new number of tiles.

Return

The virtual graph object

Parameters
  • lowerTile: The starting tile of the tile range for the virtual graph to work over.

  • upperTile: The upper bound of the tile range for the virtual graph to work over. This is a non-inclusive upper bound.

Graph createVirtualGraph(const std::vector<unsigned> &perIpuTiles)

Create a “virtual” graph working over a subset of the target’s tiles.

This method returns a graph object that references the same state as this graph but has a virtual target than only uses a subset of the target’s tiles.

This variant of the method takes the set of tiles in each IPU that should be included in the new graph.

If the getTarget() method is called on the new graph it will return a target with the new number of tiles.

Return

The virtual graph object

Parameters
  • perIpuTiles: The tiles to include in the graph. Tiles are specified by their index in the IPU. Each tile index must be unique and less than the number of tiles per IPU.

Graph createReplicatedGraph(unsigned replicationFactor)

Create a replicated graph.

The replicated graph is a view on replicationFactor virtual subgraphs. Operations on the replicated graph are implicitly applied to each virtual subgraph, for example adding a variable to the replicated graph implicitly creates a variable in all of the underlying subgraphs.

The replication factor must divide the number of tiles in the graph. If n is the number of tiles in this graph the first subgraph contains tiles [0, n / replicationFactor), the second subgraph contains tiles [n / replicationFactor, 2n / replicationFactor) and so on.

Graph getTopLevelGraph()

Return the top level graph.

The createVirtualGraph() and createReplicatedGraph() methods can be used to create graph objects that are views on an underlying graph. If this is a virtual or replicated graph then this function returns the top level underlying graph, otherwise it returns the current graph.

unsigned getReplicationFactor() const

Return the replication factor of the graph.

Tensor addReplicationIndexConstant(const DebugContext &debugContext = {})

Add a constant that is initialized with the replication index.

Tensor getNonReplicatedTensor(const Tensor &t) const

Given a replicated tensor return the underlying tensors in this graph that the replicated tensor is a placeholder for.

The tensor returned by this function has an extra outer dimension equal to the replication factor of the tensor in this graph and it is formed by concatenating the underlying tensors for each replicated subgraph in this dimension.

This function can only be used with replicated graphs created by the createReplicatedGraph function, not when the Graph is constructed.

void serialize(std::ostream &out, SerializationFormat format) const

Serialize a graph to JSON or binary (CapnProto) format.

This is equivalent to serialize(out, {}, format).

Note that this does not currently serialize every bit of graph data, so it cannot be used to save and reload a graph.

Parameters
  • out: Stream to write to.

  • format: Serialize in JSON or CapnProto format. JSON is pretty printed.

void serialize(std::ostream &out, ArrayRef<program::Program> progs, SerializationFormat format) const

Serialize a graph to JSON or binary (CapnProto) format.

Progs can be passed so that information about Copy programs can be serialized (the Graph class itself does not know about them).

Note that this does not currently serialize every bit of graph data, so it cannot be used to save and reload a graph.

Parameters
  • out: Stream to write to.

  • progs: A set of programs that are searched for Copy programs. Information about the variables copied is serialised.

  • format: Serialize in JSON or CapnProto format. JSON is pretty printed.

Function addFunction(const program::Program &program)

Add a function to the graph.

A function is a partial control program that can be reused. By registering a repeated program as a function and calling it, less control code is generated than repeating the sequence.

Return

The Function object that can be used by a Call program.

Parameters
  • program: The control program to register as a callable function

unsigned convertVirtualTileToPhysicalTile(unsigned virtualTileId) const

Convert Virtual Tile ID into Physical Tile ID.

A function provides conversion interface required by the Graphcore communication library to know what exchange block context a tile is associated with.

Return

Physical Tile ID

Parameters
  • Virtual: Tile ID

unsigned convertPhysicalTileToVirtualTile(unsigned physicalTileId) const

Convert Physical Tile ID to Virtual Tile ID.

This function provides a conversion interface required by the Graphcore communication library to know what exchange block context a tile is associated with.

Return

Virtual Tile ID

Parameters
  • Physical: Tile ID

unsigned convertPhysicalTileToVirtualTile(unsigned ipuId, unsigned physicalTileId) const

Convert Physical Tile ID to Virtual Tile ID.

A function returns Virtual Tile ID based on a parameters pair of IPU and and Physical Tile ID. This conversion interface is required by the Graphcore communication library to know what exchange block context a tile is associated with.

Return

Virtual Tile ID

Parameters
  • IPU: ID

  • Physical: Tile ID

bool hasCodelet(StringRef codeletName) const

Check if graph contains codelet of this name.

Return

bool is codelet is in graph

Parameters
  • string: codelet name

void trace(ArrayRef<StringRef> name, const TraceFn &fn)
Graph(std::unique_ptr<core::GraphBuilder>, Target target)
core::GraphBuilder &getImpl() const

Private Functions

void setInitialValue(FieldRef field, const void *val, const TypeTraits&)
template<typename T>
void setInitCallback(FieldRef field, LateInitCallback<T> callback, const TypeTraits&)
void setInitialValue(const Tensor &t, const void *val, const TypeTraits&)
void connect(FieldRef field, void *val, const TypeTraits&)
void checkFieldSubgraph(const FieldRef &f) const
void checkVertexSubgraph(const VertexRef &v) const

Private Members

std::unique_ptr<core::GraphBuilder> impl
Target target
class ConnectionDesc

Public Functions

ConnectionDesc(StringRef field, Tensor t)
ConnectionDesc(StringRef field, ArrayRef<Tensor> tsArr)
template<typename T>
ConnectionDesc(StringRef field, T v, typename std::enable_if<TypeTraits::isSimpleType<T>()>::type* = nullptr)

Private Types

enum Kind

Values:

enumerator TensorEdge
enumerator ValueEdge
enumerator VectorTensorEdge

Private Functions

void connect(Graph &g, const VertexRef &v) const

Private Members

Kind kind
std::string field
std::vector<Tensor> ts
std::unique_ptr<char[]> val
TypeTraits traits

Friends

friend class Graph
namespace core
namespace program

Namespace for program classes.

3.3.5. poplar/GraphElements.hpp

namespace poplar

Poplar classes and functions.

Typedefs

typedef unsigned vertex_id

Vertex id.

The integral type of unique identifiers to vertices with a graph.

class ComputeSet
#include <GraphElements.hpp>

A reference to a compute set within a graph.

This type provides a way to address compute sets within a graph.

Public Functions

ComputeSet()
ComputeSet(unsigned id)
unsigned getId() const

Private Members

unsigned computeset_id
class FieldRef
#include <GraphElements.hpp>

A reference to a field within a vertex instance.

This type provides a way to address fields (inputs or internal state) within a vertex. FieldRef’s are normally obtained using VertexRef::operator[](StringRef fieldName), for example:

VertexRef vertex = graph.addVertex(...);
FieldRef input = vertex["input"];
graph.connect(input, ...);

A FieldRef can also be indexed, for example:

FieldRef input_5 = vertex["input"][5];

This is used when a field is a list of regions, for example a Vector<Input<Vector<...>>> or an Input<VectorList<...>>.

Public Functions

FieldRef()
FieldRef operator[](std::size_t index) const

Access an element of a vector field.

Subscript a vector field to access the element at position index.

Return

A reference to the field.

Parameters
  • index: The subscript of the field

bool isIndexed() const
std::size_t getIndex() const

Public Members

VertexRef vertex
unsigned fieldId
std::size_t index
bool indexed

Private Functions

FieldRef(VertexRef vertex, StringRef fieldName)

FieldRef constructor from vertex id and field name.

Construct a FieldRef out of a vertex id and the name of the field.

FieldRef(VertexRef vertex, unsigned fieldId)
FieldRef(VertexRef vertex, StringRef fieldName, std::size_t index)

FieldRef constructor from vertex id, field name and an index.

Construct a FieldRef to a Vector field.

FieldRef(VertexRef vertex, unsigned fieldId, std::size_t index)

Friends

friend class VertexRef
class Function
#include <GraphElements.hpp>

A reference to a function stored within a graph.

Public Functions

Function()
Function(unsigned id)
unsigned getId() const

Private Members

unsigned function_id
class VertexRef
#include <GraphElements.hpp>

A reference to a vertex within a graph.

This type provides a way to address vertices within a graph.

Public Functions

VertexRef()
FieldRef operator[](StringRef fieldName) const

Access a field by name.

Given a vertex reference v, v[name] is a field reference.

Return

A reference to the named field.

Parameters
  • fieldName: The name of the field.

vertex_id getId() const

Private Functions

VertexRef(const core::GraphBuilder *graph, unsigned id)

Construct a vertex reference from an ID.

Return

A reference to the vertex.

Parameters
  • graph: The graph containing the vertex. \ param id The id of the vertex.

Private Members

const core::GraphBuilder *graph
vertex_id id

Friends

friend class core::GraphBuilder
friend class Graph
friend class FieldRef
namespace core

3.3.6. poplar/LateInitCallback.hpp

namespace poplar

Poplar classes and functions.

Typedefs

using LateInitCallback = std::function<T(const VertexEdgeInfo&)>

A callback function of this type can be specified for a field of a vertex, instead of specifying an initialisation value with setInitialValue.

Will be called after the graph has been built. Will be passed information about the vertex fields. Needs to return the value for the field.

struct VertexEdgeInfo
#include <LateInitCallback.hpp>

Data structure that will be passed to the callback used for ‘late initialisation’ for vertex fields.

Contains address information for the other (edge) vertex fields to allow the callback to appropriately initialise the ‘late init’ field itself.

Public Members

std::map<std::string, std::vector<StorageInfo>> storage
struct StorageInfo

Public Members

std::uint64_t startOffs
std::uint32_t len

3.3.7. poplar/PerfEstimateFunc.hpp

namespace poplar

Poplar classes and functions.

Typedefs

using PerfEstimateFunc = std::function<VertexPerfEstimate(const VertexIntrospector &v, const Target &target)>

Functions of this type can be used as performance estimator callbacks for new vertex types.

See

Graph::registerPerfEstimator

struct VertexPerfEstimate

Public Functions

VertexPerfEstimate() = default
VertexPerfEstimate(std::uint64_t cycles, std::uint64_t flops = 0)

Public Members

std::uint64_t cycles
std::uint64_t flops

3.3.8. poplar/Tensor.hpp

namespace poplar

Poplar classes and functions.

Enums

enum UpsampleMethod

Enum passed to Tensor::upsample(unsigned scale, unsigned dimension) specifying the upsampling method.

Values:

enumerator REPEAT

If dimension is of size s, for every i in [0, s), repeats the subtensor at index i scale times.

For example, with scale = 2 and dimension = 1: Shape(2,3) Shape(2x6) [[1, 2, 3], becomes [[1, 1, 2, 2, 3, 3], [4, 5, 6]] [4, 4, 5, 5, 6, 6]]

Note that a scale of 0 means repeat each tensor 0 times. So a (i, j, k, l) tensor upsampled with scale = 0 and dimension = 3 would become an (i, j, k, 0) tensor containing 0 elements.

scale = 1 is the identity operation.

Functions

bool operator==(const Tensor &a, const Tensor &b)
bool operator!=(const Tensor &a, const Tensor &b)
Tensor concat(ArrayRef<Tensor> ts, unsigned dimension = 0)

Concatenate several tensors.

The tensors are concatenated along the specified dimension.

Return

The result of the concatenation

Parameters
  • ts: The tensors to concatenate

  • dimension: The number of the dimension to concatenate across

Tensor concat(const Tensor &first, const Tensor &second, unsigned dimension = 0)

Concatenate two tensors.

The tensors are concatenated along the specified dimension.

Return

The result of the concatenation

Parameters
  • first: The first tensor to concatenate

  • second: The second tensor to concatenate

  • dimension: The number of the dimension to concatenate across

Tensor append(const Tensor &first, const Tensor &second, unsigned dimension)

Append a tensor as an element to another tensor.

Return

The extended tensor

Parameters
  • first: The tensor to append to

  • second: The tensor to add as an element in the specified dimension

  • dimension: The number of the dimension to append to

Tensor append(const Tensor &first, const Tensor &second)

Append a tensor to another in their first dimension.

Return

The extended tensor

Parameters
  • first: The tensor to append to

  • second: The tensor to add as an element in the first dimension

std::ostream &operator<<(std::ostream &os, const Tensor &tensor)

Display the regions of the tensor on a stream.

Return

The ostream written to

Parameters
  • os: The ostream to output to

  • tensor: The tensor to display

class Tensor
#include <Tensor.hpp>

A reference to a subset of tensor elements.

Public Functions

Tensor()
Tensor(const Tensor &other)
Tensor(Tensor &&other)
const Tensor &operator=(const Tensor &other) &
Tensor &operator=(Tensor &&other) &
~Tensor()
Type elementType() const

Get the element type information for this tensor.

Return

The element type.

Tensor operator[](std::size_t i) const &

Get the sub-tensor indexed by i in the first dimension of the tensor.

Parameters
  • i: The index into the first dimension of the tensor.

Tensor &&operator[](std::size_t i) &&
Tensor slice(std::size_t begin, std::size_t end, unsigned dimension) const &

Get the sub-tensor given by a specific range [begin, end) in one dimension of the tensor.

Parameters
  • begin: The first element of the range

  • end: The upper bound to the range (the last element + 1)

  • dimension: The dimension to slice in

Tensor &&slice(std::size_t begin, std::size_t end, unsigned dimension) &&
Tensor slice(std::size_t begin, std::size_t end) const

Get the sub-tensor given by a specific range [begin, end) in the first dimension of the tensor.

Parameters
  • begin: The first element of the range

  • end: The upper bound to the range (the last element + 1)

Tensor slice(const Interval &region, unsigned dimension = 0) const

Get the sub-tensor given by a specific range [begin, end) in one dimension of the tensor.

Parameters
  • region: The region to slice

  • dimension: The dimension to slice in

Tensor slice(ArrayRef<std::size_t> begin, ArrayRef<std::size_t> end) const

Get the sub-tensor given by slicing the tensor in multiple dimensions, starting at dimension 0.

Each pair begin[i], end[i] specifies that the tensor is sliced in dimension i by the range [begin[i], end[i]). The rank of the returned tensor is the same as the input tensor.

Parameters
  • begin: The lower bounds of the ranges used to slice the tensor

  • end: The upper bounds of the ranges used to slice the tensor

std::vector<Tensor> slices(ArrayRef<Interval> intervals, unsigned dimension = 0) const

Get a vector of slices.

Return

A vector of slices where each slice is obtained by slicing this tensor between the two points in the given interval list.

Parameters
  • intervals: A list of intervals.

  • dimension: The dimension to slice in

std::vector<Tensor> slices(const std::vector<std::vector<Interval>> &intervals, unsigned dimension = 0) const

Get a vector of slices.

Return

A vector of tensors where each tensor is the concatenation of the sequence of several slices, each slice being this tensor between the two point in the corresponding interval in the sequences given as input.

Parameters
  • intervals: A list of sequences of intervals.

  • dimension: The dimension to slice in

Tensor index(ArrayRef<std::size_t> indices) const

Get the sub-tensor indexed by the specified indices.

This is equivalent to repeatedly applying operator[] for each index in the vector of indices.

Return

The sub-tensor indexed by the indices.

Parameters
  • indices: The indices used to index into the tensor.

Tensor flatten() const

Flatten the tensor.

Return

A tensor consisting of all elements of the original tensor but with a single dimension.

Tensor flatten(unsigned dimBegin, unsigned dimEnd) const

Flatten the a subset of the dimensions of a tensor.

Return

A tensor consisting of all elements of the original tensor with the specified dimension range flattened into one dimension.

Parameters
  • dimBegin: The first dimension to flatten

  • dimEnd: One past the last dimension to flatten.

Tensor reshape(ArrayRef<std::size_t> shape) const

Reshape the tensor.

The reshaping operation changes the shape of the tensor but cannot change the total number of elements.

Return

A tensor consisting of all elements of the original but with new dimensions.

Parameters
  • shape: The new shape of the tensor.

Tensor dimShuffle(ArrayRef<unsigned> permutation) const

Permute the dimensions of a tensor.

The dimShuffle operation reorders the tensor to a permutation of its dimensions. It can be seen as the generalized form of a matrix transpose.

Note that this operation does not create a copy of the tensor but returns a reordered view on this tensor’s data.

Return

The shuffled tensor

Parameters
  • permutation: The permutation vector specifies a mapping from the output dimension to the input dimension. For example the permutation of {2, 0, 1} specifies that element element [a][b][c] in the original tensor is remapped to element [c][a][b] in the new tensor.

Tensor dimShufflePartial(ArrayRef<unsigned> source, ArrayRef<unsigned> destination) const

Permute some of a tensor’s dimensions.

dimShufflePartial reorders the tensors dimensions. The unspecified dimensions stay in the same relative order.

Note that this operation does not create a copy of the tensor but returns a reordered view on this tensor’s data.

Return

The shuffled tensor.

Parameters
  • source: The dimensions to move.

  • destination: The index at which to move each source dimension.

Tensor dimRoll(unsigned dimIdx, unsigned newIdx = 0) const

Roll a specified dimension to the specified dimension.

The other dimensions remain in the same relative order

Note that this operation does not create a copy of the tensor but returns a reordered view on this tensor’s data.

Return

The shuffled .

Parameters
  • dimIdx: The dimension to move.

  • newIdx: Its new location, default 0.

Tensor reshapePartial(unsigned beginIndex, unsigned endIndex, ArrayRef<std::size_t> newDims) const

Reshape a range of dimensions of a tensor.

reshapePartial reshapes the input tensor such that the total number of elements of the resultant tensor is the same as the input tensor.

Note that this operation does not create a copy of the tensor but returns a reshaped view on the input tensor’s data.

The following conditions define the valid use of this function:

1) beginIndex == endIndex

beginIndex and endIndex must each lie in the closed interval [0, rank()]. Singleton dimensions are added before beginIndex. The number of dimensions added is equal to the length of the newDims vector. For example:

reshapePartial(0, {1, 1})
Adds two singleton dimensions at indicies 0 and 1

2) size(newDims) == 0 and beginIndex != endIndex

beginIndex must lie in the half closed interval [0, rank()) endIndex must lie in the half closed interval (0, rank()] The product of vector newDims must be 1. For example:

reshapePartial(1, 3, {})
Removes singleton dimensions 1 and 2 from the tensor

3) size(newDims) != 0 and beginIndex != endIndex

beginIndex must lie in the half closed interval [0, rank()) endIndex must lie in the half close interval (0, rank()] The product of vector newDims must be equal to the product of the number of elements in the interval [beginIndex, endIndex)

The input dimensions [0, beginIndex) and [endIndex, rank()) are prepended and appended at the end of the tensor respectively. For example:

reshapePartial(1, 3, {10, 20, 30})
reshapePartial(1, 3, {10})

Return

Reshaped view of tensor

Parameters
  • beginIndex: Index of the dimension from which reshape starts

  • endIndex: Index of the first dimension after reshape ends

  • newDims: The new dimensions of the partial tensor

Tensor expand(ArrayRef<std::size_t> indices) const

Expand tensor by adding singleton dimensions at specified indices of tensor.

The rank is expanded by the size of dimensions to be added. To add more than one dimension at a given position, the same index shall be repeated.

Return

A view of expanded tensor

Parameters
  • indices: Dimension indices before which the singleton dimensions are added

Tensor squeeze(ArrayRef<std::size_t> indices) const

Reduce dimension of tensor by removing singleton dimensions at specified indices of tensor.

Return

A view of squeezed tensor

Parameters
  • indices: Indices of singleton dimensions which are removed

Tensor transpose() const

Transpose a 2-dimensional tensor.

Return

The transposed tensor.

Tensor subSample(unsigned stride, unsigned dimension) const

Sub-sample the tensor.

Sub-sample this tensor by selecting every stride-th element of the tensor in a specified dimension

Return

The sub-sampled tensor

Parameters
  • stride: The size of the stride

  • dimension: The dimension to sub-sample in

Tensor upsample(unsigned scale, unsigned dimension, UpsampleMethod method) const

Upsample the tensor.

Note that this operation does not create a copy of the tensor but creates a view of the tensor’s data. The repeated data is represented by repeated views into the tensor.

See

UpsampleMethod for descriptions of how the tensor can be upsampled.

Return

The upsampled tensor.

Parameters
  • scale: The scaling factor, >= 0.

  • dimension: The dimension to upsample in.

  • method: The method by which to upsample the tensor.

Tensor broadcast(unsigned N, unsigned dimension) const

Broadcast/repeat the tensor along a specified dimension.

Create a view with this tensor repeated N times along a specified dimension.

Return

The broadcast tensor.

Parameters
  • N: The number of times to repeat.

  • dimension: The dimension to broadcast in.

Tensor reinterpret(const Type &type) const

Reinterpret the tensor as a new type.

The new type must be the same size as the old type. See elementType() for a list of valid types and their sizes.

Return

A tensor with the same shape and referencing the same data but of the new type.

Parameters
  • type: The type to reinterpret to

Tensor reverse(unsigned dimensions) const

reverse this tensor along a specified dimension.

Return

The reversed tensor.

Parameters
  • dimension: The dimension to reverse.

std::size_t numElements() const

Get the total number of elements in the tensor.

std::size_t dim(unsigned i) const

Get a dimension of the tensor.

Parameters
  • i: The index of the dimension to get.

std::vector<std::size_t> shape() const

Get the shape of the tensor.

Return

A vector of all the dimensions of the tensor.

unsigned rank() const

Get the rank of the tensor.

Return

The number of dimensions a tensor has.

bool isContiguous() const

Get whether the tensor is contiguous.

bool containsAliases() const

Get whether the tensor contains an alias to the same storage location.

Return

True if the tensor contains an alias to the same storage location.

bool containsConstant() const

Get whether the tensor contains any constant tensors.

Return

True if the tensor contains any constant tensors.

bool isParallelWriteable() const

Get whether the elements of this tensor can be written in parallel.

This is equivalent to !(containsAliases() || containsConstant()).

Return

True if the tensor can be written in parallel.

const std::vector<Interval> getContiguousRegions() const

Get the contiguous regions of a tensor.

Return

A vector of intervals in order representing regions of the tensor that are contiguous in the tensors storage ordering.

const std::vector<VariableInterval> getVarRegions() const

Get the contiguous regions of a tensor with reference to the variables allocated in the graph.

Return

A vector of variable intervals (variable id, interval pairs) representing the regions of the tensor.

template<typename T>
bool getConstantValue(T *val) const

Read a single element of data from a tensor if it is a constant.

Return

True if tensor is constant and data is read

Parameters
  • val: Buffer to which tensor data is copied to

bool intersectsWith(const Tensor &other) const

Return whether this tensor intersects with another tensor.

Return

True if this tensor intersects with the other tensor.

Parameters
  • other: The tensor to compare with.

std::ostream &output(std::ostream &os) const

Display the expression representing the tensor on a stream.

Return

The ostream written to

Parameters
  • os: The ostream to output to

std::ostream &outputRegions(std::ostream &os) const

Display the regions of the tensor on a stream.

Return

The ostream written to

Parameters
  • os: The ostream to output to

std::string shapeToString() const

Report the shape of a Tensor as a string.

void dump() const

Display the expression representing the tensor.

void dumpRegions() const

Display the regions of the tensor.

Tensor(std::unique_ptr<core::Tensor>)
core::Tensor &getImpl() const
std::unique_ptr<core::Tensor> *getPImpl()
bool valid() const

Private Functions

bool getConstantData(void *dst, const TypeTraits &traits) const

Private Members

std::unique_ptr<core::Tensor> impl
namespace core

3.3.9. poplar/TensorCloneMethod.hpp

namespace poplar

Poplar classes and functions.

Enums

enum TensorCloneMethod

Define behaviour when a Tensor is cloned.

See

Graph::clone

Values:

enumerator PRESERVE_ORDER_AND_ALIASES

Preserve the ordering and aliasing within the original tensor reference.

enumerator CREATE_NEW_ORDER

Create a new tensor with natural ordering based on the dimensions of the cloned tensor (in the same way as addTensor).

enumerator PRESERVE_ORDER_UNLESS_ALIASES

Preserve the ordering of the original tensor unless it contains aliases.

In the case of aliases, create a new tensor ordering and duplicate the aliased elements.

enumerator GATHER_AND_PRESERVE_TILE_ORDER_AND_ALIASES

Gather elements of the underlying variables that are mapped to the same tile so they form one contiguous region on the tile in the cloned tensor.

Contiguous regions on the tile and the aliasing of elements are preserved.

enum TensorCloneDuplicationMethod

Define behaviour when a Tensor is cloned and duplicated using Graph::cloneN.

If DUPLICATE_BY_TILE_CONTIGUOUS_REGION and a new order needs to be created (either via TensorCloneMethod::CREATE_NEW_ORDER or TensorCloneMethod::PRESERVE_ORDER_UNLESS_ALIASES) then Poplar will error.

See

Graph::cloneN

See

TensorCloneMethod

Values:

enumerator DUPLICATE_BY_OUTER_DIMENSION

The multiple clones are concatenated in their outermost dimension.

i.e. the result is the same as concat(clone1, clone2, …, cloneN); There is no guarantee of any ordering constraints in memory between the clones.

enumerator DUPLICATE_BY_TILE_CONTIGUOUS_REGION

The underlying variables of the clones are contenated for each contiguous region on each tile.

Each clone will have the same contiguous regions on each tile but each of those regions will also form bigger contiguous regions across the N duplicates. This option is particular useful for efficient slicing/copying between the duplicates being cloned.

Functions

std::string toString(const TensorCloneMethod &method)
std::string toString(const TensorCloneDuplicationMethod &method)

3.3.10. poplar/TensorRearranger.hpp

namespace poplar

Poplar classes and functions.

class TensorRearranger
#include <TensorRearranger.hpp>

A TensorRearranger is an object that can re-order the view on a tensor and undo that re-ordering.

See

Graph::getSimplifyingRearranger

Public Functions

TensorRearranger()
TensorRearranger(const TensorRearranger &other)
TensorRearranger(TensorRearranger &&other)
const TensorRearranger &operator=(const TensorRearranger &other) &
TensorRearranger &operator=(TensorRearranger &&other) &
~TensorRearranger()
Tensor rearrange(const Tensor &t) const

Rearrange a tensor.

Tensor undoRearrangement(const Tensor &t) const

Undo the rearrangement done via the rearrange method.

std::vector<Interval> rearrange(ArrayRef<Interval> is) const

Apply the rearrangement to intervals.

is A list of intervals w.r.t the original tensor.

Return

A list of equivalent intervals w.r.t the rearranged tensor.

std::vector<Interval> undoRearrangement(ArrayRef<Interval> is) const

Apply the undoing of the rearrangement to intervals.

is A list of intervals w.r.t the rearranged tensor.

Return

A list of equivalent intervals w.r.t the original tensor.

TensorRearranger(std::unique_ptr<core::TensorRearranger>)
core::TensorRearranger &getImpl() const
std::unique_ptr<core::TensorRearranger> *getPImpl()
bool valid() const

Private Members

std::unique_ptr<core::TensorRearranger> impl
namespace core

3.3.11. poplar/Type.hpp

Defines

POPLAR_DECLARE_EQUIV_TYPE(T1, T2)
namespace poplar

Poplar classes and functions.

Functions

std::ostream &operator<<(std::ostream &os, const Type &t)

Variables

Type BOOL

Device type: bool

Type CHAR

Device type: char

Type UNSIGNED_CHAR

Device type: unsigned char

Type SIGNED_CHAR

Device type: signed char

Type UNSIGNED_SHORT

Device type: unsigned short

Type SHORT

Device type: short

Type UNSIGNED_INT

Device type: unsigned int

Type INT

Device type: int

Type UNSIGNED_LONG

Device type: unsigned long

Type LONG

Device type: long

Type UNSIGNED_LONGLONG

Device type: unsigned long long

Type LONGLONG

Device type: long long

Type HALF

Device type: half

Type FLOAT

Device type: float

template<typename T>
struct equivalent_device_type
#include <Type.hpp>

Template structure to relate a host type to a device type.

This structure is specialized to allow a program to relate a host type to a corresponding device type. For example::

poplar::Type t = equivalent_device_type<int>().value;

class Type
#include <Type.hpp>

Class representing device data types.

The following types are not supported on the IPU:

  • LONG

  • UNSIGNED_LONG

  • LONGLONG

  • UNSIGNED_LONGLONG

  • DOUBLE

For other types, the sizes on the IPU are:

  • BOOL: 1 byte

  • CHAR: 1 byte (signed)

  • SIGNED_CHAR: 1 byte

  • UNSIGNED_CHAR: 1 byte

  • SHORT: 2 bytes

  • SIGNED_SHORT: 2 bytes

  • UNSIGNED_SHORT: 2 bytes

  • INT: 4 bytes

  • SIGNED_INT: 4 bytes

  • SIGNED: 4 bytes

  • UNSIGNED_INT: 4 bytes

  • UNSIGNED: 4 bytes

  • HALF: 2 bytes

  • FLOAT: 4 bytes

Public Functions

Type()
~Type()
Type(const Type &other)
Type(Type &&other)
Type &operator=(const Type &other)
Type &operator=(Type &&other)
StringRef toString() const

Get a string representation on a type.

Return

A string representation of the type.

bool operator==(const Type &other) const
bool operator!=(const Type &other) const
bool operator<(const Type &other) const
Type(SSOPointer<core::Type>)
const core::Type &getImpl() const

Private Members

SSOPointer<core::Type> impl
namespace core

3.3.12. poplar/VariableMappingMethod.hpp

namespace poplar

Poplar classes and functions.

Enums

enum VariableMappingMethod

When variables are added to the graph, a tile mapping can be created.

This class enumerates the method for creating that mapping.

Values:

enumerator NONE

No mapping is created.

The tile mapping will be set later via Graph::setTileMapping.

enumerator LINEAR

The variable will be spread evenly across the tiles with the element ordering matching the tile number ordering.

The tile mapping can also be overridden later via Graph::setTileMapping.

Functions

std::string toString(const VariableMappingMethod &method)

3.3.13. poplar/VariableRef.hpp

template<>
struct std::hash<poplar::VariableRef>

Public Functions

size_t operator()(const poplar::VariableRef &v) const
namespace poplar

Poplar classes and functions.

Functions

bool operator==(const VariableInterval &a, const VariableInterval &b)
bool operator<(const VariableInterval &a, const VariableInterval &b)
struct VariableInterval
#include <VariableRef.hpp>

Type representing a segment of a particular variable.

Public Functions

VariableInterval(VariableRef var, Interval interval)
VariableInterval() = default
VariableInterval(const VariableInterval &other) = default
VariableInterval(VariableInterval &&other) = default
VariableInterval &operator=(const VariableInterval &other) = default
VariableInterval &operator=(VariableInterval &&other) = default

Public Members

VariableRef var
Interval interval
class VariableRef
#include <VariableRef.hpp>

Type representing a reference to a variable in a graph.

Public Functions

VariableRef(unsigned id, unsigned replicationFactor)
VariableRef() = default
VariableRef(const VariableRef &other) = default
VariableRef(VariableRef &&other) = default
VariableRef &operator=(const VariableRef &other) = default
VariableRef &operator=(VariableRef &&other) = default
std::size_t hash() const

Private Members

unsigned id
unsigned replicationFactor

Friends

friend class Graph
friend friend bool operator== (const VariableRef &a, const VariableRef &b)
friend friend bool operator< (const VariableRef &a, const VariableRef &b)
namespace std
template<> VariableRef >

Public Functions

size_t operator()(const poplar::VariableRef &v) const

3.3.14. poplar/VectorLayout.hpp

namespace poplar

Poplar classes and functions.

namespace layout

Namespace for layout classes.

Enums

enum Vector

An enumeration used to state what type of pointer is used for a Vector vertex field.

Values:

enumerator NotAVector
enumerator Span
enumerator ShortSpan
enumerator OnePtr
enumerator ScaledPtr32
enumerator ScaledPtr64
enumerator ScaledPtr128
enum VectorList

An enumeration used to state what type of pointer is used for a VectorList vertex field.

Values:

enumerator NotAVector
enumerator OnePtr
enumerator ScaledPtr32
enumerator ScaledPtr64
enumerator ScaledPtr128
enumerator DeltaN
enumerator DeltaNElements

Functions

std::ostream &operator<<(std::ostream &os, const Vector v)
std::string to_string(const Vector v)
std::ostream &operator<<(std::ostream &os, const VectorList v)
std::string to_string(const VectorList v)

3.3.15. poplar/VertexIntrospector.hpp

namespace poplar

Poplar classes and functions.

class FieldData
#include <VertexIntrospector.hpp>

Information about a vertex field, including its size and its initial value if set.

This is used when calculating cycle estimates.

Vertex fields can be scalar, 1D or 2D. For example:

  • Scalar: float, Input<float>.

  • 1D: Vector<float>, Input<Vector<float>>

  • 2D: Input<VectorList<float>>, Vector<Input<Vector<float>>>

Their sizes can always be returned, and the initial values can be returned for non-edge fields (float, Vector<float>) and edge fields (Input etc.) that are connected to constants.

Note that 2D fields are vectors of vectors, in other words they are jagged 2D arrays.

Public Functions

FieldData(const FieldData&) = delete
FieldData(FieldData&&)
~FieldData()
unsigned rank() const

Return the rank of the field: 0 for scalar fields, 1 for 1D and 2 for 2D.

std::size_t size() const

Return the size of the field.

For scalar fields it returns 1, for 1D fields it returns the size of the vector, and for 2D fields it returns the number of sub-vectors.

std::size_t getSizeAtIndex(std::size_t i) const

For 2D fields, return the size of the sub-vector.

Throws an error if called on non-2D fields.

Parameters
  • i: Index of sub-vector to return size of

layout::Vector getProfilerVectorLayout(std::size_t nestingLevel) const

For Vector fields return the layout.

Parameters
  • i: The dimension to query, 0 for the outer vector, 1 for the inner.

layout::VectorList getProfilerVectorListLayout() const

For VectorList fields return the layout.

We only support introspecting a VectorList that is the outermost vector.

SizeT operator[](std::size_t i) const

Instead of field.getSizeAtIndex(i) you can alternatively use field[i].size().

std::string name() const

Return the name of the vertex field.

template<typename T>
T getInitialValue(const Target &target) const

Get the inital value for a scalar field.

T should be a scalar type. Throws an error if this is not a scalar field.

template<typename T>
std::vector<T> getInitialValues(const Target &target) const

Get the initial value for a 1D or 2D vector field.

T should be a scalar type (e.g. float) for 1D fields and std::vector<> for 2D fields. Throws an error if this is a scalar field.

FieldData(std::unique_ptr<core::FieldData> fd)

Private Functions

template<typename T>
void getInitialValuesOverload(const Target &target, std::vector<T> &result) const
template<typename T>
void getInitialValuesOverload(const Target &target, std::vector<std::vector<T>> &result) const
void getInitialValues(const Target &target, void *dst, const TypeTraits &traits, std::size_t index = std::numeric_limits<std::size_t>::max()) const

Private Members

std::unique_ptr<core::FieldData> impl
struct SizeT

Public Functions

std::size_t size() const

Public Members

std::size_t value
class VertexIntrospector
#include <VertexIntrospector.hpp>

Available to cycle estimators to inspect the shape and initial values of a vertex’s fields.

Public Functions

FieldData getFieldInfo(const std::string &name) const

Return information about the vertex’s field.

ComputeSet getComputeSet() const

Return the compute set that this vertex is in.

VertexIntrospector(std::unique_ptr<core::VertexIntrospector> impl)
VertexIntrospector(VertexIntrospector&&)
const core::VertexIntrospector &getImpl() const

Private Members

std::unique_ptr<core::VertexIntrospector> impl
namespace core

3.4. Control program classes

3.4.1. poplar/Program.hpp

namespace poplar

Poplar classes and functions.

namespace core
namespace program

Namespace for program classes.

Functions

void dumpProgram(const Graph &graph, const Program &program, std::ostream &out)

Print the resulting lowered program from input ‘program’ to ostream ‘out’.

class Abort : public poplar::program::Program

Public Functions

Abort(const DebugContext &debugContext = {})

Throws an exception.

Parameters
  • debugContext: Optional DebugId and program name.

class AbortOnCondition : public poplar::program::Program

Public Functions

AbortOnCondition(Tensor predicate, const DebugContext &debugContext = {})

Throws an exception if the test tensor tests to true.

Parameters
  • predicate: Scalar tensor to test.

  • debugContext: Optional DebugId and program name.

class AssumeEqualAcrossReplicas : public poplar::program::Program
#include <Program.hpp>

A program to mark a tensor as equal across replicas.

This can be used to tell Poplar that the value of a tensor is the same in all replicas (e.g. the result of a cross-replica all-gather operation). Poplar will assume this property while checking divergency in the control flow and accept programs that otherwise would have to reject due to the lack of knowledge of tensor values.

Public Functions

AssumeEqualAcrossReplicas(Tensor t, const DebugContext &debugContext = {})
class Call : public poplar::program::Program
#include <Program.hpp>

A program to perform a function call to a previously stored program.

Public Functions

Call(Function f, const DebugContext &debugContext = {})

Call the function.

Parameters
  • f: A program that has been added to the graph using Graph::addFunction.

  • debugContext: Optional DebugId and program name.

class Copy : public poplar::program::Program
#include <Program.hpp>

A program that copies data.

Public Functions

Copy(Tensor src, Tensor dst, bool dontOutline = false, const DebugContext &debugContext = {})

Construct a program to copy data from one tensor to another.

This constructor creates a program that will copy data from the src tensor to the dst tensor.

Parameters
  • src: The tensor to copy from.

  • dst: The tensor to copy to.

  • dontOutline: Do not outline this copy as a function call. Default is false (the copy will be outlined).

  • debugContext: Optional DebugId and program name.

Copy(const DataStream &stream, Tensor dst, bool optimiseMemory = false, const DebugContext &debugContext = {})

Construct a program to copy from a data stream to a tensor.

Parameters
  • stream: The stream to copy from.

  • dst: The tensor to copy to.

  • optimiseMemory: if set to true will sacrifice speed in order to reduce memory use. For example, rearranging data on host and outlining writes.

  • debugContext: Optional DebugId and program name.

Copy(Tensor src, const DataStream &stream, bool optimiseMemory = false, const DebugContext &debugContext = {})

Construct a program to copy a Tensor to a data stream.

Parameters
  • src: The tensor to copy from.

  • stream: The stream to copy to.

  • optimiseMemory: Set to true to sacrifice speed in order to reduce memory usage.

  • debugContext: Optional DebugId and program name.

Copy(const RemoteBuffer &buffer, Tensor dst, const DebugContext &debugContext = {})

Construct a program to copy a remote buffer to a tensor.

Parameters
  • buffer: The remote buffer to copy from.

  • dst: The tensor to copy to.

  • debugContext: Optional DebugId and program name.

Copy(const RemoteBuffer &buffer, Tensor dst, Tensor offset, const DebugContext &debugContext = {})

Construct a program to copy a remote buffer to a tensor.

The data to be transferred is controlled by the definition of the buffer and the offset parameter.

The buffer has repeat data-transfer “rows” each containing numElements data items (these are not necessarily the same as rows in the destination tensor.) The size of offset defines the number of rows to copy. The rows to be copied are defined by offset: each element of offset is the index of a row to be copied.

The size of dst must be equal to the data transfer size: sizeof(offset) * numElements.

If the offset tensor has more than one element then the dst must be a rank 2 tensor with dimensions [offset.numElements(), remoteBuffer.numElements()].

Multiple values in the offset tensor with the same value will result in undefined behaviour because the order of writes to the buffer is not guarenteed.

See

Graph::addRemoteBuffer

Parameters
  • buffer: The remote buffer to copy from.

  • dst: The tensor to copy to.

  • offset: The “rows”” in the remote buffer to copy from.

  • debugContext: Optional DebugId and program name.

Copy(Tensor src, const RemoteBuffer &buffer, const DebugContext &debugContext = {})

Construct a program to copy a tensor to a remote buffer.

Parameters
  • src: The tensor to copy from.

  • buffer: The remote buffer buffer to copy to.

  • debugContext: Optional DebugId and program name.

Copy(Tensor src, const RemoteBuffer &buffer, Tensor offset, const DebugContext &debugContext = {})

Construct a program to copy a tensor to a remote buffer.

The data that is transferred is controlled by the definition of the buffer and the offset parameter.

The buffer has repeat data transfer “rows” each containing numElements data items. (These are not necessarily the same as rows in the source tensor) The rows to be copied are defined by offset. The size of offset defines the number of rows to copy. Each element of offset is the index of a row to be copied.

The size of src must be equal to the data transfer size: sizeof(offset) * numElements.

If the offset tensor has more than one element then the src must be a rank 2 tensor with dimensions [offset.numElements(), remoteBuffer.numElements()].

Multiple values in the offset tensor with the same value will result in undefined behaviour.

See

Graph::addRemoteBuffer

Parameters
  • src: The tensor to copy from.

  • buffer: The remote buffer buffer to copy to.

  • offset: The “rows” in the remote buffer to copy to.

  • debugContext: Optional DebugId and program name.

Copy(const DataStream &stream, Tensor dst, Tensor expectedIndex, bool rearrangeOnHost = false, const OptionFlags &options = {}, const DebugContext &debugContext = {})

Construct a program to copy from a data stream to a tensor.

Parameters
  • stream: The data stream to copy from.

  • dst: The tensor to copy to.

  • expectedIndex:

  • rearrangeOnHost:

  • options:

  • debugContext: Optional DebugId and program name.

Copy(Tensor src, const DataStream &stream, Tensor index, bool rearrangeOnHost = false, const OptionFlags &options = {}, const DebugContext &debugContext = {})

Construct a program to copy a tensor to a data stream.

Parameters
  • src: The tensor to copy from.

  • stream: The data stream to copy to.

  • index:

  • rearrangeOnHost:

  • options:

  • debugContext: Optional DebugId and program name.

Private Functions

Copy(const DataStream &stream, Tensor dst, bool rearrangeOnHost, Tensor offset, size_t repeats, bool optimiseMemory, const OptionFlags &options = {}, const DebugContext &debugContext = {})
Copy(Tensor src, const DataStream &stream, bool rearrangeOnHost, Tensor offset, size_t repeats, bool optimiseMemory, const OptionFlags &options = {}, const DebugContext &debugContext = {})
class CrossReplicaCopy : public poplar::program::Program
#include <Program.hpp>

A program that copies tensors between replicated sub-graphs.

Public Functions

CrossReplicaCopy(Tensor src, Tensor dst, std::map<unsigned, unsigned> replicaMap, const DebugContext &debugContext = {})

Constructor to create a program to copy a tensor to the equivalent tensor in a different replica sub-graph.

When the replicated graphs are created, this will create a Copy program in each replica. Each replica sends to exactly one other replica and receives from exactly one other replica. A replica may not copy to itself.

Parameters
  • src: Replicated tensor to copy from.

  • dst: Replicated tensor to copy to.

  • replicaMap: Each key in this map specifies the sub-graph or replica that contains the source tensor. The corresponding value is the replica that contains the destination tensor.

    The size of the replica map is equal to the graph replication factor.

    Each replica must be represented once as a key (source) and once as a value (destination).

  • debugContext: Optional DebugId and program name.

class ErrorProgram : public poplar::program::Program

Public Functions

ErrorProgram(StringRef message, Tensor debugTensor, const DebugContext &debugContext = {})

Throw an error.

Prints out a message and then throws an error.

Parameters
  • message: String to print.

  • debugTensor: Tensor that will be printed after the message to aid debugging. \param debugContext Optional DebugId and program name.

class Execute : public poplar::program::Program
#include <Program.hpp>

Program that executes a compute set in the graph.

Public Functions

Execute(ComputeSet cs, const DebugContext &debugContext = {})

Construct a graph execution program.

Parameters
  • cs: The compute set to execute.

  • debugContext: Optional DebugId and program name.

Execute(ComputeSet cs, Tensor t, const DebugContext &debugContext = {})

Construct a graph execution program and write the exit status to a scalar tensor.

The exit status is the logical and of the return values of the vertices in the compute set.

Parameters
  • cs: The compute set to execute.

  • t: The tensor to write the exit status to.

  • debugContext: Optional DebugId and program name.

class If : public poplar::program::Program
#include <Program.hpp>

A program that runs one of two programs depending on the value of a scalar tensor.

Public Functions

If(Tensor predicate, const Program &trueBody, const Program &falseBody, const DebugContext &debugContext = {})

A program that executes the trueBody or falseBody depending on the value of the predicate.

You can pass an empty Sequence to either trueBody or falseBody if you don’t want either branch to do anything.

Parameters
  • predicate: The scalar tensor that determines which branch to execute.

  • trueBody: This program is run if the predicate is true.

  • falseBody: This program is run if the predicate is false.

  • debugContext: Optional DebugId and program name.

class PrintTensor : public poplar::program::Program

Public Functions

PrintTensor(Tensor t, const DebugContext &debugContext = {})

Print the contents of a tensor.

You can send the output to a different stream by using the Engine::setPrintTensorStream function.

Parameters
  • t: The Tensor to print.

  • debugContext: Optional DebugId and program name.

PrintTensor(StringRef title, Tensor t, const DebugContext &debugContext = {})

Print the name and contents of a Tensor.

Parameters
  • title: The name of the tensor

  • t: The Tensor to print.

  • debugContext: Optional DebugId and program name.

class Program
#include <Program.hpp>

This class represents a control program that executes operations on the graph.

The class should not be explicitly constructed but one of its sub-classes should be constructed instead.

Subclassed by poplar::program::Abort, poplar::program::AbortOnCondition, poplar::program::AssumeEqualAcrossReplicas, poplar::program::Call, poplar::program::Copy, poplar::program::CrossReplicaCopy, poplar::program::ErrorProgram, poplar::program::Execute, poplar::program::If, poplar::program::PrintTensor, poplar::program::Repeat, poplar::program::RepeatWhileFalse, poplar::program::RepeatWhileTrue, poplar::program::Sequence, poplar::program::Switch, poplar::program::Sync, poplar::program::WriteUndef

Public Functions

Program()
Program(const Program &p)
Program(Program &&p)
Program &operator=(const Program &p)
Program &operator=(Program &&p)
~Program()
bool isEmpty() const
core::ProgramImpl &getImpl() const

Protected Attributes

std::unique_ptr<core::ProgramImpl> impl
class Repeat : public poplar::program::Program
#include <Program.hpp>

A program that repeatedly executes for a fixed number of iterations.

Public Functions

Repeat(unsigned count, const Program &prog, const DebugContext &debugContext = {})

Construct a repeat program.

Parameters
  • count: The number of iterations to repeat for.

  • prog: The program to repeatedly execute.

  • debugContext: Optional DebugId and program name.

class RepeatWhileFalse : public poplar::program::Program
#include <Program.hpp>

A program that evaluates the condition program, and if the predicate tensor is true it exits the loop.

If predicate tensor is false it evaluates the body program, and then loops to re-evaluate the condition program. This is like a C while statement with an inverted condition.

Public Functions

RepeatWhileFalse(const Program &cond, Tensor predicate, const Program &body, const DebugContext &debugContext = {})

Construct a repeat while false program.

Parameters
  • cond: The program evaluated before the body is evaluated.

  • predicate: The scalar tensor that determines whether to execute the body.

  • body: The body to execute when the predicate is false.

  • debugContext: Optional DebugId and program name.

class RepeatWhileTrue : public poplar::program::Program
#include <Program.hpp>

A program that evaluates the condition program, and if the predicate tensor is false it exits the loop.

If predicate tensor is true it evaluates the body program, and then loops to re-evaluate the condition program. This is like a C while statement.

Public Functions

RepeatWhileTrue(const Program &cond, Tensor predicate, const Program &body, const DebugContext &debugContext = {})

Construct a repeat while true program.

Parameters
  • cond: The program evaluated before the body is evaluated.

  • predicate: The scalar tensor that determines whether to execute the body.

  • body: The body to execute when the predicate is true.

  • debugContext: Optional DebugId and program name.

class Sequence : public poplar::program::Program
#include <Program.hpp>

Program that executes a sequence of programs.

Public Functions

template<class ...T>
Sequence(T&&... args)

Construct an execution sequence from a list of programs.

This variadic constructor is used to create a sequence of programs where the programs are provided as arguments to the constructor. For example:

Sequence(prog1, prog2, prog3)

Parameters
  • args: Parameter pack of all programs in the sequence.

Sequence(const DebugContext &debugContext = {})

Construct an empty execution sequence (with optional debug context).

Sequence(std::initializer_list<Program> programs, const DebugContext &debugContext = {})

Construct an execution sequence from a list of programs.

This constructor is used to create a sequence of programs where the programs are provided as arguments to the constructor.

Sequence{prog1, prog2, prog3}
Sequence({prog1, prog2, prog3}, {debugId})
Sequence({prog1, prog2, prog3}, {debugId, "debugName"})

Parameters
  • programs: List of programs in the sequence.

  • debugContext: Optional DebugId and program name.

void add(const Program &p)

Add a program to the end of the sequence.

Parameters
  • p: The program to add.

Private Functions

template<class ...T>
void add_many(const Program &first, T&&... rest)
void add_many()
void init()
void init(const DebugContext &debugContext)
class Switch : public poplar::program::Program
#include <Program.hpp>

A program that runs one of many programs depending on the value of a tensor.

The controlling tensor must be a scalar of type INT or UNSIGNED_INT. A switch contains of a number of switch cases, each with a case value and a case body and a default case. The case values must be unique. If the value of the controlling tensor matches the case value of a case the corresponding case body is run, otherwise the default case is run.

Public Functions

Switch(Tensor control, const std::vector<std::pair<std::int32_t, Program>> &cases, const DebugContext &debugContext = {})

Construct a switch with the specified set of cases and an empty default case.

Parameters
  • control: The controlling tensor.

  • cases: The cases of the switch.

  • debugContext: Optional DebugId and program name.

Switch(Tensor control, const std::vector<std::pair<std::int32_t, Program>> &cases, const Program &defaultCaseBody, const DebugContext &debugContext = {})

Construct a switch with the specified set of cases and default case.

Parameters
  • control: The controlling tensor.

  • cases: The cases of the switch.

  • defaultCaseBody: The body of the default case.

  • debugContext: Optional DebugId and program name.

Switch(Tensor control, const DebugContext &debugContext = {})

Construct a switch with no cases and an empty default case.

The add() method can be used to add cases after the switch is constructed.

Parameters
  • control: The controlling tensor.

  • debugContext: Optional DebugId and program name.

Switch(Tensor control, const Program &defaultCaseBody, const DebugContext &debugContext = {})

Construct a switch with no cases and the specified default case.

The add() method can be used to add cases after the switch is constructed.

Parameters
  • control: The controlling tensor.

  • defaultCaseBody: The body of the default case.

  • debugContext: Optional DebugId and program name.

Switch &add(std::int32_t value, const Program &body)

Add a case with the specified case value and body.

Return

A reference to the switch program.

Parameters
  • value: The case value.

  • body: The case body.

Public Static Functions

Switch switchWithBoundsChecking(Tensor control, const std::vector<std::pair<std::int32_t, Program>> &cases, const DebugContext &debugContext = {})

A helper function that causes the default case to throw an error.

Switch switchWithUnreachableDefault(Tensor control, const DebugContext &debugContext = {})

This function lets the compiler assume the default case is unreachable.

If the control value is something other than one of the cases, it results in undefined behaviour (although there is some very minimal error checking at runtime).

Private Functions

Switch(Tensor control, const Program &defaultCaseBody, const bool unreachableDefault, const DebugContext &debugContext = {})
class Sync : public poplar::program::Program
#include <Program.hpp>

A program to synchronise at a certain granularity dictated by the SyncType.

Public Functions

Sync(SyncType type, const DebugContext &debugContext = {})

Parameters
  • type: The type of sync to perform.

  • debugContext: Optional DebugId and program name.

class WriteUndef : public poplar::program::Program
#include <Program.hpp>

A program to mark a tensor as containing an undefined value.

This can be used to improve the liveness analysis of tensors and save memory in some situations.

Poplar does liveness analysis using the standard algorithm except that Poplar’s variables are not scalar values; they are arrays. In the standard analysis a variable is “killed” when it is written to with a new value. This means that it is dead immediately before that point because its value there can never be read.

int a = 1;
// a is dead here because its current value (1) can never be read.
a = 2; // a is killed here, which makes it dead on the line above.

In Poplar a variable is killed when all of its elements are written in the same compute set. Consider the pseudo-code:

var = graph.addVariable(FLOAT, {2}, ...);

seq.add(Execute( var[0] = 1, var[1] = 2 ));
// var is dead here (it is killed on the line below) because none of its
// element values (1, 2) can ever be read.
seq.add(Execute( var[0] = 3, var[1] = 4 ));

If only some of the elements are written then the entire variable is still live before the write because we may still need the value of the elements that were not written to.

seq.add(Execute( var[0] = 1, var[1] = 2 ));
// var is alive here because the value 2 might be read later.
seq.add(Execute( var[0] = 3 ));

var is still alive because no compute set writes to every element. If the entire variable is overwritten but in separate compute sets, then it will still be considered to be live because Poplar does not track the liveness of each variable element - only the entire variable.

seq.add(Execute( var[0] = 1, var[1] = 2 ));
// var is alive here even though 1 and 2 can never be read.
seq.add(Execute( var[0] = 3 ));
seq.add(Execute( var[1] = 4 ));

This means var is alive more than necessary which may lead to increased memory use. One solution is for Poplar to track the liveness of every variable element separately, but that would be prohibitively expensive.

Instead, this program provides a way to manually mark a tensor as being dead by writing an undefined value to it. Changing the above code to the following results in the correct liveness.

seq.add(Execute( var[0] = 1, var[1] = 2 ));
// Manually kill var because we know - even if Poplar does not - that
// it is about to be completely overwritten.
seq.add(WriteUndef(var));
seq.add(Execute( var[0] = 3 ));
seq.add(Execute( var[1] = 4 ));

For more information about liveness analysis see https://en.wikipedia.org/wiki/Live_variable_analysis and https://www.cl.cam.ac.uk/teaching/2006/OptComp/slides/lecture03.pdf

Parameters
  • t: The tensor to mark as undefined.

  • debugContext: Optional DebugId and program name.

Public Functions

WriteUndef(Tensor t, const DebugContext &debugContext = {})

3.5. Device management

3.5.1. poplar/TargetType.hpp

namespace poplar

Poplar classes and functions.

Enums

enum TargetType

Enum to represent the type of a device capable of running a graph.

Values:

enumerator IPU

Run on real IPU hardware.

enumerator IPU_MODEL

Model of the IPU which actually runs on the CPU but behaves like an IPU.

enumerator CPU

Run code on the CPU.

This does not accurately replicate all the functionality of an IPU and should only be used for running simple tests.

Functions

std::string toString(TargetType t)

Convert the target type to a string.

Throws an exception if an undefined type is passed, e.g. static_cast<TargetType>(100).

3.5.2. poplar/Target.hpp

namespace poplar

Poplar classes and functions.

Functions

void copyDeviceHalfToFloat(const Target &target, const void *src, float *dst, std::size_t numElements)

Convert device half-precision values to floats.

Parameters
  • target: Target that the half-precision data is to be copied from.

  • src: Pointer to the start of the half-precision data.

  • dst: Pointer to the float data to write.

  • numElements: Number of items to convert.

void copyFloatToDeviceHalf(const Target &target, const float *src, void *dst, std::size_t numElements)

Convert float values to device half-precision values.

Parameters
  • target: Target that the half-precision data is to be copied to.

  • src: Pointer to the float data to read.

  • dst: Pointer to the half-precision data to write.

  • numElements: Number of items to convert.

void copyDeviceHalfToDouble(const Target &target, const void *src, double *dst, std::size_t numElements)

Convert device half-precision values to doubles.

Parameters
  • target: Target that the half-precision data is to be copied from.

  • src: Pointer to the start of the half-precision data.

  • dst: Pointer to the double precision data to write.

  • numElements: Number of items to convert.

void copyDoubleToDeviceHalf(const Target &target, const double *src, void *dst, std::size_t numElements)

Convert double precision values to device half-precision values.

Parameters
  • target: Target that the half-precision data is to be copied to.

  • src: Pointer to the double precision data to read.

  • dst: Pointer to the half-precision data to write.

  • numElements: Number of items to convert.

class Target
#include <Target.hpp>

A target representation.

The Target class holds characteristics of a compilation target and enables interaction with it.

Target creation options

  • ipuLinkConfiguration (Default, BarleyTwist, SlidingWindow, None) [=None]

    The configuration used for the IPU to IPU connections (known as the Newmanry network). ‘None’ means that Poplar decides, based on the number of IPUs.

    Note that ‘Default’ is not the default!

  • syncConfiguration (intraReplicaAndAll, ipuAndAll) [=intraReplicaAndAll]

    The configuration of the hardware synchronisation groups. Note the ‘target.syncReplicasIndependently’ engine option determines which of the synchronisation groups is used for host synchronisation.

    • intraReplicaAndAll: The first sync group is used to sync IPUs within a replica and the second sync group is used to sync all IPUs.

    • ipuAndAll: The first sync group is used to sync each IPU independently with the host (if the target.syncReplicasIndependently option is set) and the second sync group is used to sync all IPUs.

  • ipuLinkTopology (mesh, torus) [=mesh]

    The topology of the IPU links. It describes how the IPUs in the system are connected.

    • mesh: The IPUs are connected as a ladder.

    • torus: The IPUs are connected as a ladder, with the top and bottom of the ladder linked together.

  • IpuLinkDomainSize Integer [=NIPUS]

    The number of IPUs connected via IPU links. Two IPU link domains can be connected together via gateway links.

Public Functions

Target()
~Target()
Target(const Target&)
Target(Target&&)
Target &operator=(const Target&)
Target &operator=(Target&&)
bool operator==(const Target&) const
bool operator!=(const Target&) const
bool operator<(const Target&) const
TargetType getTargetType() const

The target type.

StringRef getTargetArchString() const

The target architecture.

const core::TargetOptions &getTargetOptions() const
unsigned getNumIPUs() const

The number of IPUs.

unsigned getTilesPerIPU() const

The number of tiles per IPU.

unsigned getNumWorkerContexts() const

The number of worker contexts per tile.

unsigned getBytesPerTile() const

Bytes of memory per tile.

unsigned getExchangeBytesPerCycle() const

The bandwidth of internal IPU exchange in bytes per cycle.

unsigned getMemcpyBytesPerCycle() const

The maximum bandwidth for internal data copies on a tile.

unsigned getMinIPUSyncDelay() const

The IPU sync delay for the tile that is closest to the sync controller.

unsigned getGlobalSyncCycles() const

The number of clock cycles required to synchronize all IPUs.

const std::vector<unsigned> &getMemoryElementOffsets() const

Memory element offsets.

unsigned getInterleavedMemoryElementIndex() const

Memory element offset index for interleaved memory.

const std::vector<GlobalExchangeConstraint> &getGlobalExchangeConstraints() const

Set of constraints that provide a lower bound on the time it takes to send data between IPUs.

unsigned getNumStrideBits() const
unsigned getDataPathWidth() const

The width of the load/store data path within the tile.

unsigned getFp16ConvUnitMaxPipelineDepth() const

The maximum pipeline depth of the convolution units within the tile for fp16.

unsigned getFp32ConvUnitMaxPipelineDepth() const

The maximum pipeline depth of the convolution units within the tile for fp32.

unsigned getFp16ConvUnitInputLoadElemsPerCycle() const

The number of input elements loaded per cycle in f16 convolution unit.

unsigned getFp32ConvUnitInputLoadElemsPerCycle() const

The number of input elements loaded per cycle in f32 convolution unit.

unsigned getFp16InFp16OutConvUnitsPerTile() const

The number of convolution units in the tile that can be used when partial results are outputs as 16-bits and inputs are 16 bits.

unsigned getFp16InFp32OutConvUnitsPerTile() const

The number of convolution units in the tile that can be used when partial results are outputs as 32-bits and inputs are 16 bits.

unsigned getFp32InFp32OutConvUnitsPerTile() const

The number of convolution units in the tile that can be used when accumulating to 32 bit values.

unsigned getConvUnitCoeffLoadBytesPerCycle() const

The number of convolutional weights that can be loaded in a cycle.

unsigned getRptCountMax() const
bool supportsExchangeBusSharing() const

Whether tiles can share the local exchange bus during exchange.

unsigned getTilesPerSharedExchangeBus() const

The number of consecutive tiles that can share the exchange bus.

unsigned getNumTiles() const

Get the total number of tiles for this target (tiles per IPU * number of IPUs).

std::uint64_t getMemoryBytes() const

Get the total amount of memory on this target, across all IPUs.

unsigned getFloatVectorWidth() const

How many floats can be processed in one vector operation.

Equivalent to getDataPathWidth() / 32.

unsigned getHalfVectorWidth() const

How many halves can be processed in one vector operation.

Equivalent to getDataPathWidth() / 16.

unsigned getVectorWidth(const poplar::Type &type) const

How many of the given type can be processed in one vector operation.

unsigned getWeightsPerConvUnit(bool floatActivations) const
unsigned getConvUnitInputLoadElemsPerCycle(bool floatActivations) const
unsigned getMaxIPUSyncDelay() const

Get the maximum number of cycles required for an IPU sync in the best case scenario (all tiles are immediately ready).

double getTileClockFrequency() const

Get the tile clock frequency in Hertz.

unsigned getNumTilesPerXBContext() const

Get the number of tiles per exchange-block context (with repair).

unsigned getNumContextsPerXB() const

Get the number of contexts per exchange-block.

std::size_t getTypeSize(const Type&) const

Get the size of a given type in bytes.

std::size_t getAtomicStoreGranularity() const

Get the granularity of atomic stores that can be made by independent parallel worker threads.

Return

The granularity in bytes.

uint32_t makeFpIctlValue(bool inv, bool div0, bool oflo, bool esr, bool nanoo) const

Generate a value that could be written to Floating Point Initial Control Value register CSR_S.FP_ICTL in order to configure it with the specified options.

Parameters
  • inv: If true, a floating-point invalid operation (defined by IEEE 754) will cause an exception.

    The invalid operations are:

    • Addition or subtraction where the operands are + or - infinity (inf) and the operation results in the subtraction of two infs; for example: (-inf)+(+inf) or (+inf)-(+inf).

    • Divisions: (+/-0)/(+/-0) and (+/-inf)/(+/-inf).

    • Multiplications: (+/-0)*(+/-inf) and (+/-inf)*(+/-0).

    • Remainder: x REM y where y=0 or x=(+/-inf)

    • Real operations with complex results such as the square root or logarithm of a negative number.

    • Operations with Not-a-Number as at least one operand.

    • Comparisons where one of the operands is Not-a-Number.

      See also nanoo below.

  • div: If true a floating point divide by zero operation will cause an exception

  • oflo: If true a floating point overflow will cause an exception

  • esr: Enable stochastic rounding

  • nanoo: Enable Not-a-Number on overflow mode. When enabled half precision calculations that have overflowed will produce a Not-a-Number result, rather than saturating to the half precision max/min value, and the invalid operation (inv) flag will be set

unsigned getFpIctlRegIndex() const

Return the register index of the Floating Point Initial Control Value register CSR_S.FP_ICTL.

unsigned getDbgDataRegIndex() const

Return the register index of CSR_C.DBG_DATA.

IpuLinkConfiguration getIpuLinkConfiguration() const

Return the ipu link configuration of this target.

IpuLinkTopology getIpuLinkTopology() const

Return the IPU link topology.

unsigned getIpuLinkDomainSize() const

Return the size of the IPU link domain.

That is the number of IPUs that are connected via IPU links.

Target createVirtualTarget(unsigned numIPUs, unsigned tilesPerIPU) const

Create a “virtual” target consisting of a subset of the target’s tiles.

This method returns a target object that references the same state as this target but only uses a subset of the target’s tiles.

Return

The virtual target object.

Parameters
  • numIPUs: The number of IPUs the target should be for.

  • tilesPerIPU: The number of tiles per IPU.

Target(std::unique_ptr<core::Target>)
core::Target &getImpl() const

Public Static Functions

Target createCPUTarget(bool accurateHalf = false)

Create a CPU target.

Create a target for executing a simple graph on the CPU. This target will have 1 IPU with 1 tile on 1 worker thread.

This should only be used for simple functional testing.

Return

A Target object that can be used to create a graph.

Target createIPUTarget(unsigned numIPUs, StringRef systemType, const OptionFlags &opts = {})

Create an IPU target.

Create an IPU target with a specified number of IPUs based on the given system type.

Return

A Target object that can be used to create a graph.

Parameters
  • numIPUs: The number of IPUs the target should be for.

  • systemType: The ID of the system. Possible options: "ipu1"

  • opts: The option passed to the target.

Target createIPUTarget(unsigned numIPUs, unsigned tilesPerIPU, StringRef systemType, const OptionFlags &opts = {})

Create an IPU target with a virtual number of tiles.

Create an IPU target with a specified number of IPUs based on the given system type. In addition, the number of tiles can be restricted to a smaller virtual number of observable tiles.

Return

A Target object that can be used to create a graph.

Parameters
  • numIPUs: The number of IPUs the target should be for.

  • tilesPerIPU: The number of tiles per IPU.

  • systemType: The ID of the system. Possible options: "ipu1"

  • opts: The option passed to the target.

Target createIPUTarget(unsigned numIPUs, StringRef systemType, const core::TargetOptions &opts)

Create an IPU target.

Create an IPU target with a specified number of IPUs based on the given system type.

Return

A Target object that can be used to create a graph.

Parameters
  • numIPUs: The number of IPUs the target should be for.

  • systemType: The ID of the system. Possible options: "ipu1"

  • opts: The option passed to the target.

Target createIPUTarget(unsigned numIPUs, unsigned tilesPerIPU, StringRef systemType, const core::TargetOptions &opts)

Create an IPU target with a virtual number of tiles, and target options.

Create an IPU target with a specified number of IPUs based on the given system type. In addition, the number of tiles can be restricted to a smaller virtual number of observable tiles. This overload also accepts target options that can be obtained from another target.

Return

A Target object that can be used to create a graph.

Parameters
  • numIPUs: The number of IPUs the target should be for.

  • tilesPerIPU: The number of tiles per IPU.

  • systemType: The ID of the system. Possible options: "ipu1"

  • opts: The option passed to the target.

Private Members

std::unique_ptr<core::Target> impl
namespace core

3.5.3. poplar/Device.hpp

namespace poplar

Poplar classes and functions.

class Device
#include <Device.hpp>

A device refers to a physical entity that can execute code.

Devices should be obtained from a poplar::DeviceManager object or from appropriate factory poplar::Device::createXXXDevice(). Devices can not be copied but can be moved.

Public Functions

Device()
Device(Device&&)
~Device()
Device &operator=(Device&&)
Device(const Device&) = delete
Device &operator=(const Device&) = delete
unsigned getId() const

Get the numerical ID of this device as known by the DeviceManager.

const Target &getTarget() const

Get the target description of the device.

bool attach() const

Try and acquire this device and lock it to the current process.

void detach() const

Release this device to other processes.

void temporarilyDetach() const

Temporarily detach from this device so it can be reattached to without reloading the executable.

The behaviour is undefined if another process acquires and uses the device inbetween. Use of this method is strongly discouraged and it will be removed in future.

void getDriverVersion(unsigned &major, unsigned &minor, unsigned &point) const

Retrieve driver version of the attached device.

Throws if the device is not attached or is not an IPU device.

bool supportsRemoteBuffers() const

Retrieve Remote Buffers availability from attached device.

Throws if the device is not attached or is not an IPU device.

bool supportsGraphStreaming() const

Retrieve Remote Buffers availability from attached device.

Throws if the device is not attached or is not an IPU device. Deprecated, please use supportsRemoteBuffers()

std::map<std::string, std::string> getAttributes() const
std::vector<int> getNumaTopology() const

Get the NUMA node of each IPU that makes up this device.

std::vector<int> getNumaNodesUsedForIPUs() const

Get the NUMA nodes that Poplar will use to execute code that communicates with each IPU that makes up this device.

If Poplar can’t execute code on the NUMA node for an IPU then this function returns -1 for that IPU. Poplar will interpret the -1 as disabling NUMA node pinning for that IPU.

Note that this function is not necessarily the same as getNumaTopology(), as it also handles NUMA node restrictions imposed by the Poplar process’ CPU affinity. For example on a machine with two NUMA nodes, with ids of 0 and 1, each connected to one CPU and one IPU then a Poplar process that is bound to CPU 1 will use CPU 1 to execute stream callbacks for IPUs on both NUMA node 0 and 1, so this function would return [-1, 1] whereas the getNumaTopology() would return [0, 1].

Note that if the look-up of available host NUMA nodes fails then this function will return a vector of -1s, with one element for each IPU.

std::vector<unsigned> getDriverIDs() const

Get the list of driver device IDs that make up this device.

void reset() const

Reset the device’s state.

Device createVirtualDevice(unsigned tilesPerIPU)

Create a virtual device with a restricted number of tiles per IPU.

This method provides a smaller “virtual” device whose target only shows a subset of the tiles on the underlying device.

The calling object becomes a null device (the underlying device is moved into the returned Device object).

Device(std::unique_ptr<core::Device>)
core::Device &getImpl() const

Public Static Functions

Device createCPUDevice()

Create a device that executes vertex code on the host CPU.

This is only suitable for running small amounts of code; for example, for functional testing. It may not reproduce exactly the same functionality as running on an IPU. Also, functions such as Engine::getTileClockFrequency() may not return meaningful results.

Device createSimulatorDevice(const Target &target, const OptionFlags &options = {})

Private Members

Target target
std::unique_ptr<core::Device> impl
namespace core

3.5.4. poplar/DeviceManager.hpp

namespace poplar

Poplar classes and functions.

class DeviceManager
#include <DeviceManager.hpp>

A DeviceManager is able to enumerate and return groups of physical IPUs connected to an entity/host.

It returns such a group of IPUs as a single poplar::Device with a unique device manager id.

The physical devices within any returned Device may overlap with other Devices returned.

Any poplar::Device(s) returned can not be copied but can be moved for further use.

It is thread safe to both construct multiple DeviceManager’s in different threads and use them at the same time (although both threads might return the same device and therefore only one will succeed in attaching to it). It is also thread safe to use the same DeviceManager in different threads.

Public Functions

DeviceManager()
DeviceManager(const DeviceManager&)
DeviceManager(DeviceManager&&)
~DeviceManager()
std::size_t getNumDevices() const

Get the number of devices attached to this host.

std::vector<Device> getDevices(const OptionFlags &opts = {}) const

Get the list of all devices.

std::vector<Device> getDevices(TargetType type, unsigned requiredNumIPUs, const OptionFlags &opts = {}) const

Get the list of all devices fulfilling the specified criteria.

Depending on the criteria, the list may be empty - for example, if the requiredNumIPUs cannot be satisfied by any available device configurations. To view available device configurations, see the gc-info command line tool.

Return

A potentially empty list of matching devices

Parameters
  • type: The desired target type (IPU, IPU_Model, CPU)

  • requiredNumIPUs: Number of IPUs required

  • opts: The arguments passed to the target (optional)

Device getDevice(unsigned deviceManagerId, const OptionFlags &opts = {}) const

Get a specific device by its device manager id.

Return

A matching device

Parameters
  • deviceManagerId: The ID of the requested device. The ID is that returned by the gc-info command. This can specify a single device or a group of devices.

  • opts: The arguments passed to the target (optional)

std::vector<unsigned> getChildDeviceIds(unsigned parentId, unsigned numChildDeviceIpus = 1) const

Get the deviceIds of the child devices of a multi-IPU device.

A multi-IPU device will fully overlap “child” devices that are made out of the same IPUs. This method returns the set of child devices.

Parameters
  • parentId: The device ID of the parent device

  • numChildDeviceIpus: The number of IPUs the child devices must contain to be considered a child.

Public Static Functions

DeviceManager createDeviceManager()

Create a device manager for the current host.

Private Members

std::unique_ptr<core::DeviceManagerImpl> impl
namespace core

3.5.5. poplar/IpuLinkConfiguration.hpp

namespace poplar

Poplar classes and functions.

Enums

enum IpuLinkConfiguration

Enum to represent the IPU interconnect layout.

Values:

enumerator Default
enumerator SlidingWindow
enumerator BarleyTwist
enumerator None

Functions

std::ostream &operator<<(std::ostream &os, IpuLinkConfiguration ic)

3.5.6. poplar/IpuLinkTopology.hpp

namespace poplar

Poplar classes and functions.

Enums

enum IpuLinkTopology

Enum to represent the IPU interconnect layout.

Values:

enumerator Mesh
enumerator Torus

Functions

std::ostream &operator<<(std::ostream &os, IpuLinkTopology topo)

3.6. Graph execution

3.6.1. poplar/Engine.hpp

namespace poplar

Poplar classes and functions.

Typedefs

using ProgressFunc = std::function<void(int, int)>

Functions

Executable compileGraph(const Graph &graph, ArrayRef<program::Program> progs, const OptionFlags &opt = {}, ProgressFunc progressCallBack = ProgressFunc(), const DebugContext &debugContext = {})

Compile the given graph and programs to make an executable that can be executed using a poplar::Engine.

Parameters
  • graph: The graph to compile.

  • progs: The list of programs to run over the graph. Each program can be run separately by calling the run() method of the Engine with the argument being the index of the program to run in this list.

  • opt: Options that can be used to control compilation and execution. The available options are listed under Engine.

  • progressCallBack: A function that will be called to indicate engine compilation progress. See Engine::ProgressFunc for more information.

  • debugContext: Optional DebugId and debug name.

Exceptions
  • invalid_option: If any of the options passed in opt were not recognised or improperly formatted.

  • link_error: If program linking fails; for example, due to undefined symbols or lack of memory on a tile.

Executable compileGraph(Graph &&graph, ArrayRef<program::Program> progs, const OptionFlags &opt = {}, ProgressFunc progressCallBack = ProgressFunc(), const DebugContext &debugContext = {})

Deprecated:

This moving compileGraph interface is deprecated.

Use the non-moving const ref version instead.

class Engine
#include <Engine.hpp>

A graph compute engine.

The Engine class provides the ability to execute a graph program.

Engine creation options

Options can be overridden with the environment variable POPLAR_ENGINE_OPTIONS. For example:

POPLAR_ENGINE_OPTIONS='{"target.deterministicWorkers":"true"}'

Engine creation options: Debug

  • debug.allowOutOfMemory (true, false) [=false]

    If true, allow out-of-memory while compiling and linking.

  • debug.computeInstrumentationLevel (vertex, tile, device, ipu) [=tile]

    The granularity of compute instrumentation. This option has no effect unless debug.instrumentCompute is true.

    • vertex: Store the last cycle count of each vertex on every tile

    • tile: Store the last cycle count of each compute set on every tile

    • device: Store the last cycle count of each compute set on one tile. This saves memory compared to tile (since the cycle counts are always live and this needs to store them on only one tile), but it loses all per-tile cycle information. It works by adding a sync after each compute set and timing how long it takes to get to that sync, so effectively it measures the cycle time of the longest-running tile in the compute set.

    • ipu: Similar to “device”, but instead of storing the cycle counts on a single tile across all IPUs, it stores them on one tile per IPU which avoids the need for global syncs.

  • debug.retainDebugInformation (true, false) [=true] Retain compilation information to help with debugging. Must be true if profiling is enabled. Must be true if deprecated Engine connectStream / connectStreamToCallback / copy{To,From}RemoteBuffer methods are used.

  • debug.cpuMultiThreadExecution (true, false) [=true] If true, operations are executed using multiple host threads for a CPU or IPU model target. Setting to false may simplify debugging at the cost of reduced performance.

  • debug.instrument (true, false) [=false]

    If true, enable all instrument options (below). This will instruct the engine to add cycle counters to the compiled program to enable the execution profile to be retrieved after the program is run. This is only available for an IPU target (not an IPU Model target). Note that the more specific instrumentation options may override the default. For example,

    {"debug.instrument":"true",
     "debug.instrumentExternalExchange":"false"}
    

    will instrument everything apart from external exchange.

  • debug.instrumentCompute (true, false) [=false]

    If true, enable instrumentation of compute sets. See debug.instrument.

  • debug.instrumentExternalExchange (true, false) [=false]

    If true, enable instrumentation of external exchanges. See debug.instrument.

  • debug.instrumentControlFlow (true, false) [=false]

    If true, enable instrumentation of loops and conditionals. See debug.instrument.

  • debug.outputAllSymbols (true, false) [=false]

    If true, output additional symbols to the ELF files that are not required but aid debugging.

  • debug.exceptOnSOCError (true, false) [=false]

    If true, throw an exception on a SoC error. If false the error will be reported in the log instead.

  • debug.checkForSOCErrorAtRun (true, false) [=false]

    If true, check for SoC errors before and after program execution.

  • debug.profilingTile Integer [=Tiles per IPU - 1]

    The tile on which to store the cycle counter for every comput set. This has no effect unless debug.computeInstrumentationLevel is set to device.

  • debug.branchRecordTile Integer [=NTILES-1]

    The tile on which to store the branch record. This has no effect unless debug.instrumentControlFlow flag is set. In CPU target, this option has no effect. In IPU model, it only affects the memory profile.

  • debug.runtimeVerify (true, false) [=false]

    If true, expensive verification steps are enabled at runtime.

  • debug.trace (true, false) [=false]

    If true, a trace is printed to the error stream with the state of every edge before and after the execution of a compute set or exchange.

  • debug.traceFile String

    Only used if debug.trace is true. If set, the debug trace is output to the specified file instead of the error stream.

  • debug.verify (true, false) [=false]

    If true, expensive verification steps are enabled at compile time. The checks mostly focus on exchange code, including the following:

    • ensuring variables have been set,

    • ensuring section/instruction alignment is correct,

    • and ensuring the total number of bytes received is as expected.

    In addition after laying out memory we verify the memory constraints on variables are satisfied.

  • debug.supervisorStackSizeInBytes Integer

    If set, the automatically computed stack size for supervisor threads will be overridden with the specified value (in bytes) for all tiles.

  • debug.workerStackSizeInBytes Integer

    If set, the automatically computed stack size for worker threads will be overridden with the specified value (in bytes) for all tiles.

Engine creation options: Optimisations

  • opt.maxCompilationThreads Integer [=0]

    The maximum number of threads to use during compilation. A value of 0 means the hardware will be fully utilised.

  • opt.maxLinkerThreads Integer [=0]

    The maximum number of threads to use during compilation. A value of 0 means the same number will be used as were used for compilation.

  • opt.enableSwSyncs (true, false) [=false]

    If true, use a software synchronisation scheme to synchronise with the host following a stream copy. The software based synchronisation scheme lets IPUs start executing the next step as soon as they have finished sending and receiving all their data, without having to wait for every IPU to reach the end of the stream copy.

  • opt.internalExchangeOptimisationTarget (balanced, cycles) [=cycles]

    What balance of heuristics to use when generating exchange code. cycles will focus completely on speed whereas balanced will sacrifice some speed to attempt to reduce the amount of always live memory produced.

  • opt.limitVertexStateToLower256K (true, false) [=false]

    Enable this option to optimise the control code by allocating all of the vertex state in the first 256KB of memory. This has a disadvantage that this is the same range of memory that the code must live in so if the sum of the two is larger than that then the model will fail to compile.

Engine creation options: Profiler

The Profiler options control how Poplar generates the reports that can be viewed in the PopVision Graph Analyser (e.g. graph and execution profiles)

  • profiler.format (“v1”, “experimental”, “v3”) [=”v1”]

    This option sets the version of the profiler format. Note the “experimental” version may break tools that expect the “v1” format. Also note the “experimental” version is not backward compatible and subject to change. V3 format reduces the memory footprint of the profiler.

  • profiler.replicaToProfile Integer [=All replicas]

    Specifies which replica (0-based index) will be profiled. Note that a high-level summary of several metrics and timings will still be provided for the whole execution.

Engine creation options: Target

  • target.deterministicWorkers (true, false, portable) [=true]

    Ensure that the mapping of vertices to worker threads is the same for repeated execution either on the same IPU (true), or on every IPU (portable). This guarantee does not hold following breakpoints or exceptions.

  • target.saveArchive String

    If set, the binary archive will be saved to the specified filename during graph compilation. This archive contains the Elf files for each tile. No archive will be saved unless this option is set.

  • target.syncMethod (polling, hybrid, default) [=default]

    Controls how the host determines when an IPU wants to sync

    • polling: Using polling to determine when an IPU wants to sync.

    • hybrid: Use a mixture of interrupts and polling to determine an IPU wants to sync.

    • default: Choose a sensible default method based on the device type. Currently we default to polling for all device types but this may change in future.

  • target.syncPollPeriodUs Integer [=0]

    The period to use when polling for a host sync, in microseconds.

  • target.hostSyncTimeout Integer [=300]

    The amount of time to wait for a response from the IPU after running a program, in seconds. “0” means no timeout.

  • target.gatewayMode (true, false) [=false]

    Enable GWMODE (Gateway Mode) in the PCI Complex

  • target.gatewayWriteCombining (true, false) [=false]

    Optimise write to host code to use IPU machine Gateway write combining.

  • target.maxStreamCallbackThreadsPerNumaNode Integer [=0]

    The maximum number of threads per NUMA node to use to execute stream callbacks. A value of 0 means the main thread will execute all of the callbacks, which is the default because a non-zero number of threads requires thread-safe callbacks.

    A value of “auto” means the hardware will be fully utilised, this typically means up to one thread per CPU core is used.

    Note that this is the maximum number of threads in addition to the main thread, for example on a system with two NUMA nodes setting this option to 1 would mean that a total of three threads could execute callbacks with one thread pinned to each NUMA node and the main thread operating on one of the two nodes as well (assuming the main thread is free to execute callbacks).

Engine creation options: Report generation

The report generation options will automatically output the Poplar reports that can be viewed in the PopVision Graph Analyser.

These options provide a basic ability to capture the reports. For more complex use cases the reports should be generated programmatically via functions in the framework (TensorFlow, PopART or Poplar) in which the application is written.

  • autoReport.all (true, false) [=false]

    Output all the available reports described below.

    You can exclude individual reports by combining options. For example, this will generate all reports apart from the serialized graph:

    {"autoReport.all":"true",
     "autoReport.outputSerializedGraph":"false"}
    

  • autoReport.outputGraphProfile (true, false) [=false]

    Output the graph profile report: graph.cbor (V1) or profile.pop (V3). This is the same as the output of Engine::getGraphProfile.

  • autoReport.outputLoweredVars (true, false) [=false]

    Output the lowered variables file report: vars.capnp. This is equivalent to using the debug.loweredVarDumpFile option with the filename set to vars.capnp.

  • autoReport.outputArchive (true, false) [=false]

    Output the archive report: archive.a. This is equivalent to using the target.saveArchive option with the filename set to archive.a.

  • autoReport.outputSerializedGraph (true, false) [=false]

    Output the serialized graph: serialized_graph.capnp.

  • autoReport.outputExecutionProfile (true, false) [=false]

    Output the execution profile report: execution.cbor (V1) or profile.pop (V3). This is the same as the output of Engine::getExecutionProfile.

    By default this setting will also set debug.instrument to true. If you do not want instrumentation enabled you can set autoReport.outputExecutionProfile or debug.instrument to false.

  • autoReport.streamAtEachRun (true, false) [=true]

    % Applies to profiler format V3 or higher: Enable/disable the streaming of the execution profile to disk at each run. If false, the whole execution will be written to disk on Engine destruction (note some frameworks like Tensorflow may not properly destroy the Engine).

  • autoReport.outputDebugInfo (true, false) [=false]

    Output debug info: debug.json. This file gathers the data in every DebugInfo object created. Elements in the graph report with debugIds can be related to these DebugInfo objects.

  • autoReport.executionProfileProgramRunCount Integer [=2]

    Specify how many runs of each program to capture in the execution profile.

  • autoReport.directory String [=./]

    Specify which directory you want the reports to be written to. By default they will be written to the current working directory.

Engine creation options: Other

Public Types

using ProgressFunc = std::function<void(int, int)>

Callback function used to to indicate engine compilation progress.

The function is passed two integers. The first is the progress value and the second is the maximum value for the progress.

If a progress callback is used, the function should not block. All calls to the callback function will be made in a single dedicated thread so blocking in the callback will block the receipt of further notifications (but will not block compilation from progressing). The callback should not use Poplar objects or functions relating to the Graph, Engine or Device that are being compiled.

Public Functions

Engine(const Graph &graph, ArrayRef<program::Program> progs, const OptionFlags &opt = {}, ProgressFunc progressCallBack = ProgressFunc(), const DebugContext &debugContext = {})

Construct the engine from a graph and a list of programs.

Parameters
  • graph: The graph to compile into the engine.

  • progs: The list of programs to run over the graph. Each program can be run separately by calling the run() method of the Engine with the argument being the index of the program to run in this list.

  • opt: Options that can be used to control compilation and execution. The available options are listed under Engine.

  • progressCallBack: A function that will be called to indicate engine compilation progress. See Engine::ProgressFunc for more information.

  • debugContext: Optional DebugId and debug name.

Exceptions
  • invalid_option: If any of the options passed in opt were not recognised or improperly formatted.

  • link_error: If program linking fails; for example, due to undefined symbols or lack of memory on a tile.

Engine(Graph &&graph, ArrayRef<program::Program> progs, const OptionFlags &opt = {}, ProgressFunc progressCallBack = ProgressFunc(), const DebugContext &debugContext = {})

Deprecated:

This moving Engine constructor is deprecated.

Use the non-moving const ref version instead.

Engine(const Graph &graph, program::Program prog, const OptionFlags &opt = {}, ProgressFunc progressCallBack = ProgressFunc(), const DebugContext &debugContext = {})

Construct the engine from a graph and a program.

Parameters
  • graph: The graph to compile into the engine.

  • prog: The program to run over the graph. This program is run when the run() method is called on the Engine.

  • opt: Options that can be used to control compilation and execution. The available options are listed under Engine.

  • progressCallBack: A function that will be called to indicate engine compilation progress. See Engine::ProgressFunc for more information.

  • debugContext: Optional DebugId and debug name.

Exceptions
  • invalid_option: If any of the options passed in opt were not recognised or improperly formatted.

  • link_error: If the program linking fails; for example, due to undefined symbols or lack of memory on a tile.

Engine(Graph &&graph, program::Program prog, const OptionFlags &opt = {}, ProgressFunc progressCallBack = ProgressFunc(), const DebugContext &debugContext = {})

Deprecated:

This moving Engine constructor is deprecated.

Use the non-moving const ref version instead.

Engine(Executable &&exe, const OptionFlags &opt = {}, const DebugContext &debugContext = {})

Construct the engine from a precompiled executable.

Parameters
  • exe: The precompiled executable. This can be created using poplar::compileGraph().

  • opt: Options that can be used to control execution. These must be the same as the flags passed to compileGraph(). The available options are listed under Engine.

  • debugContext: Optional DebugId and debug name.

Exceptions
  • invalid_option: If any of the options passed in opt were not recognised or improperly formatted.

Engine(Engine&&)
~Engine()
void prepare(const Device &device)

Prepare the device for loading.

This configures the device ready for loading binary code.

Parameters
  • device: The device to load onto.

void deploy()

Load the engine.

This loads binary code. The device must have been prepare()ed previously.

void load(const Device &device)

Load the compiled program/graph onto a device.

This function will load all binary code and data onto the device ready for execution. This is a shortcut to separate prepare() and deploy() calls.

Parameters
  • device: The device to load onto.

void run(unsigned prog = 0, const std::string &debugName = "")

Run the graph program.

This function will execute the graph program. Note that the program needs to have already been loaded onto a device otherwise an exception will occur.

Parameters
  • prog: The index of the program to run. If this is greater than or equal to the number of programs given in the constructor then an exception is thrown.

  • debugName: Run name (for debugging/analysis).

void loadAndRun(const Device &device, unsigned prog = 0)

Run the graph program.

This function will load the program/graph onto the device and then execute the graph program.

Parameters
  • prog: The index of the program to run. If this is greater than or equal to the number of programs given in the constructor then an exception is thrown.

TimerTimePoint getTimeStamp()

Get a record of the current host and device time.

Details depend on the underlying device used.

const ProfileValue &getGraphProfile() const

Get a report containing profiling data for the graph on the underlying device.

This is only valid to call if the underlying device of the graph is an IPU model device.

Return

A reference to an internal profile.

Exceptions

const ProfileValue &getExecutionProfile()

Get a report containing profiling data for programs executed with this engine since this engine was constructed/the execution report was last reset.

See the Poplar and PopLibs User Guide for details of the data in the report.

Return

A reference to an internal profile. Be aware that if you store a reference to this, rather than copying it, then it may change when you run further programs.

Exceptions

ProfileValue getProfile()

Get a report containing profiling data for both the graph and the programs executed with this engine.

This is equivalent to getting both the graph profile and execution profiles in a single ProfileValue.

See the Poplar and PopLibs User Guide for details of the data in the report.

Return

A copy of the internal profile.

Exceptions

void resetExecutionProfile()

Reset execution profile.

When programs are run their profiles are appended to the execution profile. This discards profiling information for previously executed programs.

void disableExecutionProfiling()

Pause execution profiling.

Subsequent engine.run() calls are executed without being profiled until a subsequent call to enableExecutionProfiling.

For example, you can exclude individual programs from a profile like this:

 engine.disableExecutionProfiling();
 engine.run(...);
 engine.enableExecutionProfiling();

void enableExecutionProfiling()

Enable execution profiling.

Subsequent engine.run() calls are profiled when executed.

void printProfileSummary(std::ostream &outputStream, const OptionFlags &opt = {})

Get and print the summary of a report with the given options.

This is equivalent to getting and printing the summary of both the graph and execution reports using poplar::printProfileSummary().

Parameters
  • outputStream: A stream to write the summary to.

  • opt: A set of option flags configuring the contents of the report. All can be “true” or “false”. The default is “false”.

    The available options are:

    • showVarStorage (true, false)

    • showOptimizations (true, false)

    • showExecutionSteps (true, false)

Exceptions
  • profiling_disabled: If the device is not an IPU model device.

  • invalid_option: If any of the options passed in opt were not recognised or improperly formatted.

void reportIntervals(std::ostream &outputStream)

Write a CSV data file to a specified output stream containing the number of tiles active over time in cycles for compute, synchronisation and exchange phases.

Each row contains the following entries:

  • begin time in cycles

  • end time in cycles

  • number of tiles participating in compute

  • number of tiles participating in exchange

  • number of tiles participating in synchronisation

Because tiles execute a number of threads (up to 6) in parallel a single “thread cycle” may only be executed every 6 tile clock cycles. The cycles reported by this function are tile clock cycles rather than thread cycles.

Parameters
  • outputStream: An output stream for the CSV data to be written to.

Exceptions

void readTensor(StringRef handle, void *buf)

Synchronous copy of a buffer of data from a specific tensor in the device into a host size buffer.

The tensor must have been marked as an output tensor. The buffer must have room for all of the tensor data. The handle should match the one passed to Graph::createHostRead()

Deprecated:

Use readTensor(StringRef, void*, void*) instead.

See

Graph::createHostRead()

Parameters
  • handle: The source host copy handle.

  • buf: The destination of the read.

void readTensor(StringRef handle, void *buf, void *bufEnd)

Synchronous copy of a buffer of data from a specific tensor in the device into a host size buffer.

The tensor must have been marked as an output tensor. The buffer must have room for all of the tensor data. Buffer end address required for sizes verification. The handle should match the one passed to Graph::createHostRead()

See

Graph::createHostRead()

Parameters
  • handle: The source host copy handle.

  • buf: The destination of the read.

  • bufEnd: The end address of destination space

void writeTensor(StringRef handle, const void *buf)

Synchronous copy of a buffer of data from the host to a specific tensor in the device.

The tensor must have been marked as an input tensor. The buffer must have enough data for the whole tensor. The handle should match the one passed to Graph::createHostWrite()

Deprecated:

Use writeTensor(StringRef, const void*, const void*) instead.

See

Graph::createHostWrite()

Parameters
  • handle: The destination host copy handle.

  • buf: The source of the write.

void writeTensor(StringRef handle, const void *buf, const void *bufEnd)

Synchronous copy of a buffer of data from the host to a specific tensor in the device.

The tensor must have been marked as an input tensor. Buffer end address required for sizes verification. The handle should match the one passed to Graph::createHostWrite()

See

Graph::createHostWrite()

Parameters
  • handle: The destination host copy handle.

  • buf: The source of the write.

  • bufEnd: The end address of source space.

void connectStream(StringRef handle, void *begin, void *end)

Connect a stream to a circular buffer in memory.

Each time data is copied to/from the stream the pointer for the next transfer is incremented within the bounds given.

Parameters
  • handle: The name of the stream to connect to

  • begin: Pointer to the start of the circular buffer

  • end: Pointer to the end of the circular buffer.

void connectStream(const DataStream &stream, void *begin, void *end)

Connect a stream to a circular buffer in memory.

Each time data is copied to/from the stream the pointer for the next transfer is incremented within the bounds given.

Deprecated:

Use connectStream(StringRef, void*, void*) instead.

Parameters
  • stream: The stream to connect to

  • begin: Pointer to the start of the circular buffer

  • end: Pointer to the end of the circular buffer.

void connectStream(StringRef handle, void *p)

Connect a stream to a fixed location in memory.

Each time data is copied to/from the stream this location will be read/written.

Parameters
  • handle: The name of the stream to connect to

  • p: The pointer to the memory buffer

void connectStream(const DataStream &stream, void *p)

Connect a stream to a fixed location in memory.

Each time data is copied to/from the stream this location will be read/written.

Deprecated:

Use connectStream(StringRef, void) instead.

Parameters
  • stream: The stream to connect to

  • p: The pointer to the memory buffer

void connectStreamToCallback(StringRef handle, StreamCallbackHandle f)

Connect a stream to a callback taking a pointer to the location in memory to copy into/from.

This will be called whenever the stream will be read/was written by the device. The given memory location will only be valid to read from/write to for the duration of the callback.

Parameters
  • handle: The name of the stream to connect to.

  • f: Callback to be called whenever the stream is to be read/was written by the device.

void connectStreamToCallback(const DataStream &stream, StreamCallbackHandle f)

Connect a stream to a callback taking a pointer to the location in memory to copy into/from.

This will be called whenever the stream will be read/was written by the device. The given memory location will only be valid to read from/write to for the duration of the callback.

Deprecated:

Use connectStreamToCallback(StringRef, StreamCallbackHandle) instead.

Parameters
  • stream: The stream to connect to.

  • f: Callback to be called whenever the stream is to be read/was written by the device.

void connectStreamToCallback(StringRef handle, unsigned index, StreamCallbackHandle f)

Connect a replicated stream to a callback taking a pointer to the location in memory to copy into/from.

This will be called whenever the stream will be read/was written by the device. The given memory location will only be valid to read from/write to for the duration of the callback.

Parameters
  • handle: The name of the stream to connect to.

  • index: The replicated index to connect to.

  • f: Callback to be called whenever the stream is to be read/was written by the device.

void connectStreamToCallback(const DataStream &stream, unsigned index, StreamCallbackHandle f)

Connect a replicated stream to a callback taking a pointer to the location in memory to copy into/from.

This will be called whenever the stream will be read/was written by the device. The given memory location will only be valid to read from/write to for the duration of the callback.

Deprecated:

Use connectStreamToCallback(StringRef, unsigned, StreamCallbackHanle) instead.

Parameters
  • stream: The stream to connect to.

  • index: The replicated index to connect to.

  • f: Callback to be called whenever the stream is to be read/was written by the device.

void copyFromRemoteBuffer(StringRef handle, void *w, int repeatIndex, unsigned replicationIndex = 0)

Copy from a remote buffer to a user buffer w.

Parameters
  • handle: The name of the remote buffer to copy from.

  • w: The user buffer to copy to.

  • repeatIndex: The index in the remote buffer to copy from.

  • replicationIndex: The replicated graph index.

void copyFromRemoteBuffer(const RemoteBuffer &buffer, void *w, int repeatIndex, unsigned replicationIndex = 0)

Copy from a remote buffer to a user buffer w.

Deprecated:

Use copyFromRemoteBuffer(StringRef, void*, int, unsigned) instead.

Parameters
  • buffer: The remote buffer to copy from.

  • w: The user buffer to copy to.

  • repeatIndex: The index in the remote buffer to copy from.

  • replicationIndex: The replicated graph index.

void copyToRemoteBuffer(void *w, StringRef handle, int repeatIndex, unsigned replicationIndex = 0)

Copy to a remote buffer from a user buffer w.

Parameters
  • w: The user buffer to copy from.

  • handle: The remote buffer to copy to.

  • repeatIndex: The index in the remote buffer to copy to.

  • replicationIndex: The replicated graph index.

void copyToRemoteBuffer(void *w, const RemoteBuffer &buffer, int repeatIndex, unsigned replicationIndex = 0)

Copy to a remote buffer from a user buffer w.

Deprecated:

Use copyFromRemoteBuffer(StringRef, void*, int, unsigned) instead.

Parameters
  • w: The user buffer to copy from.

  • buffer: The remote buffer to copy to.

  • repeatIndex: The index in the remote buffer to copy to.

  • replicationIndex: The replicated graph index.

std::vector<std::string> listStreams() const

Return a list of all streams in the engine.

Return

Vector of strings each of which is a stream’s handle postfixed with ‘+’ or ‘-‘ indicating whether the stream is a host-write or a host-read respectively.

void setPrintStream(std::ostream &stream)

Set output stream for printf commands.

Parameters
  • stream: The output stream to use.

void setPrintTensorStream(std::ostream &stream)

Set the output stream for PrintTensor programs.

By default tensors are printed to stderr.

Parameters
  • stream: The output stream to use.

OptionFlags getEngineOptions() const

Returns options the engine was created with.

Engine(std::unique_ptr<core::Engine>)
const core::Engine &getImpl() const

Public Static Functions

std::string reportTiming(const TimerTimePoint &start, const TimerTimePoint &end)

Get a timing report for the measured interval.

Details depend on the underlying device used.

Parameters
  • start: Start time of report

  • end: End time of report

Private Members

std::unique_ptr<core::Engine> impl
class TimerTimePoint

Public Functions

TimerTimePoint() = default

Private Functions

TimerTimePoint(Engine &e)

Private Members

std::shared_ptr<core::TimerTimePoint> impl

Friends

friend class Engine
namespace core

3.6.2. poplar/StreamCallback.hpp

template<typename Result, typename Ret>
struct is_invocable_impl<Result, Ret, void_type<typename Result::type>> : public std::is_void<Ret>
namespace poplar

Poplar classes and functions.

class LegacyStreamCallback : public poplar::StreamCallback
#include <StreamCallback.hpp>

Convenience StreamCallback specialization for implementations that do not support prefetch/complete operations.

Public Functions

Result prefetch(void*) final override

Not available in legacy streams.

void invalidatePrefetched() final override

Not available in legacy streams.

void complete() final override

Not available in legacy streams.

class StreamCallback
#include <StreamCallback.hpp>

Interface used during stream copies to produce/consume the data being exchanged between the host and the device.

In regular stream copies, fetch and complete functions are called as a result of the device requesting the data transfer.

If the following engine options are set, prefetch function will be called after an ongoing host-to-device transfer of the same stream completes:

  • exchange.streamBufferOverlap=none

  • exchange.enablePrefetch=true

Subclassed by poplar::LegacyStreamCallback

Public Types

enum Result

Values:

enumerator Success
enumerator NotAvailable

Public Functions

~StreamCallback() = default
Result prefetch(void *p) = 0

Callback function to fill the host buffer (host-to-device streams only).

This function is called speculatively, this means it might still be called even if no additional copies for this stream exist for the remaining execution of the program.

The following situations are possible during the invocation:

  • There is more data available for consumption (A)

  • Data is temporarily not available during the time this function is called (B)

  • The stream reached the end and so no more data is available (C)

The return value indicates if the invocation resulted in the buffer being successfully filled. In the first case (A), the function shall return Result::Success. A call to complete() will follow if the program ends up transferring the data. Otherwise (scenarios B and C), it must return Result::NotAvailable. Calls to fetch() and then complete() will follow if the transfer takes place.

Note that when using a buffered data stream (see Graph::addHostToDeviceFIFO(), bufferingDepth option) there can be multiple calls to prefetch() before a corresponding complete() is called. In some circumstances prefetched data is invalidated and not read, and therefore will have no corresponding complete(), this is notified with invalidatePrefetched().

Return

Result::Success if the function was able to fill the buffer with data, or Result::NotAvailable otherwise.

Parameters
  • p: Location of the buffer. It will only be valid for the duration of the function.

void complete() = 0

Notifies that the data involved in the last prefetch/fetch invocation is used by the device.

It usually means that a speculative read was a hit, and the callback can move on to the next piece of input.

void invalidatePrefetched()

Notifies when the engine will reset this stream (invalidating any prefetched data which has not been read).

void fetch(void*) = 0

Callback function to fill the host buffer.

This function is called as a result of a stream copy, unless the last prefetch invocation was successful.

It must always fill the buffer with more data and it is followed by a call to complete.

class StreamCallbackHandle
#include <StreamCallback.hpp>

Wrapper for StreamCallback instances.

Provides backwards compatibility with C++ lambda expressions and std::function instances.

Public Functions

template<class CallbackImpl, typename = typename std::enable_if<std::is_base_of<StreamCallback, CallbackImpl>::value>::type>
StreamCallbackHandle(std::unique_ptr<CallbackImpl> f)

Constructs a handle from an instance of a stream callback implementation.

This constructor only participates in overload resolution if CallbackImpl is derived from poplar::StreamCallback (i.e. it is an implementation of the callback interface).

template<class F, typename = typename std::enable_if<traits::is_callback<F>::value>::type>
StreamCallbackHandle(F &&f)

Constructs a handle from a callable instance.

This constructor only participates in overload resolution if F satisfies the requirements of a Function Object. It transforms f into a LegacyStreamCallback implementation.

See

https://en.cppreference.com/w/cpp/named_req/FunctionObject

StreamCallbackHandle(const StreamCallbackHandle&) = delete
StreamCallbackHandle(StreamCallbackHandle&&) = default
operator std::unique_ptr<StreamCallback>() &&

Extracts the callback implementation from the handle.

Private Members

std::unique_ptr<StreamCallback> callback

Private Static Functions

template<class F>
std::unique_ptr<StreamCallback> makeCallback(F &&f)
namespace traits

Typedefs

using void_type = void
template<typename F>
struct is_callback : public poplar::traits::is_invocable_impl<std::result_of<F&(void*)>, void>
template<typename Result, typename Ret, typename = void>
struct is_invocable_impl : public false_type

Subclassed by poplar::traits::is_callback< F >

template<typename Result, typename Ret> type > > : public std::is_void< Ret >
template<typename T>
struct remove_cvref : public std::remove_cv<std::remove_reference<T>::type>

3.7. Serializing executable state

3.7.1. poplar/Executable.hpp

namespace poplar

Poplar classes and functions.

class Executable
#include <Executable.hpp>

An instance of poplar::Executable contains all of the information needed to run a program on an IPU device.

It can be saved to or loaded from disk.

Public Functions

~Executable()
Executable(Executable &&other)
Executable &operator=(Executable &&other)
void serialize(std::ostream &out) const

Serialize an executable to a stream.

All of the binary files and metadata needed to run a Poplar executable will be written to the stream. Currently the format is opaque, and compatibility between different versions of Poplar is not guaranteed.

Parameters
  • out: The stream to write to. It must be seekable.

Exceptions
  • poplar_error: if the target is not an IPU - this cannot be used to serialise CPU or IPU_MODEL executables.

Executable(std::unique_ptr<core::Executable> impl)
const core::Executable &getImpl() const

Public Static Functions

Executable deserialize(std::istream &in)

Load an executable from a stream.

Parameters
  • in: The stream to read from. It must be seekable.

Private Members

std::unique_ptr<core::Executable> impl

Friends

friend class Engine
namespace core

3.8. Profiling & debugging

3.8.1. poplar/DebugContext.hpp

Defines

SUPPORTS_LOCATION_BUILTINS
namespace poplar

Poplar classes and functions.

Typedefs

using DebugId = std::uint64_t

Enums

enum DebugSerializationFormat

Values:

enumerator JSON

Serialise in JSON format.

enumerator CBOR

Serialise in CBOR format.

Functions

std::ostream &operator<<(std::ostream &os, const DebugNameAndId &dnai)

Display the path name of the DebugNameAndId.

Return

The ostream written to

Parameters

std::ostream &operator<<(std::ostream &os, const DebugContext &dc)

Display the path name of the DebugContext.

Return

The ostream written to

Parameters
  • os: The ostream to output to

  • dc: The DebugContext to display

class DebugContext
#include <DebugContext.hpp>

DebugContext gathers the common external parameters of the context of an operation.

As an extension to DebugNameAndId, DebugContext bundles a name and a DebugId as well as the file and line in the source code where it is invoked.

Note that to reflect the specific line where an invocation took place, the DebugContext object must be constructed in the same line of the invocation. For instance, if a function foo wants to capture the DebugContext of its invocation, it should be called like this: foo(DebugContext{}); rather than: DebugContext debugContext; foo(debugContext); Although typically foo would accept a default argument: void foo(const DebugContext &debugContext = {}); so that the DebugContext can be automatically captured: foo();

A DebugContext ultimate goal is to be passed to the constructor of a DebugInfo. The DebugContext carries the DebugId of the parent DebugInfo to keep a hierarchical relationship. A typical flow would be: Initial DebugContext is (implicitly) created in foo default argument and used to create the initial DebugInfo. Then foo2 is called. void foo(const DebugContext &debugContext = {}) { DebugInfo debugInfo{debugContext}; foo2(debugInfo); } foo2 captures the DebugContext that contains the parent DebugId: void foo2(const DebugContext &debugContext) { DebugInfo debugInfo{debugContext}; } In this way, low-level operations and resources can be related to the high-level operation that triggered them.

Public Functions

DebugContext(SourceLocation loc = SourceLocation::Current())
DebugContext(const char *name, SourceLocation loc = SourceLocation::Current())
DebugContext(StringRef name, SourceLocation loc = SourceLocation::Current())
DebugContext(std::string name, SourceLocation loc = SourceLocation::Current())
DebugContext(const DebugInfo &debugInfo, std::string name = "", SourceLocation loc = SourceLocation::Current())
DebugContext(const DebugNameAndId &debugNameAndId, std::string name = "", SourceLocation loc = SourceLocation::Current())
DebugContext(const DebugContext &debugContext, std::string name = "")
DebugContext(const DebugContext &debugContext, SourceLocation loc)
DebugContext(DebugContext&&)
~DebugContext()
std::string getPathName() const

Gets the pathname of this object as the concatenation of the parent name received in the constructor via DebugInfo or DebugNameAndId and the name explicitly set for this object.

core::DebugContext &getImpl() const

Private Members

std::unique_ptr<core::DebugContext> impl
class DebugInfo
#include <DebugContext.hpp>

DebugInfo stores and persists a set of data that describes the context of an operation.

Some of that data is structured, such as the framework layer name (Poplar, PopLibs, PopART, and so on) or the file and line of the source code. But it can also be custom data set by the user through setValue method.

In turn, the DebugInfo is passed to sub-operations of that operation so that resources (Programs, Variables, etc.) created in lower levels can be hierarchically related to the initial DebugInfo.

After execution, the created DebugInfo has been written to a file. At the same time, the operation resources have been persisted in the graph and execution profile, together with their DebugInfo Id. In this way, tools like PopVision can conveniently present to the user the operation, its resources, and its DebugInfo.

This class is expected to be derived to adapt to particular use cases (typically, by adding extra mandatory arguments to the constructor). Internally, derived classes can use setValue() to store the extra data to be persisted.

At object destruction, the DebugInfo data is passed to the Streamer to be written to a file. Thus, the Streamer should be initialized before any DebugInfo object gets destroyed, or it will not be persisted.

Subclassed by poputil::OpDebugInfo

Public Functions

DebugInfo(const DebugContext &debugContext, std::string layer)

Constructor.

Parameters
  • debugContext: Captures the external context of the operation (for example, file and line of invocation).

  • layer: Name of the framework level (for example Poplar, PopLibs or PopART).

DebugInfo &operator=(const DebugInfo&) = delete
DebugInfo(const DebugInfo&) = delete
~DebugInfo()
DebugId getId() const

Gets the unique identifier of this DebugInfo object.

std::string getPathName() const

Gets the pathname of this object (as received from DebugContext).

core::DebugInfo &getImpl() const
bool setValue(std::string name, ProfileValue value)

Adds custom data to this object if “name” is not already set.

Return

true if “name” was not already set, false otherwise.

Parameters
  • name: The key name of the data.

  • value: A ProfileValue object containing the custom data.

Public Static Functions

void initializeStreamer(const std::string &fileName, const DebugSerializationFormat &format = DebugSerializationFormat::CBOR)

Initializes the Streamer, unless it is already initialized (for example through env variables).

Parameters
  • fileName: The name of the file where all DebugInfos will be persisted.

  • format: The format of the file (JSON or CBOR).

void closeStreamer()

Closes the Streamer: all data is flushed to disk and the file is ready to be read.

DebugInfo objects destroyed after this point will not be persisted.

Private Members

std::unique_ptr<core::DebugInfo> impl
class DebugNameAndId
#include <DebugContext.hpp>

DebugNameAndId bundles a name and a DebugId to facilitate their propagation through function calls.

Public Functions

DebugNameAndId(std::string name = "", DebugId debugId = {}, std::string parentPath = "")
DebugNameAndId(const char *name)
DebugNameAndId(DebugId debugId)
DebugNameAndId(const DebugInfo &debugInfo, std::string name = "")
DebugNameAndId(const DebugNameAndId &DebugNameAndId, std::string name = "")
DebugNameAndId &operator=(const DebugNameAndId &other)
~DebugNameAndId()
std::string getPathName() const

Gets the pathname of this object as the concatenation of the parent name received in the constructor via DebugInfo or DebugNameAndId and the name explicitly set for this object.

core::DebugNameAndId &getImpl() const

Private Members

std::unique_ptr<core::DebugNameAndId> impl
class SourceLocation
#include <DebugContext.hpp>

This class mimics std::source_location that is unavailable as we don’t yet support C++20.

Public Functions

SourceLocation() = default
constexpr SourceLocation(const char *functionName, const char *fileName, unsigned lineNumber)
constexpr const char *getFunctionName() const
constexpr const char *getFileName() const
constexpr unsigned getLineNumber() const
constexpr bool isValid() const

Public Static Functions

SourceLocation Current()

Private Members

const char *functionName = {""}
const char *fileName = {""}
const unsigned lineNumber = {}
const bool valid = {false}
namespace core

3.8.2. poplar/ProfileValue.hpp

namespace poplar

Poplar classes and functions.

Functions

void serializeToJSON(std::ostream &out, const ProfileValue &val, bool prettyPrint = false)
void serializeToCBOR(std::ostream &out, const ProfileValue &val, bool withTag = true)
std::ostream &operator<<(std::ostream &os, const ProfileValue &v)

Dumps the JSON representation to an output stream.

void printGraphSummary(std::ostream &out, const ProfileValue &graphProfile, const OptionFlags &opts)

Print a summary of the static graph profiling information - primarily memory use.

The available options are:

  • showOptimizations (true, false) [=false]

    If true, information about the optimisations performed are included in the summary output.

  • showPerIpuMemoryUsage (true, false) [=false]

    If true, total memory usage per-IPU is included in the summary output in addition to memory usage for the whole device.

  • showVarStorage (true, false) [=false]

    If true, information about variable storage liveness is included in the summary output. This is provided for some tiles with the highest maximum live bytes as well as a total for all tiles. The maximum live bytes is output along with information about always-live variables.

  • colours (true, false)

    Specify whether colours should be displayed in the profile report. If not set, colours will be displayed only if outputting to a supported terminal. If not set, using environment variable CLICOLOR_FORCE=1 forces colours to be displayed, while CLICOLOR=0 disables colours.

void printExecutionSummary(std::ostream &out, const ProfileValue &graphProfile, const ProfileValue &executionProfile, const OptionFlags &opts)

Print a summary of the execution profiling information - primarily cycle counts.

The information printed depends on the target and the execution profiling mode. IPUModel always prints a simulation of execution.

The available options are:

  • showExecutionSteps (true, false) [=false]

    If true, the program execution sequence with cycle estimates is included in the summary output.

  • colours (true, false)

    See printGraphSummary().

void printProfileSummary(std::ostream &out, const ProfileValue &graphProfile, const ProfileValue &executionProfile, const OptionFlags &opts = {})
class ProfileValue
#include <ProfileValue.hpp>

ProfileValue represents a read-only JSON-like tree of values that are used to store the output of the profiler.

Each value can be one of:

  • A boolean

  • A string

  • A double-precision number

  • A vector<> of child values

  • A map<string, …> of child values. Only string keys are supported.

If an invalid access is made, for example an out-of-range access or accessing the wrong type, then an exception is thrown. It is possible to write code that should never throw an exception by using type().

See the Poplar and PopLibs User Guide for details of the data in the report.

Public Types

enum Type

Values:

enumerator BOOL_
enumerator STRING
enumerator NUMBER
enumerator VECTOR
enumerator MAP
using Boolean = bool
using Number = double
using String = std::string
using Vector = std::vector<ProfileValue>
using Map = std::map<std::string, ProfileValue>

Public Functions

Type type() const
const String &asString() const
Boolean asBool() const
std::int64_t asInt() const
std::uint64_t asUint() const
double asDouble() const
const ProfileValue &operator[](StringRef s) const
const ProfileValue *getOrNull(StringRef s) const
const Map &asMap() const
const ProfileValue &operator[](std::size_t i) const
const Vector &asVector() const
std::vector<std::uint64_t> toUintVector() const
std::size_t size() const
double sumDouble() const
std::int64_t sumInt() const
std::uint64_t sumUint() const
std::int64_t sum2DInt() const
std::uint64_t sum2DUint() const
bool operator==(const ProfileValue &other) const
bool operator!=(const ProfileValue &other) const
ProfileValue()
ProfileValue(String init)
ProfileValue(Vector init)
ProfileValue(Map init)
ProfileValue(Number init)
ProfileValue(Boolean init)
template<class T, typename = typename std::enable_if<std::is_integral<T>::value>::type>
ProfileValue(T init)
ProfileValue(const char *init)
~ProfileValue()
ProfileValue(const ProfileValue &other)
ProfileValue(ProfileValue &&other) noexcept
ProfileValue &operator=(const ProfileValue &other)
ProfileValue &operator=(ProfileValue &&other) noexcept
ProfileValue &operator=(Boolean init)
ProfileValue &operator=(Number init)
ProfileValue &operator=(String init)
ProfileValue &operator=(Vector init)
ProfileValue &operator=(Map init)
template<class T, typename = typename std::enable_if<std::is_integral<T>::value>::type>
ProfileValue &operator=(T init)

Private Members

Storage v
Type t

Friends

friend class core::MutableProfileValue
struct Storage

Public Members

std::aligned_union<1, Boolean, Number, String, Vector, Map>::type buffer
namespace core

3.8.3. poplar/IPUModel.hpp

namespace poplar

Poplar classes and functions.

struct IPUModel
#include <IPUModel.hpp>

A model of an IPU to create an IPUModel Device The IPU Model will simulate the behaviour of the IPU hardware.

It will not completely implement every aspect of a real IPU.

Public Types

enum RelativeSyncDelayType

A function that returns the number of cycles before the specificed tile is released from sync relative to the first tile that is release from sync.

Values:

enumerator AUTO
enumerator NO_DELAY

Public Functions

IPUModel(char const *IPUVersion = "ipu2")
bool operator==(const IPUModel&) const
bool operator!=(const IPUModel&) const
Device createDevice(OptionFlags opts = {}, bool accurateHalf = false, unsigned deviceManagerId = std::numeric_limits<unsigned>::max())

Create a device that runs code on the CPU and models the performance that would be achieved on an IPU.

Public Members

std::string IPUVersion

ipu0, ipu1, ipu2, etc…

unsigned numIPUs

The number of IPUs.

unsigned tilesPerSuperTile

The number of tiles per supertile.

unsigned tilesPerIPU

The number of tiles per IPU.

unsigned numWorkerContexts

The number of worker contexts per tile.

unsigned memoryBytesPerTile

Memory bytes per tile.

double tileClockFrequency

Clock frequency in Hz.

unsigned exchangeBytesPerCycle

The bandwidth of internal IPU exchange in bytes per cycle.

unsigned memcpyBytesPerCycle

The number of bytes per cycle that can be copied from one location to another using a memcpy.

unsigned instructionBytes

The size of an instruction in bytes.

bool supportsSuperTileSendReceive

Whether a tile in a supertile can use all the exchange bandwidth of the supertile to send or receive, when the other tile is idle or receiving the same data.

unsigned interleavedMemoryElementIndex

Index in the memoryElementOffsets table (returned by Target::getMemoryElementOffsets) which gives the start of the interleaved memory region.

Any value greater than or equal to size of the offsets table is interpreted as machine not having interleaved memory elements. Note that by definition, interleaved memory is always in the upper part of memory

enum poplar::IPUModel::RelativeSyncDelayType relativeSyncDelay
unsigned minIPUSyncDelay

The IPU sync delay for the tile that is closest to the sync controller.

unsigned globalSyncCycles

The number of clock cycles required to synchronize all IPUs.

std::vector<GlobalExchangeConstraint> globalExchangeConstraints

Set of constraints that provide a lower bound on the time it takes to send data between IPUs.

unsigned globalExchangePacketBytes

Size of the packet used to transfer data between tiles in bytes.

unsigned tileLocalSyncSyncDelay

Number of cycles from issuing a sync instruction to the earliest time that instructions can resume.

unsigned tileLocalSyncExitDelay

Number of cycles after a worker has issued its exit instruction that the supervisor can resume.

unsigned numStrideBits

Number of stride bits.

unsigned dataPathWidth

The width of the load/store data path within the tile.

unsigned fp16ConvUnitMaxPipelineDepth

The maximum pipeline depth of the convolution units within the tile for fp16.

unsigned fp32ConvUnitMaxPipelineDepth

The maximum pipeline depth of the convolution units within the tile for fp32.

Only allow a maximum of 4 cycle AMP loop.

unsigned fp16ConvUnitInputLoadElemsPerCycle

The input elements loaded per cycle for f16 conv.

unsigned fp32ConvUnitInputLoadElemsPerCycle

The input elements loaded per cycle for f32 conv.

unsigned fp16InFp16OutConvUnitsPerTile

The number of convolution units in the tile that can be used when partial results are outputs as 16-bits and inputs are 16 bits.

unsigned fp16InFp32OutConvUnitsPerTile

The number of convolution units in the tile that can be used when partial results are outputs as 32-bits and inputs are 16 bits.

unsigned fp32InFp32OutConvUnitsPerTile

The number of convolution units in the tile that can be used when accumulating to 32 bit values.

unsigned convUnitCoeffLoadBytesPerCycle

The number of convolutional weights that can be loaded in a cycle.

unsigned supervisorInstrFetchDelay

Number of bytes supervisor contexts may be loading instructions from memory ahead of current PC.

unsigned workerInstrFetchDelay

Number of bytes worker context may be loading instructions from memory ahead of current PC.

unsigned maxImmediateOffsetInRunInstr

max range of immediate operand in run instruction zimm16 operand multiplied implicitly by 4 when added to register operand

unsigned rptCountMax
unsigned atomicStoreGranularity

The atomic store granularity.

bool compileIPUCode

Whether or not to actually compile real IPU code for modelling.

3.8.4. poplar/GlobalExchangeConstraints.hpp

namespace poplar

Poplar classes and functions.

struct GlobalExchangeConstraint

Public Functions

GlobalExchangeConstraint(double bandwidth, ArrayRef<GlobalExchangeFlow> flows)
bool operator==(const GlobalExchangeConstraint &other) const
bool operator<(const GlobalExchangeConstraint &other) const

Public Members

double bandwidth

Bandwidth in bits per second.

std::vector<GlobalExchangeFlow> flows

The flows that the constraint applies to.

struct GlobalExchangeFlow

Public Functions

GlobalExchangeFlow(unsigned src, unsigned dst)
bool operator==(const GlobalExchangeFlow &other) const
bool operator<(const GlobalExchangeFlow &other) const

Public Members

unsigned src
unsigned dst

3.8.5. poplar/CycleCount.hpp

namespace poplar

Poplar classes and functions.

Functions

poplar::Tensor cycleCount(poplar::Graph &graph, poplar::program::Sequence &prog, unsigned tile, const DebugContext &debugContext = {})

Given a sequence program type, times the program and returns the 64 bit value in a tensor of two unsigned integers.

The first element of the tensor is the lower 32-bits and the second the upper 32-bits. Sequence is timed by adding sync and timing programs around the original sequence. You must also specify the tile on which the program is timed.

Deprecated:

Use cycleCount with the syncType arg instead

Return

A unsigned integer tensor of length 2

Parameters
  • graph: The Poplar graph

  • prog: The program sequence to time

  • tile: The tile on which the program is timed

  • debugContext: Optional debug context

poplar::Tensor cycleCount(poplar::Graph &graph, poplar::program::Sequence &prog, unsigned tile, SyncType syncType, const DebugContext &debugContext = {})

Given a sequence program type, times the program and returns the 64 bit value in a tensor of two unsigned integers.

The first element of the tensor is the lower 32-bits and the second the upper 32-bits. Sequence is timed by adding sync and timing programs around the original sequence. You must also specify the tile on which the program is timed.

Return

A unsigned integer tensor of length 2

Parameters
  • graph: The Poplar graph

  • prog: The program sequence to time

  • tile: The tile on which the program is timed

  • syncType: Type of sync to wrap the original sequence in

  • debugContext: Optional debug context

poplar::Tensor cycleStamp(poplar::Graph &graph, poplar::program::Sequence &prog, unsigned tile, const DebugContext &debugContext = {})

Add a sequence program to record an absolute hardware cycle stamp on a given tile.

The stamp is a snapshot of a continuously running hardware counter on a tile and to have consistent results, measurements must be done on the same tile.

The result is a tensor containing two 32-bit elements of a 64-bit snapshot of the hardware counter. The first element of the tensor is the lower 32-bits and the second the upper 32-bits.

The timestamp is added after an internal sync is executed.

Deprecated:

Use cycleStamp with the syncType arg instead

Return

An unsigned integer tensor of length 2

Parameters
  • graph: The Poplar graph

  • prog: The program sequence to which the time stamp is added

  • tile: The tile on which the time stamp is added

  • debugContext: Optional debug context

poplar::Tensor cycleStamp(poplar::Graph &graph, poplar::program::Sequence &prog, unsigned tile, SyncType syncType, const DebugContext &debugContext = {})

Add a sequence program to record an absolute hardware cycle stamp on a given tile.

The stamp is a snapshot of a continuously running hardware counter on a tile and to have consistent results, measurements must be done on the same tile.

The result is a tensor containing two 32-bit elements of a 64-bit snapshot of the hardware counter. The first element of the tensor is the lower 32-bits and the second the upper 32-bits.

The timestamp is added after a sync is executed.

Return

An unsigned integer tensor of length 2

Parameters
  • graph: The Poplar graph

  • prog: The program sequence to which the time stamp is added

  • tile: The tile on which the time stamp is added

  • syncType: Type of sync to perform before stamping

  • debugContext: Optional debug context

std::vector<poplar::Tensor> cycleStamp(poplar::Graph &graph, poplar::program::Sequence &prog, const std::vector<unsigned> &tiles, const DebugContext &debugContext = {})

Add a compute set to record an absolute hardware cycle stamp on the specified tiles.

Deprecated:

Use cycleStamp with the syncType arg instead

Return

A vector of tensors of 2 integers

Parameters
  • graph: The Poplar graph

  • prog: The program sequence to which the time stamp is added

  • tiles: The tiles on which the time stamp is added

  • debugContext: Optional debug context

std::vector<poplar::Tensor> cycleStamp(poplar::Graph &graph, poplar::program::Sequence &prog, const std::vector<unsigned> &tiles, SyncType syncType, const DebugContext &debugContext = {})

Add a compute set to record an absolute hardware cycle stamp on the specified tiles.

Return

A vector of tensors of 2 integers

Parameters
  • graph: The Poplar graph

  • prog: The program sequence to which the time stamp is added

  • tiles: The tiles on which the time stamp is added

  • syncType: Type of sync to perform before stamping

  • debugContext: Optional debug context