Target

#include <poplar/Target.hpp>
template<>
struct hash<poplar::Target>

Public Functions

inline size_t operator()(const poplar::Target &t) const
namespace poplar

Poplar classes and functions.

Functions

template<>
std::uint64_t getTypeLimitsMaxAs<std::uint64_t>(const Type &t) const

Get the maximum representable finite value of a given type as a std::uint64_t.

Parameters

t – The type.

Throws

poplar_error – if the value is not exactly representable as a std::uint64_t.

template<>
std::int64_t getTypeLimitsMaxAs<std::int64_t>(const Type &t) const

Get the maximum representable finite value of a given type as a std::int64_t.

Parameters

t – The type.

Throws

poplar_error – if the value is not exactly representable as a std::int64_t.

template<>
std::uint64_t getTypeLimitsLowestAs<std::uint64_t>(const Type &t) const

Get the lowest representable finite value of a given type as a std::uint64_t.

Parameters

t – The type.

Throws

poplar_error – if the value is not exactly representable as a std::uint64_t.

template<>
std::int64_t getTypeLimitsLowestAs<std::int64_t>(const Type &t) const

Get the lowest representable finite value of a given type as a int64_t.

Parameters

t – The type.

Throws

poplar_error – if the value is not exactly representable as a int64_t.

void copyDeviceHalfToFloat(const Target &target, const void *src, float *dst, std::size_t numElements)

Convert device half-precision values to floats.

Deprecated:

Use the poplar::convertFromDeviceType method appropriate for the data type.

Parameters
  • target – The target that the half-precision data is to be copied from.

  • src – The pointer to the start of the half-precision data.

  • dst – The pointer to the float data to write.

  • numElements – The number of items to convert.

void copyFloatToDeviceHalf(const Target &target, const float *src, void *dst, std::size_t numElements)

Convert float values to device half-precision values.

Deprecated:

Use the poplar::convertToDeviceType method appropriate for the data type.

Parameters
  • target – The target that the half-precision data is to be copied to.

  • src – The pointer to the float data to read.

  • dst – The pointer to the half-precision data to write.

  • numElements – The number of items to convert.

void copyDeviceHalfToDouble(const Target &target, const void *src, double *dst, std::size_t numElements)

Convert device half-precision values to doubles.

Deprecated:

Use the poplar::convertFromDeviceType method appropriate for the data type.

Parameters
  • target – The target that the half-precision data is to be copied from.

  • src – The pointer to the start of the half-precision data.

  • dst – The pointer to the double precision data to write.

  • numElements – The number of items to convert.

void copyDoubleToDeviceHalf(const Target &target, const double *src, void *dst, std::size_t numElements)

Convert double precision values to device half-precision values.

Deprecated:

Use the poplar::convertToDeviceType method appropriate for the data type.

Parameters
  • target – The target that the half-precision data is to be copied to.

  • src – The pointer to the double precision data to read.

  • dst – The pointer to the half-precision data to write.

  • numElements – The number of items to convert.

class Target
#include <Target.hpp>

A target representation.

The Target class holds characteristics of a compilation target and enables interaction with it.

Target creation options

  • ipuLinkConfiguration (fixedWindow, slidingWindow, barleyTwist) [=fixedWindow when <=16 IPUs, otherwise slidingWindow]

    The configuration used for the IPU-to-IPU connections. If it is not set, Poplar decides on a configuration based on the number of IPUs.

    • fixedWindow: Up to 16 IPUs are supported, where each IPU can communicate with every other IPU. This is an alias to the previously named default configuration.

    • slidingWindow: Similar to fixedWindow, but the window of IPUs which an IPU may communicate changes. For an even indexed IPU it may send to 7 IPUs up and to 8 IPUs down, whereas an odd indexed IPU may send to 6 IPUs up and to 9 IPUs down. It is simplest when considering an IPU in the ladder configuration (see ipuLinkTopology), that an IPU may send up to 3 rungs up and 4 down.

    • barleyTwist: Nearest neighbour communication only. An IPU may send to only 3 other IPUs, one rung up, one rung down, and to it’s neighbour on the same ladder rung (see ipuLinkTopology). This has double the bandwidth available compared to fixedWindow and slidingWindow.

  • syncConfiguration (intraReplicaAndAll, ipuAndAll) [=intraReplicaAndAll]

    The configuration of the hardware synchronisation groups. Note the target.syncReplicasIndependently engine option determines which of the synchronisation groups is used for host synchronisation.

    • intraReplicaAndAll: The first sync group is used to sync IPUs within a replica and the second sync group is used to sync all IPUs.

    • ipuAndAll: The first sync group is used to sync each IPU independently with the host (if the target.syncReplicasIndependently option is set) and the second sync group is used to sync all IPUs.

  • ipuLinkTopology (mesh, torus) [=mesh]

    The topology of the IPU-Links. It describes how the IPUs in the system are connected.

    • mesh: The IPUs are connected as a ladder.

    • torus: The IPUs are connected as a ladder, with the top and bottom of the ladder linked together.

  • IpuLinkDomainSize Integer [=64]

    The number of IPUs connected via IPU-Links. Two IPU-Link domains can be connected together via GW-Links.

Public Functions

Target()
~Target()
Target(const Target&)
Target(Target&&) noexcept
Target(std::istream &in)

Load from a serialised target.

Parameters

in – The stream to read from.

Target &operator=(const Target&)
Target &operator=(Target&&) noexcept
bool operator==(const Target&) const
bool operator!=(const Target&) const
bool operator<(const Target&) const
std::size_t hash() const

Hash of the target.

void serialize(std::ostream &out) const

Serialize a target to a stream.

Currently the format is opaque, and compatibility between different versions of Poplar is not guaranteed.

Parameters

out – The stream to write to.

TargetType getTargetType() const

The target type.

std::string getTargetSystemString() const

The target system.

StringRef getTargetArchString() const

The target architecture.

unsigned getNumIPUs() const

The number of IPUs.

unsigned getTilesPerIPU() const

The number of tiles per IPU.

unsigned getNumWorkerContexts() const

The number of worker contexts per tile.

unsigned getBytesPerTile() const

Bytes of memory per tile.

unsigned getExchangeBytesPerCycle() const

The bandwidth of internal IPU exchange in bytes per cycle.

unsigned getMemcpyBytesPerCycle() const

The maximum bandwidth for internal data copies on a tile.

unsigned getMinIPUSyncDelay() const

The IPU sync delay for the tile that is closest to the sync controller.

unsigned getGlobalSyncCycles() const

The number of clock cycles required to synchronize all IPUs.

const std::vector<unsigned> &getMemoryElementOffsets() const

Memory element offsets.

unsigned getInterleavedMemoryElementIndex() const

Memory element offset index for interleaved memory.

const std::vector<GlobalExchangeConstraint> &getGlobalExchangeConstraints() const

Set of constraints that provide a lower bound on the time it takes to send data between IPUs.

unsigned getNumStrideBits() const
unsigned getDataPathWidth() const

The width of the load/store data path within the tile.

unsigned getFp8ConvUnitMaxPipelineDepth() const

The maximum pipeline depth of the convolution units within the tile for fp8.

unsigned getFp16ConvUnitMaxPipelineDepth() const

The maximum pipeline depth of the convolution units within the tile for fp16.

unsigned getFp32ConvUnitMaxPipelineDepth() const

The maximum pipeline depth of the convolution units within the tile for fp32.

unsigned getFp8ConvUnitInputLoadElemsPerCycle() const

The number of input elements loaded per cycle in f8 convolution unit.

unsigned getFp16ConvUnitInputLoadElemsPerCycle() const

The number of input elements loaded per cycle in f16 convolution unit.

unsigned getFp32ConvUnitInputLoadElemsPerCycle() const

The number of input elements loaded per cycle in f32 convolution unit.

unsigned getFp16InFp16OutConvUnitsPerTile() const

The number of convolution units in the tile that can be used when partial results are outputs as 16-bits and inputs are 16 bits.

unsigned getFp16InFp32OutConvUnitsPerTile() const

The number of convolution units in the tile that can be used when partial results are outputs as 32-bits and inputs are 16 bits.

unsigned getFp32InFp32OutConvUnitsPerTile() const

The number of convolution units in the tile that can be used when accumulating to 32 bit values.

unsigned getFp8InFp16OutConvUnitsPerTile() const

The number of convolution units in the tile that can be used when partial results are 16-bits and inputs are 8-bits.

unsigned getConvUnitCoeffLoadBytesPerCycle() const

The number of convolutional weights that can be loaded in a cycle.

unsigned getRptCountMax() const
bool supportsExchangeBusSharing() const

Whether tiles can share the local exchange bus during exchange.

unsigned getTilesPerSharedExchangeBus() const

The number of consecutive tiles that can share the exchange bus.

unsigned getNumTiles() const

Get the total number of tiles for this target (tiles per IPU * number of IPUs).

std::uint64_t getMemoryBytes() const

Get the total amount of memory on this target, across all IPUs.

unsigned getFloatVectorWidth() const

How many floats can be processed in one vector operation.

Equivalent to getDataPathWidth() / 32.

unsigned getHalfVectorWidth() const

How many halves can be processed in one vector operation.

Equivalent to getDataPathWidth() / 16.

unsigned getQuarterVectorWidth() const
unsigned getQuarterMetadataVectorWidth() const
unsigned getVectorWidth(const poplar::Type &type) const

How many of the given type can be processed in one vector operation.

unsigned getWeightsPerConvUnit(const Type &type) const
unsigned getConvUnitInputLoadElemsPerCycle(const Type &type) const
unsigned getConvUnitMaxPipelineDepth(const Type &partialsType) const
unsigned getNumConvUnits(const Type &activationsType, const Type &partialsType) const
unsigned getMaxIPUSyncDelay() const

Get the maximum number of cycles required for an IPU sync in the best case scenario (all tiles are immediately ready).

double getTileClockFrequency() const

Get the tile clock frequency in Hertz.

unsigned getNumTilesPerXBContext() const

Get the number of tiles per exchange-block context (with repair).

unsigned getNumContextsPerXB() const

Get the number of contexts per exchange-block.

unsigned getTileHostExchangeXB(unsigned tile) const

Get the XB of a tile.

unsigned getTileHostExchangeContext(unsigned tile) const

Get the context of a tile within an XB.

unsigned getTileHostExchangeContextPosition(unsigned tile) const

Get the position of a tile within a context.

std::size_t getTypeSize(const Type&) const

Get the size of a given type in bytes.

template<typename T>
T getTypeLimitsMaxAs(const Type&) const = delete

Get the maximum representable finite value of a given type.

Template parameter specifies return type.

Base templated method is deleted, see declared specialisations of this method for valid return types.

Parameters

t – The type.

template<typename T>
T getTypeLimitsLowestAs(const Type&) const = delete

Get the lowest representable finite value of a given type.

Template parameter specifies return type.

Base templated method is deleted, see declared specialisations of this method for valid return types.

Parameters

t – The type.

std::size_t getAtomicStoreGranularity() const

Get the granularity of atomic stores that can be made by independent parallel worker threads.

Returns

The granularity in bytes.

uint32_t makeFpIctlValue(bool inv, bool div0, bool oflo, bool esr, bool nanoo) const

Generate a value that could be written to Floating Point Initial Control Value register CSR_S.FP_ICTL in order to configure it with the specified options.

Parameters
  • inv

    If true, a floating-point invalid operation (defined by IEEE 754) will cause an exception.

    The invalid operations are:

    • Addition or subtraction where the operands are + or - infinity (inf) and the operation results in the subtraction of two infs; for example: (-inf)+(+inf) or (+inf)-(+inf).

    • Divisions: (+/-0)/(+/-0) and (+/-inf)/(+/-inf).

    • Multiplications: (+/-0)*(+/-inf) and (+/-inf)*(+/-0).

    • Remainder: x REM y where y=0 or x=(+/-inf)

    • Real operations with complex results such as the square root or logarithm of a negative number.

    • Operations with Not-a-Number as at least one operand.

    • Comparisons where one of the operands is Not-a-Number.

      See also nanoo below.

  • div – If true a floating point divide by zero operation will cause an exception

  • oflo – If true a floating point overflow will cause an exception

  • esr – Enable stochastic rounding

  • nanoo – Enable Not-a-Number on overflow mode. When enabled half precision calculations that have overflowed will produce a Not-a-Number result, rather than saturating to the half precision max/min value, and the invalid operation (inv) flag will be set

unsigned getFpIctlRegIndex() const

Return the register index of the Floating Point Initial Control Value register CSR_S.FP_ICTL.

unsigned getDbgDataRegIndex() const

Return the register index of CSR_C.DBG_DATA.

IpuLinkConfiguration getIpuLinkConfiguration() const

Return the IPU-Link configuration of this target.

IpuLinkTopology getIpuLinkTopology() const

Return the IPU-Link topology.

unsigned getIpuLinkDomainSize() const

Return the size of the IPU-Link domain.

That is, the number of IPUs that are connected via IPU-Links.

unsigned getInstanceSize() const
bool getGatewayMode() const
Target createVirtualTarget(unsigned numIPUs, unsigned tilesPerIPU) const

Create a “virtual” target consisting of a subset of the target’s tiles.

This method returns a target object that references the same state as this target but only uses a subset of the target’s tiles.

Parameters
  • numIPUs – The number of IPUs the target should be for.

  • tilesPerIPU – The number of tiles per IPU.

Returns

The virtual target object.

Target(std::unique_ptr<core::Target>&&) noexcept
inline core::Target &getImpl() const
const core::TargetOptions &getTargetOptions() const
template<>
double getTypeLimitsMaxAs(const Type &t) const

Get the maximum representable finite value of a given type as a double.

Parameters

t – The type.

Throws

poplar_error – if the value is not exactly representable as a double.

template<>
double getTypeLimitsLowestAs(const Type &t) const

Get the lowest representable finite value of a given type as a double.

Parameters

t – The type.

Throws

poplar_error – if the value is not exactly representable as a double.

Public Static Functions

static Target createCPUTarget(bool accurateHalf = false, unsigned numIPUs = 1)

Create a CPU target.

Create a target for executing a simple graph on the CPU.

This should only be used for simple functional testing.

Parameters
  • accurateHalf – By default, half is an alias for float, and sizeof(half) will be 4. If you set accurateHalf to true, half will be implemented in software as 16-bit IEEE floating point. This will be slower, but will produce the same results as the IPU.

  • numIPUs – The number of IPUs in the target. The IPUs will each have 1 tile with 1 worker thread.

Returns

A Target object that can be used to create a graph.

static Target createIPUTarget(StringRef system, const OptionFlags &opts = {})

Create an IPU target.

Create an IPU target with all IPUs for the system based on the given system type.

Valid system types are:

  • IPU-POD16

  • IPU-POD64

  • IPU-POD128

  • IPU-POD256

  • IPU-POD4-DA

  • IPU-POD16-DA

Parameters
  • system – The type of the IPU system.

  • opts – The options passed to the target.

Returns

A Target object that can be used to create a graph.

static Target createIPUTarget(unsigned numIPUs, StringRef system, const OptionFlags &opts = {})

Create an IPU target.

Create an IPU target with a specified number of IPUs based on the given system type.

Valid system types are:

  • IPU-POD16

  • IPU-POD64

  • IPU-POD128

  • IPU-POD256

  • IPU-POD4-DA

  • IPU-POD16-DA

Parameters
  • numIPUs – The number of IPUs the target should be for.

  • system – The type of the IPU system.

  • opts – The options passed to the target.

Returns

A Target object that can be used to create a graph.

static Target createIPUTarget(unsigned numIPUs, unsigned tilesPerIPU, StringRef system, const OptionFlags &opts = {})

Create an IPU target with a virtual number of tiles.

Create an IPU target with a specified number of IPUs based on the given system type. In addition, the number of tiles can be restricted to a smaller virtual number of observable tiles.

Valid system types are:

  • IPU-POD16

  • IPU-POD64

  • IPU-POD128

  • IPU-POD256

  • IPU-POD4-DA

  • IPU-POD16-DA

Parameters
  • numIPUs – The number of IPUs the target should be for.

  • tilesPerIPU – The number of tiles per IPU.

  • system – The type of the IPU system.

  • opts – The options passed to the target.

Returns

A Target object that can be used to create a graph.

static Target createIPUTarget(unsigned numIPUs, StringRef system, const core::TargetOptions &opts)

Create an IPU target.

Create an IPU target with a specified number of IPUs based on the given system type.

Deprecated:

Use createIPUTarget(unsigned numIPUs, StringRef system, const OptionFlags &opts) instead.

Parameters
  • numIPUs – The number of IPUs the target should be for.

  • system – The type of the IPU system.

  • opts – The options passed to the target.

Returns

A Target object that can be used to create a graph.

static Target createIPUTarget(unsigned numIPUs, unsigned tilesPerIPU, StringRef system, const core::TargetOptions &opts)

Create an IPU target with a virtual number of tiles, and target options.

Create an IPU target with a specified number of IPUs based on the given system type. In addition, the number of tiles can be restricted to a smaller virtual number of observable tiles. This overload also accepts target options that can be obtained from another target.

Deprecated:

Use createIPUTarget(unsigned numIPUs, unsigned tilesPerIPU, StringRef system, const OptionFlags &opts) instead.

Parameters
  • numIPUs – The number of IPUs the target should be for.

  • tilesPerIPU – The number of tiles per IPU.

  • system – The type of the IPU system.

  • opts – The options passed to the target.

Returns

A Target object that can be used to create a graph.

Private Functions

template<typename T>
T getTypeLimitsMaxAsImpl(const Type&) const
template<typename T>
T getTypeLimitsLowestAsImpl(const Type&) const

Private Members

std::unique_ptr<core::Target> impl
namespace core
namespace std
template<> Target >

Public Functions

inline size_t operator()(const poplar::Target &t) const