IPUModel

#include <poplar/IPUModel.hpp>
namespace poplar

Poplar classes and functions.

A VectorList is a list of vectors with a specific layout and with the usage semantics of a 2D vector.

A 1D vector must be laid out in a contiguous memory region. A 2D vector is a vector of 1D vectors. Each of these 1D vectors that make up a 2D vector is called a “sub-vector” for the remainder of this document. The elements of a 2D vector can be accessed by indexing the 2D vector along the outer and inner dimensions as A[outer][inner].

The following two categories of layouts are supported:

  1. VectorListLayout::DELTANELEMENTS is a memory efficient 2D vector layout. For legacy systems VectorListLayout::DELTAN served a similar purpose. Each sub-vector must be laid out as a contiguous memory region but the sub-vectors may may not be laid out contiguous with respect to each other in memory. Each sub-vector may have a different length.

  2. VectorListLayout::ONE_PTR and other VectorListLayout layouts that are prefixed by SCALED_PTR are for Poplar runtime use only.

struct IPUModel
#include <IPUModel.hpp>

A model of an IPU to create an IPUModel Device The IPU Model will simulate the behaviour of the IPU hardware.

It will not completely implement every aspect of a real IPU.

Public Types

enum class RelativeSyncDelayType

A function that returns the number of cycles before the specificed tile is released from sync relative to the first tile that is release from sync.

Values:

enumerator AUTO
enumerator NO_DELAY

Public Functions

explicit IPUModel(char const *IPUVersion = "ipu2")
bool operator==(const IPUModel&) const
bool operator!=(const IPUModel&) const
Device createDevice(OptionFlags opts = {}, bool accurateHalf = false, unsigned deviceManagerId = std::numeric_limits<unsigned>::max())

Create a device that runs code on the CPU and models the performance that would be achieved on an IPU.

Public Members

std::string IPUVersion

Valid values for IPUVersion are “ipu1” and “ipu2” (for Mk1 and Mk2 IPU architectures respectively)

unsigned numIPUs

The number of IPUs.

unsigned tilesPerSuperTile

The number of tiles per supertile.

unsigned tilesPerIPU

The number of tiles per IPU.

unsigned numWorkerContexts

The number of worker contexts per tile.

unsigned memoryBytesPerTile

Memory bytes per tile.

double tileClockFrequency

Clock frequency in Hz.

unsigned exchangeBytesPerCycle

The bandwidth of internal IPU exchange in bytes per cycle.

unsigned memcpyBytesPerCycle

The number of bytes per cycle that can be copied from one location to another using a memcpy.

unsigned instructionBytes

The size of an instruction in bytes.

bool supportsSuperTileSendReceive

Whether a tile in a supertile can use all the exchange bandwidth of the supertile to send or receive, when the other tile is idle or receiving the same data.

unsigned interleavedMemoryElementIndex

Index in the memoryElementOffsets table (returned by Target::getMemoryElementOffsets) which gives the start of the interleaved memory region.

Any value greater than or equal to size of the offsets table is interpreted as machine not having interleaved memory elements. Note that by definition, interleaved memory is always in the upper part of memory

enum poplar::IPUModel::RelativeSyncDelayType relativeSyncDelay
unsigned minIPUSyncDelay

The IPU sync delay for the tile that is closest to the sync controller.

unsigned globalSyncCycles

The number of clock cycles required to synchronize all IPUs.

std::vector<GlobalExchangeConstraint> globalExchangeConstraints

Set of constraints that provide a lower bound on the time it takes to send data between IPUs.

unsigned globalExchangePacketBytes

Size of the packet used to transfer data between tiles in bytes.

unsigned tileLocalSyncSyncDelay

Number of cycles from issuing a sync instruction to the earliest time that instructions can resume.

unsigned tileLocalSyncExitDelay

Number of cycles after a worker has issued its exit instruction that the supervisor can resume.

unsigned numStrideBits

Number of stride bits.

unsigned dataPathWidth

The width of the load/store data path within the tile.

unsigned fp8ConvUnitMaxPipelineDepth

The maximum pipeline depth of the convolution units within the tile for fp8.

unsigned fp16ConvUnitMaxPipelineDepth

The maximum pipeline depth of the convolution units within the tile for fp16.

unsigned fp32ConvUnitMaxPipelineDepth

The maximum pipeline depth of the convolution units within the tile for fp32.

Only allow a maximum of 4 cycle AMP loop.

unsigned fp8ConvUnitInputLoadElemsPerCycle

The input elements loaded per cycle for f8 conv.

unsigned fp16ConvUnitInputLoadElemsPerCycle

The input elements loaded per cycle for f16 conv.

unsigned fp32ConvUnitInputLoadElemsPerCycle

The input elements loaded per cycle for f32 conv.

unsigned fp16InFp16OutConvUnitsPerTile

The number of convolution units in the tile that can be used when partial results are outputs as 16-bits and inputs are 16 bits.

unsigned fp16InFp32OutConvUnitsPerTile

The number of convolution units in the tile that can be used when partial results are outputs as 32-bits and inputs are 16 bits.

unsigned fp32InFp32OutConvUnitsPerTile

The number of convolution units in the tile that can be used when accumulating to 32 bit values.

unsigned fp8InFp16OutConvUnitsPerTile

The number of convolution units in the tile that can be used when partial results are outputs as 16-bits and inputs are 8 bits.

unsigned convUnitCoeffLoadBytesPerCycle

The number of convolutional weights that can be loaded in a cycle.

unsigned supervisorInstrFetchDelay

Number of bytes supervisor contexts may be loading instructions from memory ahead of current PC.

unsigned workerInstrFetchDelay

Number of bytes worker context may be loading instructions from memory ahead of current PC.

unsigned maxImmediateOffsetInRunInstr

max range of immediate operand in run instruction zimm16 operand multiplied implicitly by 4 when added to register operand

unsigned rptCountMax
unsigned atomicStoreGranularity

The atomic store granularity.

bool compileIPUCode

Whether or not to actually compile real IPU code for modelling.