Poplar and PopLibs
|
A model of an IPU to create an IPUModel Device The IPU Model will simulate the behaviour of the IPU hardware. More...
#include <IPUModel.hpp>
Public Types | |
enum class | RelativeSyncDelayType |
A function that returns the number of cycles before the specificed tile is released from sync relative to the first tile that is release from sync. | |
Public Member Functions | |
Device | createDevice (OptionFlags opts={}, bool accurateHalf=false, unsigned deviceManagerId=std::numeric_limits< unsigned >::max()) |
Create a device that runs code on the CPU and models the performance that would be achieved on an IPU. | |
Public Attributes | |
std::string | IPUVersion |
Valid values for IPUVersion are "ipu1" and "ipu2" (for Mk1 and Mk2 IPU architectures respectively) | |
unsigned | numIPUs |
The number of IPUs. | |
unsigned | tilesPerSuperTile |
The number of tiles per supertile. | |
unsigned | tilesPerIPU |
The number of tiles per IPU. | |
unsigned | numWorkerContexts |
The number of worker contexts per tile. | |
unsigned | memoryBytesPerTile |
Memory bytes per tile. | |
double | tileClockFrequency |
Clock frequency in Hz. | |
unsigned | exchangeBytesPerCycle |
The bandwidth of internal IPU exchange in bytes per cycle. | |
unsigned | memcpyBytesPerCycle |
The number of bytes per cycle that can be copied from one location to another using a memcpy. | |
unsigned | instructionBytes |
The size of an instruction in bytes. | |
bool | supportsSuperTileSendReceive |
Whether a tile in a supertile can use all the exchange bandwidth of the supertile to send or receive, when the other tile is idle or receiving the same data. | |
unsigned | interleavedMemoryElementIndex |
Index in the memoryElementOffsets table (returned by Target::getMemoryElementOffsets) which gives the start of the interleaved memory region. More... | |
unsigned | minIPUSyncDelay |
The IPU sync delay for the tile that is closest to the sync controller. | |
unsigned | globalSyncCycles |
The number of clock cycles required to synchronize all IPUs. | |
std::vector< GlobalExchangeConstraint > | globalExchangeConstraints |
Set of constraints that provide a lower bound on the time it takes to send data between IPUs. | |
unsigned | globalExchangePacketBytes |
Size of the packet used to transfer data between tiles in bytes. | |
unsigned | tileLocalSyncSyncDelay |
Number of cycles from issuing a sync instruction to the earliest time that instructions can resume. | |
unsigned | tileLocalSyncExitDelay |
Number of cycles after a worker has issued its exit instruction that the supervisor can resume. | |
unsigned | numStrideBits |
Number of stride bits. | |
unsigned | dataPathWidth |
The width of the load/store data path within the tile. | |
unsigned | fp8ConvUnitMaxPipelineDepth |
The maximum pipeline depth of the convolution units within the tile for fp8. | |
unsigned | fp16ConvUnitMaxPipelineDepth |
The maximum pipeline depth of the convolution units within the tile for fp16. | |
unsigned | fp32ConvUnitMaxPipelineDepth |
The maximum pipeline depth of the convolution units within the tile for fp32. More... | |
unsigned | fp8ConvUnitInputLoadElemsPerCycle |
The input elements loaded per cycle for f8 conv. | |
unsigned | fp16ConvUnitInputLoadElemsPerCycle |
The input elements loaded per cycle for f16 conv. | |
unsigned | fp32ConvUnitInputLoadElemsPerCycle |
The input elements loaded per cycle for f32 conv. | |
unsigned | fp16InFp16OutConvUnitsPerTile |
The number of convolution units in the tile that can be used when partial results are outputs as 16-bits and inputs are 16 bits. | |
unsigned | fp16InFp32OutConvUnitsPerTile |
The number of convolution units in the tile that can be used when partial results are outputs as 32-bits and inputs are 16 bits. | |
unsigned | fp32InFp32OutConvUnitsPerTile |
The number of convolution units in the tile that can be used when accumulating to 32 bit values. | |
unsigned | fp8InFp16OutConvUnitsPerTile |
The number of convolution units in the tile that can be used when partial results are outputs as 16-bits and inputs are 8 bits. | |
unsigned | convUnitCoeffLoadBytesPerCycle |
The number of convolutional weights that can be loaded in a cycle. | |
unsigned | supervisorInstrFetchDelay |
Number of bytes supervisor contexts may be loading instructions from memory ahead of current PC. | |
unsigned | workerInstrFetchDelay |
Number of bytes worker context may be loading instructions from memory ahead of current PC. | |
unsigned | maxImmediateOffsetInRunInstr |
max range of immediate operand in run instruction zimm16 operand multiplied implicitly by 4 when added to register operand | |
unsigned | atomicStoreGranularity |
The atomic store granularity. | |
bool | compileIPUCode |
Whether or not to actually compile real IPU code for modelling. | |
A model of an IPU to create an IPUModel Device The IPU Model will simulate the behaviour of the IPU hardware.
It will not completely implement every aspect of a real IPU.
unsigned poplar::IPUModel::fp32ConvUnitMaxPipelineDepth |
The maximum pipeline depth of the convolution units within the tile for fp32.
Only allow a maximum of 4 cycle AMP loop.
unsigned poplar::IPUModel::interleavedMemoryElementIndex |
Index in the memoryElementOffsets table (returned by Target::getMemoryElementOffsets) which gives the start of the interleaved memory region.
Any value greater than or equal to size of the offsets table is interpreted as machine not having interleaved memory elements. Note that by definition, interleaved memory is always in the upper part of memory