5. PopDist C++ API reference

#include <popdist/backend.hpp>
namespace popdist

Functions

void registerDefaultBackend()

Automatically registers the default backend for PopDist.

PopDist will try to automatically locate the default backend and register it. An error will be thrown if it could not be found. The default backend for PopDist is based on OpenMPI.

void registerBackend(const std::string &file_path_backend)

Registers the provided shared object as the backend for PopDist.

PopDist backends allow for system specific features, such as shared buffers between instances/replicas.

Parameters

file_path_backend – The filepath pointing to the .so file containing the backend.

void initializeBackend()

Initialize PopDist backend.

Calls the initialize() function defined by the PopDist backend. Depending on the implementation of the backend, this can trigger erroneous behavior when called multiple times (or simultaneously). We recommended using the Python API (popdist.initializeBackend()) whenever possible.

bool isBackendInitialized()

Checks whether the backend has been initialized.

Returns

true if the backend is initialized and false otherwise.

void finalizeBackend()

Finalize PopDist backend.

Calls the finalize() function defined by the PopDist backend. Depending on the implementation of the backend, this can trigger erroneous behavior when called multiple times (or simultaneously). We recommended using the Python API (popdist.finalizeBackend()) whenever possible.

constexpr const char *defaultCommunicatorId()

A default communicator ID that can be passed to functions accepting a communicator_id argument.

void registerCommunicator(const std::set<uint32_t> &participants, const std::string &communicator_id = defaultCommunicatorId())

Register a new communicator that will be used for collectives/synchronization.

This is a collective function used for registration of new communicators, hence all instances need to call it with the same parameters. A subset of instances present in the participants argument can then call collectives/synchronize using the same communicator_id and participants arguments. Non-present instances must not participate in that collective/synchronization call. This function only needs to be called when registering communicators for subsets of instances. Collective calls that only need communicators with custom communicator_id can call collective functions with a new ID directly, without having to register it first.

Parameters
  • participants – A subset of instances which will use the newly registered communicator. Default value indicates that all instances will use the communicator.

  • communicator_id – An ID associated with the new communicator.

void synchronize(const std::string &communicator_id = defaultCommunicatorId(), const std::set<uint32_t> &participants = {})

Synchronizes code execution over instances.

Creates a barrier crossing selected instances, halting them until each of them has crossed the barrier. This function is not thread-safe and must be called from one thread per instance only.

Parameters
  • communicator_id – An ID of the communicator used for this call.

  • participants – A subset of instances which participate in the barrier. Default value indicates that all instances participate.

void initializeSharedBuffers(size_t buffer_size)

Initializes a shared buffer for a specific instance/replica combination.

This function initializes a raw buffer for each local replica of the instance it is called from. The size of the buffer is provided by the caller. This function is not thread-safe and must be called from one thread per instance only.

Parameters

buffer_size – Size of the shared buffer.

void writeToSharedBuffer(uint8_t instance_id, uint8_t local_replica_id, const void *data, size_t data_size)

Writes a provided buffer to the shared buffer.

Parameters
  • instance_id – The instance id of the shared buffer to write to.

  • local_replica_id – The local replica id of the shared buffer to write to.

  • data – The raw buffer.

  • data_size – The size of the shared buffer.

void deallocateSharedBuffers()

Deallocates all shared buffer for the instance it was called from.

void *readSharedBuffer(uint8_t instance_id, uint8_t local_replica_id, size_t *buffer_size)

Reads the shared buffer for a specific instance/replica combination.

This function reads the shared buffer, copies it to locally accessible memory and returns the memory address to the caller. The size of the buffer is written to the buffer_size parameter.

Parameters
  • instance_id – The identifier of the instance.

  • local_replica_id – The identifier of the local replica.

  • buffer_size – The size of the buffer in bytes.

Returns

The memory address of the first byte of the shared buffer.

void run(poplar::Engine &engine, uint32_t program_id = 0, const std::string &debug_name = "", const std::string &communicator_id = defaultCommunicatorId(), const std::set<uint32_t> &participants = {})

Calls engine.run() in a synchronized setting.

This function synchronizes selected instances before calling engine.run(). If two instances call popdist::run() with different program identifiers, this will throw an exception. These checks will be skipped when the program is not launched through PopRun. Additionally, this function is thread-safe and can be called from multiple threads within a single instance. Each thread that participates in the same reduction needs to use the same communicator_id but all threads within a single instance need to use a unique communicator_id.

Parameters
  • engine – The Poplar engine to run.

  • program_id – The Poplar program id.

  • debug_name – The Poplar debug name.

  • communicator_id – An ID of the communicator used for this call. Must be unique across threads within a single instance

  • participants – A subset of instances which participate in the synchronization. Default value indicates that all instances participate.

#include <popdist/context.hpp>
namespace popdist

Functions

unsigned getNumTotalReplicas()

Get number of total replicas.

Will try and infer the total number of replicas from environment variables. Will default to 1 if no environment is set.

unsigned getNumIpusPerReplica()

Get number of ipus per replica.

Will try and infer the ipus per replica from environment variables. Will default to 1 if no environment is set.

bool checkNumIpusPerReplica(unsigned expected)

Check if ipus per replica in context matches expected number.

Will return false if environment variables are set and do not match the given value.

bool isUniformReplicasPerInstance()

Gets whether the number of replicas per instance is uniform.

Will try and infer from environment variables. Will default to false if no environment is set.

unsigned getNumLocalReplicas()

Get number of local replicas.

Will try and infer the number of local replicas from environment variables. Will default to 1 if no environment is set.

unsigned getReplicaIndexOffset()

Get replica index offset.

Will try and infer the replica index offset from environment variables. Will default to 0 if no environment is set.

unsigned getLocalInstanceIndex()

Get local instance index.

The relative index of a instance within a host.

Will try and infer the local instance index from environment variables. Will default to 0 if no environment is set.

unsigned getInstanceIndex()

Gets the index of the current instance.

Can only be used with a uniform number of replicas per instance.

Returns

The index of the current instance.

unsigned getNumInstances()

Gets the total number of instances.

Can only be used with a uniform number of replicas per instance.

Returns

The total number of instances.

#include <popdist/collectives.hpp>
namespace popdist
namespace collectives

Functions

void allGather(const void *data, void *destination, size_t num_elements, const poplar::Type &type, const std::string &communicator_id = defaultCommunicatorId(), bool inplace = false, const std::set<uint32_t> &participants = {})

Allgather collective operation.

This function gathers data from selected instances and distributes the result back to the selected instances. Additionally, this operation is thread-safe and can be called from multiple threads within a single instance. Each thread that participates in the same reduction needs to use the same communicator_id but all threads within a single instance need to use a unique communicator_id.

Parameters
  • data – Pointer to the first element of the data being gathered.

  • destination – Pointer to the buffer where the result of the operation is written to.

  • num_elements – Number of elements in data and destination.

  • type – Type of the elements in data and destination.

  • communicator_id – An ID of the communicator used for this call. Must be unique across threads within a single instance. May be reused once the collective operation with the re-used tag is completed.

  • inplace – Perform the allGather inplace if true and not otherwise. Defaults to false. If true, you should pass nullptr to the data parameter, and the input data of each process must be located inside the destination buffer where that process’s data will be written.

  • participants – A subset of instances which participate in the collective call. Default value indicates that all instances participate.

void allReduceSum(void *data, size_t num_elements, const poplar::Type &type, const std::string &communicator_id = defaultCommunicatorId(), const std::set<uint32_t> &participants = {})

Allreduce collective operation.

This function sums values from selected instances and distributes the result back to the selected instances.

Parameters
  • data – Pointer to the first element of the data being reduced.

  • num_elements – Number of elements in data.

  • type – Type of the elements in data.

  • communicator_id – An ID of the communicator used for this call. Must be unique across threads within a single instance. May be reused once the collective operation with the re-used tag is completed.

  • participants – A subset of instances which participate in the collective call. Default value indicates that all instances participate.

void broadcast(void *data, size_t num_elements, const poplar::Type &type, const uint32_t root = 0, const std::string &communicator_id = defaultCommunicatorId(), const std::set<uint32_t> &participants = {})

Broadcast collective operation.

This function broadcasts data from the root instance to all other selected instances.

Parameters
  • data – Pointer to the first element of the data being broadcasted.

  • num_elements – Number of elements in data.

  • type – Type of the elements in data.

  • root – Rank of the instance broadcasting data.

  • communicator_id – An ID of the communicator used for this call. Must be unique across threads within a single instance. May be reused once the collective operation with the re-used tag is completed.

  • participants – A subset of instances which participate in the collective call. Default value indicates that all instances participate.

#include <popdist/popdist_poplar.hpp>
namespace popdist

Functions

poplar::Graph createGraph(poplar::TargetType targetType, unsigned ipusPerReplica = 0)

Create a Poplar graph that works in the PopDist context.

The created graph will have the appropriate replication factor set. If no context is present (no environment variables are set), then the graph will have no replication factor.

Parameters
  • targetType – The required targetType. Unless the context is a simple 1 replica system, this type must be TargetType::IPU.

  • ipusPerReplica – The expected ipusPerReplica that the graph should be created over. If this does not match the PopDist context then an exception will be thrown. If zero, the value is obtained from the PopDist context.

void setEngineOptions(poplar::OptionFlags &opt)

Set the Poplar engine options to match the PopDist context.

If no context is present (no environment variables are set) then the option flags are not changed.

Parameters

opt – The option flags to be updated.

poplar::Device getDevice(poplar::TargetType targetType, unsigned ipusPerReplica, const poplar::OptionFlags &opt = {})

Get a device that works in the PopDist context.

If no context is present (no environment variables are set), then a suitable available device is still returned.

Parameters
  • targetType – The required targetType. Unless the context is a simple 1 replica system, this type must by TargetType::IPU.

  • ipusPerReplica – The expected ipusPerReplica that the device should respect. If this does not match the PopDist context then an exception is thrown.

  • opt – Option flags for the target creation.

void prepareParentDevice()

Prepare the current parent device for PopDist execution.

This needs to be called before every engine load in order to reset the IPU state for loading a new executable. PopRun does the initial preparation, so this only needs to be called by the application when loading its second engine onwards.

Only the first instance does the actual preparation (the others do nothing), so it is safe to call this function from all the instances.

Note that all the instances must detach from their respective child devices before calling this function. This can be achieved by detaching and then performing a global synchronization barrier before the call to this function. A barrier may also be needed after the call to ensure that the preparation is complete before attempting to re-attach to the child devices.

All the necessary information is read from popdist environment variables.

void prepareDevice(poplar::Device &device, unsigned ipusPerReplica, unsigned numReplicas)

Prepare the given device for PopDist execution.

All the necessary information must be passed in.

unsigned getDeviceId(unsigned ipusPerReplica = 0)

Returns the Poplar ID of the parent device.

Parameters

ipusPerReplica – The number of IPUs per replica. If 0 is provided, the value will be read from the environment variable set by PopRun.

Returns

unsigned