7. Experimental features

7.1. Distributed execution

PopTorch supports distributed execution on IPU-POD using the IPU over Fabric(IPUoF). Please refer to the popdist documentation for examples.

If you run without using poprun, the only change to your code needed is to set the id of the current process and the total number of processes the execution is distributed across using configureProcessId()

Listing 7.1 Changes required for distributed execution
 1    def process(process_id=0, num_processes=1):
 2        # Create a poptorch.Options instance to override default options
 3        opts = poptorch.Options()
 5        # Run a 100 iteration loop on the IPU, fetching a new batch each time
 6        opts.deviceIterations(400)
 8        # Replicate the graph across 2 IPUs in each process.
 9        opts.replicationFactor(2)
11        # Set the id of the current process and the total number of processes.
12        opts.Distributed.configureProcessId(process_id, num_processes)
14        # Accumulate the gradient 8 times before applying it.
15        opts.Training.gradientAccumulation(8)
17        # Optional: All the processes must use the same seed if shuffle=True is used for the DataLoader.
18        opts.randomSeed(42)
20        training_data = poptorch.DataLoader(opts,
21                                            dataset=ExampleDataset(
22                                                shape=[3, 2], length=100000),
23                                            batch_size=model_batch_size,
24                                            shuffle=True,
25                                            drop_last=True)
27        # Wrap the model in a PopTorch training wrapper
28        poptorch_model = poptorch.trainingModel(model, options=opts)
30        # Run over the training data with "batch_size" 200 essentially.
31        for batch_number, (data, labels) in enumerate(training_data):
32            # Execute the device with a 100 iteration loop of batchsize 2 across
33            # 4 IPUs. "output" and "loss" will be the respective output and loss of the
34            # final batch of each replica (the default AnchorMode).
35            output, loss = poptorch_model(data, labels)
36            print(f"{batch_number} {labels[-1]}, {output}, {loss}")


The DataLoader will automatically select a different subset of the dataset based on the process id.


All the processes must use the same seed if shuffle=True is used for the DataLoader.