7. Experimental features
7.1. Distributed execution
PopTorch supports distributed execution on IPU-POD using the IPU over Fabric(IPUoF). Please refer to the popdist documentation for examples.
If you run without using poprun
, the only change to your code needed is to set the id of the current process and
the total number of processes the execution is distributed across using
configureProcessId()
1 def process(process_id=0, num_processes=1):
2 # Create a poptorch.Options instance to override default options
3 opts = poptorch.Options()
4
5 # Run a 100 iteration loop on the IPU, fetching a new batch each time
6 opts.deviceIterations(400)
7
8 # Replicate the graph across 2 IPUs in each process.
9 opts.replicationFactor(2)
10
11 # Set the id of the current process and the total number of processes.
12 opts.Distributed.configureProcessId(process_id, num_processes)
13
14 # Accumulate the gradient 8 times before applying it.
15 opts.Training.gradientAccumulation(8)
16
17 # Optional: All the processes must use the same seed if shuffle=True is used for the DataLoader.
18 opts.randomSeed(42)
19
20 training_data = poptorch.DataLoader(opts,
21 dataset=ExampleDataset(
22 shape=[3, 2], length=100000),
23 batch_size=model_batch_size,
24 shuffle=True,
25 drop_last=True)
26
27 # Wrap the model in a PopTorch training wrapper
28 poptorch_model = poptorch.trainingModel(model, options=opts)
29
30 # Run over the training data with "batch_size" 200 essentially.
31 for batch_number, (data, labels) in enumerate(training_data):
32 # Execute the device with a 100 iteration loop of batchsize 2 across
33 # 4 IPUs. "output" and "loss" will be the respective output and loss of the
34 # final batch of each replica (the default AnchorMode).
35 output, loss = poptorch_model(data, labels)
36 print(f"{batch_number} {labels[-1]}, {output}, {loss}")
37
Note
The DataLoader
will automatically select a different subset of the
dataset based on the process id.
Warning
All the processes must use the same seed if shuffle=True
is used
for the DataLoader
.