9. Experimental features
9.1. Distributed execution without PopRun
PopTorch supports distributed execution on IPU-POD using the IPU over Fabric (IPUoF).
If you run using your own distributed processing tool instead of PopRun, the only change to your code needed is to set the id of the current process and
the total number of processes the execution is distributed across using
configureProcessId()
.
Please also be aware that replicationFactor()
should
be used to set the number of local replicas (per host) not the total (global)
number of replicas.
1def process(process_id=0, num_processes=1):
2 # Create a poptorch.Options instance to override default options
3 opts = poptorch.Options()
4
5 # Run a 100 iteration loop on the IPU, fetching a new batch each time
6 opts.deviceIterations(400)
7
8 # Replicate the graph across 2 IPUs in each process.
9 opts.replicationFactor(2)
10
11 # Set the id of the current process and the total number of processes.
12 opts.Distributed.configureProcessId(process_id, num_processes)
13
14 # Accumulate the gradient 8 times before applying it.
15 opts.Training.gradientAccumulation(8)
16
17 # Optional: All the processes must use the same seed if shuffle=True is used for the DataLoader.
18 opts.randomSeed(42)
19
20 training_data = poptorch.DataLoader(opts,
21 dataset=ExampleDataset(shape=[3, 2],
22 length=100000),
23 batch_size=model_batch_size,
24 shuffle=True,
25 drop_last=True)
26
27 # Wrap the model in a PopTorch training wrapper
28 poptorch_model = poptorch.trainingModel(model, options=opts)
29
30 # Run over the training data with "batch_size" 200 essentially.
31 for batch_number, (data, labels) in enumerate(training_data):
32 # Execute the device with a 100 iteration loop of batchsize 8 across
33 # 4 IPUs (batch-size 2 per replica). "output" and "loss" will be the
34 # respective output and loss of the final batch of each replica
35 # (the default AnchorMode).
36 output, loss = poptorch_model(data, labels)
37 print(f"{batch_number} {labels[-1]}, {output}, {loss}")
38
Note
The DataLoader
will automatically select a different subset of the
dataset based on the process id.
Warning
All the processes must use the same seed if shuffle=True
is used
for the DataLoader
.
9.2. torch.nn.CTCLoss
Support was added for the CTCLoss operator with a number of limitations:
#. zero_infinity
parameter must be set False
#. reduction
parameter must be set to either sum
or mean
#. targets
tensor must be 2D, corresponding to stacked, padded layout