3. From PyTorch to PopTorch
This page will introduce the key features that enable training on the IPU, and how they differ from native PyTorch.
3.1. Preparing your data
Data loading in PyTorch is typically handled using torch.utils.data.DataLoader.
PopTorch extends PyTorch’s DataLoader with a poptorch.DataLoader
to enable efficient data batching with respect to PopTorch’s underlying machine learning framework, PopART.
Instantiation is almost identical to PyTorch, but you must remember to pass an instance of poptorch.Options
.
PyTorch |
training_data = torch.utils.data.DataLoader(ExampleDataset(shape=[1],
length=20000),
batch_size=10,
shuffle=True,
drop_last=True)
|
PopTorch |
# Set up the PyTorch DataLoader to load that much data at each iteration
opts = poptorch.Options()
opts.deviceIterations(10)
training_data = poptorch.DataLoader(options=opts,
dataset=ExampleDataset(shape=[1],
length=20000),
batch_size=10,
shuffle=True,
drop_last=True)
|
For more information about how to set poptorch.Options
, see Efficient data batching.
3.2. Creating your model
3.2.1. Training
If you want to create a model for training on the IPU, all you need to do is instantiate a trainingModel()
,
by passing your PyTorch model, Options
, and optimizer.
PyTorch |
model = ExampleModelWithLoss()
model.train()
optimizer = torch.optim.AdamW(model.parameters(), lr=0.001)
|
PopTorch |
model = ExampleModelWithLoss()
model.train()
optimizer = torch.optim.AdamW(model.parameters(), lr=0.001)
# Wrap the model in a PopTorch training wrapper
poptorch_model = poptorch.trainingModel(model,
options=opts,
optimizer=optimizer)
|
3.2.2. Inference
For inference, it’s even easier. Just instantiate an inferenceModel()
by passing your PyTorch model.
poptorch_model = poptorch.inferenceModel(model)
3.3. The training loop
A simple training loop in PyTorch will typically consist of:
Setting gradients to zero
Performing a forwards pass with the model (and obtaining the loss)
Performing the backwards pass with respect to the loss, and updating weights
Updating the optimizer
In PopTorch, these steps are combined into a single step
PyTorch |
for batch, target in training_data:
# Zero gradients
optimizer.zero_grad()
# Run model.
_, loss = model(batch, target)
# Back propagate the gradients.
loss.backward()
# Update the weights.
optimizer.step()
|
PopTorch |
for batch, target in training_data:
# Performs forward pass, loss function evaluation,
# backward pass and weight update in one go on the device.
_, loss = poptorch_model(batch, target)
|
3.4. Multiple/custom losses
If using multiple losses, or when creating a custom loss, the final loss must be marked explicitly using identity_loss()
.
PyTorch |
def custom_loss(output, target)
loss1 = torch.nn.functional.nll_loss(x, target)
loss2 = torch.nn.functional.nll_loss(x, target) * 5.0
return loss1 + loss2
|
PopTorch |
def custom_loss(output, target)
loss1 = torch.nn.functional.nll_loss(x, target)
loss2 = torch.nn.functional.nll_loss(x, target) * 5.0
return poptorch.identity_loss(loss1 + loss2, reduction='none')
|
3.5. Optimizers
One important thing to note about using optimizers in PopTorch is that the optimizer state is encapsulated within the PopTorch model.
As such, any change made to the optimizer outside of the model must be followed by a call to poptorch_model.setOptimizer
,
passing in the updated optimizer.
PyTorch |
if momentum_loss is None:
momentum_loss = loss
else:
momentum_loss = momentum_loss * 0.95 + loss * 0.05
if momentum_loss < 0.1:
optimizer = torch.optim.AdamW(model.parameters(), lr=0.0001)
|
PopTorch |
if momentum_loss is None:
momentum_loss = loss
else:
momentum_loss = momentum_loss * 0.95 + loss * 0.05
# Optimizer can be updated via setOptimizer.
if momentum_loss < 0.1:
poptorch_model.setOptimizer(
torch.optim.AdamW(model.parameters(), lr=0.0001))
|
3.6. Going further
For a more detailed tutorial on getting started with PopTorch, you can find the following tutorial which walks through training an MNIST model on the IPU: https://github.com/graphcore/examples/tree/master/tutorials/pytorch/tut1_basics