3. From PyTorch to PopTorch
This page will introduce the key features that enable training on the IPU, and how they differ from native PyTorch.
3.1. Preparing your data
Data loading in PyTorch is typically handled using torch.utils.data.DataLoader.
PopTorch extends PyTorch’s DataLoader with a DataLoader
to enable efficient data batching with respect to PopTorch’s underlying machine learning framework, PopART.
Instantiation is almost identical to PyTorch, but you must remember to pass an instance of Options
.
PyTorch |
training_data = torch.utils.data.DataLoader(ExampleDataset(shape=[1],
length=20000),
batch_size=10,
shuffle=True,
drop_last=True)
|
PopTorch |
# Set up the PyTorch DataLoader to load that much data at each iteration
opts = poptorch.Options()
opts.deviceIterations(10)
training_data = poptorch.DataLoader(options=opts,
dataset=ExampleDataset(shape=[1],
length=20000),
batch_size=10,
shuffle=True,
drop_last=True)
|
For more information about how to set Options
, see Section 5, Efficient data batching.
3.2. Creating your model
3.2.1. Training
If you want to create a model for training on the IPU, you first need to wrap your model in a PyTorch model that returns a tuple containing two elements: the outputs of the model and the loss.
PopTorch |
class ExampleModelWithLoss(torch.nn.Module):
def __init__(self):
super().__init__()
self.model = ExampleModel()
def forward(self, input, target):
out = self.model(input)
return (torch.nn.functional.softmax(out),
torch.nn.CrossEntropyLoss(reduction="mean")(out, target))
|
Then all you need to do is instantiate a trainingModel()
,
by passing your new PyTorch model, Options
, and optimizer.
PyTorch |
model = ExampleModelWithLoss()
model.train()
optimizer = torch.optim.AdamW(model.parameters(), lr=0.001)
|
PopTorch |
model = ExampleModelWithLoss()
model.train()
optimizer = torch.optim.AdamW(model.parameters(), lr=0.001)
# Wrap the model in a PopTorch training wrapper
poptorch_model = poptorch.trainingModel(model,
options=opts,
optimizer=optimizer)
|
3.2.2. Inference
For inference, it’s even easier. Just instantiate an inferenceModel()
by passing your PyTorch model.
poptorch_model = poptorch.inferenceModel(model)
3.3. The training loop
A simple training loop in PyTorch will typically consist of:
Setting gradients to zero
Performing a forwards pass with the model (and obtaining the loss)
Performing the backwards pass with respect to the loss, and updating weights
Updating the optimizer
In PopTorch, these steps are combined into a single step
PyTorch |
for batch, target in training_data:
# Zero gradients
optimizer.zero_grad()
# Run model.
_, loss = model(batch, target)
# Back propagate the gradients.
loss.backward()
# Update the weights.
optimizer.step()
|
PopTorch |
for batch, target in training_data:
# Performs forward pass, loss function evaluation,
# backward pass and weight update in one go on the device.
_, loss = poptorch_model(batch, target)
|
3.4. Multiple/custom losses
If using multiple losses, or when creating a custom loss, the final loss must be marked explicitly using identity_loss()
.
PyTorch |
def custom_loss(output, target)
loss1 = torch.nn.functional.nll_loss(x, target)
loss2 = torch.nn.functional.nll_loss(x, target) * 5.0
return loss1 + loss2
|
PopTorch |
def custom_loss(output, target)
loss1 = torch.nn.functional.nll_loss(x, target)
loss2 = torch.nn.functional.nll_loss(x, target) * 5.0
return poptorch.identity_loss(loss1 + loss2, reduction='none')
|
3.5. Optimizers
One important thing to note about using optimizers in PopTorch is that the optimizer state is encapsulated within the PopTorch model.
As such, any change made to the optimizer outside of the model must be followed by a call to poptorch_model.setOptimizer
,
passing in the updated optimizer.
Warning
PopTorch does not directly use the Python implementation of the optimizers. Built-in implementations are used in their place. This means that you cannot currently use custom optimizers. Subclassing a built-in optimizer will generate a warning. Any custom behaviour in a custom optimizer is unlikely to take effect, other than simply setting the existing attributes.
PyTorch |
if momentum_loss is None:
momentum_loss = loss
else:
momentum_loss = momentum_loss * 0.95 + loss * 0.05
if momentum_loss < 0.1:
optimizer = torch.optim.AdamW(model.parameters(), lr=0.0001)
|
PopTorch |
if momentum_loss is None:
momentum_loss = loss
else:
momentum_loss = momentum_loss * 0.95 + loss * 0.05
# Optimizer can be updated via setOptimizer.
if momentum_loss < 0.1:
poptorch_model.setOptimizer(
torch.optim.AdamW(model.parameters(), lr=0.0001))
|
Note
PopTorch also provides its own set of optimizers that can be accessed via poptorch.optim
.
These are wrapper classes which have several advantages over the native PyTorch optimizers. They embed constant attributes
for performance/memory savings and allow you to specify additional parameters such as loss scaling and velocity scaling.
See Section 11.6, Optimizers for more information.
3.6. Going further
For a more detailed example of getting started with PopTorch, see the Pytorch basics tutorial which walks through training an MNIST model on the IPU.