3. From PyTorch to PopTorch

This page will introduce the key features that enable training on the IPU, and how they differ from native PyTorch.

Note

PopTorch compiles a torch.nn.Model model for the IPU when it is wrapped in either a trainingModel() or inferenceModel(), as appropriate. This provides similar functionality to torch.compile but with more flexibility to generate optimal code for the IPU. Also, torch.compile does not pass options to a custom compiler backend.

For these reasons, we do not currently support torch.compile. Any calls to torch.compile should be replaced by wrapping the model with either trainingModel() or inferenceModel(). These functions perform static compilation of the whole graph to produce optimized code to run on the IPU. The compilation of multiple partial graphs is not supported.

3.1. Preparing your data

Data loading in PyTorch is typically handled using torch.utils.data.DataLoader.

PopTorch extends PyTorch’s DataLoader with a DataLoader to enable efficient data batching with respect to PopTorch’s underlying machine learning framework, PopART. Instantiation is almost identical to PyTorch, but you must remember to pass an instance of Options.

PyTorch

training_data = torch.utils.data.DataLoader(ExampleDataset(shape=[1],
                                                           length=20000),
                                            batch_size=10,
                                            shuffle=True,
                                            drop_last=True)

PopTorch

# Set up the PyTorch DataLoader to load that much data at each iteration
opts = poptorch.Options()
opts.deviceIterations(10)
training_data = poptorch.DataLoader(options=opts,
                                    dataset=ExampleDataset(shape=[1],
                                                           length=20000),
                                    batch_size=10,
                                    shuffle=True,
                                    drop_last=True)

For more information about how to set Options, see Section 5, Efficient data batching.

3.2. Creating your model

3.2.1. Training

If you want to create a model for training on the IPU, you first need to wrap your model in a PyTorch model that returns a tuple containing two elements: the outputs of the model and the loss.

PopTorch

class ExampleModelWithLoss(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.model = ExampleModel()

    def forward(self, input, target):
        out = self.model(input)

        return (torch.nn.functional.softmax(out),
                torch.nn.CrossEntropyLoss(reduction="mean")(out, target))


Then all you need to do is instantiate a trainingModel(), by passing your new PyTorch model, Options, and optimizer.

PyTorch

model = ExampleModelWithLoss()
model.train()

optimizer = torch.optim.AdamW(model.parameters(), lr=0.001)

PopTorch

model = ExampleModelWithLoss()
model.train()

optimizer = torch.optim.AdamW(model.parameters(), lr=0.001)

# Wrap the model in a PopTorch training wrapper
poptorch_model = poptorch.trainingModel(model,
                                        options=opts,
                                        optimizer=optimizer)

3.2.2. Inference

For inference, it’s even easier. Just instantiate an inferenceModel() by passing your PyTorch model.

poptorch_model = poptorch.inferenceModel(model)

3.3. The training loop

A simple training loop in PyTorch will typically consist of:

  • Setting gradients to zero

  • Performing a forwards pass with the model (and obtaining the loss)

  • Performing the backwards pass with respect to the loss, and updating weights

  • Updating the optimizer

In PopTorch, these steps are combined into a single step

PyTorch

for batch, target in training_data:
    # Zero gradients
    optimizer.zero_grad()

    # Run model.
    _, loss = model(batch, target)

    # Back propagate the gradients.
    loss.backward()

    # Update the weights.
    optimizer.step()

PopTorch

for batch, target in training_data:
    # Performs forward pass, loss function evaluation,
    # backward pass and weight update in one go on the device.
    _, loss = poptorch_model(batch, target)

3.4. Multiple/custom losses

If using multiple losses, or when creating a custom loss, the final loss must be marked explicitly using identity_loss().

PyTorch

def custom_loss(output, target)
    loss1 = torch.nn.functional.nll_loss(x, target)
    loss2 = torch.nn.functional.nll_loss(x, target) * 5.0
    return loss1 + loss2

PopTorch

def custom_loss(output, target)
    loss1 = torch.nn.functional.nll_loss(x, target)
    loss2 = torch.nn.functional.nll_loss(x, target) * 5.0
    return poptorch.identity_loss(loss1 + loss2, reduction='none')

3.5. Optimizers

One important thing to note about using optimizers in PopTorch is that the optimizer state is encapsulated within the PopTorch model. As such, any change made to the optimizer outside of the model must be followed by a call to poptorch_model.setOptimizer, passing in the updated optimizer.

Warning

PopTorch does not directly use the Python implementation of the optimizers. Built-in implementations are used in their place. This means that you cannot currently use custom optimizers. Subclassing a built-in optimizer will generate a warning. Any custom behaviour in a custom optimizer is unlikely to take effect, other than simply setting the existing attributes.

PyTorch

    if momentum_loss is None:
        momentum_loss = loss
    else:
        momentum_loss = momentum_loss * 0.95 + loss * 0.05

    if momentum_loss < 0.1:
        optimizer = torch.optim.AdamW(model.parameters(), lr=0.0001)

PopTorch

    if momentum_loss is None:
        momentum_loss = loss
    else:
        momentum_loss = momentum_loss * 0.95 + loss * 0.05

    # Optimizer can be updated via setOptimizer.
    if momentum_loss < 0.1:
        poptorch_model.setOptimizer(
            torch.optim.AdamW(model.parameters(), lr=0.0001))

Note

PopTorch also provides its own set of optimizers that can be accessed via poptorch.optim. These are wrapper classes which have several advantages over the native PyTorch optimizers. They embed constant attributes for performance/memory savings and allow you to specify additional parameters such as loss scaling and velocity scaling. See Section 11.6, Optimizers for more information.

3.6. Going further

For a more detailed example of getting started with PopTorch, see the Pytorch basics tutorial which walks through training an MNIST model on the IPU.