3. From PyTorch to PopTorch

This page will introduce the key features that enable training on the IPU, and how they differ from native PyTorch.

3.1. Preparing your data

Data loading in PyTorch is typically handled using torch.utils.data.DataLoader.

PopTorch extends PyTorch’s DataLoader with a poptorch.DataLoader to enable efficient data batching with respect to PopTorch’s underlying machine learning framework, PopART. Instantiation is almost identical to PyTorch, but you must remember to pass an instance of poptorch.Options.

PyTorch

training_data = torch.utils.data.DataLoader(ExampleDataset(shape=[1],
                                                           length=20000),
                                            batch_size=10,
                                            shuffle=True,
                                            drop_last=True)

PopTorch

# Set up the PyTorch DataLoader to load that much data at each iteration
opts = poptorch.Options()
opts.deviceIterations(10)
training_data = poptorch.DataLoader(options=opts,
                                    dataset=ExampleDataset(shape=[1],
                                                           length=20000),
                                    batch_size=10,
                                    shuffle=True,
                                    drop_last=True)

For more information about how to set poptorch.Options, see Efficient data batching.

3.2. Creating your model

3.2.1. Training

If you want to create a model for training on the IPU, all you need to do is instantiate a trainingModel(), by passing your PyTorch model, Options, and optimizer.

PyTorch

model = ExampleModelWithLoss()
model.train()

optimizer = torch.optim.AdamW(model.parameters(), lr=0.001)

PopTorch

model = ExampleModelWithLoss()
model.train()

optimizer = torch.optim.AdamW(model.parameters(), lr=0.001)

# Wrap the model in a PopTorch training wrapper
poptorch_model = poptorch.trainingModel(model,
                                        options=opts,
                                        optimizer=optimizer)

3.2.2. Inference

For inference, it’s even easier. Just instantiate an inferenceModel() by passing your PyTorch model.

poptorch_model = poptorch.inferenceModel(model)

3.3. The training loop

A simple training loop in PyTorch will typically consist of:

Setting gradients to zero
Performing a forwards pass with the model (and obtaining the loss)
Performing the backwards pass with respect to the loss, and updating weights
Updating the optimizer

In PopTorch, these steps are combined into a single step

PyTorch

for batch, target in training_data:
    # Zero gradients
    optimizer.zero_grad()

    # Run model.
    _, loss = model(batch, target)

    # Back propagate the gradients.
    loss.backward()

    # Update the weights.
    optimizer.step()

PopTorch

for batch, target in training_data:
    # Performs forward pass, loss function evaluation,
    # backward pass and weight update in one go on the device.
    _, loss = poptorch_model(batch, target)

3.4. Multiple/custom losses

If using multiple losses, or when creating a custom loss, the final loss must be marked explicitly using identity_loss().

PyTorch

def custom_loss(output, target)
    loss1 = torch.nn.functional.nll_loss(x, target)
    loss2 = torch.nn.functional.nll_loss(x, target) * 5.0
    return loss1 + loss2

PopTorch

def custom_loss(output, target)
    loss1 = torch.nn.functional.nll_loss(x, target)
    loss2 = torch.nn.functional.nll_loss(x, target) * 5.0
    return poptorch.identity_loss(loss1 + loss2, reduction='none')

3.5. Optimizers

One important thing to note about using optimizers in PopTorch is that the optimizer state is encapsulated within the PopTorch model. As such, any change made to the optimizer outside of the model must be followed by a call to poptorch_model.setOptimizer, passing in the updated optimizer.

PyTorch

    if momentum_loss is None:
        momentum_loss = loss
    else:
        momentum_loss = momentum_loss * 0.95 + loss * 0.05

    if momentum_loss < 0.1:
        optimizer = torch.optim.AdamW(model.parameters(), lr=0.0001)

PopTorch

    if momentum_loss is None:
        momentum_loss = loss
    else:
        momentum_loss = momentum_loss * 0.95 + loss * 0.05

    # Optimizer can be updated via setOptimizer.
    if momentum_loss < 0.1:
        poptorch_model.setOptimizer(
            torch.optim.AdamW(model.parameters(), lr=0.0001))

3.6. Going further

For a more detailed tutorial on getting started with PopTorch, you can find the following tutorial which walks through training an MNIST model on the IPU: https://github.com/graphcore/examples/tree/master/tutorials/pytorch/tut1_basics