7. Debugging your model

7.1. Inspecting tensors

PopTorch allows you to inspect arbitrary tensors in both inference and training models. This is very useful for debugging conditions such as overflows, underflows or vanishing gradients.

Numerous tensors are generated during model compilation. In order to have their values inspected, you first have to figure out their names. You can retrieve the complete list of tensor names in your model by calling getTensorNames(). Note that the model must first be compiled.

Listing 7.1 Retrieving the list of tensor names
input = torch.rand(10, 10)
label = torch.rand(10, 10)

model = Model()
poptorch_model = poptorch.trainingModel(model)
poptorch_model(input, label)

tensor_names = poptorch_model.getTensorNames()

7.2. Anchoring tensors

Once you have chosen a few tensors of interest, the next step is to create anchors. Anchoring enables a tensor to be observed by the application without it having to be a model output.

You can create an anchor by calling anchorTensor(). It takes two mandatory string parameters: a convenient user-defined name for the anchor and the name of the chosen tensor. Optionally, you may specify the output mode as well as the output return period. In order for these options setting to take effect, they must be set before model compilation.

In the example below, two anchors are created: one for a bias gradient tensor and one for the updated weights of a linear layer.

Listing 7.2 Anchoring tensors
opts = poptorch.Options()
opts.anchorTensor('grad_bias', 'Gradient___fc2.bias')
opts.anchorTensor('update_weight', 'UpdatedVar___fc2.weight')

7.3. Retrieving tensors

The anchored tensors will be updated after every model invocation. You can retrive their values using getAnchoredTensor(). The function takes a single parameter - the user-defined anchor name.

In the example below, we execute one training run and retrieve the values of the two tensors we have anchored previously.

Listing 7.3 Anchoring tensors
poptorch_model = poptorch.trainingModel(model, opts)
poptorch_model(input, label)

grad = poptorch_model.getAnchoredTensor('grad_bias')
update = poptorch_model.getAnchoredTensor('update_weight')

For a more practical understanding around observing tensors, Graphcore’s examples repository contains a tutorial you can follow about observing tensors, using anchoring and generating a gradient histogram: PopTorch tutorial: Observing tensors.

7.4. Inspecting optimiser state

You can inspect optimiser state without using anchoring. After you instantiate a trainingModel(), the optimiser’s state_dict() function will return the internal optimiser’s state. This state dictionary will be populated when the training model is compiled, and is updated after each training step.

Listing 7.4 Inspecting optimiser state
optim = poptorch.optim.SGD(model.parameters(), lr=0.01)
poptorch_model = poptorch.trainingModel(model, opts, optim)
poptorch_model(input, label)

state = optim.state_dict()

Note

The entries in PopTorch’s optimiser state_dict() may differ from those in PyTorch in both name and structure.