7. Debugging your model
7.1. Inspecting tensors
PopTorch allows you to inspect arbitrary tensors in both inference and training models. This is very useful for debugging conditions such as overflows, underflows or vanishing gradients.
Numerous tensors are generated during model compilation. In order to inspect their values, you first have to find their names. You can
retrieve the complete list of tensor names in your model by calling
getTensorNames()
. Note that the model
must first be compiled.
input = torch.rand(10, 10)
label = torch.rand(10, 10)
model = Model()
poptorch_model = poptorch.trainingModel(model)
poptorch_model(input, label)
tensor_names = poptorch_model.getTensorNames()
7.2. Anchoring tensors
Once you have chosen a few tensors of interest, the next step is to create anchors. Anchoring enables a tensor to be observed by the application without it having to be a model output.
You can create an anchor by calling anchorTensor()
.
It takes two mandatory string parameters: a convenient user-defined name for
the anchor and the name of the chosen tensor. Optionally, you may specify the
output mode as well as the output return period. In order for these option settings
to take effect, they must be set before model compilation.
In the example below, two anchors are created: one for a bias gradient tensor and one for the updated weights of a linear layer.
opts = poptorch.Options()
opts.anchorTensor('grad_bias', 'Gradient___fc2.bias')
opts.anchorTensor('update_weight', 'UpdatedVar___fc2.weight')
7.3. Retrieving tensors
The anchored tensors will be updated after every model invocation. You can
retrieve their values using getAnchoredTensor()
.
The function takes a single parameter - the user-defined anchor name.
In the example below, we execute one training run and retrieve the values of the two tensors we anchored previously.
poptorch_model = poptorch.trainingModel(model, opts)
poptorch_model(input, label)
grad = poptorch_model.getAnchoredTensor('grad_bias')
update = poptorch_model.getAnchoredTensor('update_weight')
For a more practical understanding around observing tensors, the Graphcore GitHub examples repository contains a tutorial you can follow about observing tensors, using anchoring and generating a gradient histogram: PopTorch tutorial: Observing tensors.
7.4. Inspecting optimiser state
You can inspect the optimiser state without using anchoring. After you instantiate a
trainingModel()
, the optimiser’s state_dict()
function will
return the internal optimiser’s state. This state dictionary will be populated
when the training model is compiled, and is updated after each training step.
optim = poptorch.optim.SGD(model.parameters(), lr=0.01)
poptorch_model = poptorch.trainingModel(model, opts, optim)
poptorch_model(input, label)
state = optim.state_dict()
Note
The entries in PopTorch’s optimiser state_dict()
may differ from those in PyTorch in both name and structure.