7. Debugging your model
7.1. Inspecting tensors
PopTorch allows you to inspect arbitrary tensors in both inference and training models. This is very useful for debugging conditions such as overflows, underflows or vanishing gradients.
Numerous tensors are generated during model compilation. In order to have
their values inspected, you first have to figure out their names. You can
retrieve the complete list of tensor names in your model by calling
getTensorNames()
. Note that the model
must first be compiled.
input = torch.rand(10, 10)
label = torch.rand(10, 10)
model = Model()
poptorch_model = poptorch.trainingModel(model)
poptorch_model(input, label)
tensor_names = poptorch_model.getTensorNames()
7.2. Anchoring tensors
Once you have chosen a few tensors of interest, the next step is to create anchors. Anchoring enables a tensor to be observed by the application without it having to be a model output.
You can create an anchor by calling anchorTensor()
.
It takes two mandatory string parameters: a convenient user-defined name for
the anchor and the name of the chosen tensor. Optionally, you may specify the
output mode as well as the output return period. In order for these options setting
to take effect, they must be set before model compilation.
In the example below, two anchors are created: one for a bias gradient tensor and one for the updated weights of a linear layer.
opts = poptorch.Options()
opts.anchorTensor('grad_bias', 'Gradient___fc2.bias')
opts.anchorTensor('update_weight', 'UpdatedVar___fc2.weight')
7.3. Retrieving tensors
The anchored tensors will be updated after every model invocation. You can
retrive their values using getAnchoredTensor()
.
The function takes a single parameter - the user-defined anchor name.
In the example below, we execute one training run and retrieve the values of the two tensors we have anchored previously.
poptorch_model = poptorch.trainingModel(model, opts)
poptorch_model(input, label)
grad = poptorch_model.getAnchoredTensor('grad_bias')
update = poptorch_model.getAnchoredTensor('update_weight')
For a more practical understanding around observing tensors, Graphcore’s examples repository contains a tutorial you can follow about observing tensors, using anchoring and generating a gradient histogram: PopTorch tutorial: Observing tensors.
7.4. Inspecting optimiser state
You can inspect optimiser state without using anchoring. After you instantiate a
trainingModel()
, the optimiser’s state_dict()
function will
return the internal optimiser’s state. This state dictionary will be populated
when the training model is compiled, and is updated after each training step.
optim = poptorch.optim.SGD(model.parameters(), lr=0.01)
poptorch_model = poptorch.trainingModel(model, opts, optim)
poptorch_model(input, label)
state = optim.state_dict()
Note
The entries in PopTorch’s optimiser state_dict()
may differ from those in PyTorch in both name and structure.