7. Error handling
This section describes how Model Runtime handles Poplar recoverable errors which are raised during the execution of a model. A recoverable error is raised when a running program fails due to a system error that is likely to be transient.
A full description of all Poplar errors can be found in the Exceptions section of the Poplar and PopLibs API Reference.
Model Runtime handles errors as follows:
application_runtime_errorIf
auto_resetis true, then the IPU is automatically reset before the next inference.An IPU reset will be performed before the next execution.
Any new requests will be processed after the IPU reset is complete.
If
auto_resetis false, then an exception is raised.The error message contains the reason why the error occurred.
All requests which have already been enqueued before the exception occurred will raise the same error.
recoverable_runtime_errorIf
poplar::RecoveryActionisIPU_RESETand ifauto_resetis true, then the IPU is automatically reset before the next inference.An IPU reset will be performed before the next execution.
Any new requests will be processed after the IPU reset is complete.
If
poplar::RecoveryActionis notIPU_RESETor ifauto_resetis false, then an exception is raised.The error message contains the reason why the error occurred.
All requests which have already been enqueued before the exception occurred will raise the same error.
Unknown runtime errors
An exception is raised.
The error message might contain the reason why the error occurred.
When these errors occur manual intervention is required before the system is operational again.
The IPU will not be reset and all requests will raise the same error.
All other runtime errors
An exception is raised.
The error message might contain the reason why the error occurred.
When these errors occur manual intervention might be required before the system is operational again.
The error message might contain a required recovery action.
The IPU will not be reset and all requests will raise the same error.