7. Error handling
This section describes how Model Runtime handles Poplar recoverable errors which are raised during the execution of a model. A recoverable error is raised when a running program fails due to a system error that is likely to be transient.
A full description of all Poplar errors can be found in the Exceptions section of the Poplar and PopLibs API Reference.
Model Runtime handles errors as follows:
application_runtime_error
If
auto_reset
is true, then the IPU is automatically reset before the next inference.An IPU reset will be performed before the next execution.
Any new requests will be processed after the IPU reset is complete.
If
auto_reset
is false, then an exception is raised.The error message contains the reason why the error occurred.
All requests which have already been enqueued before the exception occurred will raise the same error.
recoverable_runtime_error
If
poplar::RecoveryAction
isIPU_RESET
and ifauto_reset
is true, then the IPU is automatically reset before the next inference.An IPU reset will be performed before the next execution.
Any new requests will be processed after the IPU reset is complete.
If
poplar::RecoveryAction
is notIPU_RESET
or ifauto_reset
is false, then an exception is raised.The error message contains the reason why the error occurred.
All requests which have already been enqueued before the exception occurred will raise the same error.
Unknown runtime errors
An exception is raised.
The error message might contain the reason why the error occurred.
When these errors occur manual intervention is required before the system is operational again.
The IPU will not be reset and all requests will raise the same error.
All other runtime errors
An exception is raised.
The error message might contain the reason why the error occurred.
When these errors occur manual intervention might be required before the system is operational again.
The error message might contain a required recovery action.
The IPU will not be reset and all requests will raise the same error.