Details
-
Bug
-
Status: Resolved
-
Not specified
-
Resolution: Fixed
-
None
-
None
Description
The communication of the status and exceptions from a Thor instance to it's agent is different in the cloud.
In bare metal there's a persisted socket connection that relays the result and/or exception, in cloud, the result is communicated via the workunit.
This works fine if the job winds down gracefully, but if the job tries to communicate an exception early, before/as it's aborting (in case the job doesn't abort cleanly), the exception/status never reaches the agent.
The Thor instance eventually exits, and the agent only sees this as the k8s instance having died without reason.