-
Notifications
You must be signed in to change notification settings - Fork 2
Retry on daemon failure #11
Copy link
Copy link
Open
Labels
storyNew feature or enhancementNew feature or enhancement
Description
User Story
As a developer, I would like to retry a different daemon if one fails so that the job can proceed uninterrupted.
Detailed Description
From the client's perspective, failures can be separated into the following categories:
- "Successfully delivered" failure: the remote task completed, but returned a non-zero exit code. This should not result in a retry, as the entire graph has transitively failed.
- Failure to execute the task: the daemon reports to the client that the job could not be ran, for example because the Docker container could not be started. The client should retry with a different daemon.
- Timeout: this should be interpreted as network failure, and the client should retry with a different daemon, but not be surprised if a result does come in later from the daemon it gave up on.
Tasks
- Detect the first failure above.
- Detect the second failure above.
- Detect the third failure above, keeping track of the number of retries.
- Write unit tests.
Acceptance Criteria
- Given the first failure happens, then the client emit an error, which is propagated to the user.
- Given the second failure happens, then the user sees only a diagnostic message and the client tries the next daemon.
- Given the third failure happens, then the user sees a diagnostic message and the client tries the next daemon.
Estimated points
3
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
storyNew feature or enhancementNew feature or enhancement