Skip to content

Failure Handling and Retries #244

@DerDennisOP

Description

@DerDennisOP

Build Failures should also be transient or permanent or timeout:
Transient: OOM, disk full, substitution network timeout, builder
crash. Build is maked as FailedTransient and a same new job is created to queued.
Permanent: when build exits with non-zero status (build error). or FailedTransient jobs exceeded 3 - 1
The job moves to directly to FailedPermanent.
Timeout: when meta.timeout or meta.maxSilent is exceeded it's moved to FailedTimeout

Frontend maps both FailedPermanent and FailedTransient and FailedTimeout state to Failed text. API queries to Entry Points only ignore builds that are FailedTransient

Auto-retry should be capped at 3 attempts. Unsure how to detect a OOM currently (maybe exit code), but we already can detect some issues like nar push errors.

Metadata

Metadata

Assignees

No one assigned

    Labels

    code-qualityCode quality / DRY / structureenhancementNew feature or requestmediumMedium severity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions