Skip to content

No retry/reconnect on controller connection failure #52

@Bellk17

Description

@Bellk17

Noticed a race condition if operator starts first. Need retry logic for eventual consistency:

The node watcher and job controller in main.rs are spawned as tasks that exit permanently on the first transport error. If spurctld isn't ready when the operator starts, those tasks die and never come back. Requires wrapping them in retry loops.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions