Skip to content

Scheduling Queue and Event-driven Scheduling Attempts #421

@Varsius

Description

@Varsius

Description

Introduce a scheduling queue in Cortex that keeps track of unschedulable workloads. These workloads should be re-evaluated when the cluster state changes (e.g. nodes added, workloads complete), rather than relying only on exponential backoff.

Objectives

  • Implement a pending queue for unschedulable pods with ordering by some priority (e.g. submission time)
  • Register and handle cluster events to trigger scheduling attempts for pending pods
  • Integrate queueing hints to selectively re-try only affected workloads
  • Documentation

Acceptance Criteria

  • Unschedulable workloads are queued and re-tried when relevant cluster events occur
  • Basic e2e tests

Dependencies

N/A

Additional Notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions