Skip to content

Implement Gang-scheduling #393

@auhlig

Description

@auhlig

Description

Cortex should support gang-/co-scheduling for Kubernetes workloads.
A job starts only when all pods in the gang and their required resources are available at the same time.

Objectives

  • Implement a controller that groups pods into a gang. Consider prior-art on how to define these.
  • Track gang readiness and resource feasibility
  • Submit a joint scheduling request to Cortex. Consider a single request or using Cortex' reservation feature to allocate resources.
  • Bind all pods only after Cortex confirms full gang placement
  • Provide minimal metrics/logs
  • Documentation

Acceptance Criteria

  • Gang-scheduling implemented
  • Pods of a gang schedule only when the full gang can be placed
  • Basic e2e tests

Dependencies

N/A

Additional Notes

N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions