feat: Expected Runtime Plugin for Soft Eviction via Requeue Action #941
feat: Expected Runtime Plugin for Soft Eviction via Requeue Action #941rich7420 wants to merge 6 commits intoNVIDIA:mainfrom
Conversation
|
cc @itsomri , @romanbaron |
docs/plugins/expectedruntime.md
Outdated
| - [Expected Runtime Plugin Design](../designs/expected-runtime-requeue/expected-runtime-plugin.md) | ||
| - [Requeue Flow Design](../designs/expected-runtime-requeue/expected-runtime-requeue-flow.md) |
There was a problem hiding this comment.
These don't exist yet.
I think we should first design the requeue action so that we will be able to design this plugin and only then implement it. Now it is a bit "in the air".
For example I am not sure we will need "requeue-delay" - it depends on the requeue action design, and I think should be introduced there. I am also not sure why we need both cooldown and expected runtime, are they not the same? And shouldn't we also look at the queue fair share to make sure we are not removing jobs that should keep running?
I think all those questions and more should be asked and discussed before we implement it.
There was a problem hiding this comment.
Agreed, those links pointed to design docs that don’t exist yet. I’ve removed them from “See Also” in this PR
I agree we should design the requeue action first (when it runs, try/commit/rollback, victim selection, how/whether to set requeue-not-before and cooldown), then align this plugin with that. This PR only adds the nomination API + this plugin; the Requeue action itself isn’t implemented here.
About cooldown and expected runtime, I think they’re different: expected runtime = “when does this job become eligible for eviction?” (time since start). Cooldown = “after we evicted it once, how long before we can nominate it again?” (stops thrashing).
There was a problem hiding this comment.
I think all those questions and more should be asked and discussed before we implement it.
You're right! thanks!
Description
Adds the Expected Runtime plugin: running jobs that exceed their configured expected runtime get nominated as requeue candidates. The plugin only does nomination; eviction is done by the Requeue action (elsewhere). Soft eviction: jobs become eligible when runtime ≥ expected, but are only evicted when a higher-priority workload needs the slot.
Why: Time-aware fairness (requeue only when there’s contention), opt-in via
kai.scheduler/expected-runtime, cooldown viakai.scheduler/requeue-not-beforeto avoid thrashing.What changed:
expectedruntime: registersRequeueCandidateNominationFn, nominates jobs that pass checks (running, preemptible, valid expected-runtime, runtime ≥ expected, cooldown expired).RequeueCandidateNominationFn,AddRequeueCandidateNominationFn,CollectRequeueCandidates()(dedup by PodGroup UID).kai.scheduler/expected-runtime,requeue-delay,requeue-not-before.kai_requeue_nominations_total,kai_requeue_nomination_skipped_total(prefix from--metrics-namespace, defaultkai).expectedruntimein default plugin list; docs indocs/plugins/expectedruntime.md.Uses existing
LastStartTimestamp; MinRuntime stays in Requeue action filters.Related Issues
Closes #904
Checklist
Breaking Changes
Additional Notes