Skip to content

fix(slinky): Skip exec scontrol show partition for DynamicNodes#319

Merged
ravisoundar merged 1 commit into
mainfrom
rs-skip-exec
May 13, 2026
Merged

fix(slinky): Skip exec scontrol show partition for DynamicNodes#319
ravisoundar merged 1 commit into
mainfrom
rs-skip-exec

Conversation

@ravisoundar
Copy link
Copy Markdown
Collaborator

Skip exec scontrol show partition when useDynamicNodes is set

Description

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • All commits are signed off per DCO (git commit -s).

@ravisoundar ravisoundar requested a review from dmitsh as a code owner May 9, 2026 00:50
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 9, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 9, 2026

Greptile Summary

This PR fixes a crash that occurred when UseDynamicNodes=true and a topology entry had a Partition configured but no explicit nodes — previously the code would attempt to exec scontrol show partition inside a pod, which is unnecessary (and could fail) in dynamic-node mode.

  • engine.go: Adds a guard at the top of getPartitionNodes that, when UseDynamicNodes=true, skips the scontrol exec and returns the sentinel value "\tNodes=NONE".
  • slurm.go: Extends parsePartitionNodes with a match[1] == "NONE" check that converts the sentinel into an empty node slice, cleanly propagating a no-nodes result without triggering the existing "partition has no nodes" error path.

Confidence Score: 5/5

Safe to merge; the fix correctly skips the scontrol exec when UseDynamicNodes is enabled and the sentinel value threads through the existing regex parser without error.

Both changed lines are algorithmically sound: the sentinel Nodes=NONE satisfies the regex, parsePartitionNodes captures NONE and returns an empty slice cleanly. The only gap is a missing unit-test case for the new branch.

pkg/engines/slurm/slurm_test.go — the new NONE branch in parsePartitionNodes has no dedicated test case.

Important Files Changed

Filename Overview
pkg/engines/slinky/engine.go Adds dynamicShowPartitionNodes sentinel constant and an early-return guard in getPartitionNodes that correctly skips scontrol when UseDynamicNodes=true; guard is untested by existing integration tests (all test topologies supply a PodSelector, so getPartitionNodes is never reached).
pkg/engines/slurm/slurm.go Adds a match[1] == "NONE" branch in parsePartitionNodes that returns an empty slice; regex correctly matches "\tNodes=NONE" but there is no TestParsePartitionNodes case covering this new branch.

Reviews (3): Last reviewed commit: "fix(slinky): Skip exec scontrol show par..." | Re-trigger Greptile

Comment thread pkg/engines/slinky/engine.go
Comment thread pkg/engines/slinky/engine.go
…lag is set

Signed-off-by: Ravi Shankar <ravish@nvidia.com>
@ravisoundar ravisoundar merged commit 97c0aa3 into main May 13, 2026
2 checks passed
@ravisoundar ravisoundar deleted the rs-skip-exec branch May 13, 2026 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants