Skip to content

fix(slurm/slinky):#297

Merged
dmitsh merged 1 commit into
mainfrom
ds-unfake
Apr 27, 2026
Merged

fix(slurm/slinky):#297
dmitsh merged 1 commit into
mainfrom
ds-unfake

Conversation

@dmitsh
Copy link
Copy Markdown
Collaborator

@dmitsh dmitsh commented Apr 25, 2026

Description

  • deprecate fake nodes
  • preserve provided blockSizes
  • set blockSizes as an array in topology requests
  • use camelCase in topology request

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • All commits are signed off per DCO (git commit -s).

@dmitsh dmitsh marked this pull request as draft April 25, 2026 05:11
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 25, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 25, 2026

Greptile Summary

This PR deprecates fake-node support, migrates BlockSizes from a comma-separated string to a typed []int (decoded directly by mapstructure), and renames the API parameter key from "block_sizes" to "blockSizes" (camelCase) across slurm, slinky, docs, tests, and payload fixtures. The getBlockSize validation logic is replaced by a simpler getBlockSizes that passes caller-provided values through unchanged and falls back to auto-computed power-of-two sizes.

Confidence Score: 5/5

Safe to merge; changes are internally consistent and all tests are updated. Both findings are P2.

No P0 or P1 issues found. The float64→int mapstructure coercion works correctly via the native numeric decoder without WeaklyTypedInput. The block_sizes → blockSizes rename is intentional and fully applied across the codebase.

pkg/engines/slurm/slurm.go (backward compat), pkg/engines/slurm/slurm_test.go (test coverage gap)

Important Files Changed

Filename Overview
pkg/engines/slurm/slurm.go Removes fake-node support and string-based block-size parsing; BaseParams.BlockSizes is now []int decoded directly via mapstructure (tag renamed block_sizesblockSizes). Silent backward-incompatibility exists for old clients.
pkg/translate/block.go Replaces getBlockSize (with validation and fake-node handling) with simpler getBlockSizes that passes caller-provided sizes through unchanged; auto-computation logic is preserved.
pkg/engines/slinky/engine.go Adds intToStr helper for the []int → annotation string conversion; BlockSizes field updated to []int consistently with slurm changes.
pkg/translate/topology.go Removes FakeNodePool from Config struct; otherwise unchanged.
pkg/topology/topology.go Renames KeyBlockSizes constant from "block_sizes" to "blockSizes" — single source of truth for the API key rename throughout the codebase.
pkg/translate/fake.go File deleted — fake-node functionality fully removed along with its tests.
pkg/engines/slurm/slurm_test.go Removes TestParseFakeNodes, updates TestGetTranslateConfig and TestGenerateOutput for the new []int block-size type; happy-path with a concrete blockSizes array is not covered at the unit level in TestGenerateOutput.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["HTTP Request\n{blockSizes: [2,4]}"] --> B["JSON parse\n→ []interface{}{float64(2),float64(4)}"]
    B --> C["config.Decode / mapstructure\n(float64→int coercion OK)"]
    C --> D["BaseParams.BlockSizes []int"]
    D --> E["GetTranslateConfig"]
    E --> F["translate.Config.BlockSizes []int"]
    F --> G["getBlockSizes()"]
    G -->|"len > 0"| H["Return as-is"]
    G -->|"len == 0"| I["Auto-compute\n[D, 2D, 4D, ...]"]
    H --> J["toBlockTopology\nBlockSizes=2,4"]
    I --> J
    J --> K["Slurm topology.conf\nBlockSizes=2,4"]
Loading

Reviews (2): Last reviewed commit: "fix(slurm/slinky):" | Re-trigger Greptile

Comment thread pkg/engines/slurm/slurm_test.go Outdated
Comment thread pkg/translate/block.go
Comment on lines 36 to 51
@@ -88,15 +51,7 @@ func getBlockSize(blocks []*blockInfo, requestedBlockSizes []int, useFake bool)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No validation on caller-provided requestedBlockSizes

The old getBlockSize validated that values were > 0 and followed the 2^n * base pattern, falling back to auto-computed sizes on failure. The new implementation passes any caller-supplied slice through unchanged. A client submitting blockSizes: [0] or blockSizes: [-5, 3] will now produce a Slurm config line like BlockSizes=0,-5,3, which is silently invalid and will likely cause scontrol reconfigure to reject the topology. Consider adding a basic guard, e.g. reject non-positive values and return an error instead of propagating them to the config file.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.13%. Comparing base (1875ab8) to head (d383a20).
⚠️ Report is 30 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #297      +/-   ##
==========================================
- Coverage   68.46%   68.13%   -0.33%     
==========================================
  Files          82       81       -1     
  Lines        4842     4805      -37     
==========================================
- Hits         3315     3274      -41     
  Misses       1395     1395              
- Partials      132      136       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

 - deprecate fake nodes
 - preserve provided blockSizes
 - set blockSizes as an array in topology requests
 - use camelCase in topology request

Signed-off-by: Dmitry Shmulevich <dshmulevich@nvidia.com>
@dmitsh dmitsh marked this pull request as ready for review April 27, 2026 16:44
@github-actions
Copy link
Copy Markdown

Copy link
Copy Markdown
Collaborator

@ravisoundar ravisoundar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@dmitsh dmitsh merged commit 0fc0745 into main Apr 27, 2026
7 checks passed
@dmitsh dmitsh deleted the ds-unfake branch April 27, 2026 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants