Skip to content

Fix issues #51-#56: scheduler, images, attach, CLI, K8s#57

Merged
powderluv merged 1 commit intomainfrom
fix/issues-51-56
Apr 8, 2026
Merged

Fix issues #51-#56: scheduler, images, attach, CLI, K8s#57
powderluv merged 1 commit intomainfrom
fix/issues-51-56

Conversation

@powderluv
Copy link
Copy Markdown
Collaborator

Summary

Fixes 6 open issues (3 reopened from PR #49, 3 new):

Test plan

  • 743 tests pass, 0 failures (+13 new)
  • cargo fmt --check clean
  • New tests: scheduler edge cases (num_nodes=0, constraint mismatch, exclusive, single idle node), CLI show dispatch, K8s address resolution, retry backoff, image fallback, attach raw bytes
  • CI: fmt, clippy, build-and-test, cluster tests

Closes #51 #52 #53 #54 #55 #56

🤖 Generated with Claude Code

…h hang, CLI show, K8s retry/address

Fixes:
- #56: Scheduler crash recovery via catch_unwind; safety check for num_nodes=0;
  update_pending_reasons now checks constraint/exclusive/fully-consumed (matching
  find_suitable_nodes) so Reason accurately reflects why job can't be scheduled
- #55: Agent image_dir() now uses 3-tier fallback matching CLI (env → system dir
  if exists → ~/.spur/images) instead of hardcoding /var/spool/spur/images
- #54: sattach uses per-byte reads instead of line-buffered; channel buffer
  increased 32→256 to prevent deadlock; graceful task shutdown instead of abort
- #53: `spur show node X` now dispatches as `scontrol show node X` by inserting
  implicit show subcommand (docs said `spur show node` but required `spur show show node`)
- #52: K8s operator wraps background tasks (node watcher, job controller, health)
  in retry loops with exponential backoff (1s→60s cap)
- #51: K8s operator adds --address flag with POD_IP env var fallback; Pod hostname
  is no longer used as default (unroutable from spurctld)

Tests: 743 passed, 0 failed (+13 new tests)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Labeled worker node showing as down*

1 participant