From ec708d34ecb73fcd6f155404003dcaa5f0768f6c Mon Sep 17 00:00:00 2001 From: Claude Date: Sat, 22 Nov 2025 22:23:31 +0000 Subject: [PATCH 01/11] Add SLURM runner support for workload manager integration This commit adds comprehensive SLURM (Simple Linux Utility for Resource Management) support to the FZ framework, enabling users to run calculations on SLURM-managed HPC clusters. Features: - Support for slurm://[user@host[:port]:]partition/script URI format - Local SLURM execution using srun command - Remote SLURM execution via SSH with automatic file transfer - SLURM partition specification for job scheduling - Interrupt handling (Ctrl+C terminates SLURM jobs) - Timeout support for long-running jobs Implementation: - Added parse_slurm_uri() function to parse SLURM URIs - Added run_slurm_calculation() main entry point - Added _run_local_slurm_calculation() for local execution - Added _run_remote_slurm_calculation() for remote execution - Added _execute_remote_slurm_command() for remote job control - Updated _validate_calculator_uri() to support "slurm" scheme - Updated run_calculation() to route slurm:// URIs Testing: - Comprehensive URI parsing tests for various formats - Integration tests for calculator resolution and validation - All tests passing Documentation: - Updated README.md with SLURM calculator section - Updated CLAUDE.md with SLURM implementation details - Added usage examples and requirements URI Examples: - slurm://compute/script.sh (local) - slurm://user@cluster.edu:gpu/script.sh (remote) - slurm://user@cluster.edu:2222:gpu/script.sh (custom port) --- CLAUDE.md | 3 +- README.md | 40 ++ fz/runners.py | 747 ++++++++++++++++++++++++++++++++++++- tests/test_slurm_runner.py | 200 ++++++++++ 4 files changed, 986 insertions(+), 4 deletions(-) create mode 100644 tests/test_slurm_runner.py diff --git a/CLAUDE.md b/CLAUDE.md index 6fc328f..21c0438 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -78,9 +78,10 @@ The codebase is organized into functional modules (~5700 lines total): - Support for default values: `${var~default}` - Multi-line function definitions in formulas -- **`fz/runners.py`** (1345 lines) - Calculator execution engines +- **`fz/runners.py`** (~1900 lines) - Calculator execution engines - **Local shell execution** (`sh://`) - runs commands in temporary directories - **SSH remote execution** (`ssh://`) - remote HPC/cluster support with file transfer + - **SLURM workload manager** (`slurm://`) - local or remote SLURM cluster execution with partition scheduling - **Cache calculator** (`cache://`) - reuses previous results by input hash matching - Host key validation, authentication handling, timeout management diff --git a/README.md b/README.md index 4e7ca57..4e840d2 100644 --- a/README.md +++ b/README.md @@ -989,6 +989,46 @@ calculators = "ssh://user@server.com:2222/bash /absolutepath/to/calc.sh" - Warning for password-based auth - Environment variable for auto-accepting host keys: `FZ_SSH_AUTO_ACCEPT_HOSTKEYS=1` +### SLURM Workload Manager + +Execute calculations on SLURM clusters (local or remote): + +```python +# Local SLURM execution +calculators = "slurm://compute/bash script.sh" + +# Remote SLURM execution via SSH +calculators = "slurm://user@cluster.edu:gpu/bash script.sh" + +# With custom SSH port +calculators = "slurm://user@cluster.edu:2222:gpu/bash script.sh" + +# Multiple partitions for parallel execution +calculators = [ + "slurm://user@hpc.edu:compute/bash calc.sh", + "slurm://user@hpc.edu:gpu/bash calc.sh" +] +``` + +**URI Format**: `slurm://[user@host[:port]:]partition/script` + +**How it works**: +1. Local execution: Uses `srun --partition=