feat(spider-execution-manager): Add scheduler, storage, and liveness client traits. by LinZhihao-723 · Pull Request #327 · y-scope/spider

LinZhihao-723 · 2026-05-20T16:56:06Z

Description

This PR depends on #326.

This PR adds the network-client trait surface for the execution manager. Three async traits cover the EM's outbound traffic to the rest of the cluster; concrete implementations bind to the corresponding remote services and will land in follow-up PRs.

Checklist

The PR satisfies the contribution guidelines.
This is a breaking change and that has been indicated in the PR title, OR this isn't a
breaking change.
Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Ensure all workflows pass.

Summary by CodeRabbit

Release Notes

New Features
- Added execution manager component with client interfaces for liveness monitoring, task scheduling, and storage operations
- Added subprocess-based task executor with inter-process communication protocol support
- Added process pool for managing executor lifecycle, including automatic respawn on crash or timeout
- Added comprehensive integration test suite validating executor operations and error handling

coderabbitai · 2026-05-20T16:56:34Z

Walkthrough

This PR introduces the spider-execution-manager component, a process supervisor that spawns and manages a long-lived spider-task-executor subprocess for distributed task execution. It includes the wire protocol for inter-process communication, the executor binary implementation, a process pool for request serialization and timeout handling, client trait interfaces, and comprehensive integration tests with overhead instrumentation.

Changes

Task Executor and Execution Manager System

Layer / File(s)	Summary
Wire Protocol and Type System `components/spider-task-executor/src/protocol.rs`, `components/spider-task-executor/src/error.rs`	Request/Response/ExecutorOutcome enums with serde support; ExecutorError variants store serializable String messages for cross-process transport.
Package Manager Return Type Update `components/spider-task-executor/src/manager.rs`	TdlPackageManager::load returns a reference to the loaded TdlPackage instead of a cloned name string.
Task Executor Binary Implementation `components/spider-task-executor/Cargo.toml`, `components/spider-task-executor/src/bin/spider_task_executor.rs`, `components/spider-task-executor/src/lib.rs`	Single-threaded Tokio binary that reads framed bincode Request messages from stdin, loads/caches TDL packages, executes tasks, and writes Response messages with elapsed microseconds to stdout.
Execution Manager Client Traits `components/spider-execution-manager/src/client.rs`, `components/spider-execution-manager/src/client/liveness.rs`, `components/spider-execution-manager/src/client/scheduler.rs`, `components/spider-execution-manager/src/client/storage.rs`	LivenessClient (register/heartbeat), SchedulerClient (next_task), and StorageClient (register_task_instance/report_success/report_failure) async trait interfaces with error contracts.
Process Pool Supervisor and Request Execution `components/spider-execution-manager/Cargo.toml`, `components/spider-execution-manager/src/lib.rs`, `components/spider-execution-manager/src/process_pool.rs`	ProcessPool spawns initial executor and manages concurrent request serialization via mutex; auto-respawns on timeout or crash; encodes TaskContext and inputs into Request::Execute and awaits Response::Result with hard-timeout enforcement.
Crate Exports and Workspace Setup `Cargo.toml`	Workspace members extended to register spider-execution-manager and test crates.

Integration Test Infrastructure

Layer / File(s)	Summary
Integration Test Task Package `tests/huntsman/integration-test-tasks/Cargo.toml`, `tests/huntsman/integration-test-tasks/src/lib.rs`	TDL package defining fibonacci (recursive), always_fail (TdlError), always_panic (panic), and instrument (sleep/echo) tasks; shared INSTRUMENT_SLEEP_US constant.
Test Harness and Environment Helpers `tests/huntsman/task-executor/Cargo.toml`, `tests/huntsman/task-executor/src/lib.rs`	ExecutorHandle spawns executor subprocess and provides async framed I/O (send/recv/try_recv/shutdown_clean/wait_for_exit); helpers for environment discovery, TaskContext encoding, single-input/no-input/single-output wire formats, and Request construction.
Task Executor Binary Integration Tests `tests/huntsman/task-executor/tests/test_executor.rs`	Three end-to-end tests: fibonacci correctness, task error reporting, and process crash detection.
Process Pool Integration Tests `tests/huntsman/task-executor/tests/test_process_pool.rs`	Four tests validating successful execution, in-task failure, executor crash recovery with respawn, and timeout trigger with respawn.
Overhead Instrumentation Benchmark `tests/huntsman/task-executor/tests/overhead_instrument.rs`	Measures latency (end-to-end, executor internal, IPC overhead) for the instrument task, outputs Markdown table to SPIDER_TEST_INSTRUMENT_OUTPUT_DIR.
Test Build and Staging Configuration `taskfiles/test.yaml`	Updates spider-huntsman-unit-tests-executor task to build huntsman-complex, integration-test-tasks, and spider-task-executor; stages .so artifacts under G_TDL_PACKAGES_DIR with subdirectory layout; sets SPIDER_TDL_PACKAGE_DIR and SPIDER_TASK_EXECUTOR_BIN environment variables.
Existing Test Compatibility `tests/huntsman/tdl-integration/tests/complex.rs`	Updates load_and_query_name test to work with TdlPackageManager::load returning a reference.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: adding scheduler, storage, and liveness client traits to the spider-execution-manager component.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@components/spider-task-executor/src/bin/spider_task_executor.rs`:
- Around line 73-77: Validate the incoming package string before using it in
filesystem joins: ensure the `package` argument contains only an allowed pattern
(e.g., alphanumerics, underscores, hyphens) and does not contain path separators
or path traversal segments like ".."; perform this check in the same scope where
`manager.get(package)` / `manager.load(&path)` are used (referencing `package`,
`pkg_dir`, `manager.load`) and return an error (or reject) early if validation
fails. Use explicit checks (reject if package contains '/' or '\\' or if it
matches ".." or if normalization yields components outside a single final file
name) and then continue to build the path and call `manager.load(&path)` only
when the name is validated.

In `@tests/huntsman/integration-test-tasks/src/lib.rs`:
- Around line 8-9: The doc comment incorrectly links to INSTRUMENT_SLEEP while
the exported constant is named INSTRUMENT_SLEEP_US; update the intra-doc link in
the comment referencing task_decl::instrument to point to the actual constant
name INSTRUMENT_SLEEP_US (or rename the constant to match the doc link),
ensuring the rustdoc intra-doc link uses the exact symbol INSTRUMENT_SLEEP_US so
the link resolves correctly.

In `@tests/huntsman/task-executor/tests/overhead_instrument.rs`:
- Around line 101-104: After reading INSTRUMENT_OUTPUT_DIR_ENV into output_dir,
ensure the directory exists by calling std::fs::create_dir_all(&output_dir) (or
equivalent) immediately after the PathBuf::from(...) call and before any
File::create usage; propagate or panic on errors consistently with the
surrounding code so File::create cannot fail due to a missing directory
(reference symbols: INSTRUMENT_OUTPUT_DIR_ENV, output_dir, and the subsequent
File::create call).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f86863eb-c4e1-4ebe-b8b5-5dccb17e72a6

📥 Commits

Reviewing files that changed from the base of the PR and between aadb9eb and fd93247.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (23)

Cargo.toml
components/spider-execution-manager/Cargo.toml
components/spider-execution-manager/src/client.rs
components/spider-execution-manager/src/client/liveness.rs
components/spider-execution-manager/src/client/scheduler.rs
components/spider-execution-manager/src/client/storage.rs
components/spider-execution-manager/src/lib.rs
components/spider-execution-manager/src/process_pool.rs
components/spider-task-executor/Cargo.toml
components/spider-task-executor/src/bin/spider_task_executor.rs
components/spider-task-executor/src/error.rs
components/spider-task-executor/src/lib.rs
components/spider-task-executor/src/manager.rs
components/spider-task-executor/src/protocol.rs
taskfiles/test.yaml
tests/huntsman/integration-test-tasks/Cargo.toml
tests/huntsman/integration-test-tasks/src/lib.rs
tests/huntsman/task-executor/Cargo.toml
tests/huntsman/task-executor/src/lib.rs
tests/huntsman/task-executor/tests/overhead_instrument.rs
tests/huntsman/task-executor/tests/test_executor.rs
tests/huntsman/task-executor/tests/test_process_pool.rs
tests/huntsman/tdl-integration/tests/complex.rs

coderabbitai · 2026-05-20T17:02:03Z

+    let pkg = if let Some(pkg) = manager.get(package) {
+        pkg
+    } else {
+        let path = pkg_dir.join(package).join(format!("lib{package}.so"));
+        manager.load(&path)?


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate package names before filesystem joins to block path traversal.

package is joined into a load path without validation. A crafted value containing separators or .. can escape SPIDER_TDL_PACKAGE_DIR and load an unintended .so.

Suggested fix

+fn is_valid_package_name(package: &str) -> bool { + !package.is_empty() + && package + .bytes() + .all(|b| b.is_ascii_alphanumeric() || b == b'_' || b == b'-') +} + fn run_task( manager: &mut TdlPackageManager, pkg_dir: &Path, package: &str, @@ ) -> Result<Vec<u8>, ExecutorError> { + if !is_valid_package_name(package) { + return Err(ExecutorError::InvalidLibrary(format!( + "invalid package name: {package}" + ))); + } + let pkg = if let Some(pkg) = manager.get(package) { pkg } else { let path = pkg_dir.join(package).join(format!("lib{package}.so")); manager.load(&path)?

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let pkg = if let Some(pkg) = manager.get(package) {

pkg

} else {

let path = pkg_dir.join(package).join(format!("lib{package}.so"));

manager.load(&path)?

fn is_valid_package_name(package: &str) -> bool {

!package.is_empty()

&& package

.bytes()

.all(|b| b.is_ascii_alphanumeric() || b == b'_' || b == b'-')

}

fn run_task(

manager: &mut TdlPackageManager,

pkg_dir: &Path,

package: &str,

) -> Result<Vec<u8>, ExecutorError> {

if !is_valid_package_name(package) {

return Err(ExecutorError::InvalidLibrary(format!(

"invalid package name: {package}"

)));

}

let pkg = if let Some(pkg) = manager.get(package) {

pkg

} else {

let path = pkg_dir.join(package).join(format!("lib{package}.so"));

manager.load(&path)?

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@components/spider-task-executor/src/bin/spider_task_executor.rs` around lines 73 - 77, Validate the incoming package string before using it in filesystem joins: ensure the `package` argument contains only an allowed pattern (e.g., alphanumerics, underscores, hyphens) and does not contain path separators or path traversal segments like ".."; perform this check in the same scope where `manager.get(package)` / `manager.load(&path)` are used (referencing `package`, `pkg_dir`, `manager.load`) and return an error (or reject) early if validation fails. Use explicit checks (reject if package contains '/' or '\\' or if it matches ".." or if normalization yields components outside a single final file name) and then continue to build the path and call `manager.load(&path)` only when the name is validated.

coderabbitai · 2026-05-20T17:02:04Z

+//! * [`task_decl::instrument`] — fixed-cost task: sleeps for a known [`INSTRUMENT_SLEEP`] duration
+//!   then echoes its `Vec<String>` payload back. Used by the overhead bench so the non-sleep


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the rustdoc symbol link to the exported constant.

The doc comment references INSTRUMENT_SLEEP, but the declared constant is INSTRUMENT_SLEEP_US (Line 17), so the intra-doc link is inconsistent.

Proposed fix

-//! * [`task_decl::instrument`] — fixed-cost task: sleeps for a known [`INSTRUMENT_SLEEP`] duration +//! * [`task_decl::instrument`] — fixed-cost task: sleeps for a known [`INSTRUMENT_SLEEP_US`] duration

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

//! * [`task_decl::instrument`] — fixed-cost task: sleeps for a known [`INSTRUMENT_SLEEP`] duration

//! then echoes its `Vec<String>` payload back. Used by the overhead bench so the non-sleep

//! * [`task_decl::instrument`] — fixed-cost task: sleeps for a known [`INSTRUMENT_SLEEP_US`] duration

//! then echoes its `Vec<String>` payload back. Used by the overhead bench so the non-sleep

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/huntsman/integration-test-tasks/src/lib.rs` around lines 8 - 9, The doc comment incorrectly links to INSTRUMENT_SLEEP while the exported constant is named INSTRUMENT_SLEEP_US; update the intra-doc link in the comment referencing task_decl::instrument to point to the actual constant name INSTRUMENT_SLEEP_US (or rename the constant to match the doc link), ensuring the rustdoc intra-doc link uses the exact symbol INSTRUMENT_SLEEP_US so the link resolves correctly.

coderabbitai · 2026-05-20T17:02:04Z

+    let output_dir = std::env::var_os(INSTRUMENT_OUTPUT_DIR_ENV).map_or_else(
+        || panic!("{INSTRUMENT_OUTPUT_DIR_ENV} env var not set"),
+        PathBuf::from,
+    );


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Ensure the output directory exists before file creation.

At Line 179, File::create will fail when SPIDER_TEST_INSTRUMENT_OUTPUT_DIR points to a path that does not already exist. Create the directory once right after parsing the env var.

Proposed fix

async fn instrument_overhead() { let output_dir = std::env::var_os(INSTRUMENT_OUTPUT_DIR_ENV).map_or_else( || panic!("{INSTRUMENT_OUTPUT_DIR_ENV} env var not set"), PathBuf::from, ); + std::fs::create_dir_all(&output_dir) + .unwrap_or_else(|err| panic!("create {} failed: {err}", output_dir.display())); let mut handle = ExecutorHandle::spawn();

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let output_dir = std::env::var_os(INSTRUMENT_OUTPUT_DIR_ENV).map_or_else(

|| panic!("{INSTRUMENT_OUTPUT_DIR_ENV} env var not set"),

PathBuf::from,

);

let output_dir = std::env::var_os(INSTRUMENT_OUTPUT_DIR_ENV).map_or_else(

|| panic!("{INSTRUMENT_OUTPUT_DIR_ENV} env var not set"),

PathBuf::from,

);

std::fs::create_dir_all(&output_dir)

.unwrap_or_else(|err| panic!("create {} failed: {err}", output_dir.display()));

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/huntsman/task-executor/tests/overhead_instrument.rs` around lines 101 - 104, After reading INSTRUMENT_OUTPUT_DIR_ENV into output_dir, ensure the directory exists by calling std::fs::create_dir_all(&output_dir) (or equivalent) immediately after the PathBuf::from(...) call and before any File::create usage; propagate or panic on errors consistently with the surrounding code so File::create cannot fail due to a missing directory (reference symbols: INSTRUMENT_OUTPUT_DIR_ENV, output_dir, and the subsequent File::create call).

LinZhihao-723 added 10 commits May 14, 2026 19:36

Implementation done.

320c2d0

Commit integration tests.

d25131c

gg, forgot to apply linters.

777cff5

Done with process pool implementation.

47a004f

Rename test source.

e8253e9

Merge branch 'task-executor-impl' into process-pool-basic

86bb2ed

Fix.

5946d18

Merge branch 'task-executor-impl' into process-pool-basic

cf6fdf8

Rename test binary.

c614fd3

Add client trait implement.

fd93247

LinZhihao-723 requested review from a team and sitaowang1998 as code owners May 20, 2026 16:56

coderabbitai Bot reviewed May 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(spider-execution-manager): Add scheduler, storage, and liveness client traits.#327

feat(spider-execution-manager): Add scheduler, storage, and liveness client traits.#327
LinZhihao-723 wants to merge 10 commits into
y-scope:mainfrom
LinZhihao-723:client-traits

LinZhihao-723 commented May 20, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 20, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 20, 2026

Uh oh!

coderabbitai Bot May 20, 2026

Uh oh!

coderabbitai Bot May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-    let pkg = if let Some(pkg) = manager.get(package) {
-        pkg
-    } else {
-        let path = pkg_dir.join(package).join(format!("lib{package}.so"));
-        manager.load(&path)?
+fn is_valid_package_name(package: &str) -> bool {
+    !package.is_empty()
+        && package
+            .bytes()
+            .all(|b| b.is_ascii_alphanumeric() || b == b'_' || b == b'-')
+}
+fn run_task(
+    manager: &mut TdlPackageManager,
+    pkg_dir: &Path,
+    package: &str,
+) -> Result<Vec<u8>, ExecutorError> {
+    if !is_valid_package_name(package) {
+        return Err(ExecutorError::InvalidLibrary(format!(
+            "invalid package name: {package}"
+        )));
+    }
+    let pkg = if let Some(pkg) = manager.get(package) {
+        pkg
+    } else {
+        let path = pkg_dir.join(package).join(format!("lib{package}.so"));
+        manager.load(&path)?

		//! * [`task_decl::instrument`] — fixed-cost task: sleeps for a known [`INSTRUMENT_SLEEP`] duration
		//! then echoes its `Vec<String>` payload back. Used by the overhead bench so the non-sleep

Conversation

LinZhihao-723 commented May 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Validation performed

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

LinZhihao-723 commented May 20, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 20, 2026 •

edited

Loading