Skip to content

feat(spider-execution-manager): Add scheduler, storage, and liveness client traits.#327

Open
LinZhihao-723 wants to merge 10 commits into
y-scope:mainfrom
LinZhihao-723:client-traits
Open

feat(spider-execution-manager): Add scheduler, storage, and liveness client traits.#327
LinZhihao-723 wants to merge 10 commits into
y-scope:mainfrom
LinZhihao-723:client-traits

Conversation

@LinZhihao-723
Copy link
Copy Markdown
Member

@LinZhihao-723 LinZhihao-723 commented May 20, 2026

Description

This PR depends on #326.

This PR adds the network-client trait surface for the execution manager. Three async traits cover the EM's outbound traffic to the rest of the cluster; concrete implementations bind to the corresponding remote services and will land in follow-up PRs.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Ensure all workflows pass.

Summary by CodeRabbit

Release Notes

  • New Features
    • Added execution manager component with client interfaces for liveness monitoring, task scheduling, and storage operations
    • Added subprocess-based task executor with inter-process communication protocol support
    • Added process pool for managing executor lifecycle, including automatic respawn on crash or timeout
    • Added comprehensive integration test suite validating executor operations and error handling

Review Change Stack

@LinZhihao-723 LinZhihao-723 requested review from a team and sitaowang1998 as code owners May 20, 2026 16:56
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

Walkthrough

This PR introduces the spider-execution-manager component, a process supervisor that spawns and manages a long-lived spider-task-executor subprocess for distributed task execution. It includes the wire protocol for inter-process communication, the executor binary implementation, a process pool for request serialization and timeout handling, client trait interfaces, and comprehensive integration tests with overhead instrumentation.

Changes

Task Executor and Execution Manager System

Layer / File(s) Summary
Wire Protocol and Type System
components/spider-task-executor/src/protocol.rs, components/spider-task-executor/src/error.rs
Request/Response/ExecutorOutcome enums with serde support; ExecutorError variants store serializable String messages for cross-process transport.
Package Manager Return Type Update
components/spider-task-executor/src/manager.rs
TdlPackageManager::load returns a reference to the loaded TdlPackage instead of a cloned name string.
Task Executor Binary Implementation
components/spider-task-executor/Cargo.toml, components/spider-task-executor/src/bin/spider_task_executor.rs, components/spider-task-executor/src/lib.rs
Single-threaded Tokio binary that reads framed bincode Request messages from stdin, loads/caches TDL packages, executes tasks, and writes Response messages with elapsed microseconds to stdout.
Execution Manager Client Traits
components/spider-execution-manager/src/client.rs, components/spider-execution-manager/src/client/liveness.rs, components/spider-execution-manager/src/client/scheduler.rs, components/spider-execution-manager/src/client/storage.rs
LivenessClient (register/heartbeat), SchedulerClient (next_task), and StorageClient (register_task_instance/report_success/report_failure) async trait interfaces with error contracts.
Process Pool Supervisor and Request Execution
components/spider-execution-manager/Cargo.toml, components/spider-execution-manager/src/lib.rs, components/spider-execution-manager/src/process_pool.rs
ProcessPool spawns initial executor and manages concurrent request serialization via mutex; auto-respawns on timeout or crash; encodes TaskContext and inputs into Request::Execute and awaits Response::Result with hard-timeout enforcement.
Crate Exports and Workspace Setup
Cargo.toml
Workspace members extended to register spider-execution-manager and test crates.

Integration Test Infrastructure

Layer / File(s) Summary
Integration Test Task Package
tests/huntsman/integration-test-tasks/Cargo.toml, tests/huntsman/integration-test-tasks/src/lib.rs
TDL package defining fibonacci (recursive), always_fail (TdlError), always_panic (panic), and instrument (sleep/echo) tasks; shared INSTRUMENT_SLEEP_US constant.
Test Harness and Environment Helpers
tests/huntsman/task-executor/Cargo.toml, tests/huntsman/task-executor/src/lib.rs
ExecutorHandle spawns executor subprocess and provides async framed I/O (send/recv/try_recv/shutdown_clean/wait_for_exit); helpers for environment discovery, TaskContext encoding, single-input/no-input/single-output wire formats, and Request construction.
Task Executor Binary Integration Tests
tests/huntsman/task-executor/tests/test_executor.rs
Three end-to-end tests: fibonacci correctness, task error reporting, and process crash detection.
Process Pool Integration Tests
tests/huntsman/task-executor/tests/test_process_pool.rs
Four tests validating successful execution, in-task failure, executor crash recovery with respawn, and timeout trigger with respawn.
Overhead Instrumentation Benchmark
tests/huntsman/task-executor/tests/overhead_instrument.rs
Measures latency (end-to-end, executor internal, IPC overhead) for the instrument task, outputs Markdown table to SPIDER_TEST_INSTRUMENT_OUTPUT_DIR.
Test Build and Staging Configuration
taskfiles/test.yaml
Updates spider-huntsman-unit-tests-executor task to build huntsman-complex, integration-test-tasks, and spider-task-executor; stages .so artifacts under G_TDL_PACKAGES_DIR with subdirectory layout; sets SPIDER_TDL_PACKAGE_DIR and SPIDER_TASK_EXECUTOR_BIN environment variables.
Existing Test Compatibility
tests/huntsman/tdl-integration/tests/complex.rs
Updates load_and_query_name test to work with TdlPackageManager::load returning a reference.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding scheduler, storage, and liveness client traits to the spider-execution-manager component.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@components/spider-task-executor/src/bin/spider_task_executor.rs`:
- Around line 73-77: Validate the incoming package string before using it in
filesystem joins: ensure the `package` argument contains only an allowed pattern
(e.g., alphanumerics, underscores, hyphens) and does not contain path separators
or path traversal segments like ".."; perform this check in the same scope where
`manager.get(package)` / `manager.load(&path)` are used (referencing `package`,
`pkg_dir`, `manager.load`) and return an error (or reject) early if validation
fails. Use explicit checks (reject if package contains '/' or '\\' or if it
matches ".." or if normalization yields components outside a single final file
name) and then continue to build the path and call `manager.load(&path)` only
when the name is validated.

In `@tests/huntsman/integration-test-tasks/src/lib.rs`:
- Around line 8-9: The doc comment incorrectly links to INSTRUMENT_SLEEP while
the exported constant is named INSTRUMENT_SLEEP_US; update the intra-doc link in
the comment referencing task_decl::instrument to point to the actual constant
name INSTRUMENT_SLEEP_US (or rename the constant to match the doc link),
ensuring the rustdoc intra-doc link uses the exact symbol INSTRUMENT_SLEEP_US so
the link resolves correctly.

In `@tests/huntsman/task-executor/tests/overhead_instrument.rs`:
- Around line 101-104: After reading INSTRUMENT_OUTPUT_DIR_ENV into output_dir,
ensure the directory exists by calling std::fs::create_dir_all(&output_dir) (or
equivalent) immediately after the PathBuf::from(...) call and before any
File::create usage; propagate or panic on errors consistently with the
surrounding code so File::create cannot fail due to a missing directory
(reference symbols: INSTRUMENT_OUTPUT_DIR_ENV, output_dir, and the subsequent
File::create call).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f86863eb-c4e1-4ebe-b8b5-5dccb17e72a6

📥 Commits

Reviewing files that changed from the base of the PR and between aadb9eb and fd93247.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (23)
  • Cargo.toml
  • components/spider-execution-manager/Cargo.toml
  • components/spider-execution-manager/src/client.rs
  • components/spider-execution-manager/src/client/liveness.rs
  • components/spider-execution-manager/src/client/scheduler.rs
  • components/spider-execution-manager/src/client/storage.rs
  • components/spider-execution-manager/src/lib.rs
  • components/spider-execution-manager/src/process_pool.rs
  • components/spider-task-executor/Cargo.toml
  • components/spider-task-executor/src/bin/spider_task_executor.rs
  • components/spider-task-executor/src/error.rs
  • components/spider-task-executor/src/lib.rs
  • components/spider-task-executor/src/manager.rs
  • components/spider-task-executor/src/protocol.rs
  • taskfiles/test.yaml
  • tests/huntsman/integration-test-tasks/Cargo.toml
  • tests/huntsman/integration-test-tasks/src/lib.rs
  • tests/huntsman/task-executor/Cargo.toml
  • tests/huntsman/task-executor/src/lib.rs
  • tests/huntsman/task-executor/tests/overhead_instrument.rs
  • tests/huntsman/task-executor/tests/test_executor.rs
  • tests/huntsman/task-executor/tests/test_process_pool.rs
  • tests/huntsman/tdl-integration/tests/complex.rs

Comment on lines +73 to +77
let pkg = if let Some(pkg) = manager.get(package) {
pkg
} else {
let path = pkg_dir.join(package).join(format!("lib{package}.so"));
manager.load(&path)?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate package names before filesystem joins to block path traversal.

package is joined into a load path without validation. A crafted value containing separators or .. can escape SPIDER_TDL_PACKAGE_DIR and load an unintended .so.

Suggested fix
+fn is_valid_package_name(package: &str) -> bool {
+    !package.is_empty()
+        && package
+            .bytes()
+            .all(|b| b.is_ascii_alphanumeric() || b == b'_' || b == b'-')
+}
+
 fn run_task(
     manager: &mut TdlPackageManager,
     pkg_dir: &Path,
     package: &str,
@@
 ) -> Result<Vec<u8>, ExecutorError> {
+    if !is_valid_package_name(package) {
+        return Err(ExecutorError::InvalidLibrary(format!(
+            "invalid package name: {package}"
+        )));
+    }
+
     let pkg = if let Some(pkg) = manager.get(package) {
         pkg
     } else {
         let path = pkg_dir.join(package).join(format!("lib{package}.so"));
         manager.load(&path)?
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let pkg = if let Some(pkg) = manager.get(package) {
pkg
} else {
let path = pkg_dir.join(package).join(format!("lib{package}.so"));
manager.load(&path)?
fn is_valid_package_name(package: &str) -> bool {
!package.is_empty()
&& package
.bytes()
.all(|b| b.is_ascii_alphanumeric() || b == b'_' || b == b'-')
}
fn run_task(
manager: &mut TdlPackageManager,
pkg_dir: &Path,
package: &str,
) -> Result<Vec<u8>, ExecutorError> {
if !is_valid_package_name(package) {
return Err(ExecutorError::InvalidLibrary(format!(
"invalid package name: {package}"
)));
}
let pkg = if let Some(pkg) = manager.get(package) {
pkg
} else {
let path = pkg_dir.join(package).join(format!("lib{package}.so"));
manager.load(&path)?
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/spider-task-executor/src/bin/spider_task_executor.rs` around lines
73 - 77, Validate the incoming package string before using it in filesystem
joins: ensure the `package` argument contains only an allowed pattern (e.g.,
alphanumerics, underscores, hyphens) and does not contain path separators or
path traversal segments like ".."; perform this check in the same scope where
`manager.get(package)` / `manager.load(&path)` are used (referencing `package`,
`pkg_dir`, `manager.load`) and return an error (or reject) early if validation
fails. Use explicit checks (reject if package contains '/' or '\\' or if it
matches ".." or if normalization yields components outside a single final file
name) and then continue to build the path and call `manager.load(&path)` only
when the name is validated.

Comment on lines +8 to +9
//! * [`task_decl::instrument`] — fixed-cost task: sleeps for a known [`INSTRUMENT_SLEEP`] duration
//! then echoes its `Vec<String>` payload back. Used by the overhead bench so the non-sleep
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the rustdoc symbol link to the exported constant.

The doc comment references INSTRUMENT_SLEEP, but the declared constant is INSTRUMENT_SLEEP_US (Line 17), so the intra-doc link is inconsistent.

Proposed fix
-//! * [`task_decl::instrument`] — fixed-cost task: sleeps for a known [`INSTRUMENT_SLEEP`] duration
+//! * [`task_decl::instrument`] — fixed-cost task: sleeps for a known [`INSTRUMENT_SLEEP_US`] duration
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
//! * [`task_decl::instrument`] — fixed-cost task: sleeps for a known [`INSTRUMENT_SLEEP`] duration
//! then echoes its `Vec<String>` payload back. Used by the overhead bench so the non-sleep
//! * [`task_decl::instrument`] — fixed-cost task: sleeps for a known [`INSTRUMENT_SLEEP_US`] duration
//! then echoes its `Vec<String>` payload back. Used by the overhead bench so the non-sleep
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/huntsman/integration-test-tasks/src/lib.rs` around lines 8 - 9, The doc
comment incorrectly links to INSTRUMENT_SLEEP while the exported constant is
named INSTRUMENT_SLEEP_US; update the intra-doc link in the comment referencing
task_decl::instrument to point to the actual constant name INSTRUMENT_SLEEP_US
(or rename the constant to match the doc link), ensuring the rustdoc intra-doc
link uses the exact symbol INSTRUMENT_SLEEP_US so the link resolves correctly.

Comment on lines +101 to +104
let output_dir = std::env::var_os(INSTRUMENT_OUTPUT_DIR_ENV).map_or_else(
|| panic!("{INSTRUMENT_OUTPUT_DIR_ENV} env var not set"),
PathBuf::from,
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Ensure the output directory exists before file creation.

At Line 179, File::create will fail when SPIDER_TEST_INSTRUMENT_OUTPUT_DIR points to a path that does not already exist. Create the directory once right after parsing the env var.

Proposed fix
 async fn instrument_overhead() {
     let output_dir = std::env::var_os(INSTRUMENT_OUTPUT_DIR_ENV).map_or_else(
         || panic!("{INSTRUMENT_OUTPUT_DIR_ENV} env var not set"),
         PathBuf::from,
     );
+    std::fs::create_dir_all(&output_dir)
+        .unwrap_or_else(|err| panic!("create {} failed: {err}", output_dir.display()));
 
     let mut handle = ExecutorHandle::spawn();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let output_dir = std::env::var_os(INSTRUMENT_OUTPUT_DIR_ENV).map_or_else(
|| panic!("{INSTRUMENT_OUTPUT_DIR_ENV} env var not set"),
PathBuf::from,
);
let output_dir = std::env::var_os(INSTRUMENT_OUTPUT_DIR_ENV).map_or_else(
|| panic!("{INSTRUMENT_OUTPUT_DIR_ENV} env var not set"),
PathBuf::from,
);
std::fs::create_dir_all(&output_dir)
.unwrap_or_else(|err| panic!("create {} failed: {err}", output_dir.display()));
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/huntsman/task-executor/tests/overhead_instrument.rs` around lines 101 -
104, After reading INSTRUMENT_OUTPUT_DIR_ENV into output_dir, ensure the
directory exists by calling std::fs::create_dir_all(&output_dir) (or equivalent)
immediately after the PathBuf::from(...) call and before any File::create usage;
propagate or panic on errors consistently with the surrounding code so
File::create cannot fail due to a missing directory (reference symbols:
INSTRUMENT_OUTPUT_DIR_ENV, output_dir, and the subsequent File::create call).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant