feat(spider-execution-manager): Add scheduler, storage, and liveness client traits.#327
feat(spider-execution-manager): Add scheduler, storage, and liveness client traits.#327LinZhihao-723 wants to merge 10 commits into
Conversation
WalkthroughThis PR introduces the ChangesTask Executor and Execution Manager System
Integration Test Infrastructure
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@components/spider-task-executor/src/bin/spider_task_executor.rs`:
- Around line 73-77: Validate the incoming package string before using it in
filesystem joins: ensure the `package` argument contains only an allowed pattern
(e.g., alphanumerics, underscores, hyphens) and does not contain path separators
or path traversal segments like ".."; perform this check in the same scope where
`manager.get(package)` / `manager.load(&path)` are used (referencing `package`,
`pkg_dir`, `manager.load`) and return an error (or reject) early if validation
fails. Use explicit checks (reject if package contains '/' or '\\' or if it
matches ".." or if normalization yields components outside a single final file
name) and then continue to build the path and call `manager.load(&path)` only
when the name is validated.
In `@tests/huntsman/integration-test-tasks/src/lib.rs`:
- Around line 8-9: The doc comment incorrectly links to INSTRUMENT_SLEEP while
the exported constant is named INSTRUMENT_SLEEP_US; update the intra-doc link in
the comment referencing task_decl::instrument to point to the actual constant
name INSTRUMENT_SLEEP_US (or rename the constant to match the doc link),
ensuring the rustdoc intra-doc link uses the exact symbol INSTRUMENT_SLEEP_US so
the link resolves correctly.
In `@tests/huntsman/task-executor/tests/overhead_instrument.rs`:
- Around line 101-104: After reading INSTRUMENT_OUTPUT_DIR_ENV into output_dir,
ensure the directory exists by calling std::fs::create_dir_all(&output_dir) (or
equivalent) immediately after the PathBuf::from(...) call and before any
File::create usage; propagate or panic on errors consistently with the
surrounding code so File::create cannot fail due to a missing directory
(reference symbols: INSTRUMENT_OUTPUT_DIR_ENV, output_dir, and the subsequent
File::create call).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: f86863eb-c4e1-4ebe-b8b5-5dccb17e72a6
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (23)
Cargo.tomlcomponents/spider-execution-manager/Cargo.tomlcomponents/spider-execution-manager/src/client.rscomponents/spider-execution-manager/src/client/liveness.rscomponents/spider-execution-manager/src/client/scheduler.rscomponents/spider-execution-manager/src/client/storage.rscomponents/spider-execution-manager/src/lib.rscomponents/spider-execution-manager/src/process_pool.rscomponents/spider-task-executor/Cargo.tomlcomponents/spider-task-executor/src/bin/spider_task_executor.rscomponents/spider-task-executor/src/error.rscomponents/spider-task-executor/src/lib.rscomponents/spider-task-executor/src/manager.rscomponents/spider-task-executor/src/protocol.rstaskfiles/test.yamltests/huntsman/integration-test-tasks/Cargo.tomltests/huntsman/integration-test-tasks/src/lib.rstests/huntsman/task-executor/Cargo.tomltests/huntsman/task-executor/src/lib.rstests/huntsman/task-executor/tests/overhead_instrument.rstests/huntsman/task-executor/tests/test_executor.rstests/huntsman/task-executor/tests/test_process_pool.rstests/huntsman/tdl-integration/tests/complex.rs
| let pkg = if let Some(pkg) = manager.get(package) { | ||
| pkg | ||
| } else { | ||
| let path = pkg_dir.join(package).join(format!("lib{package}.so")); | ||
| manager.load(&path)? |
There was a problem hiding this comment.
Validate package names before filesystem joins to block path traversal.
package is joined into a load path without validation. A crafted value containing separators or .. can escape SPIDER_TDL_PACKAGE_DIR and load an unintended .so.
Suggested fix
+fn is_valid_package_name(package: &str) -> bool {
+ !package.is_empty()
+ && package
+ .bytes()
+ .all(|b| b.is_ascii_alphanumeric() || b == b'_' || b == b'-')
+}
+
fn run_task(
manager: &mut TdlPackageManager,
pkg_dir: &Path,
package: &str,
@@
) -> Result<Vec<u8>, ExecutorError> {
+ if !is_valid_package_name(package) {
+ return Err(ExecutorError::InvalidLibrary(format!(
+ "invalid package name: {package}"
+ )));
+ }
+
let pkg = if let Some(pkg) = manager.get(package) {
pkg
} else {
let path = pkg_dir.join(package).join(format!("lib{package}.so"));
manager.load(&path)?📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let pkg = if let Some(pkg) = manager.get(package) { | |
| pkg | |
| } else { | |
| let path = pkg_dir.join(package).join(format!("lib{package}.so")); | |
| manager.load(&path)? | |
| fn is_valid_package_name(package: &str) -> bool { | |
| !package.is_empty() | |
| && package | |
| .bytes() | |
| .all(|b| b.is_ascii_alphanumeric() || b == b'_' || b == b'-') | |
| } | |
| fn run_task( | |
| manager: &mut TdlPackageManager, | |
| pkg_dir: &Path, | |
| package: &str, | |
| ) -> Result<Vec<u8>, ExecutorError> { | |
| if !is_valid_package_name(package) { | |
| return Err(ExecutorError::InvalidLibrary(format!( | |
| "invalid package name: {package}" | |
| ))); | |
| } | |
| let pkg = if let Some(pkg) = manager.get(package) { | |
| pkg | |
| } else { | |
| let path = pkg_dir.join(package).join(format!("lib{package}.so")); | |
| manager.load(&path)? |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@components/spider-task-executor/src/bin/spider_task_executor.rs` around lines
73 - 77, Validate the incoming package string before using it in filesystem
joins: ensure the `package` argument contains only an allowed pattern (e.g.,
alphanumerics, underscores, hyphens) and does not contain path separators or
path traversal segments like ".."; perform this check in the same scope where
`manager.get(package)` / `manager.load(&path)` are used (referencing `package`,
`pkg_dir`, `manager.load`) and return an error (or reject) early if validation
fails. Use explicit checks (reject if package contains '/' or '\\' or if it
matches ".." or if normalization yields components outside a single final file
name) and then continue to build the path and call `manager.load(&path)` only
when the name is validated.
| //! * [`task_decl::instrument`] — fixed-cost task: sleeps for a known [`INSTRUMENT_SLEEP`] duration | ||
| //! then echoes its `Vec<String>` payload back. Used by the overhead bench so the non-sleep |
There was a problem hiding this comment.
Fix the rustdoc symbol link to the exported constant.
The doc comment references INSTRUMENT_SLEEP, but the declared constant is INSTRUMENT_SLEEP_US (Line 17), so the intra-doc link is inconsistent.
Proposed fix
-//! * [`task_decl::instrument`] — fixed-cost task: sleeps for a known [`INSTRUMENT_SLEEP`] duration
+//! * [`task_decl::instrument`] — fixed-cost task: sleeps for a known [`INSTRUMENT_SLEEP_US`] duration📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| //! * [`task_decl::instrument`] — fixed-cost task: sleeps for a known [`INSTRUMENT_SLEEP`] duration | |
| //! then echoes its `Vec<String>` payload back. Used by the overhead bench so the non-sleep | |
| //! * [`task_decl::instrument`] — fixed-cost task: sleeps for a known [`INSTRUMENT_SLEEP_US`] duration | |
| //! then echoes its `Vec<String>` payload back. Used by the overhead bench so the non-sleep |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/huntsman/integration-test-tasks/src/lib.rs` around lines 8 - 9, The doc
comment incorrectly links to INSTRUMENT_SLEEP while the exported constant is
named INSTRUMENT_SLEEP_US; update the intra-doc link in the comment referencing
task_decl::instrument to point to the actual constant name INSTRUMENT_SLEEP_US
(or rename the constant to match the doc link), ensuring the rustdoc intra-doc
link uses the exact symbol INSTRUMENT_SLEEP_US so the link resolves correctly.
| let output_dir = std::env::var_os(INSTRUMENT_OUTPUT_DIR_ENV).map_or_else( | ||
| || panic!("{INSTRUMENT_OUTPUT_DIR_ENV} env var not set"), | ||
| PathBuf::from, | ||
| ); |
There was a problem hiding this comment.
Ensure the output directory exists before file creation.
At Line 179, File::create will fail when SPIDER_TEST_INSTRUMENT_OUTPUT_DIR points to a path that does not already exist. Create the directory once right after parsing the env var.
Proposed fix
async fn instrument_overhead() {
let output_dir = std::env::var_os(INSTRUMENT_OUTPUT_DIR_ENV).map_or_else(
|| panic!("{INSTRUMENT_OUTPUT_DIR_ENV} env var not set"),
PathBuf::from,
);
+ std::fs::create_dir_all(&output_dir)
+ .unwrap_or_else(|err| panic!("create {} failed: {err}", output_dir.display()));
let mut handle = ExecutorHandle::spawn();📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let output_dir = std::env::var_os(INSTRUMENT_OUTPUT_DIR_ENV).map_or_else( | |
| || panic!("{INSTRUMENT_OUTPUT_DIR_ENV} env var not set"), | |
| PathBuf::from, | |
| ); | |
| let output_dir = std::env::var_os(INSTRUMENT_OUTPUT_DIR_ENV).map_or_else( | |
| || panic!("{INSTRUMENT_OUTPUT_DIR_ENV} env var not set"), | |
| PathBuf::from, | |
| ); | |
| std::fs::create_dir_all(&output_dir) | |
| .unwrap_or_else(|err| panic!("create {} failed: {err}", output_dir.display())); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/huntsman/task-executor/tests/overhead_instrument.rs` around lines 101 -
104, After reading INSTRUMENT_OUTPUT_DIR_ENV into output_dir, ensure the
directory exists by calling std::fs::create_dir_all(&output_dir) (or equivalent)
immediately after the PathBuf::from(...) call and before any File::create usage;
propagate or panic on errors consistently with the surrounding code so
File::create cannot fail due to a missing directory (reference symbols:
INSTRUMENT_OUTPUT_DIR_ENV, output_dir, and the subsequent File::create call).
Description
This PR depends on #326.
This PR adds the network-client trait surface for the execution manager. Three async traits cover the EM's outbound traffic to the rest of the cluster; concrete implementations bind to the corresponding remote services and will land in follow-up PRs.
Checklist
breaking change.
Validation performed
Summary by CodeRabbit
Release Notes