Skip to content

feat(script): Autonomous Script Protocol#38

Merged
Mingye-Lu merged 16 commits into
mainfrom
feat/script-protocol
Jun 9, 2026
Merged

feat(script): Autonomous Script Protocol#38
Mingye-Lu merged 16 commits into
mainfrom
feat/script-protocol

Conversation

@Mingye-Lu

Copy link
Copy Markdown
Owner

Summary

Adds a new Autonomous Script Protocol that lets the LLM (or an external MCP client) execute deterministic multi-step browser automation without per-step LLM round-trips — dramatically faster and cheaper for repetitive page patterns.

What's new

New crate: crates/script/

  • AST grammar (ScriptDefinition, ScriptNode, Expression)
  • Parser + validator (parse_script, validate_script)
  • Persistence layer (save_script_to_disk, load_script_from_disk, list_scripts_on_disk)

7 new tools (exposed via agent loop and MCP server):

Tool Description
run_script Execute an inline script or load one by name; returns script_id immediately
wait_for_scripts Block until script(s) complete; returns full ScriptResult
script_status Non-blocking poll of running script state
cancel_script Abort a running script
save_script Persist a script definition to ~/.acrawl/scripts/<name>.json
list_scripts List all saved scripts with ISO 8601 timestamps
read_script Read back a saved script definition

Script nodes: tool_call, assign, collect, yield, for_loop, for_each, while_loop, if_else, try_catch, parallel

Execution engine (crates/agent/src/script_executor/):

  • Step counter, wall-clock timeout, per-step timeout, output byte limit, cancellation token
  • Parallel branches share step counter + cancel token; each branch opens its own browser page
  • errors_caught and output_bytes propagated back from parallel branches to parent

Bugs fixed (post code-review)

Severity Fix
Critical Expression serde tag changed from internally-tagged to adjacently-tagged (#[serde(tag="kind", content="value")]) — Literal, Variable, JsEval now deserialize correctly from JSON
Important run_script ToolSpec schema now exposes name, save_as, limits; removed internal __load_from_disk
Important max_output_bytes enforced via push_extracted/push_yielded helpers
Important cleanup_completed() removed from spawn_script — completed scripts survive until explicitly wait_for_scripts-ed
Important validate_script_name consolidated into persistence.rs; rejects leading dash, dots, path traversal
Important list_scripts modified_at now returns ISO 8601 UTC (e.g. 2026-06-09T13:39:09Z)
Bug spawn_script in MCP server wrapped in rt.block_on — was panicking with no reactor running and killing the server process

Test coverage

  • 17 parser unit tests (including parse_script_expression_round_trip covering all 5 Expression variants through serde round-trip)
  • 31 script executor unit tests (including collect_over_output_byte_limit_fails, yield_over_output_byte_limit_fails)
  • 1 script manager unit test (completed_script_survives_subsequent_spawn_check)
  • 11 script integration tests
  • 1 MCP stdio integration test (stdio_server_run_script_returns_script_id_and_survives)

E2E verified (live MCP session)

All 7 tools exercised end-to-end against the running MCP server:

run_script  → script_id returned, server alive        ✓
wait_for_scripts → status:Completed, extracted_data:[42], yielded_data:["done"]  ✓
run_script (navigate + literal + variable) → extracted_data:["Example Domain"]  ✓
script_status → live step/items_collected/elapsed_secs  ✓
cancel_script → status:Cancelled, items_collected:0  ✓
save_script → saved to disk  ✓
list_scripts → ISO 8601 modified_at  ✓
read_script → full definition round-tripped including field_access expression  ✓

Mingye-Lu and others added 16 commits June 8, 2026 21:01
Change Expression enum from internally-tagged (#[serde(tag = kind)]) to
adjacently-tagged (#[serde(tag = kind, content = value)]) so that newtype
variants (Literal, Variable, JsEval) deserialize correctly from JSON.

Update expression_to_value in parser.rs to wrap FieldAccess and ArrayIndex
struct fields under a value key, matching the new wire format.

Add parse_script_expression_round_trip test covering all five Expression
variants through serde_json::to_value -> parse_script.

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
…mestamp

Strengthen persistence::validate_script_name to reject leading dashes,
dots, and non-normal path components (matching the stricter rules
previously duplicated in save_script.rs and read_script.rs).

Remove the duplicated local validate_script_name from save_script.rs and
read_script.rs; both now delegate to script::persistence::validate_script_name.

Fix format_system_time in list_scripts.rs: was using Debug format
({:?}) producing unreadable output; now returns Unix epoch seconds.

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
The run_script handler already accepted name (load saved script),
save_as (persist after run), and limits (override defaults), but the
ToolSpec input_schema only declared script and the internal
__load_from_disk field, hiding the other params from the LLM.

Replace __load_from_disk with the three user-facing properties and
update the instructions field to reference name instead of the internal
marker.

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
…helpers

ScriptLimits::max_output_bytes was set by effective_limits() but never
checked during execution; extracted_data and yielded_data could grow
without bound.

Add output_bytes: usize field to ScriptExecutor. Replace the inline
Collect/Yield node handling in mod.rs with calls to push_extracted and
push_yielded from data.rs; both helpers now check and accumulate the byte
count, returning ScriptExecutionError::ToolError on overflow.

Remove the #[allow(dead_code)] on the data.rs impl block and delete the
three helpers that were never called (store_variable, variables,
extracted_data). Fix the doc-comment Expression examples to match the
adjacently-tagged wire format (kind + value).

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
…eted spawn race

Two executor-layer safety fixes:

1. ParallelBranchResult now carries errors_caught and output_bytes.
   After all branches complete the parent merges both counters back,
   so TryCatch nodes inside parallel branches correctly contribute to
   the overall errors_caught tally and the output byte budget.

2. Remove the cleanup_completed() call from spawn_script(). It was
   removing finished scripts from the map before the caller could
   retrieve results via wait_for_scripts, producing spurious NotFound
   errors for fast-completing scripts. check_can_spawn() already
   counts only running (non-finished) handles so the concurrent cap
   is unaffected.

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
The wait_for_subagents bullet inside section_parallel_exploration had
extra leading spaces (9) vs its sibling bullets (7), producing slightly
misaligned output. Normalize to the consistent 7-space indent.

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
…mpleted race

Two tests that directly exercise the fix behaviours that cannot be
driven through the MCP server in a headless shell (run_script requires
an active Playwright browser):

collect_over_output_byte_limit_fails / yield_over_output_byte_limit_fails
  Verify that push_extracted / push_yielded return ScriptStatus::Failed
  with an 'output size limit exceeded' message when the accumulated
  output exceeds ScriptLimits::max_output_bytes.

completed_script_survives_subsequent_spawn_check
  Pre-populates ScriptManager with a finished entry, calls
  check_can_spawn (formerly spawn_script would call cleanup_completed
  here), and asserts the completed entry is still retrievable via
  get_status — proving the cleanup race is gone.

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
…me context

spawn_script calls tokio::task::spawn internally but was being invoked
from synchronous code outside any block_on, causing an immediate panic:

  'there is no reactor running, must be called from the context of
   a Tokio 1.x runtime'

Wrap the call in rt.block_on(async { ... }) to enter the runtime
context before spawning, matching the pattern already used by
wait_for_scripts.

Add stdio integration test (stdio_server_run_script_returns_script_id_and_survives)
that drives the full run_script -> wait_for_scripts flow through the
MCP binary, verifying the server stays alive and returns correct
extracted_data / yielded_data.

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
…ix epoch

format_system_time previously returned a bare Unix epoch integer
(e.g. '1780991949'). Now uses time::OffsetDateTime + Rfc3339 to
return a human-readable UTC timestamp (e.g. '2026-06-09T13:39:09Z').

Uses i64::try_from to avoid the clippy::cast_possible_wrap lint on
u64->i64 conversion.

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
… test

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
@Mingye-Lu Mingye-Lu merged commit 0eb6c8f into main Jun 9, 2026
4 checks passed
@Mingye-Lu Mingye-Lu deleted the feat/script-protocol branch June 9, 2026 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant