Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
15 commits
Select commit Hold shift + click to select a range
8370d6e
feat(agent): add dynamic system prompt infrastructure and optimizatio…
Mingye-Lu Jun 10, 2026
fd5936f
feat(agent): add HTML diff mode, loop detection, page fingerprinting,…
Mingye-Lu Jun 10, 2026
2689427
feat(agent): add failure classification, self-healing selectors, acti…
Mingye-Lu Jun 10, 2026
8f94f78
feat(browser,runtime): add component enrichment, cleaning profiles, b…
Mingye-Lu Jun 11, 2026
f699f7f
chore: update CHANGELOG.md for optimization features
Mingye-Lu Jun 11, 2026
7e4b9a5
fix: resolve clippy --all-targets warnings (raw strings, float_cmp, t…
Mingye-Lu Jun 11, 2026
231b2f9
docs: document optimization settings and performance features
Mingye-Lu Jun 11, 2026
ec90e16
fix(runtime): thread session model through struct and re-export
Mingye-Lu Jun 11, 2026
8e35b8a
fix(runtime): use model-specific pricing for cumulative cost and per-…
Mingye-Lu Jun 11, 2026
169b34b
fix(agent): wire failure_classification flag and remove execute_js fr…
Mingye-Lu Jun 11, 2026
de85a99
fix(agent): correct execute() pipeline order and make DynamicPromptCo…
Mingye-Lu Jun 11, 2026
cd1b101
fix(agent): wire page fingerprinting and content-aware profile select…
Mingye-Lu Jun 11, 2026
c28a9dc
fix(ui,tui,mcp): propagate session model to child agents and fix assi…
Mingye-Lu Jun 11, 2026
cfebe8d
fix(mcp): persist CrawlState across direct-tool calls to enable html_…
Mingye-Lu Jun 11, 2026
2151b91
style: cargo fmt mcp-server execute_browser_tool call site
Mingye-Lu Jun 11, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 43 additions & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

```bash
cargo build --release # produce ./target/release/acrawl
cargo test --workspace # run full test suite (~770 tests)
cargo test --workspace # run full test suite (~1,100 tests)
cargo test -p <crate> <test_name> # run a single test (e.g. -p agent mvp_tool_specs_contains_expected_21_tools)
cargo clippy --workspace --all-targets -- -D warnings # lints must be clean (workspace lints set pedantic = warn)
cargo fmt --check # format check
Expand Down Expand Up @@ -78,6 +78,48 @@ Default model comes from the `default_model` field in the active provider's `Sto

`agent::mvp_tool_specs()` returns the canonical 21-tool list with JSON schemas and required permission. When you add or rename a tool, update `mvp_tool_specs`, add a handler in `tools/mod.rs`, and adjust the count assertion in `crates/agent/src/lib.rs` tests.

## Optimization layer

14 vendor-derived optimizations live in `crates/agent/src/` and `crates/runtime/src/`. All are gated by `settings.optimization.*` fields (all default OFF). The pattern every optimization follows:

### Shared infrastructure (must understand before touching any optimization)

**`DynamicPromptContext`** (`crates/agent/src/prompt.rs`) — four optional string fields (`stagnation_alert`, `planning_guidance`, `budget_warning`, `loop_nudge`). `build_system_prompt(specs, Some(&ctx))` appends the context as section 9 of the system prompt.

**Arc slot pattern** — `CrawlerAgent` and `ConversationRuntime` share two Arc slots created in `run_with_system_prompt()`:
- `prompt_override: Arc<Mutex<Option<Vec<String>>>>` — agent writes a new full system prompt here after any tool execution; runtime applies it before the next API call in `prepare_iteration()`.
- `last_assistant_text: Arc<Mutex<Option<String>>>` — runtime writes the latest assistant response text here; agent reads it for confidence parsing.
- `cumulative_cost: Arc<AtomicU64>` (millicents) — runtime updates it after each usage record; agent reads it for budget enforcement.

All three slots are internal to `ConversationRuntime` (not constructor parameters) but accessible via getters. The agent gets the cost counter via `runtime.cumulative_cost_counter()` after construction.

### Per-optimization modules

| Module | Location | What it adds to `CrawlState` / `CrawlerAgent` |
|--------|----------|-----------------------------------------------|
| `page_fingerprint` | `crates/agent/src/page_fingerprint.rs` | `CrawlState.page_fingerprints: Vec<PageFingerprint>` |
| `tools/html_diff` | `crates/agent/src/tools/html_diff.rs` | `CrawlState.html_diff_tracker: Option<HtmlDiffTracker>` |
| `loop_detector` | `crates/agent/src/loop_detector.rs` | `CrawlState.loop_detector: Option<LoopDetector>` |
| `failure_classifier` | `crates/agent/src/failure_classifier.rs` | (pure function — no state) |
| `self_healing` | `crates/agent/src/self_healing.rs` | (pure function — no state) |
| `action_cache` | `crates/agent/src/action_cache.rs` | `CrawlState.action_cache: Option<ActionCache>` |
| `confidence` | `crates/agent/src/confidence.rs` | `CrawlerAgent.confidence_tracker: Option<ConfidenceTracker>` |
| `budget` | `crates/runtime/src/budget.rs` | `CrawlerAgent.cumulative_cost_slot: SharedCostCounter` |

### Where optimizations run

All optimization logic runs inside `CrawlerAgent::execute()` in `crates/agent/src/implementation/mod.rs`. The execution order (each guarded by its settings flag):
1. **Action cache lookup** — before the tool runs (returns cached result if hit)
2. **Tool execution** — normal handler dispatch
3. **Self-healing retry** — on SelectorNotFound/SelectorAmbiguous
4. **Loop detection** — records action + fingerprint, writes nudge to prompt_override_slot
5. **Planning interval** — injects planning/execution guidance at step N
6. **Confidence tracking** — reads last_assistant_text slot, parses `[confidence: ...]`
7. **Budget enforcement** — reads cumulative_cost_slot, warns or blocks
8. **Action cache store** — stores result after successful read-only tool call

`CrawlState` fields are ephemeral (never persisted to session files). Adding a new field requires no serde changes.

## Conventions specific to this repo

- **Always run `cargo fmt` before committing.** CI checks formatting with `cargo fmt --check` — commits that fail this check will be rejected.
Expand Down
19 changes: 19 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,25 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Added

- **HTML Diff Mode** (`optimization.html_diff_mode`) — on repeated visits to the same URL, only changed content sections are returned with `[unchanged: N sections]` markers, reducing token usage 50–70% on multi-turn sessions.
- **Action Loop Detection** (`optimization.loop_detection`) — rolling-window action hash detects repeated identical actions with escalating nudges (soft at 5, medium at 8, strong at 12 repeats); page stagnation detection after 5 consecutive identical page fingerprints.
- **Page Fingerprinting** (`optimization.page_fingerprinting`) — lightweight FNV-1a fingerprint (url + element_count + first-1000-char text hash) stored in CrawlState; used by loop detection and action caching for cache invalidation.
- **Planning Interval** (`optimization.planning_interval`) — every N steps injects planning-checkpoint or execution-mode guidance into the dynamic prompt; disabled by default (interval=0).
- **Failure Classification** (`optimization.failure_classification`) — 16-category keyword-based error taxonomy (zero LLM cost); `classify()` maps error messages to SelectorNotFound, CaptchaDetected, RateLimited, etc.; `retry_strategy()` returns RetryWithHealing, RetryWithDelay, NoRetry, or ResetAndRetry per category.
- **Self-Healing Selectors** (`optimization.self_healing`) — on SelectorNotFound/SelectorAmbiguous, fetches a fresh page_map and text-matches to the correct element ref; logs `[healed: @eOLD → @eNEW]`; zero LLM calls; max retries configurable (default 2).
- **Action Caching** (`optimization.action_caching`) — in-memory FNV-1a keyed cache for read-only tools (`page_map`, `read_content`, `list_resources`, `execute_js`); invalidated on page fingerprint change; TTL-based expiry (default 30s); interaction tools never cached.
- **Confidence Tracking** (`optimization.confidence_tracking`) — parses `[confidence: HIGH/MEDIUM/LOW]` from assistant responses; 2+ consecutive LOWs triggers stagnation alert via DynamicPromptContext; advisory only, never blocks.
- **Compound Component Enrichment** (`optimization.compound_enrichment`) — extends interactive element JSON with an `enrichment` field for complex form controls: date format hints, range min/max/step/value, number bounds, select option lists (max 20 + overflow count), file accept types, textarea maxlength. Max 200 bytes/element.
- **Content-Aware Cleaning Profiles** (`optimization.content_aware_profiles`) — `CleaningProfile` enum (Default/Minimal/Aggressive/ReadingMode) auto-selected by task keyword and content size; `select_profile()` picks ReadingMode for extraction tasks, Minimal for interaction tasks, Aggressive for content > 50KB.
- **Budget Enforcement** (`optimization.budget_max_session_cost_usd`, `optimization.budget_enforcement`) — `BudgetEnforcer` with Warn/Block modes; Warn injects budget warning into the dynamic prompt at configurable threshold (default 80%); Block terminates the agent loop cleanly when the cost limit is reached.
- **Per-Agent Cost Attribution** (`optimization.per_agent_cost_tracking`) — `build_cost_breakdown()` walks flat child sessions and reconstructs per-child cost via UsageTracker; `/cost` command shows per-agent breakdown when flag is ON.
- **Dynamic System Prompt Infrastructure** — `DynamicPromptContext` struct with four optional fields (stagnation_alert, planning_guidance, budget_warning, loop_nudge); injected as section 9 of the system prompt via a shared `Arc<Mutex<>>` slot; all optimizations write to this slot, runtime picks up on the next iteration.
- **Optimization Settings Schema** — nested `OptimizationSettings` struct in `Settings` with 18 fields, all `Option<T>` and defaulting to OFF for backward compatibility; 18 `settings_get_*` getter functions.

## [0.9.1] - 2026-06-10

### Changed
Expand Down
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

62 changes: 61 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -722,6 +722,29 @@ Created with defaults on first run.
| `browser_backend` | `null` | Active browser backend: `"extension"` or `null` (CloakBrowser) |
| `extension_bridge_port` | `19876` | Port for Chrome extension bridge WebSocket server |

All fields are optional; omitting a field uses the default. The `optimization` block accepts a nested object with the following fields (all default to `false`/`0`/`null`, safe to omit entirely):

| Field | Default | Description |
|-------|---------|-------------|
| `html_diff_mode` | `false` | On repeated visits to the same URL, returns only changed content sections with `[unchanged: N sections]` markers. 50 to 70% token reduction on multi-turn sessions. No behavior change on first visit. |
| `content_aware_profiles` | `false` | Auto-selects a cleaning profile based on the task keyword: ReadingMode for extraction tasks, Minimal for interaction tasks, Aggressive for content > 50KB. |
| `loop_detection` | `false` | Detects repeated identical actions and injects escalating nudges (soft, medium, strong). Also detects page stagnation. |
| `loop_detection_window` | `20` | Rolling window size for action hash comparison. |
| `loop_nudge_threshold` | `5` | Number of repeated actions before first nudge fires. |
| `page_fingerprinting` | `false` | Enables lightweight page fingerprints used by loop detection and action caching. |
| `failure_classification` | `false` | Classifies errors into 16 categories (SelectorNotFound, CaptchaDetected, RateLimited, etc.) using keyword matching. Zero LLM cost. |
| `self_healing` | `false` | On SelectorNotFound/SelectorAmbiguous, fetches a fresh page_map and text-matches to a replacement element ref. Logs `[healed: @eOLD -> @eNEW]`. Zero LLM calls. |
| `self_healing_max_retries` | `2` | Max healing attempts per failed action. |
| `action_caching` | `false` | Caches results of read-only tools (`page_map`, `read_content`, `list_resources`, `execute_js`) keyed by tool + input + page fingerprint. Cache is invalidated when the page changes. |
| `action_cache_ttl_secs` | `30` | Cache entry TTL in seconds. |
| `planning_interval` | `0` | Every N steps, injects a planning checkpoint into the system prompt. 0 = disabled. |
| `confidence_tracking` | `false` | Asks the LLM to self-report confidence after each action (`[confidence: HIGH/MEDIUM/LOW]`). Two consecutive LOWs trigger a stagnation alert. |
| `compound_enrichment` | `false` | Adds `enrichment` metadata to complex form controls in page_map: date format hints, range min/max/value, select option lists (max 20 + overflow count), file accept types, textarea maxlength. Max 200 bytes per element. |
| `budget_max_session_cost_usd` | `null` | Session cost limit in USD. Null = no limit. |
| `budget_enforcement` | `null` | How to enforce the budget: `warn` injects a warning into the prompt; `block` terminates the session when the limit is reached. |
| `budget_warn_threshold_pct` | `80` | Percentage of budget at which warnings start. |
| `per_agent_cost_tracking` | `false` | When ON, `/cost` shows a per-child-agent cost breakdown. |

### Environment Variables

| Variable | Description |
Expand All @@ -730,6 +753,43 @@ Created with defaults on first run.

Provider-specific env vars (see [provider table](#24-llm-providers) above) are read as fallbacks when no `credentials.json` entry exists.

### Performance Optimizations

acrawl ships 14 vendor-derived optimizations (sourced from browser-use, Stagehand, crawl4ai, Skyvern, Spider, nanobrowser, and ZeroClaw). All are **disabled by default**, enable selectively via `settings.json`.

Example `settings.json` with a cost-optimized profile:

```json
{
"optimization": {
"html_diff_mode": true,
"action_caching": true,
"page_fingerprinting": true,
"loop_detection": true,
"self_healing": true,
"budget_max_session_cost_usd": 0.50,
"budget_enforcement": "warn"
}
}
```

| Optimization | Flag | Benefit |
|--------------|------|---------|
| **HTML Diff Mode** | `html_diff_mode` | Reduces tokens by 50 to 70% on repeated visits by returning only changed content. |
| **Content-Aware Profiles** | `content_aware_profiles` | Auto-selects cleaning profiles (ReadingMode, Minimal, Aggressive) based on task. |
| **Loop Detection** | `loop_detection` | Prevents infinite loops by detecting repeated actions and injecting nudges. |
| **Page Fingerprinting** | `page_fingerprinting` | Generates lightweight page fingerprints for loop detection and action caching. |
| **Failure Classification** | `failure_classification` | Classifies errors into 16 categories using keyword matching with zero LLM cost. |
| **Self-Healing** | `self_healing` | Automatically heals broken selectors using text-matching with zero LLM calls. |
| **Action Caching** | `action_caching` | Caches read-only tool results to avoid redundant LLM calls. |
| **Planning Interval** | `planning_interval` | Injects periodic planning checkpoints to keep the agent focused. |
| **Confidence Tracking** | `confidence_tracking` | Tracks LLM self-reported confidence to alert on stagnation. |
| **Compound Enrichment** | `compound_enrichment` | Enriches complex form controls in the page map with metadata. |
| **Budget Limit** | `budget_max_session_cost_usd` | Sets a hard session cost limit in USD to prevent runaway costs. |
| **Budget Enforcement** | `budget_enforcement` | Controls whether to warn or block when the session budget is reached. |
| **Budget Warning** | `budget_warn_threshold_pct` | Triggers warnings when a percentage of the budget is consumed. |
| **Per-Agent Cost Tracking** | `per_agent_cost_tracking` | Breaks down costs per child agent in the `/cost` command. |

## Known Limitations

acrawl works well on most public web content, but some situations are outside what the agent can reliably handle:
Expand Down Expand Up @@ -780,7 +840,7 @@ crates/
commands/ 17 slash commands with resume-safety annotations
```

11 crates, ~38K lines of Rust, 770 tests.
11 crates, ~40K lines of Rust, 1,097 tests.

## Development

Expand Down
1 change: 1 addition & 0 deletions crates/agent/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ regex = "1"
runtime = { path = "../runtime" }
script = { path = "../script" }
serde_json = "1"
sha2 = "0.10"
time = { version = "0.3", features = ["formatting"] }
tokio = { version = "1", features = ["sync", "time", "fs"] }
tokio-util = { version = "0.7", default-features = false }
Expand Down
Loading
Loading