PredicateSystems
diff --git a/‎.dockerignore‎
Lines changed: 8 additions & 0 deletions b/‎.dockerignore‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎.gitignore‎
Lines changed: 6 additions & 0 deletions b/‎.gitignore‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎Dockerfile‎
Lines changed: 40 additions & 0 deletions b/‎Dockerfile‎
Lines changed: 40 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 165 additions & 109 deletions b/‎README.md‎
Lines changed: 165 additions & 109 deletions
@@ -0,0 +1,8 @@
+node_modules
+dist
+test-output
+*.log
+.git
+.github
+.env
+.env.*
@@ -27,3 +27,9 @@ coverage/
 
 # npm pack output
 *.tgz
+
+# Temp extension copy for Docker build
+extension/
+
+# Test output
+test-output/
@@ -0,0 +1,40 @@
+# Dockerfile for testing predicate-snapshot skill with OpenClaw
+FROM mcr.microsoft.com/playwright:v1.58.2-noble
+
+WORKDIR /app
+
+# Install Node.js 20
+RUN curl -fsSL https://deb.nodesource.com/setup_20.x | bash - && \
+    apt-get install -y nodejs
+
+# Install OpenClaw globally
+RUN npm install -g openclaw
+
+# Create OpenClaw skills directory
+RUN mkdir -p /root/.openclaw/skills/predicate-snapshot
+
+# Copy package files
+COPY package*.json ./
+
+# Install dependencies
+RUN npm ci
+
+# Copy source code
+COPY . .
+
+# Build TypeScript
+RUN npm run build
+
+# Copy built skill to OpenClaw skills directory
+RUN cp -r dist /root/.openclaw/skills/predicate-snapshot/ && \
+    cp package.json /root/.openclaw/skills/predicate-snapshot/ && \
+    cp SKILL.md /root/.openclaw/skills/predicate-snapshot/ && \
+    cp README.md /root/.openclaw/skills/predicate-snapshot/ && \
+    cd /root/.openclaw/skills/predicate-snapshot && npm install --omit=dev
+
+# Copy test scripts
+COPY test-skill.ts ./
+COPY test-openclaw-integration.sh ./
+
+# Default command - run the login demo
+CMD ["npm", "run", "demo:login"]
@@ -143,6 +143,63 @@ You might wonder: "Isn't 50 elements vs 24,567 elements comparing apples to oran
 
 **Future:** OpenClaw may add configuration to set Predicate as the default snapshot provider.
 
+---
+
+## ⚡ Usage with Autonomous Agents
+
+> **Important:** OpenClaw agents work autonomously—they don't wait for manual slash commands. Here's how to integrate Predicate snapshots into autonomous workflows.
+
+### Option 1: Include in Task Instructions (Recommended)
+
+Add Predicate snapshot instructions directly in your task prompt:
+
+```
+Navigate to amazon.com and find the cheapest laptop under $500.
+
+IMPORTANT: For page observation, use /predicate-snapshot instead of the
+default accessibility tree. Use /predicate-act to interact with elements
+by their ID from the snapshot.
+```
+
+### Option 2: Modify Agent System Prompt
+
+For consistent usage across all tasks, add to your agent's system prompt:
+
+```
+## Browser Observation
+When observing web pages, always use /predicate-snapshot instead of the
+default accessibility tree. This provides ML-ranked elements optimized
+for efficient decision-making (~500 tokens vs ~18,000 tokens).
+
+To interact with page elements:
+1. Call /predicate-snapshot to get ranked elements with IDs
+2. Call /predicate-act <action> <element_id> to perform actions
+```
+
+### Option 3: OpenClaw Config (Future)
+
+OpenClaw may add support for setting the default snapshot provider:
+
+```yaml
+# ~/.openclaw/config.yaml (proposed future feature)
+browser:
+  snapshot_provider: predicate-snapshot
+```
+
+### Why This Matters
+
+Without explicit instructions, the agent will use OpenClaw's default accessibility tree, which:
+- Sends ~18,000 tokens per page observation
+- Includes thousands of irrelevant elements
+- Costs more and runs slower
+
+By instructing the agent to use `/predicate-snapshot`, you get:
+- ~500 tokens per observation (97% reduction)
+- Only the 50 most relevant elements
+- Faster, cheaper, more accurate automation
+
+---
+
 ## Usage
 
 ### Capture Snapshot
@@ -203,7 +260,78 @@ Each ML-powered snapshot consumes 1 credit. Local snapshots are free.
 
 ## Development
 
-### Run Demo
+### Run in Docker (Recommended for Safe Testing)
+
+Docker provides an **isolated environment** for testing browser automation—no risk to your local machine, browser profiles, or credentials.
+
+```bash
+cd predicate-snapshot-skill
+
+# Run the skill MCP tools test (no API keys required)
+./docker-test.sh skill
+
+# Run the login demo (requires LLM API key)
+./docker-test.sh demo:login
+```
+
+**Test options:**
+
+| Command | What it tests | API Keys Required? |
+|---------|---------------|-------------------|
+| `./docker-test.sh` | Skill MCP tools & browser integration | No |
+| `./docker-test.sh skill` | Same as above (explicit) | No |
+| `./docker-test.sh openclaw` | OpenClaw full runtime integration | No |
+| `./docker-test.sh demo:login` | Full 6-step login workflow | Yes (LLM) |
+| `./docker-test.sh demo` | Basic token comparison | Yes (LLM) |
+
+**Passing API Keys:**
+
+The `demo:login` and `demo` tests require at least one LLM API key (OpenAI or Anthropic) for element selection:
+
+```bash
+# Option 1: Export environment variables
+export OPENAI_API_KEY="sk-..."          # OpenAI API key
+export ANTHROPIC_API_KEY="sk-ant-..."   # OR Anthropic API key
+export PREDICATE_API_KEY="sk-..."       # Optional: for ML-ranked snapshots
+
+./docker-test.sh demo:login
+
+# Option 2: Inline (single command)
+OPENAI_API_KEY="sk-..." PREDICATE_API_KEY="sk-..." ./docker-test.sh demo:login
+
+# Option 3: Using docker-compose with .env file
+# Create a .env file with your keys:
+echo "OPENAI_API_KEY=sk-..." >> .env
+echo "PREDICATE_API_KEY=sk-..." >> .env
+docker-compose up demo-login
+```
+
+| API Key | Required? | Purpose |
+|---------|-----------|---------|
+| `OPENAI_API_KEY` | One of these required | LLM for element selection |
+| `ANTHROPIC_API_KEY` | One of these required | LLM for element selection |
+| `PREDICATE_API_KEY` | Optional | ML-ranked snapshots (reduces noise & tokens) |
+
+**Why Docker is safer:**
+
+| Concern | Docker Isolation |
+|---------|------------------|
+| Browser profile | Fresh Chromium instance, no cookies or history |
+| Network traffic | Contained, won't trigger corporate firewalls |
+| File system | Only `./test-output/` is mounted |
+| Credentials | None stored—test site uses fake credentials |
+
+**Using docker-compose:**
+
+```bash
+docker-compose up skill-test      # Skill MCP tools test
+docker-compose up openclaw-test   # OpenClaw full runtime test
+docker-compose up demo-login      # Login demo
+```
+
+The test uses a purpose-built test site (`https://www.localllamaland.com/login`) with fake credentials (`testuser` / `password123`)—no real accounts involved.
+
+### Run Demo (Local)
 
 Compare token usage between accessibility tree and Predicate snapshot:
 
@@ -315,134 +443,62 @@ This demo compares A11y Tree vs Predicate Snapshot across **all 6 steps**, measu
 
 #### Key Observations
 
-| Metric | A11y Tree | Predicate Snapshot | Delta |
+| Metric | OpenClaw A11y Tree Snapshot | Predicate Snapshot | Delta |
 |--------|-----------|-------------------|-------|
-| **Steps Completed** | 3/6 (failed at step 4) | **6/6** | Predicate wins |
-| **Token Savings** | baseline | **70-74% per step** | Significant |
-| **SPA Hydration** | No built-in wait | **`check().eventually()` handles it** | More reliable |
-
-**Why A11y Tree Failed at Step 4:**
-
-The A11y (accessibility tree) approach failed to click the login button because:
-
-1. **Element ID mismatch**: The A11y tree assigns sequential IDs based on DOM traversal order, which can change between snapshots as the SPA re-renders. The LLM selected element 47 ("Sign in"), but that ID no longer pointed to the button after form state changed.
+| **Steps Completed** | 6/6 | **6/6** | Both pass |
+| **Total Tokens** | 5,366 | **1,565** | **-71%** |
+| **Token Savings** | baseline | **67-74% per step** | Significant |
 
-2. **No stable identifiers**: Unlike Predicate's `data-predicate-id` attributes (injected by the browser extension), A11y IDs are ephemeral and not anchored to the actual DOM elements.
+**Why Predicate Snapshot is better:**
 
-3. **SPA state changes**: After filling both form fields, the button transitioned from disabled → enabled. This state change can cause the A11y tree to re-order elements, invalidating the LLM's element selection.
-
-**Predicate Snapshot succeeded because:**
-- `data-predicate-id` attributes are stable across re-renders
-- ML-ranking surfaces the most relevant elements (button with "Sign in" text)
-- `runtime.check().eventually()` properly waits for SPA hydration
+1. **Dramatic token reduction**: 71% fewer tokens across the entire workflow (5,366 → 1,565 tokens)
+2. **ML-ranked elements**: Only the most relevant interactable elements are included with enough context, reducing noise
+3. **Stable identifiers**: `data-predicate-id` attributes survive SPA re-renders
+4. **`runtime.check().eventually()`**: Properly waits for SPA hydration before capturing snapshots
 
 #### Raw Demo Logs
 
+Full Docker demo output: [pastebin.com/ksETcQ4C](https://pastebin.com/ksETcQ4C)
+
 <details>
-<summary>Click to expand full demo output</summary>
+<summary>Click to expand results summary</summary>
 
 ```
-======================================================================
- LOGIN + PROFILE CHECK: A11y Tree vs. Predicate Snapshot
-======================================================================
-Using OpenAI provider
-Model: gpt-4o-mini
-Running in headed mode (visible browser window)
-Overlay enabled: elements will be highlighted with green borders
-Predicate snapshots: REAL (ML-ranked)
-======================================================================
-
-======================================================================
- Running with A11Y approach
-======================================================================
-
-[2026-02-25 01:14:50] Step 1: Wait for login form hydration
-  Waiting for form to hydrate using runtime.check().eventually()...
-  Button initially disabled: false
-  PASS (11822ms) | Found 19 elements
-
-[2026-02-25 01:15:02] Step 2: Fill username field
-  Snapshot: 45 elements, 1241 tokens
-  LLM chose element 37: "Username"
-  PASS (6771ms) | Typed "testuser"
-  Tokens: prompt=1241 total=1251
-
-[2026-02-25 01:15:08] Step 3: Fill password field
-  LLM chose element 42: "Password"
-  Waiting for login button to become enabled...
-  PASS (12465ms) | Button enabled: true
-  Tokens: prompt=1295 total=1305
-
-[2026-02-25 01:15:21] Step 4: Click login button
-  LLM chose element 47: "Sign in"
-  FAIL (7801ms) | Navigated to https://www.localllamaland.com/login
-  Tokens: prompt=1367 total=1377
-
-======================================================================
- Running with PREDICATE approach
-======================================================================
-
-[2026-02-25 01:15:29] Step 1: Wait for login form hydration
-  Waiting for form to hydrate using runtime.check().eventually()...
-  Button initially disabled: false
-  PASS (10586ms) | Found 19 elements
-
-[2026-02-25 01:15:40] Step 2: Fill username field
-  Snapshot: 19 elements, 351 tokens
-  LLM chose element 23: "username"
-  PASS (12877ms) | Typed "testuser"
-  Tokens: prompt=351 total=361
-
-[2026-02-25 01:15:53] Step 3: Fill password field
-  LLM chose element 25: "Password"
-  Waiting for login button to become enabled...
-  PASS (17886ms) | Button enabled: true
-  Tokens: prompt=352 total=362
-
-[2026-02-25 01:16:10] Step 4: Click login button
-  LLM chose element 29: "Sign in"
-  PASS (12690ms) | Navigated to https://www.localllamaland.com/profile
-  Tokens: prompt=346 total=356
-
-[2026-02-25 01:16:23] Step 5: Navigate to profile page
-  PASS (1ms) | Already on profile page
-
-[2026-02-25 01:16:23] Step 6: Extract username from profile
-  Waiting for profile card to load...
-  Found username: testuser@localllama.land
-  Found email: Profile testuser testuser@localllama.lan
-  PASS (20760ms) | username=testuser@localllama.land
-  Tokens: prompt=480 total=480
-
 ======================================================================
  RESULTS SUMMARY
 ======================================================================
 
 +-----------------------------------------------------------------------+
 | Metric              | A11y Tree        | Predicate        | Delta     |
 +-----------------------------------------------------------------------+
-| Total Tokens        |             3933 |             1559 | -60%      |
-| Total Latency (ms)  |            38859 |            74800 | +92%      |
-| Steps Passed        |              3/6 |              6/6 |           |
+| Total Tokens        |             5366 |             1565 | -71%      |
+| Total Latency (ms)  |            51675 |            75555 | +46%      |
+| Steps Passed        |              6/6 |              6/6 |           |
 +-----------------------------------------------------------------------+
 
-Key Insight: Predicate snapshots use 60% fewer tokens
+Key Insight: Predicate snapshots use 71% fewer tokens
 for a multi-step login workflow with form filling.
 
 Step-by-step breakdown:
 ----------------------------------------------------------------------
 Step 1: Wait for login form hydration
-  A11y: 0 tokens, 11822ms, PASS
-  Pred: 0 tokens, 10586ms, PASS (0% savings)
+  A11y: 0 tokens, 12060ms, PASS
+  Pred: 0 tokens, 10792ms, PASS (0% savings)
 Step 2: Fill username field
-  A11y: 1251 tokens, 6771ms, PASS
-  Pred: 361 tokens, 12877ms, PASS (71% savings)
+  A11y: 1251 tokens, 7613ms, PASS
+  Pred: 361 tokens, 12324ms, PASS (71% savings)
 Step 3: Fill password field
-  A11y: 1305 tokens, 12465ms, PASS
-  Pred: 362 tokens, 17886ms, PASS (72% savings)
+  A11y: 1305 tokens, 13691ms, PASS
+  Pred: 362 tokens, 18410ms, PASS (72% savings)
 Step 4: Click login button
-  A11y: 1377 tokens, 7801ms, FAIL
-  Pred: 356 tokens, 12690ms, PASS (74% savings)
+  A11y: 1377 tokens, 7909ms, PASS
+  Pred: 362 tokens, 13233ms, PASS (74% savings)
+Step 5: Navigate to profile page
+  A11y: 0 tokens, 1ms, PASS
+  Pred: 0 tokens, 0ms, PASS (0% savings)
+Step 6: Extract username from profile
+  A11y: 1433 tokens, 10401ms, PASS
+  Pred: 480 tokens, 20796ms, PASS (67% savings)
 ```
 
 </details>
@@ -451,15 +507,15 @@ Step 4: Click login button
 
 | Step | A11y Tree | Predicate Snapshot | Token Savings |
 |------|-----------|-------------------|---------------|
-| Step 1: Navigate to localllamaland.com/login | PASS | PASS | - |
+| Step 1: Wait for login form hydration | PASS | PASS | - |
 | Step 2: Fill username | 1,251 tokens, PASS | 361 tokens, PASS | **71%** |
 | Step 3: Fill password | 1,305 tokens, PASS | 362 tokens, PASS | **72%** |
-| Step 4: Click login | 1,377 tokens, **FAIL** | 356 tokens, PASS | **74%** |
-| Step 5: Navigate to profile | (not reached) | PASS | - |
-| Step 6: Extract username | (not reached) | 480 tokens, PASS | - |
-| **Total** | **3,933 tokens, 3/6 steps** | **1,559 tokens, 6/6 steps** | **60%** |
+| Step 4: Click login | 1,377 tokens, PASS | 362 tokens, PASS | **74%** |
+| Step 5: Navigate to profile | PASS | PASS | - |
+| Step 6: Extract username | 1,433 tokens, PASS | 480 tokens, PASS | **67%** |
+| **Total** | **5,366 tokens, 6/6 steps** | **1,565 tokens, 6/6 steps** | **71%** |
 
-> **Key Insight:** Predicate Snapshot not only reduces tokens by 70%+ per step, but also **improves automation reliability** on SPAs with automatic wait for hydration via `runtime.check().eventually()`. The stable element IDs survive React/Next.js re-renders that break A11y tree-based approaches.
+> **Key Insight:** Predicate Snapshot reduces tokens by **67-74% per step** while maintaining the same pass rate. For multi-step workflows, this translates to significant cost savings and faster LLM inference.
 
 ### Build
-Original file line number
+Diff line change
@@ @@ -0,0 +1,8 @@ @@
 +node_modules
 +dist
 +test-output
 +*.log
 +.git
 +.github
 +.env
 +.env.*