Skip to content

Commit b41cb5e

Browse files
authored
Merge pull request #4 from PredicateSystems/verify_local
Verify skill in docker container
2 parents b34a8f3 + dbb2f8c commit b41cb5e

12 files changed

Lines changed: 1393 additions & 112 deletions

.dockerignore

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
node_modules
2+
dist
3+
test-output
4+
*.log
5+
.git
6+
.github
7+
.env
8+
.env.*

.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,3 +27,9 @@ coverage/
2727

2828
# npm pack output
2929
*.tgz
30+
31+
# Temp extension copy for Docker build
32+
extension/
33+
34+
# Test output
35+
test-output/

Dockerfile

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# Dockerfile for testing predicate-snapshot skill with OpenClaw
2+
FROM mcr.microsoft.com/playwright:v1.58.2-noble
3+
4+
WORKDIR /app
5+
6+
# Install Node.js 20
7+
RUN curl -fsSL https://deb.nodesource.com/setup_20.x | bash - && \
8+
apt-get install -y nodejs
9+
10+
# Install OpenClaw globally
11+
RUN npm install -g openclaw
12+
13+
# Create OpenClaw skills directory
14+
RUN mkdir -p /root/.openclaw/skills/predicate-snapshot
15+
16+
# Copy package files
17+
COPY package*.json ./
18+
19+
# Install dependencies
20+
RUN npm ci
21+
22+
# Copy source code
23+
COPY . .
24+
25+
# Build TypeScript
26+
RUN npm run build
27+
28+
# Copy built skill to OpenClaw skills directory
29+
RUN cp -r dist /root/.openclaw/skills/predicate-snapshot/ && \
30+
cp package.json /root/.openclaw/skills/predicate-snapshot/ && \
31+
cp SKILL.md /root/.openclaw/skills/predicate-snapshot/ && \
32+
cp README.md /root/.openclaw/skills/predicate-snapshot/ && \
33+
cd /root/.openclaw/skills/predicate-snapshot && npm install --omit=dev
34+
35+
# Copy test scripts
36+
COPY test-skill.ts ./
37+
COPY test-openclaw-integration.sh ./
38+
39+
# Default command - run the login demo
40+
CMD ["npm", "run", "demo:login"]

README.md

Lines changed: 165 additions & 109 deletions
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,63 @@ You might wonder: "Isn't 50 elements vs 24,567 elements comparing apples to oran
143143

144144
**Future:** OpenClaw may add configuration to set Predicate as the default snapshot provider.
145145

146+
---
147+
148+
## ⚡ Usage with Autonomous Agents
149+
150+
> **Important:** OpenClaw agents work autonomously—they don't wait for manual slash commands. Here's how to integrate Predicate snapshots into autonomous workflows.
151+
152+
### Option 1: Include in Task Instructions (Recommended)
153+
154+
Add Predicate snapshot instructions directly in your task prompt:
155+
156+
```
157+
Navigate to amazon.com and find the cheapest laptop under $500.
158+
159+
IMPORTANT: For page observation, use /predicate-snapshot instead of the
160+
default accessibility tree. Use /predicate-act to interact with elements
161+
by their ID from the snapshot.
162+
```
163+
164+
### Option 2: Modify Agent System Prompt
165+
166+
For consistent usage across all tasks, add to your agent's system prompt:
167+
168+
```
169+
## Browser Observation
170+
When observing web pages, always use /predicate-snapshot instead of the
171+
default accessibility tree. This provides ML-ranked elements optimized
172+
for efficient decision-making (~500 tokens vs ~18,000 tokens).
173+
174+
To interact with page elements:
175+
1. Call /predicate-snapshot to get ranked elements with IDs
176+
2. Call /predicate-act <action> <element_id> to perform actions
177+
```
178+
179+
### Option 3: OpenClaw Config (Future)
180+
181+
OpenClaw may add support for setting the default snapshot provider:
182+
183+
```yaml
184+
# ~/.openclaw/config.yaml (proposed future feature)
185+
browser:
186+
snapshot_provider: predicate-snapshot
187+
```
188+
189+
### Why This Matters
190+
191+
Without explicit instructions, the agent will use OpenClaw's default accessibility tree, which:
192+
- Sends ~18,000 tokens per page observation
193+
- Includes thousands of irrelevant elements
194+
- Costs more and runs slower
195+
196+
By instructing the agent to use `/predicate-snapshot`, you get:
197+
- ~500 tokens per observation (97% reduction)
198+
- Only the 50 most relevant elements
199+
- Faster, cheaper, more accurate automation
200+
201+
---
202+
146203
## Usage
147204

148205
### Capture Snapshot
@@ -203,7 +260,78 @@ Each ML-powered snapshot consumes 1 credit. Local snapshots are free.
203260

204261
## Development
205262

206-
### Run Demo
263+
### Run in Docker (Recommended for Safe Testing)
264+
265+
Docker provides an **isolated environment** for testing browser automation—no risk to your local machine, browser profiles, or credentials.
266+
267+
```bash
268+
cd predicate-snapshot-skill
269+
270+
# Run the skill MCP tools test (no API keys required)
271+
./docker-test.sh skill
272+
273+
# Run the login demo (requires LLM API key)
274+
./docker-test.sh demo:login
275+
```
276+
277+
**Test options:**
278+
279+
| Command | What it tests | API Keys Required? |
280+
|---------|---------------|-------------------|
281+
| `./docker-test.sh` | Skill MCP tools & browser integration | No |
282+
| `./docker-test.sh skill` | Same as above (explicit) | No |
283+
| `./docker-test.sh openclaw` | OpenClaw full runtime integration | No |
284+
| `./docker-test.sh demo:login` | Full 6-step login workflow | Yes (LLM) |
285+
| `./docker-test.sh demo` | Basic token comparison | Yes (LLM) |
286+
287+
**Passing API Keys:**
288+
289+
The `demo:login` and `demo` tests require at least one LLM API key (OpenAI or Anthropic) for element selection:
290+
291+
```bash
292+
# Option 1: Export environment variables
293+
export OPENAI_API_KEY="sk-..." # OpenAI API key
294+
export ANTHROPIC_API_KEY="sk-ant-..." # OR Anthropic API key
295+
export PREDICATE_API_KEY="sk-..." # Optional: for ML-ranked snapshots
296+
297+
./docker-test.sh demo:login
298+
299+
# Option 2: Inline (single command)
300+
OPENAI_API_KEY="sk-..." PREDICATE_API_KEY="sk-..." ./docker-test.sh demo:login
301+
302+
# Option 3: Using docker-compose with .env file
303+
# Create a .env file with your keys:
304+
echo "OPENAI_API_KEY=sk-..." >> .env
305+
echo "PREDICATE_API_KEY=sk-..." >> .env
306+
docker-compose up demo-login
307+
```
308+
309+
| API Key | Required? | Purpose |
310+
|---------|-----------|---------|
311+
| `OPENAI_API_KEY` | One of these required | LLM for element selection |
312+
| `ANTHROPIC_API_KEY` | One of these required | LLM for element selection |
313+
| `PREDICATE_API_KEY` | Optional | ML-ranked snapshots (reduces noise & tokens) |
314+
315+
**Why Docker is safer:**
316+
317+
| Concern | Docker Isolation |
318+
|---------|------------------|
319+
| Browser profile | Fresh Chromium instance, no cookies or history |
320+
| Network traffic | Contained, won't trigger corporate firewalls |
321+
| File system | Only `./test-output/` is mounted |
322+
| Credentials | None stored—test site uses fake credentials |
323+
324+
**Using docker-compose:**
325+
326+
```bash
327+
docker-compose up skill-test # Skill MCP tools test
328+
docker-compose up openclaw-test # OpenClaw full runtime test
329+
docker-compose up demo-login # Login demo
330+
```
331+
332+
The test uses a purpose-built test site (`https://www.localllamaland.com/login`) with fake credentials (`testuser` / `password123`)—no real accounts involved.
333+
334+
### Run Demo (Local)
207335

208336
Compare token usage between accessibility tree and Predicate snapshot:
209337

@@ -315,134 +443,62 @@ This demo compares A11y Tree vs Predicate Snapshot across **all 6 steps**, measu
315443

316444
#### Key Observations
317445

318-
| Metric | A11y Tree | Predicate Snapshot | Delta |
446+
| Metric | OpenClaw A11y Tree Snapshot | Predicate Snapshot | Delta |
319447
|--------|-----------|-------------------|-------|
320-
| **Steps Completed** | 3/6 (failed at step 4) | **6/6** | Predicate wins |
321-
| **Token Savings** | baseline | **70-74% per step** | Significant |
322-
| **SPA Hydration** | No built-in wait | **`check().eventually()` handles it** | More reliable |
323-
324-
**Why A11y Tree Failed at Step 4:**
325-
326-
The A11y (accessibility tree) approach failed to click the login button because:
327-
328-
1. **Element ID mismatch**: The A11y tree assigns sequential IDs based on DOM traversal order, which can change between snapshots as the SPA re-renders. The LLM selected element 47 ("Sign in"), but that ID no longer pointed to the button after form state changed.
448+
| **Steps Completed** | 6/6 | **6/6** | Both pass |
449+
| **Total Tokens** | 5,366 | **1,565** | **-71%** |
450+
| **Token Savings** | baseline | **67-74% per step** | Significant |
329451

330-
2. **No stable identifiers**: Unlike Predicate's `data-predicate-id` attributes (injected by the browser extension), A11y IDs are ephemeral and not anchored to the actual DOM elements.
452+
**Why Predicate Snapshot is better:**
331453

332-
3. **SPA state changes**: After filling both form fields, the button transitioned from disabled → enabled. This state change can cause the A11y tree to re-order elements, invalidating the LLM's element selection.
333-
334-
**Predicate Snapshot succeeded because:**
335-
- `data-predicate-id` attributes are stable across re-renders
336-
- ML-ranking surfaces the most relevant elements (button with "Sign in" text)
337-
- `runtime.check().eventually()` properly waits for SPA hydration
454+
1. **Dramatic token reduction**: 71% fewer tokens across the entire workflow (5,366 → 1,565 tokens)
455+
2. **ML-ranked elements**: Only the most relevant interactable elements are included with enough context, reducing noise
456+
3. **Stable identifiers**: `data-predicate-id` attributes survive SPA re-renders
457+
4. **`runtime.check().eventually()`**: Properly waits for SPA hydration before capturing snapshots
338458

339459
#### Raw Demo Logs
340460

461+
Full Docker demo output: [pastebin.com/ksETcQ4C](https://pastebin.com/ksETcQ4C)
462+
341463
<details>
342-
<summary>Click to expand full demo output</summary>
464+
<summary>Click to expand results summary</summary>
343465

344466
```
345-
======================================================================
346-
LOGIN + PROFILE CHECK: A11y Tree vs. Predicate Snapshot
347-
======================================================================
348-
Using OpenAI provider
349-
Model: gpt-4o-mini
350-
Running in headed mode (visible browser window)
351-
Overlay enabled: elements will be highlighted with green borders
352-
Predicate snapshots: REAL (ML-ranked)
353-
======================================================================
354-
355-
======================================================================
356-
Running with A11Y approach
357-
======================================================================
358-
359-
[2026-02-25 01:14:50] Step 1: Wait for login form hydration
360-
Waiting for form to hydrate using runtime.check().eventually()...
361-
Button initially disabled: false
362-
PASS (11822ms) | Found 19 elements
363-
364-
[2026-02-25 01:15:02] Step 2: Fill username field
365-
Snapshot: 45 elements, 1241 tokens
366-
LLM chose element 37: "Username"
367-
PASS (6771ms) | Typed "testuser"
368-
Tokens: prompt=1241 total=1251
369-
370-
[2026-02-25 01:15:08] Step 3: Fill password field
371-
LLM chose element 42: "Password"
372-
Waiting for login button to become enabled...
373-
PASS (12465ms) | Button enabled: true
374-
Tokens: prompt=1295 total=1305
375-
376-
[2026-02-25 01:15:21] Step 4: Click login button
377-
LLM chose element 47: "Sign in"
378-
FAIL (7801ms) | Navigated to https://www.localllamaland.com/login
379-
Tokens: prompt=1367 total=1377
380-
381-
======================================================================
382-
Running with PREDICATE approach
383-
======================================================================
384-
385-
[2026-02-25 01:15:29] Step 1: Wait for login form hydration
386-
Waiting for form to hydrate using runtime.check().eventually()...
387-
Button initially disabled: false
388-
PASS (10586ms) | Found 19 elements
389-
390-
[2026-02-25 01:15:40] Step 2: Fill username field
391-
Snapshot: 19 elements, 351 tokens
392-
LLM chose element 23: "username"
393-
PASS (12877ms) | Typed "testuser"
394-
Tokens: prompt=351 total=361
395-
396-
[2026-02-25 01:15:53] Step 3: Fill password field
397-
LLM chose element 25: "Password"
398-
Waiting for login button to become enabled...
399-
PASS (17886ms) | Button enabled: true
400-
Tokens: prompt=352 total=362
401-
402-
[2026-02-25 01:16:10] Step 4: Click login button
403-
LLM chose element 29: "Sign in"
404-
PASS (12690ms) | Navigated to https://www.localllamaland.com/profile
405-
Tokens: prompt=346 total=356
406-
407-
[2026-02-25 01:16:23] Step 5: Navigate to profile page
408-
PASS (1ms) | Already on profile page
409-
410-
[2026-02-25 01:16:23] Step 6: Extract username from profile
411-
Waiting for profile card to load...
412-
Found username: testuser@localllama.land
413-
Found email: Profile testuser testuser@localllama.lan
414-
PASS (20760ms) | username=testuser@localllama.land
415-
Tokens: prompt=480 total=480
416-
417467
======================================================================
418468
RESULTS SUMMARY
419469
======================================================================
420470
421471
+-----------------------------------------------------------------------+
422472
| Metric | A11y Tree | Predicate | Delta |
423473
+-----------------------------------------------------------------------+
424-
| Total Tokens | 3933 | 1559 | -60% |
425-
| Total Latency (ms) | 38859 | 74800 | +92% |
426-
| Steps Passed | 3/6 | 6/6 | |
474+
| Total Tokens | 5366 | 1565 | -71% |
475+
| Total Latency (ms) | 51675 | 75555 | +46% |
476+
| Steps Passed | 6/6 | 6/6 | |
427477
+-----------------------------------------------------------------------+
428478
429-
Key Insight: Predicate snapshots use 60% fewer tokens
479+
Key Insight: Predicate snapshots use 71% fewer tokens
430480
for a multi-step login workflow with form filling.
431481
432482
Step-by-step breakdown:
433483
----------------------------------------------------------------------
434484
Step 1: Wait for login form hydration
435-
A11y: 0 tokens, 11822ms, PASS
436-
Pred: 0 tokens, 10586ms, PASS (0% savings)
485+
A11y: 0 tokens, 12060ms, PASS
486+
Pred: 0 tokens, 10792ms, PASS (0% savings)
437487
Step 2: Fill username field
438-
A11y: 1251 tokens, 6771ms, PASS
439-
Pred: 361 tokens, 12877ms, PASS (71% savings)
488+
A11y: 1251 tokens, 7613ms, PASS
489+
Pred: 361 tokens, 12324ms, PASS (71% savings)
440490
Step 3: Fill password field
441-
A11y: 1305 tokens, 12465ms, PASS
442-
Pred: 362 tokens, 17886ms, PASS (72% savings)
491+
A11y: 1305 tokens, 13691ms, PASS
492+
Pred: 362 tokens, 18410ms, PASS (72% savings)
443493
Step 4: Click login button
444-
A11y: 1377 tokens, 7801ms, FAIL
445-
Pred: 356 tokens, 12690ms, PASS (74% savings)
494+
A11y: 1377 tokens, 7909ms, PASS
495+
Pred: 362 tokens, 13233ms, PASS (74% savings)
496+
Step 5: Navigate to profile page
497+
A11y: 0 tokens, 1ms, PASS
498+
Pred: 0 tokens, 0ms, PASS (0% savings)
499+
Step 6: Extract username from profile
500+
A11y: 1433 tokens, 10401ms, PASS
501+
Pred: 480 tokens, 20796ms, PASS (67% savings)
446502
```
447503

448504
</details>
@@ -451,15 +507,15 @@ Step 4: Click login button
451507

452508
| Step | A11y Tree | Predicate Snapshot | Token Savings |
453509
|------|-----------|-------------------|---------------|
454-
| Step 1: Navigate to localllamaland.com/login | PASS | PASS | - |
510+
| Step 1: Wait for login form hydration | PASS | PASS | - |
455511
| Step 2: Fill username | 1,251 tokens, PASS | 361 tokens, PASS | **71%** |
456512
| Step 3: Fill password | 1,305 tokens, PASS | 362 tokens, PASS | **72%** |
457-
| Step 4: Click login | 1,377 tokens, **FAIL** | 356 tokens, PASS | **74%** |
458-
| Step 5: Navigate to profile | (not reached) | PASS | - |
459-
| Step 6: Extract username | (not reached) | 480 tokens, PASS | - |
460-
| **Total** | **3,933 tokens, 3/6 steps** | **1,559 tokens, 6/6 steps** | **60%** |
513+
| Step 4: Click login | 1,377 tokens, PASS | 362 tokens, PASS | **74%** |
514+
| Step 5: Navigate to profile | PASS | PASS | - |
515+
| Step 6: Extract username | 1,433 tokens, PASS | 480 tokens, PASS | **67%** |
516+
| **Total** | **5,366 tokens, 6/6 steps** | **1,565 tokens, 6/6 steps** | **71%** |
461517

462-
> **Key Insight:** Predicate Snapshot not only reduces tokens by 70%+ per step, but also **improves automation reliability** on SPAs with automatic wait for hydration via `runtime.check().eventually()`. The stable element IDs survive React/Next.js re-renders that break A11y tree-based approaches.
518+
> **Key Insight:** Predicate Snapshot reduces tokens by **67-74% per step** while maintaining the same pass rate. For multi-step workflows, this translates to significant cost savings and faster LLM inference.
463519
464520
### Build
465521

0 commit comments

Comments
 (0)