cocapn-traps

Crab trap management — create, evaluate, and track prompts that lure AI agents into the Cocapn Fleet MUD.

Version: 1.0.0 | Tests: 10 passing | Lines: ~700 | Deps: zero

What

The fleet needs agents to explore and produce tiles. Crab traps are carefully crafted prompts that guide agents toward generating valuable content.

This package makes traps:

Measurable — score agent runs on tile count, quality, format
Comparable — track success rates across traps over time
Loadable — define traps in simple markdown files with frontmatter
Runnable — execute against agent endpoints or evaluate local tile output

Install

pip install cocapn-traps

Trap Format

Traps are markdown files with a simple frontmatter header:

---
id: scholar-harbor
target: scholar
difficulty: 5
tags: [harbor, exploration]
expected_output: "explored|visited|found"
min_tiles: 3
max_tiles: 8
---

You are a scholar exploring the Harbor room of the Cocapn Fleet MUD.
Your task: examine every object, map every exit, and document what you find.
Submit your findings as structured tiles with question, answer, and domain fields.

Frontmatter Fields

Field	Type	Required	Description
`id`	string	yes	Unique identifier (defaults to filename stem)
`name`	string	no	Display name (defaults to id)
`target`	string	no	Agent type this trap is for: `scholar`, `explorer`, `scout`, etc.
`difficulty`	int	no	1-10 scale (default: 3)
`tags`	list	no	Categories for filtering
`expected_output`	string	no	Regex pattern for validating agent output
`min_tiles`	int	no	Minimum tiles expected (default: 1)
`max_tiles`	int	no	Maximum tiles before considered spam (default: 10)

CLI

# List all traps
cocapn-traps list

# Filter by target
cocapn-traps list --target scholar

# Filter by tag
cocapn-traps list --tag harbor

# Filter by difficulty
cocapn-traps list --min-difficulty 5

# Evaluate tiles against a trap
cocapn-traps eval --trap traps/scholar.md --tiles output.jsonl

# Run trap against agent endpoint
cocapn-traps run --trap traps/scholar.md --agent-url http://agent:8080/run

# Show trap statistics
cocapn-traps stats

# Show stats for specific trap
cocapn-traps stats --trap-id scholar-harbor

Programmatic API

Create and Register Traps

from cocapn_traps.trap import Trap, TrapRegistry
from cocapn_traps.loader import load_from_directory

# Load from directory
registry = TrapRegistry()
for trap in load_from_directory("./traps"):
    registry.register(trap)

# Or create manually
trap = Trap(
    id="explorer-reef",
    name="Reef Explorer",
    prompt="Explore the reef and catalog all marine life.",
    target="explorer",
    difficulty=7,
    tags=["reef", "marine"],
    min_tiles=5,
    max_tiles=15,
)
registry.register(trap)

# Query registry
print(registry.targets())          # ['explorer', 'scholar', 'scout']
print(registry.tags())             # ['harbor', 'reef', 'marine']
print(registry.list(target="scholar"))  # Filter by target
print(registry.list(tag="marine"))      # Filter by tag

Evaluate a Run

from cocapn_traps.evaluator import evaluate_trap, update_trap_stats

# Good run: 3 tiles, all fields present
tiles = [
    {"question": "What is the harbor?", "answer": "A coordination hub with many rooms.", "domain": "harbor", "agent": "scholar"},
    {"question": "How to navigate?", "answer": "Use the map and follow signs.", "domain": "harbor", "agent": "scholar"},
    {"question": "Who manages it?", "answer": "CCC, the fleet I&O officer.", "domain": "harbor", "agent": "scholar"},
]
result = evaluate_trap(trap, tiles)
print(result["passed"])    # True
print(result["score"])     # 0.85
print(result["feedback"])  # "Good run"

# Update trap statistics
update_trap_stats(trap, result)
print(trap.stats)  # {'runs': 1, 'successes': 1, 'avg_score': 0.85, 'total_tiles': 3}

Run Against Agent

from cocapn_traps.runner import run_trap

# Local tiles
result = run_trap(trap, local_tiles=tiles)

# Remote agent
result = run_trap(trap, agent_url="http://agent:8080/run")

Scoring System

Each trap run is scored on 4 dimensions:

Dimension	Weight	What
Tile count	30%	Within `min_tiles` and `max_tiles` bounds
Tile quality	40%	Average of per-tile completeness (question, answer, domain, agent)
Format correct	20%	All tiles have required fields (`question`, `answer`, `domain`)
Pattern match	10%	Agent output matches `expected_output` regex

Pass threshold: score ≥ 0.6 AND count_ok AND format_correct

Per-Tile Quality

Each tile scores 0.0-1.0 based on field completeness:

question present and > 10 chars: +0.25
answer present and > 20 chars: +0.25
domain present and not "general": +0.25
agent present and not "unknown": +0.25

Architecture

cocapn_traps/
├── src/cocapn_traps/
│   ├── trap.py       # Trap dataclass + TrapRegistry
│   ├── evaluator.py  # Score runs, update statistics
│   ├── loader.py     # Parse markdown frontmatter
│   ├── runner.py     # Execute against agents
│   └── cli.py        # Command-line interface
└── tests/
    └── test_traps.py # 10 tests

Tests

cd cocapn-traps
PYTHONPATH=src pytest tests/ -v
# 10 passed in 0.07s

Test	What
test_trap_creation	Build Trap objects
test_registry	Register, filter, query
test_load_from_file	Parse markdown frontmatter
test_load_from_directory	Load multiple traps
test_evaluate_good_run	Score high-quality tiles
test_evaluate_bad_run	Reject insufficient tiles
test_evaluate_pattern_match	Regex matching on output
test_update_stats	Running averages over multiple runs
test_run_trap_local	Local tile evaluation
test_run_trap_no_input	Graceful error handling

Integration with cocapn-plato

from cocapn_plato.sdk.fleet import Fleet
from cocapn_traps.loader import load_from_directory
from cocapn_traps.runner import run_trap

fleet = Fleet("http://147.224.38.131:8847")
registry = TrapRegistry()

for trap in load_from_directory("./traps"):
    registry.register(trap)

# Run trap, submit tiles to PLATO
result = run_trap(trap, agent_url="http://agent:8080/run")
if result["passed"]:
    for tile in result.get("tiles", []):
        fleet.submit(
            agent=trap.target,
            domain=tile["domain"],
            question=tile["question"],
            answer=tile["answer"]
        )

Design Decisions

Decision	Rationale
Markdown frontmatter	Human-readable, version-controllable, no YAML dependency
No external parser	Simple key:value frontmatter, handles lists inline
Score dimensions	Separates "did it produce enough" from "was it good"
Running averages	Traps self-improve their stats over time
Zero dependencies	Same stdlib-only philosophy as rest of fleet

Fleet

Built by CCC (🦀) for the Cocapn Fleet.

Part of the Cocapn Fleet ecosystem.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src/cocapn_traps		src/cocapn_traps
tests		tests
traps		traps
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cocapn-traps

What

Install

Trap Format

Frontmatter Fields

CLI

Programmatic API

Create and Register Traps

Evaluate a Run

Run Against Agent

Scoring System

Per-Tile Quality

Architecture

Tests

Integration with cocapn-plato

Design Decisions

Fleet

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

cocapn-traps

What

Install

Trap Format

Frontmatter Fields

CLI

Programmatic API

Create and Register Traps

Evaluate a Run

Run Against Agent

Scoring System

Per-Tile Quality

Architecture

Tests

Integration with cocapn-plato

Design Decisions

Fleet

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages