Skip to content

Commit 34a3c37

Browse files
committed
feat: add plugin lifecycle CLI and release packaging assets
1 parent b268f23 commit 34a3c37

24 files changed

Lines changed: 3442 additions & 20 deletions

README.md

Lines changed: 119 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,14 @@
22

33
Turn any website into a REST API by scraping it live with Playwright.
44

5-
Web2API loads recipe folders from `recipes/` at startup. Each recipe defines endpoints with selectors, actions, fields, and pagination in YAML. Optional Python scrapers handle interactive or complex sites. Drop a folder — get an API.
5+
Web2API loads recipe folders from `recipes/` at startup. Each recipe defines endpoints with selectors, actions, fields, and pagination in YAML. Optional Python scrapers handle interactive or complex sites. Optional plugin metadata can declare external dependencies and required env vars. Drop a folder — get an API.
66

77
## Features
88

99
- **Arbitrary named endpoints** — recipes define as many endpoints as needed (not limited to read/search)
1010
- **Declarative YAML recipes** with selectors, actions, transforms, and pagination
1111
- **Custom Python scrapers** for interactive sites (e.g. typing text, waiting for dynamic content)
12+
- **Optional plugin metadata** (`plugin.yaml`) for recipe-specific dependency requirements
1213
- **Shared browser/context pool** for concurrent Playwright requests
1314
- **In-memory response cache** with stale-while-revalidate
1415
- **Unified JSON response schema** across all recipes and endpoints
@@ -29,6 +30,82 @@ curl -s http://localhost:8010/health | jq
2930
curl -s http://localhost:8010/api/sites | jq
3031
```
3132

33+
## CLI
34+
35+
Web2API ships with a management CLI:
36+
37+
```bash
38+
web2api --help
39+
```
40+
41+
### Plugin Commands
42+
43+
```bash
44+
# List all recipe folders with plugin readiness
45+
web2api plugins list
46+
47+
# Check missing env vars/commands/packages
48+
web2api plugins doctor
49+
web2api plugins doctor x
50+
web2api plugins doctor x --no-run-healthchecks
51+
web2api plugins doctor x --allow-untrusted
52+
53+
# Install plugin recipe from source
54+
web2api plugins add ./my-recipe
55+
web2api plugins add https://github.com/acme/web2api-recipes.git --ref v1.2.0 --subdir recipes/news
56+
57+
# Update managed plugin from recorded source
58+
web2api plugins update x --yes
59+
web2api plugins update x --ref v1.3.0 --subdir recipes/x --yes
60+
61+
# Install plugin recipe from catalog
62+
web2api plugins catalog list
63+
web2api plugins catalog add hackernews --yes
64+
65+
# Install declared dependencies for a plugin recipe (host)
66+
web2api plugins install x --yes
67+
web2api plugins install x --apt --yes # include apt packages
68+
69+
# Generate Dockerfile snippet for plugin dependencies
70+
web2api plugins install x --target docker --apt
71+
72+
# Remove plugin recipe + manifest record
73+
web2api plugins uninstall x --yes
74+
75+
# Disable/enable a recipe (writes/removes recipes/<slug>/.disabled)
76+
web2api plugins disable x --yes
77+
web2api plugins enable x
78+
```
79+
80+
`plugins install` does not run `apt` installs unless `--apt` is explicitly passed.
81+
Install-state records are stored in `recipes/.web2api_plugins.json`.
82+
Default catalog path is `plugins/catalog.yaml` in a source checkout, with a bundled fallback
83+
inside the installed package.
84+
`plugins update` works only for plugins tracked in the manifest.
85+
86+
Plugins installed from untrusted sources (for example git URLs) are blocked from executing
87+
install/healthcheck commands unless `--allow-untrusted` is passed.
88+
89+
### Self Update Commands
90+
91+
```bash
92+
# Show current version + recommended update method
93+
web2api self update check
94+
95+
# Apply update using auto-detected method (pip/git/docker)
96+
web2api self update apply --yes
97+
98+
# Pin explicit method or target version/ref
99+
web2api self update apply --method pip --to 0.1.0 --yes
100+
web2api self update apply --method git --to v0.1.0 --yes
101+
```
102+
103+
For `--method git`, `self update apply` checks out a tag:
104+
- if `--to` is provided, that tag/ref is used
105+
- if `--to` is omitted, the latest sortable git tag is used
106+
107+
After `self update apply`, the CLI automatically runs `web2api plugins doctor`.
108+
32109
## Discover Recipes
33110

34111
Recipe availability is dynamic. Use discovery endpoints instead of relying on a static README list.
@@ -137,11 +214,13 @@ recipes/
137214
<slug>/
138215
recipe.yaml # required — endpoint definitions
139216
scraper.py # optional — custom Python scraper
217+
plugin.yaml # optional — dependency metadata and runtime checks
140218
README.md # optional — documentation
141219
```
142220

143221
- Folder name must match `slug`
144222
- `slug` cannot be a reserved system route (`api`, `health`, `docs`, `openapi`, `redoc`)
223+
- Recipe folders containing `.disabled` are skipped by discovery
145224
- Restart the service to pick up new or changed recipes
146225
- Invalid recipes are skipped with warning logs
147226

@@ -254,6 +333,43 @@ class Scraper(BaseScraper):
254333
- `params` also includes validated extra query params (for example `count`)
255334
- Endpoints not handled by the scraper fall back to declarative YAML
256335

336+
### Plugin Metadata (Optional)
337+
338+
Use `plugin.yaml` to declare install/runtime requirements for a recipe:
339+
340+
```yaml
341+
version: "1.0.0"
342+
web2api:
343+
min: "0.2.0"
344+
max: "1.0.0"
345+
requires_env:
346+
- BIRD_AUTH_TOKEN
347+
- BIRD_CT0
348+
dependencies:
349+
commands:
350+
- bird
351+
python:
352+
- httpx
353+
apt:
354+
- nodejs
355+
npm:
356+
- "@steipete/bird"
357+
healthcheck:
358+
command: ["bird", "--version"]
359+
```
360+
361+
Version bounds in `web2api.min` / `web2api.max` use numeric `major.minor.patch` format.
362+
363+
`GET /api/sites` now includes a `plugin` block (or `null`) with:
364+
365+
- declared metadata from `plugin.yaml`
366+
- computed `status.ready` plus missing env vars/commands/python packages
367+
- unverified package declarations (`apt`, `npm`) for operators
368+
369+
Compatibility enforcement:
370+
- `PLUGIN_ENFORCE_COMPATIBILITY=false` (default): incompatible plugins are loaded but reported as not ready.
371+
- `PLUGIN_ENFORCE_COMPATIBILITY=true`: incompatible plugins are skipped at discovery time.
372+
257373
## Configuration
258374

259375
Environment variables (with defaults):
@@ -270,7 +386,8 @@ Environment variables (with defaults):
270386
| `CACHE_TTL_SECONDS` | 30 | Fresh cache duration in seconds |
271387
| `CACHE_STALE_TTL_SECONDS` | 120 | Stale-while-revalidate window in seconds |
272388
| `CACHE_MAX_ENTRIES` | 500 | Maximum cached request variants |
273-
| `RECIPES_DIR` | `./recipes` | Path to recipes directory |
389+
| `RECIPES_DIR` | `./recipes` (or bundled defaults in installed package) | Path to recipes directory |
390+
| `PLUGIN_ENFORCE_COMPATIBILITY` | false | Skip plugin recipes outside declared `web2api` version bounds |
274391
| `BIRD_AUTH_TOKEN` | empty | X/Twitter auth token for `x` recipe |
275392
| `BIRD_CT0` | empty | X/Twitter ct0 token for `x` recipe |
276393

plugins/catalog.yaml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
plugins:
2+
hackernews:
3+
description: "Built-in Hacker News recipe."
4+
source: "../recipes/hackernews"
5+
trusted: true
6+
7+
deepl:
8+
description: "Built-in DeepL translation recipe."
9+
source: "../recipes/deepl"
10+
trusted: true
11+
12+
x:
13+
description: "Built-in X/Twitter recipe (requires bird CLI and auth env vars)."
14+
source: "../recipes/x"
15+
trusted: true

pyproject.toml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ dependencies = [
1313
"playwright>=1.50,<2.0",
1414
"pydantic>=2.10,<3.0",
1515
"pyyaml>=6.0,<7.0",
16+
"typer>=0.12,<1.0",
1617
"uvicorn[standard]>=0.34,<1.0",
1718
]
1819

@@ -26,9 +27,23 @@ dev = [
2627
"ruff>=0.9,<1.0",
2728
]
2829

30+
[project.scripts]
31+
web2api = "web2api.cli:main"
32+
33+
[tool.setuptools]
34+
include-package-data = true
35+
2936
[tool.setuptools.packages.find]
3037
include = ["web2api*"]
3138

39+
[tool.setuptools.package-data]
40+
web2api = [
41+
"templates/*.html",
42+
"bundled/plugins/*.yaml",
43+
"bundled/recipes/*/*.yaml",
44+
"bundled/recipes/*/*.py",
45+
]
46+
3247
[tool.ruff]
3348
line-length = 100
3449
target-version = "py312"

recipes/x/plugin.yaml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
version: "1.0.0"
2+
web2api:
3+
min: "0.1.0"
4+
requires_env:
5+
- BIRD_AUTH_TOKEN
6+
- BIRD_CT0
7+
dependencies:
8+
commands:
9+
- bird
10+
apt:
11+
- nodejs
12+
npm:
13+
- "@steipete/bird"
14+
healthcheck:
15+
command:
16+
- bird
17+
- --version

recipes/x/scraper.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -89,12 +89,13 @@ async def scrape(self, endpoint: str, page: Page, params: dict[str, Any]) -> Scr
8989

9090
items: list[dict[str, Any]] = []
9191
for tweet in tweets_data[:count]:
92+
author_username = tweet.get("author", {}).get("username", username)
9293
items.append({
9394
"text": tweet.get("text", ""),
94-
"author": tweet.get("author", {}).get("username", username),
95+
"author": author_username,
9596
"author_name": tweet.get("author", {}).get("name", ""),
9697
"timestamp": tweet.get("createdAt", ""),
97-
"url": f"https://x.com/{tweet.get('author', {}).get('username', username)}/status/{tweet.get('id', '')}",
98+
"url": f"https://x.com/{author_username}/status/{tweet.get('id', '')}",
9899
"replies": tweet.get("replyCount"),
99100
"reposts": tweet.get("retweetCount"),
100101
"likes": tweet.get("likeCount"),

tests/integration/test_api.py

Lines changed: 38 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@ def _write_recipe(
5151
recipes_dir: Path,
5252
slug: str,
5353
endpoints: dict[str, dict] | None = None,
54+
plugin: dict[str, object] | None = None,
5455
) -> None:
5556
if endpoints is None:
5657
endpoints = {
@@ -75,6 +76,11 @@ def _write_recipe(
7576
),
7677
encoding="utf-8",
7778
)
79+
if plugin is not None:
80+
(recipe_dir / "plugin.yaml").write_text(
81+
yaml.safe_dump(plugin),
82+
encoding="utf-8",
83+
)
7884

7985

8086
def _success_response(
@@ -147,20 +153,31 @@ async def test_api_routes_and_index(
147153
caplog: pytest.LogCaptureFixture,
148154
) -> None:
149155
recipes_dir = tmp_path / "recipes"
156+
missing_env = "WEB2API_TEST_ALPHA_TOKEN_UNLIKELY"
157+
monkeypatch.delenv(missing_env, raising=False)
150158

151-
_write_recipe(recipes_dir, "alpha", endpoints={
152-
"read": {
153-
"url": "https://example.com/items?page={page}",
154-
"items": {"container": ".item", "fields": {"title": {"selector": ".title"}}},
155-
"pagination": {"type": "page_param", "param": "page"},
159+
_write_recipe(
160+
recipes_dir,
161+
"alpha",
162+
endpoints={
163+
"read": {
164+
"url": "https://example.com/items?page={page}",
165+
"items": {"container": ".item", "fields": {"title": {"selector": ".title"}}},
166+
"pagination": {"type": "page_param", "param": "page"},
167+
},
168+
"search": {
169+
"url": "https://example.com/search?q={query}&page={page}",
170+
"requires_query": True,
171+
"items": {"container": ".item", "fields": {"title": {"selector": ".title"}}},
172+
"pagination": {"type": "page_param", "param": "page"},
173+
},
156174
},
157-
"search": {
158-
"url": "https://example.com/search?q={query}&page={page}",
159-
"requires_query": True,
160-
"items": {"container": ".item", "fields": {"title": {"selector": ".title"}}},
161-
"pagination": {"type": "page_param", "param": "page"},
175+
plugin={
176+
"version": "1.0.0",
177+
"requires_env": [missing_env],
178+
"dependencies": {"commands": ["missing-web2api-plugin-command"]},
162179
},
163-
})
180+
)
164181
_write_recipe(recipes_dir, "beta") # read only
165182

166183
async def fake_scrape(
@@ -251,6 +268,16 @@ async def fake_scrape(
251268
alpha_site = next(s for s in sites_resp.json() if s["slug"] == "alpha")
252269
ep_names = {ep["name"] for ep in alpha_site["endpoints"]}
253270
assert ep_names == {"read", "search"}
271+
assert alpha_site["plugin"]["version"] == "1.0.0"
272+
assert alpha_site["plugin"]["status"]["ready"] is False
273+
assert missing_env in alpha_site["plugin"]["status"]["checks"]["env"]["missing"]
274+
assert (
275+
"missing-web2api-plugin-command"
276+
in alpha_site["plugin"]["status"]["checks"]["commands"]["missing"]
277+
)
278+
279+
beta_site = next(s for s in sites_resp.json() if s["slug"] == "beta")
280+
assert beta_site["plugin"] is None
254281

255282
# Health check
256283
health_resp = await client.get("/health")

0 commit comments

Comments
 (0)