markmaton is a lightweight HTML-to-Markdown parser core built for agent workflows.
It solves the last-mile parsing problem in a web pipeline: you already have page HTML,
but it is still too noisy and awkward for downstream agent use. Feed markmaton
HTML from a fetcher or browser layer and get back cleaner Markdown, metadata, links,
images, and quality signals.
Note
markmaton is a general parser, not a crawler.
Feed it HTML from Playwright, fetch, Firecrawl, or another upstream page-visit tool.
- Raw page HTML is usually not directly useful for downstream agent workflows.
- Modern pages often mix the real content with navigation, overlays, cards, and app shell chrome.
markmatonkeeps that cleanup and conversion step deterministic and separate from crawling.- The project stays narrow by design: no crawling, browser control, network, or LLM features.
- The user-facing entrypoint is a Python CLI and API wrapped around a fast Go engine.
pip install markmatonuv tool install markmatonTip
The installed package works through plain pip.
Local development uses uv with Python 3.12.
markmaton convert \
--html-file page.html \
--url https://example.com/article \
--output-format markdownTo get the full structured response:
markmaton convert \
--html-file page.html \
--url https://example.com/article \
--output-format jsonfrom markmaton import ConvertOptions, ConvertRequest, convert_html
html = "<article><h1>Hello</h1><p>World</p></article>"
response = convert_html(
ConvertRequest(
html=html,
url="https://example.com/article",
options=ConvertOptions(only_main_content=True),
)
)
print(response.markdown)
print(response.metadata.title)Tip
Pass url whenever you can.
markmaton uses it as parsing context for canonical metadata and absolute link resolution.
JSON mode returns markdown, html_clean, metadata, links, images, and quality. See response shape for details.
- Go engine:
cmd/markmaton-engine - Python wrapper and CLI:
markmaton/ - Parser fixtures and golden files:
testdata/ - Research, benchmark, and release docs:
docs/
- Documentation index
- Usage guide
- Packaging layout
- PyPI release path
- Benchmark workflow
- Benchmark matrix
- AI agent skill — for using
markmatoninside an agent workflow
Set up the local development environment:
uv sync --group devRun the core test suites:
uv run python -m unittest discover -s tests -p 'test_*.py'
go test ./...For a manual end-to-end smoke:
The repo is pinned to:
- Python
3.12via.python-version - a committed
uv.lock
Important
Automated tests are unit-test-first. Live page visits and benchmarks are manual.