markmaton

markmaton is a lightweight HTML-to-Markdown parser core built for agent workflows.

It solves the last-mile parsing problem in a web pipeline: you already have page HTML, but it is still too noisy and awkward for downstream agent use. Feed markmaton HTML from a fetcher or browser layer and get back cleaner Markdown, metadata, links, images, and quality signals.

Note

markmaton is a general parser, not a crawler. Feed it HTML from Playwright, fetch, Firecrawl, or another upstream page-visit tool.

Why it exists

Raw page HTML is usually not directly useful for downstream agent workflows.
Modern pages often mix the real content with navigation, overlays, cards, and app shell chrome.
markmaton keeps that cleanup and conversion step deterministic and separate from crawling.
The project stays narrow by design: no crawling, browser control, network, or LLM features.
The user-facing entrypoint is a Python CLI and API wrapped around a fast Go engine.

Install

`pip`

pip install markmaton

`uv tool`

uv tool install markmaton

Tip

The installed package works through plain pip. Local development uses uv with Python 3.12.

Quickstart

CLI

markmaton convert \
  --html-file page.html \
  --url https://example.com/article \
  --output-format markdown

To get the full structured response:

markmaton convert \
  --html-file page.html \
  --url https://example.com/article \
  --output-format json

Python API

from markmaton import ConvertOptions, ConvertRequest, convert_html

html = "<article><h1>Hello</h1><p>World</p></article>"

response = convert_html(
    ConvertRequest(
        html=html,
        url="https://example.com/article",
        options=ConvertOptions(only_main_content=True),
    )
)

print(response.markdown)
print(response.metadata.title)

Tip

Pass url whenever you can. markmaton uses it as parsing context for canonical metadata and absolute link resolution.

Output

JSON mode returns markdown, html_clean, metadata, links, images, and quality. See response shape for details.

Project shape

Go engine: cmd/markmaton-engine
Python wrapper and CLI: markmaton/
Parser fixtures and golden files: testdata/
Research, benchmark, and release docs: docs/

Documentation

Documentation index
Usage guide
Packaging layout
PyPI release path
Benchmark workflow
Benchmark matrix
AI agent skill — for using markmaton inside an agent workflow

Development

Set up the local development environment:

uv sync --group dev

Run the core test suites:

uv run python -m unittest discover -s tests -p 'test_*.py'
go test ./...

For a manual end-to-end smoke:

Local smoke flow

The repo is pinned to:

Python 3.12 via .python-version
a committed uv.lock

Important

Automated tests are unit-test-first. Live page visits and benchmarks are manual.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github		.github
cmd/markmaton-engine		cmd/markmaton-engine
docs		docs
internal		internal
issues		issues
markmaton		markmaton
plan		plan
skills/html-to-markdown		skills/html-to-markdown
testdata		testdata
tests		tests
.gitignore		.gitignore
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum
hatch_build.py		hatch_build.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

markmaton

Why it exists

Install

`pip`

`uv tool`

Quickstart

CLI

Python API

Output

Project shape

Documentation

Development

Release notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

markmaton

Why it exists

Install

pip

uv tool

Quickstart

CLI

Python API

Output

Project shape

Documentation

Development

Release notes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`pip`

`uv tool`

Packages