markgrab/llms.txt at main · QuartzUnit/markgrab · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# markgrab
> Universal web content extraction — URL to LLM-ready markdown
- `pip install markgrab` / `markgrab[browser]` / `markgrab[youtube]` / `markgrab[pdf]` / `markgrab[all]`
- Python >=3.12 | core deps: httpx, beautifulsoup4, markdownify
- Auto-detects URL type: HTML, YouTube, PDF, DOCX
- Playwright browser fallback for JS-heavy pages, anti-bot stealth
- Configurable truncation (default 50k chars), proxy support, language detection
- Exports: `extract()`, `ExtractResult`
```python
from markgrab import extract
result = await extract("https://example.com", max_chars=50_000)
print(result.title, result.markdown, result.word_count)
```
- CLI: `markgrab URL [--browser] [--format json|text|markdown] [--max-chars N]`
- MCP: `markgrab-mcp` server with `extract_url`, `extract_multiple` tools
- [Docs](https://github.com/QuartzUnit/markgrab) | [PyPI](https://pypi.org/project/markgrab/) | [Full API](/llms-full.txt)