Convert any documentation site into an llms.md file —
structured context ready for AI coding agents.
AI models hallucinate APIs from new or obscure libraries because they were never trained on that documentation. When you ask a coding agent to use a recent SDK, it invents function names, parameters, and behaviors that do not exist.
Prumo solves this by turning live documentation into a compact, structured llms.md file that you can drop into any agent's context window.
URL or GitHub repo
│
▼
┌─────────────┐ ┌──────────────┐ ┌────────────┐
│ Crawler │────▶│ Exporter │────▶│ llms.md │
│(Static/GitHub│ │ (Gemini or │ │ (Markdown) │
│ /Playwright) │ │ Claude) │ │ │
└─────────────┘ └──────────────┘ └────────────┘
Crawler operates in three modes:
- Default — navigates static HTML, follows internal links, strips navigation noise.
- GitHub mode (
--github) — reads.md/.mdxfiles directly from a repository via the GitHub API. Bypasses JavaScript-rendered sites (Docusaurus, VitePress, Next.js). - JS mode (
--js) — renders pages with a headless browser (Playwright) for JavaScript-heavy sites with no useful GitHub markdown source.
Exporter sends the cleaned content to an LLM, which generates a llms.md with full Markdown documentation content.
Prumo is a CLI tool. The recommended way to install it is with pipx, which installs it in an isolated environment and makes it globally available in your terminal:
pipx install prumoDon't have pipx? Install it first:
# macOS brew install pipx && pipx ensurepath # Ubuntu / Debian sudo apt install pipx && pipx ensurepath # Windows scoop install pipx
Alternative — pip inside a virtual environment:
pip install prumoAlternative — uv:
uv tool install prumoprumo initThe wizard will ask for your Gemini or Claude API key, and optionally a GitHub token for --github mode.
# Standard mode — static HTML
prumo fetch https://docs.example.com
# GitHub mode — reads .md/.mdx directly from the repository
prumo fetch https://github.com/some/repo --github
# JavaScript mode — renders with Playwright
prumo fetch https://docs.stellar.org --js --max-pages 30
# Remap GitHub blob links to the published documentation URLs
prumo fetch https://github.com/stellar/stellar-docs \
--github \
--docs-base-url https://developers.stellar.org/docspip install prumo[js]
playwright install chromiumWhen --js is used, Prumo prints this warning:
Warning: JavaScript rendering mode enabled.
This launches a headless browser that may execute untrusted code.
Use only with trusted sites.
The result is written to ./output/llms.md by default.
Current release version: 0.1.2.
Prumo generates a llms.md following the llmstxt.org standard:
# FastAPI
> Modern, fast web framework for building APIs with Python.
## Getting Started
- [Installation](https://fastapi.tiangolo.com/tutorial/): How to install and create the first endpoint.
- [First Steps](https://fastapi.tiangolo.com/tutorial/first-steps/): Basic structure of a FastAPI application.
## Request Handling
- [Path Parameters](https://fastapi.tiangolo.com/tutorial/path-params/): Dynamic URL parameters with automatic type validation.Interactive wizard that creates a local .env file with your credentials.
Options:
--force, -f Overwrite an existing .env without prompting
Crawls a documentation site and generates llms.md.
| Option | Default | Description |
|---|---|---|
url |
required | Root URL of the docs site or GitHub repository |
--output, -o |
./output |
Output directory |
--provider, -p |
gemini |
LLM provider: gemini or claude |
--api-key, -k |
env var | LLM provider API key |
--max-pages, -m |
50 |
Maximum pages or files to crawl |
--github |
false |
Use the GitHub API to read .md/.mdx files directly |
--github-token |
env var | GitHub Personal Access Token |
--js |
false |
Use Playwright to render JS-heavy docs |
--docs-base-url, -d |
— | Remap GitHub blob links to the published docs URL |
For each secret, Prumo tries in this order and stops at the first match:
--api-key / --github-token flag → .env file → shell environment variable → error
| JS-rendered sites | Standard mode may return empty pages on JS-heavy docs. Prefer --github first; use --js when no markdown source is available. |
| Playwright mode costs | --js is slower and uses more CPU/RAM because it launches a headless browser for rendering. |
| Large documentation | Crawling is capped at --max-pages to avoid bloated API calls. Increase it if the generated file feels incomplete. |
| Output quality | Depends on the LLM provider and the structure of the source documentation. Gemini 2.5 Flash is the default and works well for most cases. |
| GitHub API rate limits | Authenticated requests are limited to 5,000 per hour. A large repository with --max-pages 200 can consume several hundred requests. |
git clone https://github.com/Dione-b/prumo.git
cd prumo
uv sync
cp .env.example .env # fill in your keysuv run ruff check .
uv run mypy prumo/
uv run pytest tests/ -vContributions are welcome. Before opening a pull request:
- Keep changes focused on a single concern.
- Add or update tests for any behavior changes.
- Make sure
ruff,mypy, andpytestall pass locally. - Describe the motivation and user-facing impact in the PR description.
If you find a bug or want to propose a feature, open an issue first.
MIT — see LICENSE for details.