Codebase → LLM‑friendly Markdown report. Single file, zero‑dependency core, zero bloat.
Install: uv tool install atlas (or pip install . from clone).
atlas # interactive setup
atlas . # all sections, non‑interactive
atlas structure contents ./src # only these sections
atlas -o reports/ --no-file . # stdout only; nothing written to disk
atlas owner/repo # GitHub‑only for remote repos (requires gh)| Flag | Effect |
|---|---|
-C <path> |
chdir first (git‑style) |
-o <dir> |
Output directory (default ./atlas/) |
--no-file |
Skip file output entirely |
-v |
Force stdout even in tty |
-q |
Suppress stderr (level 0) |
--debug |
Verbose stderr (level 2) |
--no-color |
Strip ANSI from stderr |
A codebase is the territory: ground truth, irregular, vast.
An LLM dropped into fifty thousand lines of raw source with no topology is a traveller without a compass.
Atlas produces maps. A map is a deliberate abstraction of the territory for a specific purpose. It is not better than the code; it is useful because it omits. The traveller requests ground truth via tools when needed; the map provides the topology to know where to look.
Current zoom levels:
| Map type | Section | What it is |
|---|---|---|
| Cartographic outline | structure |
File tree. Shape of the land. |
| Full survey | contents |
Every line of every file. Complete territory. |
Future zoom levels:
| Map type | Section | What it is |
|---|---|---|
| Interface map | signatures |
Function/class signatures, public APIs, type stubs. The index of the territory. |
| Relation map | graph |
Import graph, call graph, class hierarchy. Mermaid/DOT. The roads of the territory. |
| Chronicle map | chronicle |
Commit history as narrative. The geological layers. |
You choose the resolution. Run atlas structure for a 500‑token briefing, or atlas contents for a 50k‑token deep dive. An LLM with a good map knows which files to request when it needs the territory itself.
Atlas is not a framework. No decorators, no metaclasses, no plugin loader, no BaseSection inheritance. Indirection is the enemy of readability; the registry is the only extension point.
SECTIONS: dict[str, Callable[[Path], str]] = {}Every section is a pure function f(path: Path) -> str. Register it and the CLI, the interactive picker, and the assembler recognize it automatically.
To add a section:
- Write the generator (e.g.
generate_graph(path) -> str). - Register it:
SECTIONS["graph"] = generate_graph. - Gate it in
applicable_sections()if it depends on external tools or file markers. - Wire it into
generate_report()— see below.
That is the entire ceremony.
| Layer | Symbol(s) | Responsibility |
|---|---|---|
| Generators | generate_*() |
Markdown only. No notion of files, stdout, or CLI. |
| Assembler | generate_report() |
Orchestrates generators into the final document. |
| I/O dispatch | write_files(), stdout logic in main() |
Directories, tty detection, -o, --no-file. |
| CLI / interactive | main(), interactive() |
Argparse, user prompts. |
| Messaging | Messenger |
stderr only. stdout is sacred for machine output. |
We rejected an automatic loop like "\n\n".join(SECTIONS[s](path) for s in sections) because ordering and grouping matter. The Project section must coalesce github and git subsections under a single ## Project heading before ## Structure. Explicit if blocks in generate_report make the document structure obvious at a glance and allow sub‑section grouping without hacks.
Directory traversal in _walk() and _walk_contents() uses a single, explicit tuple sort key. Priority is hardcoded and obvious:
README.md(case‑insensitive)SPEC.mdpyproject.toml- Files before directories
- Alphabetical within each group
If you change what "matters" in a repo, change this tuple.
write_files(report, sections, output_dir, path, msg) splits the report into full.md plus {section}.md per section. It does this by re‑invoking generators, not by parsing full.md. This keeps file writing dumb and generator‑centric; the assembler and dispatch layers never leak into each other.
Single‑threaded, sequential. Generators are independent and could be parallelised later, but the complexity is not yet justified for the current scale.
| Section | Condition | Data source |
|---|---|---|
structure |
Always | _walk() traversal |
contents |
Always | _walk_contents() + _file_content_block() |
git |
.git/ exists and git in $PATH |
fastgit preferred, falls back to git subprocess; cached via @lru_cache |
github |
.git/ remote is GitHub or owner/repo arg, and gh in $PATH |
ghapi preferred, falls back to gh api subprocess |
Traversal & formatting
_walk(path, prefix="")— recursive tree bullets. ExcludesEXCLUDED_NAMES. Returnslist[str]._walk_contents(path, root)— recursive content blocks. Sort key:(0 if readme else 1, 0 if spec else 1, 0 if pyproject else 1, e.is_dir(), e.name.lower())._file_content_block(path, root)— line numbering (widthright‑aligned +│), binary detection via null‑byte sniff in first 8KB, dynamic fence depth via_max_backtick_run(text) + 1.lang_tag(path)— stripstext/x-prefix frommimetypes. Falls back to file extension or"text".is_binary(path)— null byte in first 8KB; treatsOSError/PermissionErroras binary.
Git & GitHub
_git_data(path)—@functools.lru_cache(maxsize=1). Dict of first/last commit (ISO%cI), total count, branch, remote URL, rawgit shortlog -snstring._github_data(owner, repo)— triesghapi, falls back togh apiJSON. Returns empty dict if both fail (graceful degradation)._parse_owner_repo(path_or_ref)— splitsowner/repostrings. Guards against local paths that happen to contain/.run_cmd(cmd, cwd=None)—subprocess.runwith 30s timeout. Returnsstr | None.
I/O & messaging
write_files(...)— writesfull.md+{section}.mdper section by re‑invoking generators.Messenger(level, color)—0=quiet,1=normal,2=debug. Methods:info,warn,error,debug. Writes tosys.stderr; color via ANSI ifisatty()and enabled.applicable_sections(path, msg)— returns ordered list of section keys available for the given path/context.
- Nested fence collision —
_file_content_blockalready counts max consecutive`via_max_backtick_run, but stress‑test against nested code blocks of depth > 3. - Interface map (
signatures) — parse JS/Python withastortree‑sitterto emit signature stubs without bodies. - Relation map (
graph) — static import analysis → Mermaidgraph TD. - Directory exclusion — currently hardcoded
EXCLUDED_NAMES; should accept.atlasignoreor CLI--exclude. - Multiple repo inputs —
atlas owner/repo1 owner/repo2for comparative reports. - Owner/Name inference — currently defaults to
git config user.name; should prefer GitHubownerfrom remote URL when available.
Apache‑2.0