Skip to content

1iis/atlas-alpha

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

atlas

Codebase → LLM‑friendly Markdown report. Single file, zero‑dependency core, zero bloat.

Install: uv tool install atlas (or pip install . from clone).


30‑second usage

atlas                            # interactive setup
atlas .                          # all sections, non‑interactive
atlas structure contents ./src   # only these sections
atlas -o reports/ --no-file .    # stdout only; nothing written to disk
atlas owner/repo                 # GitHub‑only for remote repos (requires gh)
Flag Effect
-C <path> chdir first (git‑style)
-o <dir> Output directory (default ./atlas/)
--no-file Skip file output entirely
-v Force stdout even in tty
-q Suppress stderr (level 0)
--debug Verbose stderr (level 2)
--no-color Strip ANSI from stderr

Philosophy: map and territory

A codebase is the territory: ground truth, irregular, vast.
An LLM dropped into fifty thousand lines of raw source with no topology is a traveller without a compass.

Atlas produces maps. A map is a deliberate abstraction of the territory for a specific purpose. It is not better than the code; it is useful because it omits. The traveller requests ground truth via tools when needed; the map provides the topology to know where to look.

Current zoom levels:

Map type Section What it is
Cartographic outline structure File tree. Shape of the land.
Full survey contents Every line of every file. Complete territory.

Future zoom levels:

Map type Section What it is
Interface map signatures Function/class signatures, public APIs, type stubs. The index of the territory.
Relation map graph Import graph, call graph, class hierarchy. Mermaid/DOT. The roads of the territory.
Chronicle map chronicle Commit history as narrative. The geological layers.

You choose the resolution. Run atlas structure for a 500‑token briefing, or atlas contents for a 50k‑token deep dive. An LLM with a good map knows which files to request when it needs the territory itself.


Programming model: how to extend

Atlas is not a framework. No decorators, no metaclasses, no plugin loader, no BaseSection inheritance. Indirection is the enemy of readability; the registry is the only extension point.

The registry

SECTIONS: dict[str, Callable[[Path], str]] = {}

Every section is a pure function f(path: Path) -> str. Register it and the CLI, the interactive picker, and the assembler recognize it automatically.

To add a section:

  1. Write the generator (e.g. generate_graph(path) -> str).
  2. Register it: SECTIONS["graph"] = generate_graph.
  3. Gate it in applicable_sections() if it depends on external tools or file markers.
  4. Wire it into generate_report() — see below.

That is the entire ceremony.

Architectural boundaries

Layer Symbol(s) Responsibility
Generators generate_*() Markdown only. No notion of files, stdout, or CLI.
Assembler generate_report() Orchestrates generators into the final document.
I/O dispatch write_files(), stdout logic in main() Directories, tty detection, -o, --no-file.
CLI / interactive main(), interactive() Argparse, user prompts.
Messaging Messenger stderr only. stdout is sacred for machine output.

Why explicit assembly?

We rejected an automatic loop like "\n\n".join(SECTIONS[s](path) for s in sections) because ordering and grouping matter. The Project section must coalesce github and git subsections under a single ## Project heading before ## Structure. Explicit if blocks in generate_report make the document structure obvious at a glance and allow sub‑section grouping without hacks.

Sorting and traversal

Directory traversal in _walk() and _walk_contents() uses a single, explicit tuple sort key. Priority is hardcoded and obvious:

  1. README.md (case‑insensitive)
  2. SPEC.md
  3. pyproject.toml
  4. Files before directories
  5. Alphabetical within each group

If you change what "matters" in a repo, change this tuple.

File output

write_files(report, sections, output_dir, path, msg) splits the report into full.md plus {section}.md per section. It does this by re‑invoking generators, not by parsing full.md. This keeps file writing dumb and generator‑centric; the assembler and dispatch layers never leak into each other.

Concurrency

Single‑threaded, sequential. Generators are independent and could be parallelised later, but the complexity is not yet justified for the current scale.


Section reference

Section Condition Data source
structure Always _walk() traversal
contents Always _walk_contents() + _file_content_block()
git .git/ exists and git in $PATH fastgit preferred, falls back to git subprocess; cached via @lru_cache
github .git/ remote is GitHub or owner/repo arg, and gh in $PATH ghapi preferred, falls back to gh api subprocess

Internal symbol quick‑reference

Traversal & formatting

  • _walk(path, prefix="") — recursive tree bullets. Excludes EXCLUDED_NAMES. Returns list[str].
  • _walk_contents(path, root) — recursive content blocks. Sort key: (0 if readme else 1, 0 if spec else 1, 0 if pyproject else 1, e.is_dir(), e.name.lower()).
  • _file_content_block(path, root) — line numbering (width right‑aligned + ), binary detection via null‑byte sniff in first 8KB, dynamic fence depth via _max_backtick_run(text) + 1.
  • lang_tag(path) — strips text/x- prefix from mimetypes. Falls back to file extension or "text".
  • is_binary(path) — null byte in first 8KB; treats OSError / PermissionError as binary.

Git & GitHub

  • _git_data(path)@functools.lru_cache(maxsize=1). Dict of first/last commit (ISO %cI), total count, branch, remote URL, raw git shortlog -sn string.
  • _github_data(owner, repo) — tries ghapi, falls back to gh api JSON. Returns empty dict if both fail (graceful degradation).
  • _parse_owner_repo(path_or_ref) — splits owner/repo strings. Guards against local paths that happen to contain /.
  • run_cmd(cmd, cwd=None)subprocess.run with 30s timeout. Returns str | None.

I/O & messaging

  • write_files(...) — writes full.md + {section}.md per section by re‑invoking generators.
  • Messenger(level, color)0=quiet, 1=normal, 2=debug. Methods: info, warn, error, debug. Writes to sys.stderr; color via ANSI if isatty() and enabled.
  • applicable_sections(path, msg) — returns ordered list of section keys available for the given path/context.

Known gaps & next steps

  1. Nested fence collision_file_content_block already counts max consecutive ` via _max_backtick_run, but stress‑test against nested code blocks of depth > 3.
  2. Interface map (signatures) — parse JS/Python with ast or tree‑sitter to emit signature stubs without bodies.
  3. Relation map (graph) — static import analysis → Mermaid graph TD.
  4. Directory exclusion — currently hardcoded EXCLUDED_NAMES; should accept .atlasignore or CLI --exclude.
  5. Multiple repo inputsatlas owner/repo1 owner/repo2 for comparative reports.
  6. Owner/Name inference — currently defaults to git config user.name; should prefer GitHub owner from remote URL when available.

License

Apache‑2.0

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages