memorialiste

La mémorialiste visits your repository, reads what changed since its last visit, writes the missing chapters of your project's story, and leaves a merge request behind.

A one-shot CLI tool that keeps documentation up-to-date with source code changes. Each run computes a git diff since the last documentation update, calls an OpenAI-compatible LLM to rewrite the affected docs, and opens a Merge/Pull Request.

How it works (and why the manifest matters)

memorialiste does NOT regenerate docs from scratch every time. Instead it keeps each doc file in sync with a specific slice of the source tree, and only redoes the work when that slice changed.

The link between docs and code lives in a single file: docs/.docstructure.yaml. It lists every doc file you want managed by the tool, and tells it which source paths each doc cares about:

docs:
  - path: docs/user/guide.md
    audience: end users
    covers: [cmd/, cliconfig/]
    description: User-facing CLI guide.       # human-only note, NOT sent to LLM
    prompt: |                                 # this IS sent to LLM
      Audience: engineers who want to USE the tool. Do NOT document internal
      packages, contributor workflow, or release process.

  - path: docs/architecture.md
    audience: developers
    covers: [context/, generate/, output/, platform/]
    description: Internal architecture: packages, abstractions, data flow.
    prompt: |
      Audience: developers extending memorialiste. Include a Mermaid component
      diagram. Do NOT include user-facing CLI flags or installation steps.

Heads-up on description: — this field is a human-only annotation. It is parsed but not sent to the LLM. To steer the model (scope rules, negative prompts, "do not include X"), use the per-doc prompt: field (appended as an extra user message after the diff) or system_prompt: (replaces the built-in system prompt). See Per-Doc Overrides.

Each run, for every entry, memorialiste:

Reads the generated_at watermark from the doc file's frontmatter (the commit SHA the doc was last regenerated against).
Computes a git diff filtered to that entry's covers paths only.
If the diff is empty, skips this doc entirely — no LLM call, no wasted tokens.
Otherwise feeds the diff + the current doc body to the LLM, writes the refreshed body back with the bumped watermark.

This is why the manifest is non-negotiable: without per-doc covers, every doc would see every change, the LLM would burn tokens deciding what's relevant, and the user guide would get rewritten whenever you edit internal plumbing.

The audience field is also used to name the auto-created branch (docs/memorialiste-developers, docs/memorialiste-end-users, …) so MR lists stay readable.

Installation

docker pull idconstruct/memorialiste:latest

Pin a specific version for reproducibility:

docker pull idconstruct/memorialiste:v0.3.1

Usage

GitLab CI

update-docs:
  image: idconstruct/memorialiste:latest
  variables:
    MEMORIALISTE_AST_CONTEXT: "true"
  script:
    - memorialiste
      --provider-url "$OLLAMA_URL"
      --model "qwen3-coder:30b"
      --platform gitlab
      --platform-token "$GITLAB_TOKEN"
      --project-id "$CI_PROJECT_ID"
      --dry-run=false
  rules:
    - if: $CI_COMMIT_BRANCH == "main"

GitHub Actions

- name: Update docs
  run: |
    docker run --rm --network=host \
      -v ${{ github.workspace }}:/repo \
      -e MEMORIALISTE_PLATFORM_TOKEN=${{ secrets.GITHUB_TOKEN }} \
      idconstruct/memorialiste:latest \
      --repo /repo \
      --provider-url "$OLLAMA_URL" \
      --model qwen3-coder:30b \
      --platform github \
      --project-id "${{ github.repository }}" \
      --dry-run=false \
      --ast-context

Local dry-run

docker run --rm --network=host --user $(id -u):$(id -g) \
  -v $(pwd):/repo \
  idconstruct/memorialiste:latest \
  --repo /repo \
  --provider-url http://localhost:11434 \
  --model qwen3-coder:30b \
  --ast-context

Using Claude, Gemini, GPT-4 and other models

memorialiste talks to LLMs exclusively via the OpenAI-compatible /v1/chat/completions API. There is no native Anthropic / Google / OpenAI SDK. To use any non-Ollama model, run an OpenAI-compatible proxy that translates requests to the target provider's native API. memorialiste itself needs zero changes — you point --provider-url at the proxy and adjust --model.

Self-hosted: LiteLLM

LiteLLM supports ~100 models (Claude, Gemini, Bedrock, Vertex AI, etc.) and runs as a Docker sidecar.

# docker-compose.yml
services:
  litellm:
    image: ghcr.io/berriai/litellm:main-latest
    ports: ["4000:4000"]
    environment:
      ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}
      OPENAI_API_KEY:    ${OPENAI_API_KEY}

memorialiste --provider-url http://litellm:4000 --model claude-3-5-sonnet-20241022

Self-hosted: one-api

one-api — aggregator with a web UI, same OpenAI-compat surface. Point --provider-url at its base URL.

Cloud: OpenRouter

OpenRouter routes to Claude, GPT-4, Gemini and many others via a single OpenAI-compat endpoint.

memorialiste \
  --provider-url https://openrouter.ai/api/v1 \
  --api-key "$OPENROUTER_API_KEY" \
  --model anthropic/claude-3.5-sonnet

The --api-key value is sent as Authorization: Bearer <key> to the provider. Combine with --model-params to tune temperature, top_p, etc.

CLI Flags & Environment Variables

All flags can be set via environment variables (uppercase snake_case with MEMORIALISTE_ prefix). Flags take precedence over env vars.

Flag	Env var	Default	Description
`--provider-url`	`MEMORIALISTE_PROVIDER_URL`	`http://localhost:11434`	OpenAI-compatible base URL
`--model`	`MEMORIALISTE_MODEL`	`qwen3-coder:30b`	Model tag
`--model-params`	`MEMORIALISTE_MODEL_PARAMS`	`""`	Extra model params JSON (e.g. `{"temperature":0.2}`)
`--system-prompt`	`MEMORIALISTE_SYSTEM_PROMPT`	built-in	System prompt literal OR `@path/to/file`
`--prompt`	`MEMORIALISTE_PROMPT`	`""`	Additional user prompt literal OR `@path/to/file` (appended after diff)
`--language`	`MEMORIALISTE_LANGUAGE`	`english`	Output language; substituted into `{language}` placeholder
`--api-key`	`MEMORIALISTE_API_KEY`	`""`	Bearer token for the LLM provider
`--doc-structure`	`MEMORIALISTE_DOC_STRUCTURE`	`docs/.docstructure.yaml`	Path to the doc structure manifest
`--repo`	`MEMORIALISTE_REPO`	`.`	Local git repository root
`--token-budget`	`MEMORIALISTE_TOKEN_BUDGET`	`12000`	Max diff tokens before summarisation kicks in
`--dry-run`	`MEMORIALISTE_DRY_RUN`	`true`	Write files locally; skip branch+commit+platform
`--branch-prefix`	`MEMORIALISTE_BRANCH_PREFIX`	`docs/memorialiste-`	Branch name prefix
`--ast-context`	`MEMORIALISTE_AST_CONTEXT`	`false`	Enable AST-enriched diff context via grep-ast
`--code-search`	`MEMORIALISTE_CODE_SEARCH`	`false`	Expose the AST `search_code` tool to the LLM (function calling)
`--code-search-max-turns`	`MEMORIALISTE_CODE_SEARCH_MAX_TURNS`	`10`	Max tool-call turns before aborting
`--repo-meta`	`MEMORIALISTE_REPO_META`	`basic`	Repo metadata level: `basic` or `extended`
`--watermarks-file`	`MEMORIALISTE_WATERMARKS_FILE`	`""`	Sidecar YAML file storing `generated_at` watermarks; when empty, watermarks live in doc frontmatter
`--llm-timeout`	`MEMORIALISTE_LLM_TIMEOUT`	`5m`	Per-request timeout for LLM provider HTTP calls (e.g. `5m`, `30s`, `10m30s`). Overridable per-doc via manifest `llm_timeout`.
`--validate`	`MEMORIALISTE_VALIDATE`	`warn`	Post-generation Markdown validator: `off`, `warn`, `fix`, or `strict`. See Output Validation.
`--validate-batch`	`MEMORIALISTE_VALIDATE_BATCH`	`warn`	Cross-doc / source-of-truth validator: `off`, `warn`, or `strict`. See Batch rules.
`--platform-timeout`	`MEMORIALISTE_PLATFORM_TIMEOUT`	`60s`	Per-request timeout for GitLab/GitHub HTTP calls and git push
`--ast-parse-timeout`	`MEMORIALISTE_AST_PARSE_TIMEOUT`	`5s`	Per-file timeout for Go AST parsing inside the `search_code` tool
`--platform`	`MEMORIALISTE_PLATFORM`	`gitlab`	`gitlab` or `github`
`--platform-url`	`MEMORIALISTE_PLATFORM_URL`	platform default	Base URL for self-hosted instances
`--platform-token`	`MEMORIALISTE_PLATFORM_TOKEN`	required (non-dry-run)	Platform access token
`--project-id`	`MEMORIALISTE_PROJECT_ID`	required (non-dry-run)	GitLab project ID or `owner/repo`
`--base-branch`	`MEMORIALISTE_BASE_BRANCH`	`main`	Target branch for the opened MR/PR
`--version`	—	—	Print version and exit
`--help`	—	—	Show grouped help

Watermark Format

Every generated doc file carries YAML frontmatter:

---
generated_at: abc1234def5
---

# Your Doc Title
...

The tool reads generated_at to compute the diff since the last run. A file without frontmatter is treated as never generated (full-repo diff scoped to the entry's covers paths).

Sidecar watermarks (clean Markdown)

To keep generated Markdown free of frontmatter, set watermarks_file in the manifest (globally under defaults: or per-entry):

defaults:
  watermarks_file: .memorialiste-watermarks.yaml
docs:
  - path: docs/architecture.md
    covers: [internal/]

In sidecar mode each doc file is written verbatim and the generated_at SHA per file is stored in the sidecar YAML:

- path: docs/architecture.md
  generated_at: abc1234def5

Migration between modes is bidirectional and lossless in one transition run: if a doc has frontmatter but the sidecar lacks a record, the frontmatter is used; if the doc has no frontmatter but another entry's sidecar holds a record, that one is used. The next write places the watermark in the canonical location declared by the manifest.

Per-Doc Overrides

Entries (and an optional defaults: block) may override the following fields, which are otherwise taken from CLI flags or env vars:

model, model_params, language, system_prompt, prompt, ast_context, code_search, code_search_max_turns, repo_meta, token_budget, watermarks_file, llm_timeout, validate.

Precedence (lowest → highest): kong defaults < manifest defaults < manifest per-doc entry < env var (MEMORIALISTE_*) < explicit CLI flag.

Steering the LLM per doc

Three fields actually reach the model:

Field	What it does	When to use
`system_prompt`	Replaces the built-in system prompt entirely. Supports `@path/to/file.md` to load from disk.	You want full control over the model's role and output format (e.g. CONTRIBUTING.md style, AI-readable JSON).
`prompt`	Appended as an extra user message after the diff context. Built-in system prompt stays. Supports `@path/to/file.md` to load from disk — handy for sharing scope rules across docs.	Scope rules, negative prompts ("do NOT include env vars"), audience hints. This is what you want 90% of the time.
`description`	Human-only annotation. Not sent to the LLM.	Documenting the manifest for the next reader.

Mixing per-doc models with prompt: lets you tune cost vs. quality per audience — e.g. small model for the lightweight user guide, large model for the architecture doc:

defaults:
  model: gpt-oss:120b

docs:
  - path: docs/user-guide.md
    model: gemma3:27b              # cheaper, terse user-facing prose
    covers: [cmd/, cliconfig/]
    prompt: |
      Audience: end users. Do NOT include env vars, internal package names,
      or architecture diagrams.

  - path: docs/architecture.md
    # inherits model from defaults
    covers: [context/, generate/, output/, platform/]
    prompt: |
      Include a Mermaid sequenceDiagram for the generation pipeline. List
      every package's public surface. Do NOT cover CLI usage.

Gotcha: the --model CLI flag wins over manifest per-doc model: (CLI explicit beats manifest entry, see precedence above). If you want per-doc models from the manifest, do not pass --model on the command line and unset MEMORIALISTE_MODEL.

Output Validation

LLMs regularly leak Markdown artifacts that the system prompt can't fully prevent. The built-in validator runs after generation and before write.

Modes (CLI --validate=MODE, env MEMORIALISTE_VALIDATE, or per-doc validate.mode in the manifest):

Mode	Behaviour
`off`	Skip validation.
`warn` (default)	Log every finding; continue.
`fix`	Apply safe auto-fixes (currently: LaTeX inline-math macros → Unicode); log residual findings; continue.
`strict`	Same as `warn`, but exit non-zero if any finding remains.

Built-in rules:

ID	What it catches	Auto-fix
`markdown.bold-balanced`	Odd number of `**` markers on a line.	—
`markdown.fence-closed`	Unclosed fenced code block.	—
`markdown.table-columns`	Table row with a different column count from the header.	—
`latex.no-inline-math`	$...$ LaTeX inline math.	Known macros (`\to`, `\rightarrow`, `\leftarrow`, `\cdot`, `\Rightarrow`) → Unicode glyph.
`latex.no-display-math`	`$$...$$` LaTeX display math.	—
`markdown.no-html-line-break`	Raw `<br>` / `<br/>` in text.	—

Per-doc override (mode + rule allowlist):

docs:
  - path: docs/architecture.md
    covers: [internal/]
    validate:
      mode: fix
      disable: [markdown.no-html-line-break]

Batch rules

Per-doc rules can't catch semantic drift between docs of a single run (e.g. INJECTION_DETECTION_THRESHOLD = 0.90 in admin.md and 0.8 in security.md) or doc-vs-source mismatch. After all docs are generated, a separate batch pass runs:

ID	What it catches
`defaults.cross-doc-consistency`	The same env var has different defaults in two or more docs of the run.
`defaults.source-of-truth`	A doc's default disagrees with the Go source (`env:"NAME"` + `envDefault:"X"` / `default:"X"` struct tags; supports inhuman/config and kelseyhightower/envconfig).
`manifest.uncovered-changes`	A file exists in the repo that no entry's `covers:` matches — typically "added a new module / Terraform resource folder but forgot to wire it into the manifest".

Both rules extract (env, default) pairs from Markdown tables only — prose and fenced code blocks are ignored to avoid false positives on example snippets.

Modes (CLI --validate-batch, env MEMORIALISTE_VALIDATE_BATCH, or defaults.validate.batch_mode in the manifest):

Mode	Behaviour
`off`	Skip batch pass entirely.
`warn` (default)	Log findings; continue.
`strict`	Exit non-zero if any finding remains — recommended for CI.

There is intentionally no fix mode for batch — when the doc and source disagree, only a human can decide which one is wrong.

defaults.validate.uncovered_ignore silences manifest.uncovered-changes on intentionally-uncovered paths. Items are gitignore-style patterns; prefix @ to load patterns from a file (.gitignore / .dockerignore / custom):

defaults:
  validate:
    batch_mode: strict
    uncovered_ignore:
      - "@.gitignore"      # reuse existing ignore rules
      - "@.dockerignore"
      - "vendor/"
      - ".github/"
      - "*.gen.go"
      - "README.md"

Per-doc opt-out (e.g. to exclude a changelog from cross-doc checks):

defaults:
  validate:
    batch_mode: strict
docs:
  - path: docs/changelog.md
    covers: [.]
    validate:
      disable: [defaults.cross-doc-consistency]

Vocabulary Extraction

Small local LLMs invent plausible-sounding identifier names — fake Prometheus metrics, hallucinated OTel spans, made-up env-var enum values. The vocabulary feature stops this by extracting the real identifiers from the source tree and (a) prepending them to the system prompt, and (b) flagging the doc afterwards if it references anything else.

How it works

For each doc entry the tool walks covers: paths, applies the configured regex rules, builds a per-category list of literal identifiers, and prepends a ## VERIFIED VOCABULARY block to the system prompt:

## VERIFIED VOCABULARY

Use ONLY these literal strings — do NOT invent variants, prefixes, or
plural/singular forms. If a name you want is not listed below, omit it.

### prometheus_metrics
- jarvis_agent_events_writer_buffer_used
- jarvis_forced_summary_invocations_total

### otel_spans
- llm.call
- mcp.tool

Built-in defaults target a typical Go project (metrics, OTel spans, env vars, CLI flags, registered handlers). They live in vocabulary/defaults.yaml and are embedded in the binary.

Configuration

defaults:
  vocabulary:
    enabled: true              # default true
    inject_into_prompt: true   # default true
    rules:                     # REPLACES built-in defaults wholesale
      - name: prometheus_metrics
        scope: '*/metrics.go'
        pattern: 'Name:\s*"([a-z_][a-z0-9_]*)"'
      - name: env_vars
        scope: 'cliconfig/*.go'
        pattern: '`env:"([A-Z_][A-Z0-9_]*)".*envDefault:"([^"]*)"`'
        format: '{1} (default: {2})'
        max_entries: 100
    extra_rules:               # APPENDED to defaults
      - name: mcp_tools
        scope: 'internal/mcp/*.go'
        pattern: 'd\.Register\("([a-z][a-z0-9_-]+)"'

Each rule needs name, scope (glob), pattern (RE2 regex with ≥1 capture group). Optional: capture (default 1), format (default {1}), max_entries (default 200).

When a category exceeds max_entries, identifiers from files changed since the doc's last watermark come first; the rest are alphabetical. The system warns once per truncated category and once per zero-match rule.

Unknown-identifier validator

A new batch rule doc.unknown-identifiers runs in the existing --validate-batch=off|warn|strict pipeline. After generation it extracts identifier-shaped tokens from the doc — backtick-wrapped lowercase identifiers, UPPER_SNAKE, and dotted word.word (span names) — and flags any that are neither in the verified vocabulary nor present as a whole-word match in any covers: file.

defaults:
  validate:
    batch_mode: strict
    unknown_identifiers:
      enabled: true        # opt-in (default false)
      min_length: 6        # default 6
      ignore:              # gitignore-style + @file refs
        - '@.gitignore'
        - 'foo_*'
      categories:          # optional allowlist
        - prometheus_metrics
        - otel_spans

Use validate.disable: [doc.unknown-identifiers] on a specific doc to exempt it (e.g. changelogs).

In CI: pair vocabulary.enabled: true with unknown_identifiers.enabled: true + batch_mode: strict to block MR/PR creation when the LLM ships hallucinated identifiers.

Repository Metadata

The LLM receives a compact metadata block prepended to its prompt so it can write accurate version numbers:

=== Repository metadata ===
Latest tag: v0.3.1
HEAD: 53ebb4b...
Short SHA: 53ebb4b
=== End metadata ===

--repo-meta=extended adds remote URL (token-redacted), branch, last 5 tags with dates — useful for CHANGELOG / release-notes documents.

AST-Enriched Context

--ast-context runs every changed file through grep-ast's TreeContext renderer, so the model sees enclosing function signatures and surrounding code structure instead of raw +/- lines. Significantly improves quality for code-heavy docs.

AST Code Search

--code-search exposes a search_code function-calling tool to the LLM. Mid-generation the model may ask for any Go declaration in the repo by regex name match; the tool returns the matched function, method, type, const, or var bodies with file paths and line ranges. Useful when the diff alone lacks context (e.g. a doc covering one package references symbols defined in another).

Bounded by --code-search-max-turns (default 10) and a per-file 5s parse timeout. Provider must implement OpenAI-style function calling and emit proper tool_calls (not stringified JSON in content). Verified working on local Ollama: qwen3:14b, qwen3.6:35b, gpt-oss:120b. Models that return finish_reason: stop with a JSON blob in content (e.g. qwen2.5-coder:7b, sometimes qwen3-coder:30b with large contexts) do not follow the API correctly — memorialiste prints a WARNING — the model did not call any tools line and proceeds with diff-only output. If the provider rejects a tools-shaped request entirely, memorialiste fails fast with an actionable error suggesting --code-search=false.

Tip — when to combine with --ast-context: AST context already embeds the enclosing function/method around every changed line, so tool-capable models often skip search_code entirely when AST is on. Use --code-search ALONE when you want the model to pull in declarations referenced by the diff but defined far away from it; use both flags together for the most thorough context (the model picks what it needs).

Architecture Diagrams

The built-in system prompt encourages the LLM to emit Mermaid diagrams (```mermaid fenced blocks) when the diff touches architecture, data flow, or component relationships. GitLab and GitHub render Mermaid natively in Markdown previews. No rendering toolchain required.

Runtime Dependencies

The Docker image bundles:

Tool	Version	Purpose
`grep-ast`	0.5.0	AST-enriched diff context (`--ast-context`)
`tree-sitter`	0.20.4	Required by grep-ast
`tree-sitter-languages`	1.10.2	Language grammars for grep-ast

These are only invoked when --ast-context is enabled.

Examples

See examples/ for ready-to-run scenarios:

Scenario	What it shows
`01-user-guide`	Plain end-user guide; built-in prompt; minimal config
`02-architecture`	Developer-facing architecture overview with AST + Mermaid
`03-developer-onboarding`	Custom system prompt for contributor onboarding
`04-ai-readable`	Dense LLM-readable project context (think `CLAUDE.md`)
`05-russian-docs`	`--language russian` (works for any language)
`06-changelog`	CHANGELOG via `--repo-meta=extended`
`07-codesearch`	`--code-search` — model pulls Go declarations via function calling
`ci-gitlab`	Drop-in `.gitlab-ci.yml`
`ci-github`	Drop-in GitHub Actions workflow

Every doc-scenario folder contains an executable run.sh that you can invoke locally against a running Ollama.

Library Usage

memorialiste is also a Go library — use manifest, context, generate, output, and platform packages directly. See package godoc.

import (
    "context"
    "github.com/inhuman/memorialiste/manifest"
    mctx "github.com/inhuman/memorialiste/context"
)

m, _ := manifest.Parse("docs/.docstructure.yaml")
dc, _ := mctx.Assemble(context.Background(), m.Docs[0], mctx.Options{
    RepoPath:    ".",
    ASTContext:  true,
    TokenBudget: 12000,
})
fmt.Println(dc.Diff)

Contributing

Bug reports, feature requests, and pull requests are welcome.

Bug? Use the bug report template — capture the exact CLI invocation, env vars, and log lines.
Feature idea? Use the feature request template — describe the scenario, not just the wish.
Want to send a PR? Read CONTRIBUTING.md first — it covers the dev loop, project conventions, commit-message style, and the pre-release local Docker smoke that every release tag must pass.

Quick dev loop:

git clone https://github.com/inhuman/memorialiste.git
cd memorialiste
go test ./...
go vet ./...
docker build -t memorialiste:dev --build-arg VERSION=dev .

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.github		.github
cliconfig		cliconfig
cmd/memorialiste		cmd/memorialiste
codesearch		codesearch
context		context
docs		docs
effective		effective
examples		examples
generate		generate
internal/fake		internal/fake
manifest		manifest
output		output
platform		platform
provider		provider
validate		validate
vendor		vendor
vocabulary		vocabulary
watermarks		watermarks
.dockerignore		.dockerignore
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Dockerfile.dev		Dockerfile.dev
README-ru.md		README-ru.md
README.md		README.md
doc.go		doc.go
go.mod		go.mod
go.sum		go.sum

Folders and files

Latest commit

History

Repository files navigation

memorialiste

How it works (and why the manifest matters)

Installation

Usage

GitLab CI

GitHub Actions

Local dry-run

Using Claude, Gemini, GPT-4 and other models

Self-hosted: LiteLLM

Self-hosted: one-api

Cloud: OpenRouter

CLI Flags & Environment Variables

Watermark Format

Sidecar watermarks (clean Markdown)

Per-Doc Overrides

Steering the LLM per doc

Output Validation

Batch rules

Vocabulary Extraction

How it works

Configuration

Unknown-identifier validator

Repository Metadata

AST-Enriched Context

AST Code Search

Architecture Diagrams

Runtime Dependencies

Examples

Library Usage

Contributing

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages