Skip to content

[STORY] Wiki Cache Migration to PostgreSQL #429

@jsbattig

Description

@jsbattig

Part of: #408

Story: Wiki Cache Migration to PostgreSQL

Part of: #408

[Conversation Reference: "wiki cache: move to postgres, render once, read from cache always"]

Story Overview

Objective: Move the rendered wiki cache from local filesystem storage to PostgreSQL so that all cluster nodes share the same rendered wiki content. When a wiki page is rendered (markdown to HTML), the result is stored in PostgreSQL and served from there on all subsequent requests from any node. Render once, read from cache always.

User Value: Wiki pages are rendered once (on whichever node handles the first request) and all subsequent requests from any cluster node serve the cached result instantly. No duplicate rendering across nodes, consistent wiki content cluster-wide.

Acceptance Criteria

AC1: Wiki Cache Table in PostgreSQL

Scenario: Rendered wiki content is stored in a PostgreSQL table.

Given a wiki page is rendered from markdown source
When the rendered HTML is cached
Then it is stored in a wiki_cache table in PostgreSQL
And the table stores: repo_alias, page_path, rendered_html, source_hash, rendered_at
And the source_hash allows detecting when the markdown source has changed
And the rendered_html is served on subsequent requests without re-rendering

Technical Requirements:

  • wiki_cache table in PostgreSQL schema (add to migration files)
  • Columns: repo_alias (TEXT), page_path (TEXT), rendered_html (TEXT), source_hash (TEXT), rendered_at (TIMESTAMPTZ), metadata (JSONB)
  • Composite primary key: (repo_alias, page_path)
  • WikiCacheBackend Protocol and PostgreSQL implementation
  • Source hash: SHA256 of the markdown source content

AC2: Cache-First Read Pattern

Scenario: Wiki page requests check cache before rendering.

Given a request for wiki page /repo-alias/docs/getting-started.md
When the wiki service handles the request
Then it first checks the wiki_cache table for (repo_alias, page_path)
And if a cached entry exists and source_hash matches current source
Then it returns the cached rendered_html (no rendering)
And if no cache entry exists or source_hash differs
Then it renders the markdown, stores result in wiki_cache, and returns it

Technical Requirements:

  • Cache lookup: SELECT rendered_html FROM wiki_cache WHERE repo_alias=%s AND page_path=%s AND source_hash=%s
  • Cache miss: render markdown, INSERT...ON CONFLICT DO UPDATE into wiki_cache
  • Source hash comparison ensures stale cache is detected when source changes
  • Rendering only happens once per source version, regardless of which node

AC3: Cache Invalidation on Source Change

Scenario: When wiki source files change (repo refresh), cache entries are invalidated.

Given a golden repo is refreshed and wiki source files have changed
When a query requests a wiki page whose source has changed
Then the source_hash in the cache no longer matches the new source
And the page is re-rendered and the cache is updated
And subsequent requests get the new rendered content

Technical Requirements:

  • Source hash computed from current file content on each request (cheap, <1ms for typical markdown)
  • No explicit invalidation needed -- hash mismatch triggers re-render
  • Old cache entries for deleted pages cleaned up periodically

AC4: Bulk Cache Cleanup

Scenario: Stale cache entries for deleted repos or pages are cleaned up.

Given wiki cache entries exist for a repository that has been removed
When the periodic cleanup runs
Then cache entries for non-existent repos are deleted
And cache entries for non-existent pages within active repos are deleted

Technical Requirements:

  • Cleanup: DELETE FROM wiki_cache WHERE repo_alias NOT IN (SELECT alias_name FROM global_repos)
  • Page-level cleanup during repo refresh (optional, source_hash handles it)
  • Cleanup integrated with DataRetentionScheduler or separate periodic task

AC5: SQLite Backend for Standalone Mode

Scenario: In standalone mode, wiki cache uses SQLite or local storage.

Given the server is running in standalone mode
When wiki caching is used
Then it uses a SQLite-backed WikiCacheBackend
And behavior is identical to the PostgreSQL version
And the Protocol interface is the same

Technical Requirements:

  • WikiCacheSqliteBackend implementation
  • Same WikiCacheBackend Protocol as PostgreSQL version
  • Factory (Story 9) creates appropriate backend based on mode
  • wiki_cache table in local SQLite database

Implementation Status

  • Core implementation complete
  • Unit tests passing
  • Integration tests passing
  • E2E tests passing
  • Code review approved
  • Manual E2E testing completed
  • Documentation updated

Technical Implementation Details

Schema

CREATE TABLE wiki_cache (
    repo_alias TEXT NOT NULL,
    page_path TEXT NOT NULL,
    rendered_html TEXT NOT NULL,
    source_hash TEXT NOT NULL,
    rendered_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    metadata JSONB,
    PRIMARY KEY (repo_alias, page_path)
);

CREATE INDEX idx_wiki_cache_repo ON wiki_cache (repo_alias);

File Structure

src/code_indexer/server/storage/postgres/
    wiki_cache_backend.py      # WikiCachePostgresBackend

src/code_indexer/server/storage/
    protocols.py               # Add WikiCacheBackend Protocol

Cache Flow

request for wiki page
    |
    v
compute source_hash from current markdown file
    |
    v
SELECT from wiki_cache WHERE repo_alias=? AND page_path=? AND source_hash=?
    |
    +-- found? --> return cached rendered_html
    |
    +-- not found? --> render markdown to HTML
                           |
                           v
                    INSERT/UPDATE wiki_cache
                           |
                           v
                    return rendered_html

Migration File

Add wiki_cache table to migration system (new migration file if 001 already applied, or include in 001 if not yet deployed).

Testing Requirements

  • Automated: Cache stores rendered HTML and returns it on subsequent requests.
  • Automated: Source hash change triggers re-render and cache update.
  • Automated: Cache miss triggers render and cache store.
  • Automated: Bulk cleanup removes entries for deleted repos.
  • Automated: Both SQLite and PostgreSQL backends satisfy the Protocol.
  • Manual E2E: In cluster mode, request a wiki page on Node A (triggers render), request same page on Node B (should be instant cache hit from PostgreSQL).

Definition of Done

  • wiki_cache table in PostgreSQL schema
  • WikiCachePostgresBackend satisfies WikiCacheBackend Protocol
  • WikiCacheSqliteBackend for standalone mode
  • Render-once-serve-everywhere pattern verified across nodes
  • Source hash invalidation works on content change
  • Bulk cleanup for deleted repos/pages
  • All tests pass

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions