Skip to content

jagoff/ChromaInspector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ChromaInspector

Minimal, YAML-configured Streamlit admin for a local ChromaDB (persistent mode). Browse collections, filter records by metadata, inspect documents by ID, and watch a real-time dashboard fed by obsidian-rag telemetry.

Built to work alongside a live Chroma writer (e.g. rag index, rag watch) — opens the SQLite with ?mode=ro&immutable=1, so concurrent reads are safe and never block the writer.

Run

cd ChromaInspector
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
streamlit run app.py

Opens at http://localhost:8501 (or 8502 if 8501 is busy).

Features

Dashboard (landing page)

Four-column KPIs — collections, total embeddings, dimension, DB size on disk. Per-collection bar chart with empty-collection filtering. Deep-dive: metadata key coverage + top string values.

Chat performance (auto-refresh 3 s)

Parsed from ~/.local/share/obsidian-rag/web.log [chat-timing] lines:

  • p50 / p95 / p99 of total, ttft, retrieve, prefill, decode
  • Tokens/sec throughput (p50, p95)
  • Latency line chart + stacked retrieve/prefill/decode bars
  • Model usage distribution
  • Slowest-10 expander

Retrieval quality (auto-refresh 30 s)

From eval.jsonl — hit@5, MRR, chain_success for both singles and chains, with trend line chart.

User feedback (auto-refresh 10 s)

From feedback.jsonl — 👍 / 👎 counts + ratio, top 👎 reasons, expander with recent negative ratings.

Query quality (auto-refresh 5 s)

From queries.jsonl — top_score distribution histogram, command usage, hot notes (most-retrieved paths), query gaps (weak queries with top_score < 0.10).

System resources (auto-refresh 5 s)

From rag_{cpu,memory}.jsonl — area charts of CPU % and memory (MB) by category, plus top-memory-consumers table.

Per-view modes

Each view in config.yaml exposes three modes in the sidebar:

  • Browse — paginate records with an optional where JSON filter.
  • Filter — metadata-only filter (semantic search only works if the config's embedding function matches the collection's).
  • Inspect — fetch a single document by ID.

Config

Everything lives in config.yaml. Edit it, save, refresh the page.

  • chroma.mode: persistent — points at ~/.local/share/obsidian-rag/chroma by default.
  • chroma.mode: http — swap to hit a remote Chroma server.
  • views[] — each entry becomes a sidebar item. display_fields picks which metadata columns show up in tables.
  • saved_queries[] — one-click presets for the filter view.
  • features — write paths (allow_delete, allow_add) are off by default.

Architecture

  • app.py — Streamlit entrypoint + sidebar navigation.
  • src/chroma_client.py — SQLite-direct reader for chroma.sqlite3 (no ChromaDB client, avoids the single-process constraint of PersistentClient).
  • src/metrics.py — tailing parsers for obsidian-rag telemetry (web.log, queries.jsonl, eval.jsonl, feedback.jsonl, rag_{cpu,memory}.jsonl).
  • src/views/ — one module per page (collections.py, dashboard.py, browse.py, search.py, inspect.py).

Real-time is driven by @st.fragment(run_every=...) — only telemetry fragments re-query logs on each tick; ChromaDB itself is opened once (snapshot read).

Notes

  • The bundled config.yaml targets obsidian-rag's collections (obsidian_notes_v9_*, obsidian_urls_v1_*). Adjust paths / collection names for other Chroma deployments.
  • Chroma's default embedder has a different dimension than bge-m3 (what obsidian-rag uses), so query_texts-based semantic search won't work against those collections from this UI. Use metadata filters here; use the RAG CLI for semantic queries.

About

Streamlit admin for local ChromaDB with real-time dashboard (chat latency, retrieval quality, feedback, resources)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages