Minimal, YAML-configured Streamlit admin for a local ChromaDB (persistent mode). Browse collections, filter records by metadata, inspect documents by ID, and watch a real-time dashboard fed by obsidian-rag telemetry.
Built to work alongside a live Chroma writer (e.g. rag index, rag watch) — opens the SQLite with ?mode=ro&immutable=1, so concurrent reads are safe and never block the writer.
cd ChromaInspector
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
streamlit run app.pyOpens at http://localhost:8501 (or 8502 if 8501 is busy).
Four-column KPIs — collections, total embeddings, dimension, DB size on disk. Per-collection bar chart with empty-collection filtering. Deep-dive: metadata key coverage + top string values.
Parsed from ~/.local/share/obsidian-rag/web.log [chat-timing] lines:
- p50 / p95 / p99 of
total,ttft,retrieve,prefill,decode - Tokens/sec throughput (p50, p95)
- Latency line chart + stacked retrieve/prefill/decode bars
- Model usage distribution
- Slowest-10 expander
From eval.jsonl — hit@5, MRR, chain_success for both singles and chains, with trend line chart.
From feedback.jsonl — 👍 / 👎 counts + ratio, top 👎 reasons, expander with recent negative ratings.
From queries.jsonl — top_score distribution histogram, command usage, hot notes (most-retrieved paths), query gaps (weak queries with top_score < 0.10).
From rag_{cpu,memory}.jsonl — area charts of CPU % and memory (MB) by category, plus top-memory-consumers table.
Each view in config.yaml exposes three modes in the sidebar:
- Browse — paginate records with an optional
whereJSON filter. - Filter — metadata-only filter (semantic search only works if the config's embedding function matches the collection's).
- Inspect — fetch a single document by ID.
Everything lives in config.yaml. Edit it, save, refresh the page.
chroma.mode: persistent— points at~/.local/share/obsidian-rag/chromaby default.chroma.mode: http— swap to hit a remote Chroma server.views[]— each entry becomes a sidebar item.display_fieldspicks which metadata columns show up in tables.saved_queries[]— one-click presets for the filter view.features— write paths (allow_delete,allow_add) are off by default.
app.py— Streamlit entrypoint + sidebar navigation.src/chroma_client.py— SQLite-direct reader forchroma.sqlite3(no ChromaDB client, avoids the single-process constraint ofPersistentClient).src/metrics.py— tailing parsers for obsidian-rag telemetry (web.log,queries.jsonl,eval.jsonl,feedback.jsonl,rag_{cpu,memory}.jsonl).src/views/— one module per page (collections.py,dashboard.py,browse.py,search.py,inspect.py).
Real-time is driven by @st.fragment(run_every=...) — only telemetry fragments re-query logs on each tick; ChromaDB itself is opened once (snapshot read).
- The bundled
config.yamltargets obsidian-rag's collections (obsidian_notes_v9_*,obsidian_urls_v1_*). Adjust paths / collection names for other Chroma deployments. - Chroma's default embedder has a different dimension than bge-m3 (what obsidian-rag uses), so
query_texts-based semantic search won't work against those collections from this UI. Use metadata filters here; use the RAG CLI for semantic queries.