krzychdre · krzychdre · Jun 21, 2026 · Jun 21, 2026
diff --git a/ai_plans/2026-06-21_backend-metrics-page.md b/ai_plans/2026-06-21_backend-metrics-page.md
@@ -0,0 +1,142 @@
+# Backend web metrics page (tokens / cost / duration / models / modes)
+
+**Branch:** `feature/web-metrics-page` (stacked off `feature/self-hosted-remote-task-control`)
+**Date:** 2026-06-21
+
+## Goal
+
+Add a metrics/analytics page to the self-hosted cloud web view (`self-hosted-cloudapi`)
+showing, for the logged-in user, with a period filter:
+
+- **Tokens** used: input / output / cache-read / cache-write
+- **Cost**
+- **Session duration**
+- **Models** used (dimension)
+- **Modes** used (dimension)
+- (bonus) **Providers** used
+
+## Evidence — where the data lives (verified against the live `stork_code` DB)
+
+The aggregation source is **`telemetry_events`**, NOT `task_messages`.
+
+- `task_messages` only holds _shared/live_ tasks (22 rows). `api_req_started` JSON
+  carries tokens/cost but **no model and no mode** (`ClineApiReqInfo` =
+  `tokensIn/tokensOut/cacheWrites/cacheReads/cost/apiProtocol`).
+- `telemetry_events` has 387 rows. The **`LLM Completion`** event
+  (`TelemetryEventName.LLM_COMPLETION = "LLM Completion"`) carries every dimension:
+
+    ```json
+    {
+    	"mode": "code",
+    	"apiProvider": "openrouter",
+    	"modelId": "nvidia/nemotron-3-super-120b-a12b:free",
+    	"taskId": "019eeb06-...",
+    	"inputTokens": 27633,
+    	"outputTokens": 1752,
+    	"cacheReadTokens": 0,
+    	"cacheWriteTokens": 0,
+    	"cost": 0
+    }
+    ```
+
+- `telemetry_events.user_id` == web-session `user.id`
+  (both `user_2c8fdf212b024808aa7a1ba1a`) → scope aggregation to
+  `TelemetryEvent.user_id == user["user_id"]`. `organization_id` is null
+  (single self-hosted user), so user-scoping is sufficient.
+- Properties are stored as **TEXT** (JSON string). Tests run on **SQLite**
+  (no jsonb operators) → aggregate in **Python** after loading rows, mirroring the
+  existing `_compute_metrics` server-side pattern. Volume is modest.
+
+## Decisions (confirmed with user)
+
+- **Session duration** = per-`taskId` span (max−min event ts), summed across tasks;
+  also surface the task count.
+- **Charts**: real charts via a **vendored** library (consistent with
+  `static/vendor/{marked,purify,socket.io}.min.js` — no CDN). Use **Chart.js**
+  (single UMD file, no deps): per-day bars (tokens + cost) and doughnuts
+  (tokens by model / by mode).
+
+## Changes
+
+### 1. `src/services/metrics_service.py` (new)
+
+- `LLM_COMPLETION_EVENT = "LLM Completion"`.
+- `PERIODS` map: `today`, `7d`, `30d`, `90d`, `all` → start `datetime` (UTC).
+  (`today` = start of current UTC day.)
+- `async def compute_user_metrics(db, user_id, period) -> dict`:
+    - Select `TelemetryEvent` where `user_id == user_id`,
+      `event_type == LLM Completion`, `created_at >= start` (if not `all`),
+      order by `created_at`.
+    - Parse `properties` JSON per row; coerce numbers via a local `_num`.
+    - Accumulate:
+        - totals: `input/output/cache_read/cache_write` tokens, `cost`,
+          `completions` (row count).
+        - `by_model[modelId]`, `by_mode[mode]`, `by_provider[apiProvider]`:
+          tokens (in+out), cost, count.
+        - `by_day[YYYY-MM-DD]`: tokens (in+out), cost — for the time series.
+        - per-`taskId`: first/last ts → duration; sum → `total_duration_ms`,
+          `task_count`.
+    - Return a JSON-serializable dict: totals, sorted breakdown lists
+      (desc by tokens), `by_day` (chronological), `duration`, `task_count`,
+      `period`, and a `chart` payload (labels + datasets) ready for Chart.js.
+- Reuse `_fmt_tokens` / `_fmt_duration` (move them from `web.py` into this
+  service, or import). Keep formatting helpers shared.
+
+### 2. `src/routers/web.py`
+
+- `GET /app/metrics?period=7d`:
+    - redirect to `/app/login` if no user.
+    - validate `period` (default `7d`, fall back to `7d` on unknown).
+    - call `compute_user_metrics`, render `metrics.html` with the dict +
+      `chart_json = json.dumps(chart_payload)` + the list of period options for the
+      selector.
+
+### 3. `src/web/templates/metrics.html` (new, extends `base.html`)
+
+- Period selector: links `?period=…` styled as a segmented control; active one
+  highlighted.
+- Summary stat cards: total tokens (with in/out/cache breakdown), total cost,
+  session duration, task count, completion count.
+- Two chart canvases: per-day bar (tokens & cost on dual axis) + two doughnuts
+  (tokens by model, by mode).
+- Breakdown tables: by model, by mode, by provider (tokens / cost / count).
+- Empty state when no events in the period.
+- `{% block scripts %}`: `<script src="/static/vendor/chart.umd.min.js">` +
+  `<script src="/static/metrics.js?v=...">` + a `#metrics-data` JSON island.
+
+### 4. `src/web/static/metrics.js` (new)
+
+- Read `#metrics-data` JSON, instantiate Chart.js bar + doughnut charts with the
+  VS Code dark palette (read CSS vars / hardcode accent colors). Guard if
+  `window.Chart` is missing (best-effort, like live.js).
+
+### 5. `src/web/static/vendor/chart.umd.min.js` (new, vendored)
+
+- Download Chart.js v4 UMD build into vendor/ (same as other vendored libs).
+
+### 6. `src/web/static/app.css`
+
+- Add: `.metrics-nav`/segmented period control, `.stat-grid`/`.stat-card`,
+  `.chart-grid`/`.chart-card`, `.breakdown` tables. Reuse existing CSS vars.
+
+### 7. `src/web/templates/base.html`
+
+- Add a primary nav (Tasks · Metrics) in the topbar so users can switch.
+  Active-link styling via a `nav_active` context var (set per route).
+
+### 8. Tests — `tests/test_web_and_share.py`
+
+- Helper `_seed_event(db, user_id, event_type, properties, created_at)`.
+- Seed several `LLM Completion` events (2 models, 2 modes, known token/cost),
+  then:
+    - `GET /app/metrics` 200 for a logged-in user; redirect when anonymous.
+    - Totals reflect summed tokens/cost; model & mode names appear in the HTML.
+    - `period` filtering excludes out-of-range events.
+    - Unit-test `compute_user_metrics` directly for exact aggregate numbers,
+      duration (per-task span), and breakdown ordering.
+
+## Out of scope
+
+- Org-wide / multi-user rollups (single self-hosted user; org_id null).
+- Custom date-range picker (fixed period presets only for now).
+- Editing/retention of telemetry events.
diff --git a/ai_plans/2026-06-21_compact-token-counts-web-summary.md b/ai_plans/2026-06-21_compact-token-counts-web-summary.md
@@ -0,0 +1,34 @@
+# Compact token counts in cloud web task summary
+
+**Date:** 2026-06-21
+**Branch:** feature/self-hosted-remote-task-control
+
+## Problem
+
+The task summary header shows raw token counts (`96 941`, `3365`, `29 385`, `1 000 000`)
+for tokens in/out and context used. Large numbers are hard to scan; the user wants
+human-readable abbreviations (M for millions, k for thousands).
+
+## Root cause / location
+
+All three header values — `hdr-tokens-in`, `hdr-tokens-out`, `hdr-context` — are
+formatted through a single helper `fmt()` in
+[self-hosted-cloudapi/src/web/static/live.js](../self-hosted-cloudapi/src/web/static/live.js#L52),
+used by both the persisted-metrics path (`updateFromConversation`) and the live
+snapshot path. It previously called `Number(n).toLocaleString()`. Cost is rendered
+by a separate formatter, so it is unaffected.
+
+## Fix
+
+Replace `fmt()` body with a compact formatter:
+
+- `>= 1e9` → `B`, `>= 1e6` → `M`, `>= 1e3` → `k`, one decimal, trailing `.0` stripped.
+- `< 1000` → plain integer string.
+- `null` / non-finite → `—` (unchanged).
+
+Examples: `1 000 000 → 1M`, `96 941 → 96.9k`, `3365 → 3.4k`, `29 385 → 29.4k`.
+
+## Scope
+
+Single-function change, no markup/CSS changes (the `/` separator for
+`context / window` is built outside `fmt`, so `29.4k / 1M` renders correctly).
diff --git a/ai_plans/2026-06-21_fix-duplicate-command-output-rows-web-view.md b/ai_plans/2026-06-21_fix-duplicate-command-output-rows-web-view.md
@@ -0,0 +1,72 @@
+# Fix duplicated command-output blocks on the web task view
+
+**Date:** 2026-06-21
+**Branch:** `feature/self-hosted-remote-task-control`
+**Symptom (user's words, with screenshot):** a command's output renders twice — the first
+"OUTPUT" block shows only the first line and keeps a spinner (looks active forever), and a second
+"OUTPUT" block below shows the full output.
+
+---
+
+## Root cause (proven by code trace, not assumed)
+
+`ExecuteCommandTool` streams terminal output with `task.say("command_output", text, …, partial)`
+(`src/core/tools/ExecuteCommandTool.ts:294-311`, scheduled at :313-333, finalized at :398-401).
+On the **first** output line it ALSO issues `await task.ask("command_output", "")`
+(`ExecuteCommandTool.ts:371`) — the "view output?" ask.
+
+That ask is appended to `clineMessages` right after the first partial say. The next partial say then
+hits `TaskAskSay.say()` where `isUpdatingPreviousPartial` requires
+`clineMessages.at(-1)` to be the partial say (`TaskAskSay.ts:493-494`). It is now the **ask**, so the
+check fails and a **new** partial say is created with a **new ts** (`TaskAskSay.ts:512-521`).
+
+Net message stream for one command:
+
+- `say command_output` **A** — ts `T1`, text = first chunk, stays `partial:true` forever (orphaned).
+- `ask command_output` **B** — ts `T2`, empty text (not rendered; `classify` returns null on no text).
+- `say command_output` **C** — ts `T3`, finalized full output (`partial:false`).
+
+VS Code never shows the duplicate because the chat runs `consolidateCommands`
+(`packages/core/src/message-utils/consolidateCommands.ts`): it folds every `command_output` (ask
+_and_ say) into the preceding command card, dedups equal-text pairs, and drops all standalone
+`command_output` rows. The web renderer `render.js` applies **no** such consolidation — it renders
+each `command_output` say as its own "Output" row, keyed by its own `ts`. Two different ts (`T1`,
+`T3`) → two rows; `T1` is `partial:true` → stuck spinner. Exactly the screenshot.
+
+This is the existing-duplicate-row class noted in
+`2026-06-21_fix-stuck-partial-spinners-duplicate-task-messages.md`, but those were _same-ts_ races
+fixed by the unique index + history-no-animate. This case is _different-ts_ and the unique index
+cannot merge it — the messages are genuinely distinct.
+
+## Fix — bring the web renderer to parity with VS Code (frontend only)
+
+`self-hosted-cloudapi/src/web/static/render.js`, inside `mountConversation`:
+
+All `command_output` messages that follow one `command` represent **one logical output block**.
+Collapse them onto a single row owned by the most recent command, showing the latest content — the
+finalized say `C` replaces the orphaned partial `A` in place (and clears its spinner).
+
+- Track `lastCommandTs` = ts of the most recently classified `command` message.
+- Introduce `keyOf(m)`: the row-identity key (was implicitly `m.ts`). For `command_output` return
+  `"cmdout@" + lastCommandTs` (fallback to own ts if no command seen yet); otherwise return `m.ts`.
+  `m.ts` stays the numeric value used for step-duration math — only the dedup/DOM identity changes.
+- `upsert` keys `byTs` / `rawByTs` / `activeByTs` by `key`; duration/`tail` keep numeric `ts`.
+  `tail` also remembers `key` so in-place replacement of the output row is detected.
+- `activeByTs[key]` becomes `{ ts, label }` so `getActivity()` can still rank by numeric recency
+  even though command-output keys are non-numeric strings. `markResolved`/ask paths use real ts,
+  which equals `keyOf` for asks, so they are unaffected.
+
+Why frontend, not backend/extension: the backend stores raw `ClineMessage[]` to support live relay
+and faithful replay; consolidating at storage loses fidelity and complicates streaming. The web
+renderer is the direct analog of the VS Code chat view, which is exactly where consolidation lives.
+
+## Verification
+
+- Reload a finished task that ran a multi-line command → one OUTPUT block, full text, no spinner.
+- Drive a live command → the single OUTPUT row streams (spinner) and clears on completion.
+- Unrelated rows (reasoning, tools, api_req) unchanged; token/cost header unchanged.
+
+## Out of scope
+
+- The upstream `ExecuteCommandTool` orphaned-partial behaviour (VS Code masks it; changing it is
+  broad and risky). We match VS Code's presentation instead.
diff --git a/ai_plans/2026-06-21_persist-workspace-path-task-list.md b/ai_plans/2026-06-21_persist-workspace-path-task-list.md
@@ -0,0 +1,137 @@
+# Persist `workspacePath` on tasks → show project/worktree in cloud web view
+
+Date: 2026-06-21
+Branch: stacks on `feature/self-hosted-remote-task-control` (depends on the
+unmerged "task list on cloud web view" commit `82e4b0a1b`; main does not have it).
+
+## Problem / evidence
+
+The cloud web view never shows which project/worktree a task belongs to.
+
+Traced the data flow end to end:
+
+- The extension **does** capture the worktree root: `vscode.workspace.workspaceFolders[0].uri.fsPath`
+  → `workspacePath` (src/extension/bridge.ts:93), emitted on `extension:register`
+  (packages/cloud/src/bridge/BridgeOrchestrator.ts:127; schema packages/types/src/cloud.ts:411).
+- Server stores it **in-memory only**, per user, newest-wins:
+  ConnectionRegistry `_instance_by_user[user_id]["workspacePath"]`
+  (self-hosted-cloudapi/src/realtime/hub.py:56-62, accessor :76). Never persisted.
+- The `tasks` table has no workspace/cwd/path column at all
+  (self-hosted-cloudapi/src/models/task.py).
+- The two Task-creation paths both create `Task(id, user_id)` with nothing else:
+    - live bridge: `upsert_task_message` (services/telemetry_service.py:117),
+      called from realtime/sio.py:190 (has `user_id`).
+    - share/backfill: `backfill_messages` (services/telemetry_service.py:48),
+      called from routers/events.py:70.
+- The backfill `properties` form field (TS getTelemetryProperties) carries only
+  `gitProperties` — `repositoryName` is identical across worktrees of one repo,
+  so it cannot identify a worktree. The absolute `workspacePath` is the correct key.
+
+So the only authoritative server-side source of the worktree path is the
+registry instance for the user, which is populated by the bridge.
+
+## Design decision (chosen)
+
+Source `workspace_path` with an explicit-first, registry-fallback strategy:
+
+- **Live bridge** (`upsert_task_message`): from `registry.instance(user_id)["workspacePath"]`.
+  This is the only available source and is authoritative — events only flow while
+  the bridge is connected, so the registry is always populated here.
+- **Share/backfill** (`backfill_messages`): **explicit client field first** — the
+  extension sends `workspacePath` in the backfill FormData — with the registry as
+  a **fallback** for older clients that don't send it. This gives 100% coverage
+  even when the bridge is OFF at share time.
+
+This is consistent with what already crosses to the self-hosted server (the bridge
+already sends the absolute `workspacePath`), and `getRooCodeApiUrl()` points at the
+self-hosted cloud API in this fork.
+
+Implementation surfaces for the explicit field (small, backward-compatible):
+
+- packages/types/src/telemetry.ts: add OPTIONAL `getTelemetryWorkspacePath?(): string | undefined`
+  to `TelemetryPropertiesProvider` (optional → no break for other implementers).
+- src/core/webview/ClineProvider.ts: implement it returning `this.cwd`
+  (`currentWorkspacePath || getWorkspacePath()`), already defined.
+- packages/cloud/src/TelemetryClient.ts `backfillMessages`: append `workspacePath`
+  to the FormData from `this.providerRef?.deref()?.getTelemetryWorkspacePath?.()`
+  (only when non-empty). NOT added to general telemetry `properties` — kept out of
+  the per-event payload to avoid leaking an absolute path into every event.
+
+Write semantics: set `workspace_path` when it is currently NULL (on Task create,
+or on a later event for a pre-existing task that predates this feature). Never
+overwrite a non-null value — a task does not change worktrees.
+
+## Changes
+
+1. **Model** — self-hosted-cloudapi/src/models/task.py
+   Add `workspace_path = Column(String, nullable=True)` to `Task`.
+
+2. **Migration** — new alembic/versions/e5f6a7b8c9d0_task_workspace_path.py
+   `down_revision = "d4e5f6a7b8c9"` (current head).
+   upgrade: `op.add_column("tasks", sa.Column("workspace_path", sa.String(), nullable=True))`
+   downgrade: `op.drop_column("tasks", "workspace_path")`.
+
+3. **Ingestion** — services/telemetry_service.py
+
+    - Add optional `workspace_path: str | None = None` param to `upsert_task_message`
+      and `backfill_messages`.
+    - On get-or-create, set `task.workspace_path = workspace_path` when creating.
+    - For an existing task whose `workspace_path` is NULL and a value is now known,
+      set it (one-time backfill of legacy rows). Guard: only when non-empty.
+
+4. **Callers**
+
+    - realtime/sio.py (`on_task_event`, ~:188): resolve
+      `ws = (registry.instance(user_id) or {}).get("workspacePath")` and pass to
+      `upsert_task_message(..., workspace_path=ws)`.
+    - routers/events.py (`backfill_events_endpoint`, ~:70): resolve the same from
+      `registry.instance(current_user["user_id"])` and pass to `backfill_messages`.
+      (Import the `registry` singleton from src.realtime.sio / hub.)
+
+5. **Web view** — routers/web.py
+
+    - Add a small helper `_workspace_label(path)` → basename for compact display
+      (full path kept for the tooltip/header).
+    - task_list (~:219): add `"workspace": task.workspace_path` and
+      `"workspace_label": _workspace_label(task.workspace_path)` to each item dict.
+    - task_detail (~:300): pass `task` already in context (template can read
+      `task.workspace_path`); add a derived label to context for the header.
+
+6. **Templates / static**
+
+    - templates/tasks_list.html: render a `badge badge-muted` with the basename and
+      `title="{{ t.workspace }}"` (full path on hover) in `.task-meta`, when present.
+    - templates/task_detail.html: show the worktree path in the header block
+      (full path; truncate with CSS if needed).
+    - static/app.css: minor style for the new label if needed (reuse existing
+      `.badge`/`.task-date` styling; avoid new classes unless necessary).
+
+7. **Tests** — tests/test_web_and_share.py
+    - Live path: simulate a registered extension instance with a `workspacePath`,
+      drive a task event, assert the persisted Task row has `workspace_path` and the
+      `/app` + detail pages render the basename/full path.
+    - Backfill path: register instance, POST /api/events/backfill, assert persisted
+      `workspace_path`.
+    - Null path: a task created with no registry instance → `workspace_path` NULL,
+      page renders without the badge (no crash).
+    - One-time backfill: pre-existing NULL row gets populated on a later event.
+
+## Verification (done)
+
+- `python -m pytest` in self-hosted-cloudapi: **73 passed** (incl. 4 new — live
+  stamp, legacy-NULL backfill + no-overwrite, explicit backfill field, registry
+  fallback; 2 new web-render: list badge + full-path detail, and null-renders-clean).
+- Migration upgrade/downgrade roundtrip proven on SQLite in isolation (adds then
+  drops `tasks.workspace_path`); `alembic heads` shows a single head `e5f6a7b8c9d0`.
+  (Full-chain SQLite upgrade is blocked by a pre-existing Postgres-only timezone
+  migration, unrelated to this change.)
+- `turbo check-types` for tumble-code + @roo-code/types + @roo-code/cloud: clean.
+- `@roo-code/cloud` vitest: **278 passed** (TelemetryClient suite now 26, +2 for the
+  explicit workspacePath field present/absent).
+
+## Out of scope
+
+- Live-cockpit header display of `instance.workspacePath` (that data already
+  reaches the browser in the join ack; separate "option 1" branch).
+- Multi-root workspaces: only `workspaceFolders[0]` is captured (extension-side,
+  pre-existing limitation).