Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
142 changes: 142 additions & 0 deletions ai_plans/2026-06-21_backend-metrics-page.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
# Backend web metrics page (tokens / cost / duration / models / modes)

**Branch:** `feature/web-metrics-page` (stacked off `feature/self-hosted-remote-task-control`)
**Date:** 2026-06-21

## Goal

Add a metrics/analytics page to the self-hosted cloud web view (`self-hosted-cloudapi`)
showing, for the logged-in user, with a period filter:

- **Tokens** used: input / output / cache-read / cache-write
- **Cost**
- **Session duration**
- **Models** used (dimension)
- **Modes** used (dimension)
- (bonus) **Providers** used

## Evidence — where the data lives (verified against the live `stork_code` DB)

The aggregation source is **`telemetry_events`**, NOT `task_messages`.

- `task_messages` only holds _shared/live_ tasks (22 rows). `api_req_started` JSON
carries tokens/cost but **no model and no mode** (`ClineApiReqInfo` =
`tokensIn/tokensOut/cacheWrites/cacheReads/cost/apiProtocol`).
- `telemetry_events` has 387 rows. The **`LLM Completion`** event
(`TelemetryEventName.LLM_COMPLETION = "LLM Completion"`) carries every dimension:

```json
{
"mode": "code",
"apiProvider": "openrouter",
"modelId": "nvidia/nemotron-3-super-120b-a12b:free",
"taskId": "019eeb06-...",
"inputTokens": 27633,
"outputTokens": 1752,
"cacheReadTokens": 0,
"cacheWriteTokens": 0,
"cost": 0
}
```

- `telemetry_events.user_id` == web-session `user.id`
(both `user_2c8fdf212b024808aa7a1ba1a`) → scope aggregation to
`TelemetryEvent.user_id == user["user_id"]`. `organization_id` is null
(single self-hosted user), so user-scoping is sufficient.
- Properties are stored as **TEXT** (JSON string). Tests run on **SQLite**
(no jsonb operators) → aggregate in **Python** after loading rows, mirroring the
existing `_compute_metrics` server-side pattern. Volume is modest.

## Decisions (confirmed with user)

- **Session duration** = per-`taskId` span (max−min event ts), summed across tasks;
also surface the task count.
- **Charts**: real charts via a **vendored** library (consistent with
`static/vendor/{marked,purify,socket.io}.min.js` — no CDN). Use **Chart.js**
(single UMD file, no deps): per-day bars (tokens + cost) and doughnuts
(tokens by model / by mode).

## Changes

### 1. `src/services/metrics_service.py` (new)

- `LLM_COMPLETION_EVENT = "LLM Completion"`.
- `PERIODS` map: `today`, `7d`, `30d`, `90d`, `all` → start `datetime` (UTC).
(`today` = start of current UTC day.)
- `async def compute_user_metrics(db, user_id, period) -> dict`:
- Select `TelemetryEvent` where `user_id == user_id`,
`event_type == LLM Completion`, `created_at >= start` (if not `all`),
order by `created_at`.
- Parse `properties` JSON per row; coerce numbers via a local `_num`.
- Accumulate:
- totals: `input/output/cache_read/cache_write` tokens, `cost`,
`completions` (row count).
- `by_model[modelId]`, `by_mode[mode]`, `by_provider[apiProvider]`:
tokens (in+out), cost, count.
- `by_day[YYYY-MM-DD]`: tokens (in+out), cost — for the time series.
- per-`taskId`: first/last ts → duration; sum → `total_duration_ms`,
`task_count`.
- Return a JSON-serializable dict: totals, sorted breakdown lists
(desc by tokens), `by_day` (chronological), `duration`, `task_count`,
`period`, and a `chart` payload (labels + datasets) ready for Chart.js.
- Reuse `_fmt_tokens` / `_fmt_duration` (move them from `web.py` into this
service, or import). Keep formatting helpers shared.

### 2. `src/routers/web.py`

- `GET /app/metrics?period=7d`:
- redirect to `/app/login` if no user.
- validate `period` (default `7d`, fall back to `7d` on unknown).
- call `compute_user_metrics`, render `metrics.html` with the dict +
`chart_json = json.dumps(chart_payload)` + the list of period options for the
selector.

### 3. `src/web/templates/metrics.html` (new, extends `base.html`)

- Period selector: links `?period=…` styled as a segmented control; active one
highlighted.
- Summary stat cards: total tokens (with in/out/cache breakdown), total cost,
session duration, task count, completion count.
- Two chart canvases: per-day bar (tokens & cost on dual axis) + two doughnuts
(tokens by model, by mode).
- Breakdown tables: by model, by mode, by provider (tokens / cost / count).
- Empty state when no events in the period.
- `{% block scripts %}`: `<script src="/static/vendor/chart.umd.min.js">` +
`<script src="/static/metrics.js?v=...">` + a `#metrics-data` JSON island.

### 4. `src/web/static/metrics.js` (new)

- Read `#metrics-data` JSON, instantiate Chart.js bar + doughnut charts with the
VS Code dark palette (read CSS vars / hardcode accent colors). Guard if
`window.Chart` is missing (best-effort, like live.js).

### 5. `src/web/static/vendor/chart.umd.min.js` (new, vendored)

- Download Chart.js v4 UMD build into vendor/ (same as other vendored libs).

### 6. `src/web/static/app.css`

- Add: `.metrics-nav`/segmented period control, `.stat-grid`/`.stat-card`,
`.chart-grid`/`.chart-card`, `.breakdown` tables. Reuse existing CSS vars.

### 7. `src/web/templates/base.html`

- Add a primary nav (Tasks · Metrics) in the topbar so users can switch.
Active-link styling via a `nav_active` context var (set per route).

### 8. Tests — `tests/test_web_and_share.py`

- Helper `_seed_event(db, user_id, event_type, properties, created_at)`.
- Seed several `LLM Completion` events (2 models, 2 modes, known token/cost),
then:
- `GET /app/metrics` 200 for a logged-in user; redirect when anonymous.
- Totals reflect summed tokens/cost; model & mode names appear in the HTML.
- `period` filtering excludes out-of-range events.
- Unit-test `compute_user_metrics` directly for exact aggregate numbers,
duration (per-task span), and breakdown ordering.

## Out of scope

- Org-wide / multi-user rollups (single self-hosted user; org_id null).
- Custom date-range picker (fixed period presets only for now).
- Editing/retention of telemetry events.
34 changes: 34 additions & 0 deletions ai_plans/2026-06-21_compact-token-counts-web-summary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Compact token counts in cloud web task summary

**Date:** 2026-06-21
**Branch:** feature/self-hosted-remote-task-control

## Problem

The task summary header shows raw token counts (`96 941`, `3365`, `29 385`, `1 000 000`)
for tokens in/out and context used. Large numbers are hard to scan; the user wants
human-readable abbreviations (M for millions, k for thousands).

## Root cause / location

All three header values — `hdr-tokens-in`, `hdr-tokens-out`, `hdr-context` — are
formatted through a single helper `fmt()` in
[self-hosted-cloudapi/src/web/static/live.js](../self-hosted-cloudapi/src/web/static/live.js#L52),
used by both the persisted-metrics path (`updateFromConversation`) and the live
snapshot path. It previously called `Number(n).toLocaleString()`. Cost is rendered
by a separate formatter, so it is unaffected.

## Fix

Replace `fmt()` body with a compact formatter:

- `>= 1e9` → `B`, `>= 1e6` → `M`, `>= 1e3` → `k`, one decimal, trailing `.0` stripped.
- `< 1000` → plain integer string.
- `null` / non-finite → `—` (unchanged).

Examples: `1 000 000 → 1M`, `96 941 → 96.9k`, `3365 → 3.4k`, `29 385 → 29.4k`.

## Scope

Single-function change, no markup/CSS changes (the `/` separator for
`context / window` is built outside `fmt`, so `29.4k / 1M` renders correctly).
72 changes: 72 additions & 0 deletions ai_plans/2026-06-21_fix-duplicate-command-output-rows-web-view.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Fix duplicated command-output blocks on the web task view

**Date:** 2026-06-21
**Branch:** `feature/self-hosted-remote-task-control`
**Symptom (user's words, with screenshot):** a command's output renders twice — the first
"OUTPUT" block shows only the first line and keeps a spinner (looks active forever), and a second
"OUTPUT" block below shows the full output.

---

## Root cause (proven by code trace, not assumed)

`ExecuteCommandTool` streams terminal output with `task.say("command_output", text, …, partial)`
(`src/core/tools/ExecuteCommandTool.ts:294-311`, scheduled at :313-333, finalized at :398-401).
On the **first** output line it ALSO issues `await task.ask("command_output", "")`
(`ExecuteCommandTool.ts:371`) — the "view output?" ask.

That ask is appended to `clineMessages` right after the first partial say. The next partial say then
hits `TaskAskSay.say()` where `isUpdatingPreviousPartial` requires
`clineMessages.at(-1)` to be the partial say (`TaskAskSay.ts:493-494`). It is now the **ask**, so the
check fails and a **new** partial say is created with a **new ts** (`TaskAskSay.ts:512-521`).

Net message stream for one command:

- `say command_output` **A** — ts `T1`, text = first chunk, stays `partial:true` forever (orphaned).
- `ask command_output` **B** — ts `T2`, empty text (not rendered; `classify` returns null on no text).
- `say command_output` **C** — ts `T3`, finalized full output (`partial:false`).

VS Code never shows the duplicate because the chat runs `consolidateCommands`
(`packages/core/src/message-utils/consolidateCommands.ts`): it folds every `command_output` (ask
_and_ say) into the preceding command card, dedups equal-text pairs, and drops all standalone
`command_output` rows. The web renderer `render.js` applies **no** such consolidation — it renders
each `command_output` say as its own "Output" row, keyed by its own `ts`. Two different ts (`T1`,
`T3`) → two rows; `T1` is `partial:true` → stuck spinner. Exactly the screenshot.

This is the existing-duplicate-row class noted in
`2026-06-21_fix-stuck-partial-spinners-duplicate-task-messages.md`, but those were _same-ts_ races
fixed by the unique index + history-no-animate. This case is _different-ts_ and the unique index
cannot merge it — the messages are genuinely distinct.

## Fix — bring the web renderer to parity with VS Code (frontend only)

`self-hosted-cloudapi/src/web/static/render.js`, inside `mountConversation`:

All `command_output` messages that follow one `command` represent **one logical output block**.
Collapse them onto a single row owned by the most recent command, showing the latest content — the
finalized say `C` replaces the orphaned partial `A` in place (and clears its spinner).

- Track `lastCommandTs` = ts of the most recently classified `command` message.
- Introduce `keyOf(m)`: the row-identity key (was implicitly `m.ts`). For `command_output` return
`"cmdout@" + lastCommandTs` (fallback to own ts if no command seen yet); otherwise return `m.ts`.
`m.ts` stays the numeric value used for step-duration math — only the dedup/DOM identity changes.
- `upsert` keys `byTs` / `rawByTs` / `activeByTs` by `key`; duration/`tail` keep numeric `ts`.
`tail` also remembers `key` so in-place replacement of the output row is detected.
- `activeByTs[key]` becomes `{ ts, label }` so `getActivity()` can still rank by numeric recency
even though command-output keys are non-numeric strings. `markResolved`/ask paths use real ts,
which equals `keyOf` for asks, so they are unaffected.

Why frontend, not backend/extension: the backend stores raw `ClineMessage[]` to support live relay
and faithful replay; consolidating at storage loses fidelity and complicates streaming. The web
renderer is the direct analog of the VS Code chat view, which is exactly where consolidation lives.

## Verification

- Reload a finished task that ran a multi-line command → one OUTPUT block, full text, no spinner.
- Drive a live command → the single OUTPUT row streams (spinner) and clears on completion.
- Unrelated rows (reasoning, tools, api_req) unchanged; token/cost header unchanged.

## Out of scope

- The upstream `ExecuteCommandTool` orphaned-partial behaviour (VS Code masks it; changing it is
broad and risky). We match VS Code's presentation instead.
137 changes: 137 additions & 0 deletions ai_plans/2026-06-21_persist-workspace-path-task-list.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
# Persist `workspacePath` on tasks → show project/worktree in cloud web view

Date: 2026-06-21
Branch: stacks on `feature/self-hosted-remote-task-control` (depends on the
unmerged "task list on cloud web view" commit `82e4b0a1b`; main does not have it).

## Problem / evidence

The cloud web view never shows which project/worktree a task belongs to.

Traced the data flow end to end:

- The extension **does** capture the worktree root: `vscode.workspace.workspaceFolders[0].uri.fsPath`
→ `workspacePath` (src/extension/bridge.ts:93), emitted on `extension:register`
(packages/cloud/src/bridge/BridgeOrchestrator.ts:127; schema packages/types/src/cloud.ts:411).
- Server stores it **in-memory only**, per user, newest-wins:
ConnectionRegistry `_instance_by_user[user_id]["workspacePath"]`
(self-hosted-cloudapi/src/realtime/hub.py:56-62, accessor :76). Never persisted.
- The `tasks` table has no workspace/cwd/path column at all
(self-hosted-cloudapi/src/models/task.py).
- The two Task-creation paths both create `Task(id, user_id)` with nothing else:
- live bridge: `upsert_task_message` (services/telemetry_service.py:117),
called from realtime/sio.py:190 (has `user_id`).
- share/backfill: `backfill_messages` (services/telemetry_service.py:48),
called from routers/events.py:70.
- The backfill `properties` form field (TS getTelemetryProperties) carries only
`gitProperties` — `repositoryName` is identical across worktrees of one repo,
so it cannot identify a worktree. The absolute `workspacePath` is the correct key.

So the only authoritative server-side source of the worktree path is the
registry instance for the user, which is populated by the bridge.

## Design decision (chosen)

Source `workspace_path` with an explicit-first, registry-fallback strategy:

- **Live bridge** (`upsert_task_message`): from `registry.instance(user_id)["workspacePath"]`.
This is the only available source and is authoritative — events only flow while
the bridge is connected, so the registry is always populated here.
- **Share/backfill** (`backfill_messages`): **explicit client field first** — the
extension sends `workspacePath` in the backfill FormData — with the registry as
a **fallback** for older clients that don't send it. This gives 100% coverage
even when the bridge is OFF at share time.

This is consistent with what already crosses to the self-hosted server (the bridge
already sends the absolute `workspacePath`), and `getRooCodeApiUrl()` points at the
self-hosted cloud API in this fork.

Implementation surfaces for the explicit field (small, backward-compatible):

- packages/types/src/telemetry.ts: add OPTIONAL `getTelemetryWorkspacePath?(): string | undefined`
to `TelemetryPropertiesProvider` (optional → no break for other implementers).
- src/core/webview/ClineProvider.ts: implement it returning `this.cwd`
(`currentWorkspacePath || getWorkspacePath()`), already defined.
- packages/cloud/src/TelemetryClient.ts `backfillMessages`: append `workspacePath`
to the FormData from `this.providerRef?.deref()?.getTelemetryWorkspacePath?.()`
(only when non-empty). NOT added to general telemetry `properties` — kept out of
the per-event payload to avoid leaking an absolute path into every event.

Write semantics: set `workspace_path` when it is currently NULL (on Task create,
or on a later event for a pre-existing task that predates this feature). Never
overwrite a non-null value — a task does not change worktrees.

## Changes

1. **Model** — self-hosted-cloudapi/src/models/task.py
Add `workspace_path = Column(String, nullable=True)` to `Task`.

2. **Migration** — new alembic/versions/e5f6a7b8c9d0_task_workspace_path.py
`down_revision = "d4e5f6a7b8c9"` (current head).
upgrade: `op.add_column("tasks", sa.Column("workspace_path", sa.String(), nullable=True))`
downgrade: `op.drop_column("tasks", "workspace_path")`.

3. **Ingestion** — services/telemetry_service.py

- Add optional `workspace_path: str | None = None` param to `upsert_task_message`
and `backfill_messages`.
- On get-or-create, set `task.workspace_path = workspace_path` when creating.
- For an existing task whose `workspace_path` is NULL and a value is now known,
set it (one-time backfill of legacy rows). Guard: only when non-empty.

4. **Callers**

- realtime/sio.py (`on_task_event`, ~:188): resolve
`ws = (registry.instance(user_id) or {}).get("workspacePath")` and pass to
`upsert_task_message(..., workspace_path=ws)`.
- routers/events.py (`backfill_events_endpoint`, ~:70): resolve the same from
`registry.instance(current_user["user_id"])` and pass to `backfill_messages`.
(Import the `registry` singleton from src.realtime.sio / hub.)

5. **Web view** — routers/web.py

- Add a small helper `_workspace_label(path)` → basename for compact display
(full path kept for the tooltip/header).
- task_list (~:219): add `"workspace": task.workspace_path` and
`"workspace_label": _workspace_label(task.workspace_path)` to each item dict.
- task_detail (~:300): pass `task` already in context (template can read
`task.workspace_path`); add a derived label to context for the header.

6. **Templates / static**

- templates/tasks_list.html: render a `badge badge-muted` with the basename and
`title="{{ t.workspace }}"` (full path on hover) in `.task-meta`, when present.
- templates/task_detail.html: show the worktree path in the header block
(full path; truncate with CSS if needed).
- static/app.css: minor style for the new label if needed (reuse existing
`.badge`/`.task-date` styling; avoid new classes unless necessary).

7. **Tests** — tests/test_web_and_share.py
- Live path: simulate a registered extension instance with a `workspacePath`,
drive a task event, assert the persisted Task row has `workspace_path` and the
`/app` + detail pages render the basename/full path.
- Backfill path: register instance, POST /api/events/backfill, assert persisted
`workspace_path`.
- Null path: a task created with no registry instance → `workspace_path` NULL,
page renders without the badge (no crash).
- One-time backfill: pre-existing NULL row gets populated on a later event.

## Verification (done)

- `python -m pytest` in self-hosted-cloudapi: **73 passed** (incl. 4 new — live
stamp, legacy-NULL backfill + no-overwrite, explicit backfill field, registry
fallback; 2 new web-render: list badge + full-path detail, and null-renders-clean).
- Migration upgrade/downgrade roundtrip proven on SQLite in isolation (adds then
drops `tasks.workspace_path`); `alembic heads` shows a single head `e5f6a7b8c9d0`.
(Full-chain SQLite upgrade is blocked by a pre-existing Postgres-only timezone
migration, unrelated to this change.)
- `turbo check-types` for tumble-code + @roo-code/types + @roo-code/cloud: clean.
- `@roo-code/cloud` vitest: **278 passed** (TelemetryClient suite now 26, +2 for the
explicit workspacePath field present/absent).

## Out of scope

- Live-cockpit header display of `instance.workspacePath` (that data already
reaches the browser in the join ack; separate "option 1" branch).
- Multi-root workspaces: only `workspaceFolders[0]` is captured (extension-side,
pre-existing limitation).
Loading
Loading