Skip to content

bug(web): session enrichment uses project default agent instead of persisted session agent #1995

@yyovil

Description

@yyovil

Bug

The web dashboard metadata-enrichment path uses the current project/default agent to enrich every session in a project, instead of using the agent persisted on each session record. In mixed-agent projects, a historical goose / gemini / grok / etc. session can be handed to the Codex plugin when the project default is currently Codex.

That causes two bad outcomes:

  1. Performance: Codex receives a non-Codex session with no codexThreadId, falls back to cwd-based discovery, and scans ~/.codex/sessions.
  2. Correctness: because cwd is shared, Codex can find an unrelated recent Codex rollout and attribute that Codex model/thread/cost to the non-Codex session.

Source: local AO dogfooding / live debugging with ao_timeout_again.har and Next OOM logs.
Reported by: @yyovil
Date: 2026-05-22
Analyzed against: upstream/main d5249228498fc5cb8f14609fcd81baa3630e5076; installed #1994 worktree commit 63ad054566e706eed989d1d9ce0b696bda58a4b6; root checkout was dirty/ahead so latest was fetched but not pulled.
AO version: 0.8.0
Environment: macOS Darwin 25.4.0 arm64, Node v22.22.2, zsh.
Confidence: High — reproduced locally with one concrete session metadata file and traced to exact web serialization code.

Concrete local example

Problematic session metadata file:

~/.agent-orchestrator/projects/agent-orchestrator_48321dec7a/sessions/ao-22.json

Relevant fields:

{
  "agent": "goose",
  "worktree": "/Users/tanishqpalandurkar/Projects/agent-orchestrator",
  "createdAt": "2026-05-20T12:48:10.866Z",
  "status": "stuck",
  "lifecycle": {
    "session": {
      "state": "terminated",
      "reason": "runtime_lost"
    }
  }
}

This file has no codexThreadId because it is not a Codex session.

Current local project state:

project sessions: 69
worker sessions: 68
sessions needing summary and missing codexThreadId: 43
~/.codex/sessions JSONL files: 982
largest JSONL observed: 92 MB

The problematic rows are recent, not ancient metadata. They were created between:

2026-05-20T12:48:10Z and 2026-05-21T09:28:37Z

Local reproduction evidence

Direct project listing is not the slow part:

sessionManager.list(project): 243ms, RSS ~79 MB

But forcing Codex enrichment on a non-Codex session reproduces the slow and wrong path:

session: ao-22
persisted agent: goose
codexThreadId: undefined
workspacePath: /Users/tanishqpalandurkar/Projects/agent-orchestrator

codex.getSessionInfo(ao-22): 7681ms

The result returned unrelated Codex metadata:

{
  "summary": "Codex session (gpt-5.5)",
  "agentSessionId": "019e4b8d-b055-7501-845e-b60da93cb526",
  "metadata": {
    "codexThreadId": "019e4b8d-b055-7501-845e-b60da93cb526",
    "codexModel": "gpt-5.5"
  },
  "cost": {
    "inputTokens": 51287022,
    "outputTokens": 134644,
    "estimatedCostUsd": 129.563995
  }
}

That thread id belongs to the current/recent Codex orchestrator, not to ao-22's stored goose session. The cwd fallback matched the shared repo path and selected a recent Codex rollout.

Root Cause

Core session loading preserves enough information to know the session's intended agent. session-manager.ts resolves persisted session selection before core enrichment:

const selection = resolveSelectionForSession(project, sessionId, repaired.raw);
const effectiveAgentName = selection.agentName;
const plugins = resolvePlugins(project, effectiveAgentName);

But the web serialization/enrichment layer does not use that persisted session agent. It resolves the project and picks the current project/default agent:

const agentName = projects[i]?.agent ?? config.defaults.agent;
const agent = registry.get<Agent>("agent", agentName);
return enrichSessionAgentSummary(dashboardSessions[i], core, agent);

So if a project default is now codex, every session without a summary can be passed to Codex, regardless of whether the session metadata says agent: goose, agent: gemini, agent: grok, etc.

Why this caused the observed timeout/OOM family

This bug interacts with #1991 and #1855:

  • The session-detail page calls fresh session-list endpoints for sidebar/project zone counts.
  • /api/sessions?fresh=true bypasses listCached() and then web enrichment runs over many worker sessions.
  • Web enrichment selects Codex for non-Codex rows because the project default is Codex.
  • Codex sees no codexThreadId, falls back to cwd-based discovery under ~/.codex/sessions.
  • Multiple rows share the same repo cwd, so repeated request-path enrichment can repeatedly scan/parse unrelated Codex history.
  • Client aborts/timeouts do not necessarily cancel server-side work already started.

This explains why #1992/#1994 improved real Codex sessions with codexThreadId, but did not fully fix the reload timeout/OOM: many rows are not Codex sessions at all, and the web layer is routing them into Codex anyway.

Reproduction

  1. Use a project where the current default/project agent is codex.
  2. Have existing session metadata files in that project whose persisted agent is not codex, e.g. agent: goose, and which have no persisted summary.
  3. Open the session detail page or call the project sessions API path that triggers web metadata enrichment.
  4. Observe enrichSessionsMetadata(...) choosing the project/default Codex agent and calling codex.getSessionInfo(...) for the non-Codex session.
  5. On machines with large ~/.codex/sessions, this can take seconds per fallback and can produce wrong Codex summary/cost metadata for non-Codex sessions.

Fix

  • Preserve/expose the session's persisted agent on the Session object, or provide a core helper that resolves the effective agent for a loaded session.
  • In packages/web/src/lib/serialize.ts, use the persisted session agent for session-specific summary enrichment before falling back to project/default agent for truly legacy records.
  • Do not call Codex getSessionInfo() for a session whose persisted agent is known and is not Codex.
  • Add regression tests with a mixed-agent project:
    • default/project agent is codex
    • stored session metadata has agent: goose
    • enrichSessionsMetadata(...) must not call Codex getSessionInfo() for that row
  • Consider skipping agent summary enrichment entirely for terminal/runtime-lost rows unless persisted native metadata is already present.

Impact

  • Dashboard/session-detail can time out or OOM on mixed-agent projects with a large Codex history.
  • Non-Codex sessions can display incorrect Codex summaries/costs because cwd fallback may attach the wrong Codex rollout.
  • This affects recent sessions too; it is not limited to pre-codexThreadId legacy Codex metadata.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions