Skip to content

Auto context condensing unreliable on vscode-lm: UI shows real ~128K window but condense trigger uses inflated advertised window (936K-1M) #10

Description

@simurg79

Summary

On the vscode-lm (GitHub Copilot) provider, automatic context condensing is unreliable — it "sometimes kicks in but mostly ignores the limit." Root cause: the UI and the condense trigger read two different sources of truth for the model's context window.

  • The UI context bar shows the real, conservative window (~128K).
  • The condense/truncation trigger uses the provider's advertised window (~936K–1M for copilot/claude-opus-4.8).

Because the trigger divides by the inflated number, at ~128K of real usage it computes ~14% full and never fires. The request then grows past the real Copilot backend cap (~128K), the Copilot→Anthropic proxy trims messages from the front to fit, and the trim is not tool-pairing-aware — producing the recurring Anthropic 400:

messages.0.content.0: unexpected `tool_use_id` found in `tool_result` blocks: toolu_…
Each `tool_result` block must have a corresponding `tool_use` block in the previous message.

Provider: vscode-lm, model copilot/claude-opus-4.8.


Two sources of truth (the core bug)

A. UI bar → static model table (shows ~128K, correct)

The webview derives the window from useSelectedModel, which for vscode-lm returns the static vscodeLlmModels entry, falling back to sane defaults when the family is unknown:

  • webview-ui/src/components/chat/TaskHeader.tsxconst contextWindow = model?.contextWindow || 1
  • webview-ui/src/components/ui/hooks/useSelectedModel.ts (case "vscode-lm") — returns { ...openAiModelInfoSaneDefaults, ...info, supportsImages: false }
  • packages/types/src/providers/vscode-llm.ts — static table tops out at contextWindow: 128000 (claude-4-sonnet); claude-3.5-sonnet is 81638. claude-opus-4.8 is NOT in this table.
  • packages/types/src/providers/openai.tsopenAiModelInfoSaneDefaults.contextWindow = 128_000

⇒ For claude-opus-4.8, the table lookup misses, the code falls back to openAiModelInfoSaneDefaults, and the UI always shows 128K. This matches the observed UI value.

B. Condense trigger → live provider value (uses ~936K, inflated)

The extension host computes the window dynamically from the live VS Code LM client:

  • src/api/providers/vscode-lm.ts (getModel()):
    contextWindow:
      typeof this.client.maxInputTokens === "number"
        ? Math.max(0, this.client.maxInputTokens)   // ~936K–1M for copilot/claude
        : openAiModelInfoSaneDefaults.contextWindow,
  • src/core/task/Task.ts (attemptApiRequest): const contextWindow = modelInfo.contextWindow → passed into the trigger.
  • src/core/context-management/index.ts (manageContext and willManageContext):
    const contextPercent = (100 * prevContextTokens) / contextWindow
    if (contextPercent >= effectiveThreshold || prevContextTokens > allowedTokens) { /* condense */ }
    // allowedTokens = contextWindow * (1 - TOKEN_BUFFER_PERCENTAGE) - reservedTokens

⇒ Both the percentage path and the absolute fallback (allowedTokens ≈ 0.9 × 936K) scale with the inflated window, so neither fires near the real ~128K wall.

The models capability dump from the failing session confirms what Copilot advertises to the VS Code LM API:

family max_prompt_tokens context window
claude-opus-4.8 936,000 1,000,000
claude-opus-4.7-high 167,997 200,000
claude-opus-4.5 167,997 200,000

Why it's "sometimes, but mostly not"

  1. Provider-dependent (dominant). Models whose advertised window is realistic condense correctly; vscode-lm/Copilot advertises 936K–1M while the real budget is ~128K, so it under-fires. Switching model/profile flips the behavior.
  2. Default threshold is 100%. If the global slider was never changed, autoCondenseContextPercent defaults to 100:
    • webview-ui/src/context/ExtensionStateContext.tsxautoCondenseContextPercent: 100
    • src/core/task/Task.ts — destructure default autoCondenseContextPercent = 100
      The per-profile map also falls back to this global default when the active profile has no entry (src/core/context-management/index.ts, threshold resolution).
  3. Measurement lag (secondary). prevContextTokens comes from the last completed api_req_started record (packages/core/src/message-utils/consolidateTokenUsage.ts) plus only the single last message's tokens (manageContext). A turn that injects a large jump (big tool results / multiple file reads) isn't reflected until the next request — so the request that blows the limit can do so before the check ever sees the overage.

Downstream symptom: orphaned tool_result 400

With condensing not firing, Roo sends the full (well-formed) history. The Copilot→Anthropic proxy trims from the front to fit the real cap; the cut lands between an assistant tool_use message and its following user tool_result message (worsened by parallel tool calls, where one assistant message holds several tool_use blocks and the next user message holds all the matching tool_results). The surviving tool_result becomes messages[0].content[0] with no preceding tool_use, and Anthropic rejects it. Evidence from a captured report: Roo's stored history was clean (messages[0] was the task text; the tool_use/tool_result pair was matched; no summary present ⇒ condense never ran), yet the sent request was rejected — confirming the corruption is introduced downstream, after Roo hands off.

Note: the Copilot proxy is closed-source, so "front-trim" is inferred from the message-shape math (stored-clean + sent-corrupted, Anthropic request_id), not verified line-by-line. The in-repo defect (trusting the advertised window) is the fixable enabling cause.


Findings checklist

  • F1 (primary): vscode-lm getModel() reports client.maxInputTokens (936K–1M) as contextWindow, which is not honored end-to-end on the Copilot path — src/api/providers/vscode-lm.ts.
  • F2: Condense trigger scales both percentage and absolute-fallback by that inflated window — src/core/context-management/index.ts, src/core/task/Task.ts.
  • F3 (inconsistency): UI bar uses a different source (static vscodeLlmModels / openAiModelInfoSaneDefaults = 128K) than the trigger — useSelectedModel.ts, TaskHeader.tsx, packages/types/src/providers/vscode-llm.ts, packages/types/src/providers/openai.ts.
  • F4: claude-opus-4.8 (and other newer Copilot families) are absent from the static vscodeLlmModels table, so the UI silently falls back to the 128K sane default.
  • F5: Default autoCondenseContextPercent is 100%, so even with a correct window the trigger only fires at the very top — ExtensionStateContext.tsx, Task.ts.
  • F6 (secondary): Token-usage measurement lags one request and only adds the last message, so a single large turn can overshoot before the gate sees it — consolidateTokenUsage.ts, manageContext.
  • F7 (downstream): No tool-pairing-safe guard at the provider boundary, so an externally-trimmed array can present an orphaned leading tool_resultsrc/api/transform/vscode-lm-format.ts, src/api/providers/vscode-lm.ts.

Suggested fixes

  1. Single source of truth for the window. Make the condense trigger and the UI agree. Preferred: clamp the vscode-lm effective contextWindow to the real working budget instead of trusting client.maxInputTokens (evidence ⇒ ~128K). Ideally a configurable per-provider cap defaulting to 128K for vscode-lm. This fixes F1–F4 at once and makes the bar truthful for the trigger too.
  2. Add the missing Copilot families (e.g. claude-opus-4.8) to vscodeLlmModels with realistic windows, so the UI/static path stops falling back to the generic 128K default and both paths read the same number.
  3. Lower the default autoCondenseContextPercent below 100% (e.g. 75–80%) so condensing has headroom even when the window is correct (F5).
  4. (Defense in depth) Sanitize the outgoing message array at the vscode-lm boundary so messages[0] can never begin with an orphaned tool_result (strip/repair leading orphans after any trimming/merging) — closes F7 even if a downstream trim still happens.
  5. (Optional) Tighten the pre-send token estimate to include the just-built turn so the gate doesn't lag by one request (F6).

Reproduction

  1. Use the vscode-lm provider with copilot/claude-opus-4.8 (advertised window ~936K–1M).
  2. Leave auto-condense enabled at the default threshold.
  3. Drive the conversation past ~128K tokens (e.g. several large file reads / parallel tool calls).
  4. Observe: the UI context bar reads ~128K (correct), no condense fires, and the request fails with the Anthropic messages.0.content.0 … unexpected tool_use_id 400.

Environment

  • Provider: vscode-lm (GitHub Copilot), model copilot/claude-opus-4.8
  • Symptom onset: ~128K tokens (the real Copilot backend cap, far below the advertised 936K–1M)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions