You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On the vscode-lm (GitHub Copilot) provider, automatic context condensing is unreliable — it "sometimes kicks in but mostly ignores the limit." Root cause: the UI and the condense trigger read two different sources of truth for the model's context window.
The UI context bar shows the real, conservative window (~128K).
The condense/truncation trigger uses the provider's advertised window (~936K–1M for copilot/claude-opus-4.8).
Because the trigger divides by the inflated number, at ~128K of real usage it computes ~14% full and never fires. The request then grows past the real Copilot backend cap (~128K), the Copilot→Anthropic proxy trims messages from the front to fit, and the trim is not tool-pairing-aware — producing the recurring Anthropic 400:
messages.0.content.0: unexpected `tool_use_id` found in `tool_result` blocks: toolu_…
Each `tool_result` block must have a corresponding `tool_use` block in the previous message.
Provider: vscode-lm, model copilot/claude-opus-4.8.
Two sources of truth (the core bug)
A. UI bar → static model table (shows ~128K, correct)
The webview derives the window from useSelectedModel, which for vscode-lm returns the staticvscodeLlmModels entry, falling back to sane defaults when the family is unknown:
packages/types/src/providers/vscode-llm.ts — static table tops out at contextWindow: 128000 (claude-4-sonnet); claude-3.5-sonnet is 81638. claude-opus-4.8 is NOT in this table.
⇒ For claude-opus-4.8, the table lookup misses, the code falls back to openAiModelInfoSaneDefaults, and the UI always shows 128K. This matches the observed UI value.
B. Condense trigger → live provider value (uses ~936K, inflated)
The extension host computes the window dynamically from the live VS Code LM client:
src/api/providers/vscode-lm.ts (getModel()):
contextWindow:
typeofthis.client.maxInputTokens==="number"
? Math.max(0,this.client.maxInputTokens)// ~936K–1M for copilot/claude
: openAiModelInfoSaneDefaults.contextWindow,
src/core/task/Task.ts (attemptApiRequest): const contextWindow = modelInfo.contextWindow → passed into the trigger.
src/core/context-management/index.ts (manageContext and willManageContext):
⇒ Both the percentage path and the absolute fallback (allowedTokens ≈ 0.9 × 936K) scale with the inflated window, so neither fires near the real ~128K wall.
The models capability dump from the failing session confirms what Copilot advertises to the VS Code LM API:
family
max_prompt_tokens
context window
claude-opus-4.8
936,000
1,000,000
claude-opus-4.7-high
167,997
200,000
claude-opus-4.5
167,997
200,000
Why it's "sometimes, but mostly not"
Provider-dependent (dominant). Models whose advertised window is realistic condense correctly; vscode-lm/Copilot advertises 936K–1M while the real budget is ~128K, so it under-fires. Switching model/profile flips the behavior.
Default threshold is 100%. If the global slider was never changed, autoCondenseContextPercent defaults to 100:
src/core/task/Task.ts — destructure default autoCondenseContextPercent = 100
The per-profile map also falls back to this global default when the active profile has no entry (src/core/context-management/index.ts, threshold resolution).
Measurement lag (secondary).prevContextTokens comes from the last completedapi_req_started record (packages/core/src/message-utils/consolidateTokenUsage.ts) plus only the single last message's tokens (manageContext). A turn that injects a large jump (big tool results / multiple file reads) isn't reflected until the next request — so the request that blows the limit can do so before the check ever sees the overage.
Downstream symptom: orphaned tool_result 400
With condensing not firing, Roo sends the full (well-formed) history. The Copilot→Anthropic proxy trims from the front to fit the real cap; the cut lands between an assistant tool_use message and its following user tool_result message (worsened by parallel tool calls, where one assistant message holds several tool_use blocks and the next user message holds all the matching tool_results). The surviving tool_result becomes messages[0].content[0] with no preceding tool_use, and Anthropic rejects it. Evidence from a captured report: Roo's stored history was clean (messages[0] was the task text; the tool_use/tool_result pair was matched; no summary present ⇒ condense never ran), yet the sent request was rejected — confirming the corruption is introduced downstream, after Roo hands off.
Note: the Copilot proxy is closed-source, so "front-trim" is inferred from the message-shape math (stored-clean + sent-corrupted, Anthropic request_id), not verified line-by-line. The in-repo defect (trusting the advertised window) is the fixable enabling cause.
Findings checklist
F1 (primary):vscode-lmgetModel() reports client.maxInputTokens (936K–1M) as contextWindow, which is not honored end-to-end on the Copilot path — src/api/providers/vscode-lm.ts.
F2: Condense trigger scales both percentage and absolute-fallback by that inflated window — src/core/context-management/index.ts, src/core/task/Task.ts.
F3 (inconsistency): UI bar uses a different source (static vscodeLlmModels / openAiModelInfoSaneDefaults = 128K) than the trigger — useSelectedModel.ts, TaskHeader.tsx, packages/types/src/providers/vscode-llm.ts, packages/types/src/providers/openai.ts.
F4:claude-opus-4.8 (and other newer Copilot families) are absent from the static vscodeLlmModels table, so the UI silently falls back to the 128K sane default.
F5: Default autoCondenseContextPercent is 100%, so even with a correct window the trigger only fires at the very top — ExtensionStateContext.tsx, Task.ts.
F6 (secondary): Token-usage measurement lags one request and only adds the last message, so a single large turn can overshoot before the gate sees it — consolidateTokenUsage.ts, manageContext.
F7 (downstream): No tool-pairing-safe guard at the provider boundary, so an externally-trimmed array can present an orphaned leading tool_result — src/api/transform/vscode-lm-format.ts, src/api/providers/vscode-lm.ts.
Suggested fixes
Single source of truth for the window. Make the condense trigger and the UI agree. Preferred: clamp the vscode-lm effective contextWindow to the real working budget instead of trusting client.maxInputTokens (evidence ⇒ ~128K). Ideally a configurable per-provider cap defaulting to 128K for vscode-lm. This fixes F1–F4 at once and makes the bar truthful for the trigger too.
Add the missing Copilot families (e.g. claude-opus-4.8) to vscodeLlmModels with realistic windows, so the UI/static path stops falling back to the generic 128K default and both paths read the same number.
Lower the default autoCondenseContextPercent below 100% (e.g. 75–80%) so condensing has headroom even when the window is correct (F5).
(Defense in depth) Sanitize the outgoing message array at the vscode-lm boundary so messages[0] can never begin with an orphaned tool_result (strip/repair leading orphans after any trimming/merging) — closes F7 even if a downstream trim still happens.
(Optional) Tighten the pre-send token estimate to include the just-built turn so the gate doesn't lag by one request (F6).
Reproduction
Use the vscode-lm provider with copilot/claude-opus-4.8 (advertised window ~936K–1M).
Leave auto-condense enabled at the default threshold.
Drive the conversation past ~128K tokens (e.g. several large file reads / parallel tool calls).
Observe: the UI context bar reads ~128K (correct), no condense fires, and the request fails with the Anthropic messages.0.content.0 … unexpected tool_use_id 400.
Environment
Provider: vscode-lm (GitHub Copilot), model copilot/claude-opus-4.8
Symptom onset: ~128K tokens (the real Copilot backend cap, far below the advertised 936K–1M)
Summary
On the
vscode-lm(GitHub Copilot) provider, automatic context condensing is unreliable — it "sometimes kicks in but mostly ignores the limit." Root cause: the UI and the condense trigger read two different sources of truth for the model's context window.copilot/claude-opus-4.8).Because the trigger divides by the inflated number, at ~128K of real usage it computes ~14% full and never fires. The request then grows past the real Copilot backend cap (~128K), the Copilot→Anthropic proxy trims messages from the front to fit, and the trim is not tool-pairing-aware — producing the recurring Anthropic
400:Provider:
vscode-lm, modelcopilot/claude-opus-4.8.Two sources of truth (the core bug)
A. UI bar → static model table (shows ~128K, correct)
The webview derives the window from
useSelectedModel, which forvscode-lmreturns the staticvscodeLlmModelsentry, falling back to sane defaults when the family is unknown:webview-ui/src/components/chat/TaskHeader.tsx—const contextWindow = model?.contextWindow || 1webview-ui/src/components/ui/hooks/useSelectedModel.ts(case "vscode-lm") — returns{ ...openAiModelInfoSaneDefaults, ...info, supportsImages: false }packages/types/src/providers/vscode-llm.ts— static table tops out atcontextWindow: 128000(claude-4-sonnet);claude-3.5-sonnetis81638.claude-opus-4.8is NOT in this table.packages/types/src/providers/openai.ts—openAiModelInfoSaneDefaults.contextWindow = 128_000⇒ For
claude-opus-4.8, the table lookup misses, the code falls back toopenAiModelInfoSaneDefaults, and the UI always shows 128K. This matches the observed UI value.B. Condense trigger → live provider value (uses ~936K, inflated)
The extension host computes the window dynamically from the live VS Code LM client:
src/api/providers/vscode-lm.ts(getModel()):src/core/task/Task.ts(attemptApiRequest):const contextWindow = modelInfo.contextWindow→ passed into the trigger.src/core/context-management/index.ts(manageContextandwillManageContext):⇒ Both the percentage path and the absolute fallback (
allowedTokens ≈ 0.9 × 936K) scale with the inflated window, so neither fires near the real ~128K wall.The models capability dump from the failing session confirms what Copilot advertises to the VS Code LM API:
claude-opus-4.8claude-opus-4.7-highclaude-opus-4.5Why it's "sometimes, but mostly not"
vscode-lm/Copilot advertises 936K–1M while the real budget is ~128K, so it under-fires. Switching model/profile flips the behavior.autoCondenseContextPercentdefaults to 100:webview-ui/src/context/ExtensionStateContext.tsx—autoCondenseContextPercent: 100src/core/task/Task.ts— destructure defaultautoCondenseContextPercent = 100The per-profile map also falls back to this global default when the active profile has no entry (
src/core/context-management/index.ts, threshold resolution).prevContextTokenscomes from the last completedapi_req_startedrecord (packages/core/src/message-utils/consolidateTokenUsage.ts) plus only the single last message's tokens (manageContext). A turn that injects a large jump (big tool results / multiple file reads) isn't reflected until the next request — so the request that blows the limit can do so before the check ever sees the overage.Downstream symptom: orphaned
tool_result400With condensing not firing, Roo sends the full (well-formed) history. The Copilot→Anthropic proxy trims from the front to fit the real cap; the cut lands between an assistant
tool_usemessage and its following usertool_resultmessage (worsened by parallel tool calls, where one assistant message holds severaltool_useblocks and the next user message holds all the matchingtool_results). The survivingtool_resultbecomesmessages[0].content[0]with no precedingtool_use, and Anthropic rejects it. Evidence from a captured report: Roo's stored history was clean (messages[0]was the task text; thetool_use/tool_resultpair was matched; no summary present ⇒ condense never ran), yet the sent request was rejected — confirming the corruption is introduced downstream, after Roo hands off.Findings checklist
vscode-lmgetModel()reportsclient.maxInputTokens(936K–1M) ascontextWindow, which is not honored end-to-end on the Copilot path —src/api/providers/vscode-lm.ts.src/core/context-management/index.ts,src/core/task/Task.ts.vscodeLlmModels/openAiModelInfoSaneDefaults= 128K) than the trigger —useSelectedModel.ts,TaskHeader.tsx,packages/types/src/providers/vscode-llm.ts,packages/types/src/providers/openai.ts.claude-opus-4.8(and other newer Copilot families) are absent from the staticvscodeLlmModelstable, so the UI silently falls back to the 128K sane default.autoCondenseContextPercentis 100%, so even with a correct window the trigger only fires at the very top —ExtensionStateContext.tsx,Task.ts.consolidateTokenUsage.ts,manageContext.tool_result—src/api/transform/vscode-lm-format.ts,src/api/providers/vscode-lm.ts.Suggested fixes
vscode-lmeffectivecontextWindowto the real working budget instead of trustingclient.maxInputTokens(evidence ⇒ ~128K). Ideally a configurable per-provider cap defaulting to 128K forvscode-lm. This fixes F1–F4 at once and makes the bar truthful for the trigger too.claude-opus-4.8) tovscodeLlmModelswith realistic windows, so the UI/static path stops falling back to the generic 128K default and both paths read the same number.autoCondenseContextPercentbelow 100% (e.g. 75–80%) so condensing has headroom even when the window is correct (F5).vscode-lmboundary somessages[0]can never begin with an orphanedtool_result(strip/repair leading orphans after any trimming/merging) — closes F7 even if a downstream trim still happens.Reproduction
vscode-lmprovider withcopilot/claude-opus-4.8(advertised window ~936K–1M).messages.0.content.0 … unexpected tool_use_id400.Environment
vscode-lm(GitHub Copilot), modelcopilot/claude-opus-4.8