Auto context condensing unreliable on vscode-lm: UI shows real ~128K window but condense trigger uses inflated advertised window (936K-1M)

## Summary

On the `vscode-lm` (GitHub Copilot) provider, automatic context condensing is unreliable — it "sometimes kicks in but mostly ignores the limit." Root cause: **the UI and the condense trigger read two different sources of truth for the model's context window.**

- The **UI context bar** shows the real, conservative window (~**128K**).
- The **condense/truncation trigger** uses the provider's **advertised** window (~**936K–1M** for `copilot/claude-opus-4.8`).

Because the trigger divides by the inflated number, at ~128K of real usage it computes ~14% full and never fires. The request then grows past the **real** Copilot backend cap (~128K), the Copilot→Anthropic proxy trims messages from the front to fit, and the trim is not tool-pairing-aware — producing the recurring Anthropic `400`:

```
messages.0.content.0: unexpected `tool_use_id` found in `tool_result` blocks: toolu_…
Each `tool_result` block must have a corresponding `tool_use` block in the previous message.
```

Provider: `vscode-lm`, model `copilot/claude-opus-4.8`.

---

## Two sources of truth (the core bug)

### A. UI bar → static model table (shows ~128K, correct)
The webview derives the window from `useSelectedModel`, which for `vscode-lm` returns the **static** `vscodeLlmModels` entry, falling back to sane defaults when the family is unknown:

- `webview-ui/src/components/chat/TaskHeader.tsx` — `const contextWindow = model?.contextWindow || 1`
- `webview-ui/src/components/ui/hooks/useSelectedModel.ts` (`case "vscode-lm"`) — returns `{ ...openAiModelInfoSaneDefaults, ...info, supportsImages: false }`
- `packages/types/src/providers/vscode-llm.ts` — static table tops out at `contextWindow: 128000` (`claude-4-sonnet`); `claude-3.5-sonnet` is `81638`. **`claude-opus-4.8` is NOT in this table.**
- `packages/types/src/providers/openai.ts` — `openAiModelInfoSaneDefaults.contextWindow = 128_000`

⇒ For `claude-opus-4.8`, the table lookup misses, the code falls back to `openAiModelInfoSaneDefaults`, and the UI **always shows 128K**. This matches the observed UI value.

### B. Condense trigger → live provider value (uses ~936K, inflated)
The extension host computes the window dynamically from the live VS Code LM client:

- `src/api/providers/vscode-lm.ts` (`getModel()`):
  ```ts
  contextWindow:
    typeof this.client.maxInputTokens === "number"
      ? Math.max(0, this.client.maxInputTokens)   // ~936K–1M for copilot/claude
      : openAiModelInfoSaneDefaults.contextWindow,
  ```
- `src/core/task/Task.ts` (`attemptApiRequest`): `const contextWindow = modelInfo.contextWindow` → passed into the trigger.
- `src/core/context-management/index.ts` (`manageContext` and `willManageContext`):
  ```ts
  const contextPercent = (100 * prevContextTokens) / contextWindow
  if (contextPercent >= effectiveThreshold || prevContextTokens > allowedTokens) { /* condense */ }
  // allowedTokens = contextWindow * (1 - TOKEN_BUFFER_PERCENTAGE) - reservedTokens
  ```

⇒ Both the percentage path **and** the absolute fallback (`allowedTokens ≈ 0.9 × 936K`) scale with the inflated window, so neither fires near the real ~128K wall.

The models capability dump from the failing session confirms what Copilot advertises to the VS Code LM API:

| family | max_prompt_tokens | context window |
|---|---|---|
| `claude-opus-4.8` | 936,000 | 1,000,000 |
| `claude-opus-4.7-high` | 167,997 | 200,000 |
| `claude-opus-4.5` | 167,997 | 200,000 |

---

## Why it's "sometimes, but mostly not"

1. **Provider-dependent (dominant).** Models whose advertised window is realistic condense correctly; `vscode-lm`/Copilot advertises 936K–1M while the real budget is ~128K, so it under-fires. Switching model/profile flips the behavior.
2. **Default threshold is 100%.** If the global slider was never changed, `autoCondenseContextPercent` defaults to **100**:
   - `webview-ui/src/context/ExtensionStateContext.tsx` — `autoCondenseContextPercent: 100`
   - `src/core/task/Task.ts` — destructure default `autoCondenseContextPercent = 100`
   The per-profile map also falls back to this global default when the active profile has no entry (`src/core/context-management/index.ts`, threshold resolution).
3. **Measurement lag (secondary).** `prevContextTokens` comes from the **last completed** `api_req_started` record (`packages/core/src/message-utils/consolidateTokenUsage.ts`) plus only the **single last** message's tokens (`manageContext`). A turn that injects a large jump (big tool results / multiple file reads) isn't reflected until the *next* request — so the request that blows the limit can do so before the check ever sees the overage.

---

## Downstream symptom: orphaned `tool_result` 400

With condensing not firing, Roo sends the full (well-formed) history. The Copilot→Anthropic proxy trims from the front to fit the real cap; the cut lands between an assistant `tool_use` message and its following user `tool_result` message (worsened by parallel tool calls, where one assistant message holds several `tool_use` blocks and the next user message holds all the matching `tool_result`s). The surviving `tool_result` becomes `messages[0].content[0]` with no preceding `tool_use`, and Anthropic rejects it. Evidence from a captured report: Roo's stored history was clean (`messages[0]` was the task text; the `tool_use`/`tool_result` pair was matched; **no summary present** ⇒ condense never ran), yet the *sent* request was rejected — confirming the corruption is introduced downstream, after Roo hands off.

> Note: the Copilot proxy is closed-source, so "front-trim" is inferred from the message-shape math (stored-clean + sent-corrupted, Anthropic `request_id`), not verified line-by-line. The in-repo defect (trusting the advertised window) is the fixable enabling cause.

---

## Findings checklist

- [ ] **F1 (primary):** `vscode-lm` `getModel()` reports `client.maxInputTokens` (936K–1M) as `contextWindow`, which is not honored end-to-end on the Copilot path — `src/api/providers/vscode-lm.ts`.
- [ ] **F2:** Condense trigger scales both percentage and absolute-fallback by that inflated window — `src/core/context-management/index.ts`, `src/core/task/Task.ts`.
- [ ] **F3 (inconsistency):** UI bar uses a **different** source (static `vscodeLlmModels` / `openAiModelInfoSaneDefaults` = 128K) than the trigger — `useSelectedModel.ts`, `TaskHeader.tsx`, `packages/types/src/providers/vscode-llm.ts`, `packages/types/src/providers/openai.ts`.
- [ ] **F4:** `claude-opus-4.8` (and other newer Copilot families) are absent from the static `vscodeLlmModels` table, so the UI silently falls back to the 128K sane default.
- [ ] **F5:** Default `autoCondenseContextPercent` is 100%, so even with a correct window the trigger only fires at the very top — `ExtensionStateContext.tsx`, `Task.ts`.
- [ ] **F6 (secondary):** Token-usage measurement lags one request and only adds the last message, so a single large turn can overshoot before the gate sees it — `consolidateTokenUsage.ts`, `manageContext`.
- [ ] **F7 (downstream):** No tool-pairing-safe guard at the provider boundary, so an externally-trimmed array can present an orphaned leading `tool_result` — `src/api/transform/vscode-lm-format.ts`, `src/api/providers/vscode-lm.ts`.

---

## Suggested fixes

1. **Single source of truth for the window.** Make the condense trigger and the UI agree. Preferred: clamp the `vscode-lm` effective `contextWindow` to the real working budget instead of trusting `client.maxInputTokens` (evidence ⇒ ~128K). Ideally a configurable per-provider cap defaulting to 128K for `vscode-lm`. This fixes F1–F4 at once and makes the bar truthful for the trigger too.
2. **Add the missing Copilot families** (e.g. `claude-opus-4.8`) to `vscodeLlmModels` with realistic windows, so the UI/static path stops falling back to the generic 128K default and both paths read the same number.
3. **Lower the default `autoCondenseContextPercent`** below 100% (e.g. 75–80%) so condensing has headroom even when the window is correct (F5).
4. **(Defense in depth)** Sanitize the outgoing message array at the `vscode-lm` boundary so `messages[0]` can never begin with an orphaned `tool_result` (strip/repair leading orphans after any trimming/merging) — closes F7 even if a downstream trim still happens.
5. **(Optional)** Tighten the pre-send token estimate to include the just-built turn so the gate doesn't lag by one request (F6).

---

## Reproduction

1. Use the `vscode-lm` provider with `copilot/claude-opus-4.8` (advertised window ~936K–1M).
2. Leave auto-condense enabled at the default threshold.
3. Drive the conversation past ~128K tokens (e.g. several large file reads / parallel tool calls).
4. Observe: the UI context bar reads ~128K (correct), no condense fires, and the request fails with the Anthropic `messages.0.content.0 … unexpected tool_use_id` 400.

## Environment

- Provider: `vscode-lm` (GitHub Copilot), model `copilot/claude-opus-4.8`
- Symptom onset: ~128K tokens (the real Copilot backend cap, far below the advertised 936K–1M)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto context condensing unreliable on vscode-lm: UI shows real ~128K window but condense trigger uses inflated advertised window (936K-1M) #10

Summary

Two sources of truth (the core bug)

A. UI bar → static model table (shows ~128K, correct)

B. Condense trigger → live provider value (uses ~936K, inflated)

Why it's "sometimes, but mostly not"

Downstream symptom: orphaned `tool_result` 400

Findings checklist

Suggested fixes

Reproduction

Environment

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

family	max_prompt_tokens	context window
`claude-opus-4.8`	936,000	1,000,000
`claude-opus-4.7-high`	167,997	200,000
`claude-opus-4.5`	167,997	200,000

Auto context condensing unreliable on vscode-lm: UI shows real ~128K window but condense trigger uses inflated advertised window (936K-1M) #10

Description

Summary

Two sources of truth (the core bug)

A. UI bar → static model table (shows ~128K, correct)

B. Condense trigger → live provider value (uses ~936K, inflated)

Why it's "sometimes, but mostly not"

Downstream symptom: orphaned tool_result 400

Findings checklist

Suggested fixes

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Downstream symptom: orphaned `tool_result` 400