Skip to content

feat: add XML tool calling support as provider setting#11973

Open
jthweny wants to merge 113 commits intoRooCodeInc:mainfrom
jthweny:feat/xml-tool-calling
Open

feat: add XML tool calling support as provider setting#11973
jthweny wants to merge 113 commits intoRooCodeInc:mainfrom
jthweny:feat/xml-tool-calling

Conversation

@jthweny
Copy link
Copy Markdown

@jthweny jthweny commented Mar 21, 2026

Adds a useXmlToolCalling provider toggle. When enabled, the system prompt includes XML formatting instructions and native tool parameters (tools/tool_choice) are omitted from Anthropic/Vertex API requests, forcing the model to use text-based XML tool calling parsed by the existing TagMatcher. 12 new tests, all passing.

Interactively review PR in Roo Code Cloud

Add a useXmlToolCalling boolean toggle to provider settings that enables
text-based XML tool calling instead of native function calling.

Phase 1 - System Prompt:
- Add useXmlToolCalling to baseProviderSettingsSchema in provider-settings.ts
- Modify getSharedToolUseSection() to return XML formatting instructions
  when useXmlToolCalling is true
- Make getToolUseGuidelinesSection() XML-aware with conditional steps
- Thread useXmlToolCalling through SYSTEM_PROMPT(), generateSystemPrompt(),
  and Task.getSystemPrompt()
- Add UI toggle checkbox in ApiOptions.tsx settings panel
- Add i18n string for the toggle label

Phase 2 - Transport Layer:
- Add useXmlToolCalling to ApiHandlerCreateMessageMetadata interface
- Conditionally omit native tools/tool_choice from Anthropic API requests
  when useXmlToolCalling is enabled
- Same conditional omission for Anthropic Vertex provider
- Thread useXmlToolCalling from provider settings into API request metadata
  in Task.attemptApiRequest()

The existing TagMatcher-based text parsing in presentAssistantMessage()
automatically handles XML tool calls when the model outputs them as raw
text (which occurs when native tools are omitted from the request).

Tests: 9 new tool-use.spec.ts tests + 3 new anthropic.spec.ts tests, all passing.
Copilot AI review requested due to automatic review settings March 21, 2026 09:32
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. Enhancement New feature or request labels Mar 21, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a provider setting (useXmlToolCalling) intended to switch Anthropic/Vertex from native tool calling to XML-in-text tool calling, by updating the system prompt and omitting native tool parameters from certain provider API requests.

Changes:

  • Add useXmlToolCalling to provider settings schema and expose it in the webview “Advanced settings” UI.
  • Thread useXmlToolCalling into system prompt generation to emit XML tool-calling instructions.
  • Update Anthropic + Anthropic Vertex request building to omit tools / tool_choice when useXmlToolCalling is enabled, with new tests asserting omission.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
webview-ui/src/i18n/locales/en/settings.json Adds UI strings for the new advanced setting.
webview-ui/src/components/settings/ApiOptions.tsx Adds the “Use XML tool calling” checkbox in Advanced settings.
src/core/webview/generateSystemPrompt.ts Threads the new toggle into system prompt preview generation.
src/core/task/Task.ts Threads the toggle into runtime prompt generation and API handler metadata.
src/core/prompts/system.ts Passes the toggle into tool-use prompt sections.
src/core/prompts/sections/tool-use.ts Adds XML-mode tool-use instructions section.
src/core/prompts/sections/tool-use-guidelines.ts Adds XML-mode reinforcement to tool-use guidelines.
src/core/prompts/sections/tests/tool-use.spec.ts Adds/extends unit tests for XML vs native prompt sections.
src/api/providers/anthropic.ts Omits native tool params when XML mode is enabled.
src/api/providers/anthropic-vertex.ts Omits native tool params when XML mode is enabled.
src/api/providers/tests/anthropic.spec.ts Adds tests asserting omission/presence of tool params based on the toggle.
src/api/index.ts Adds useXmlToolCalling to handler metadata interface and documents intended behavior.
packages/types/src/provider-settings.ts Adds useXmlToolCalling to provider settings schema.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 9 to 15
: ""

return `# Tool Use Guidelines

1. Assess what information you already have and what information you need to proceed with the task.
2. Choose the most appropriate tool based on the task and the tool descriptions provided. Assess if you need additional information to proceed, and which of the available tools would be most effective for gathering this information. For example using the list_files tool is more effective than running a command like \`ls\` in the terminal. It's critical that you think about each available tool and use the one that best fits the current step in the task.
3. If multiple actions are needed, you may use multiple tools in a single message when appropriate, or use tools iteratively across messages. Each tool use should be informed by the results of previous tool uses. Do not assume the outcome of any tool use. Each step must be informed by the previous step's result.
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In XML mode, step 3 still says multiple tools may be used in a single message, but other XML instructions reinforce one-at-a-time tool usage. Please reconcile the XML-mode guidance so it consistently reflects the intended/implemented behavior (single-tool vs multi-tool) to avoid conflicting instructions in the system prompt.

Suggested change
: ""
return `# Tool Use Guidelines
1. Assess what information you already have and what information you need to proceed with the task.
2. Choose the most appropriate tool based on the task and the tool descriptions provided. Assess if you need additional information to proceed, and which of the available tools would be most effective for gathering this information. For example using the list_files tool is more effective than running a command like \`ls\` in the terminal. It's critical that you think about each available tool and use the one that best fits the current step in the task.
3. If multiple actions are needed, you may use multiple tools in a single message when appropriate, or use tools iteratively across messages. Each tool use should be informed by the results of previous tool uses. Do not assume the outcome of any tool use. Each step must be informed by the previous step's result.
: "";
const step3Guideline = useXmlToolCalling
? `3. If multiple actions are needed, use tools iteratively across messages, making at most one XML tool call per assistant message. Each tool use should be informed by the results of previous tool uses. Do not assume the outcome of any tool use. Each step must be informed by the previous step's result.`
: `3. If multiple actions are needed, you may use multiple tools in a single message when appropriate, or use tools iteratively across messages. Each tool use should be informed by the results of previous tool uses. Do not assume the outcome of any tool use. Each step must be informed by the previous step's result.`;
return `# Tool Use Guidelines
1. Assess what information you already have and what information you need to proceed with the task.
2. Choose the most appropriate tool based on the task and the tool descriptions provided. Assess if you need additional information to proceed, and which of the available tools would be most effective for gathering this information. For example using the list_files tool is more effective than running a command like \`ls\` in the terminal. It's critical that you think about each available tool and use the one that best fits the current step in the task.
${step3Guideline}

Copilot uses AI. Check for mistakes.
Comment on lines +804 to +813
<div>
<Checkbox
checked={apiConfiguration?.useXmlToolCalling ?? false}
onChange={handleInputChange("useXmlToolCalling", noTransform)}>
{t("settings:advancedSettings.useXmlToolCalling")}
</Checkbox>
<div className="text-sm text-vscode-descriptionForeground ml-6">
{t("settings:advancedSettings.useXmlToolCallingDescription")}
</div>
</div>
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UI exposes useXmlToolCalling for all providers, but the backend behavior (omitting native tools/tool_choice) is only implemented for Anthropic/Anthropic-Vertex. Enabling this for other providers will still modify the system prompt to demand XML, while requests continue using native tool calling—creating a prompt/API mismatch and likely breaking tool use. Consider gating this checkbox (and/or the prompt toggle) to only providers that actually honor useXmlToolCalling, or implement the request-side behavior for all providers.

Suggested change
<div>
<Checkbox
checked={apiConfiguration?.useXmlToolCalling ?? false}
onChange={handleInputChange("useXmlToolCalling", noTransform)}>
{t("settings:advancedSettings.useXmlToolCalling")}
</Checkbox>
<div className="text-sm text-vscode-descriptionForeground ml-6">
{t("settings:advancedSettings.useXmlToolCallingDescription")}
</div>
</div>
{(selectedProvider === "anthropic" || selectedProvider === "vertex") && (
<div>
<Checkbox
checked={apiConfiguration?.useXmlToolCalling ?? false}
onChange={handleInputChange("useXmlToolCalling", noTransform)}>
{t("settings:advancedSettings.useXmlToolCalling")}
</Checkbox>
<div className="text-sm text-vscode-descriptionForeground ml-6">
{t("settings:advancedSettings.useXmlToolCallingDescription")}
</div>
</div>
)}

Copilot uses AI. Check for mistakes.
Comment on lines +90 to +92
${getSharedToolUseSection(useXmlToolCalling)}${toolsCatalog}

${getToolUseGuidelinesSection()}
${getToolUseGuidelinesSection(useXmlToolCalling)}
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useXmlToolCalling is threaded into the system prompt unconditionally. Since only some providers currently change their API request behavior based on this flag, the prompt can instruct XML tool calls while the selected provider still expects native tool calling. Scope the XML prompt sections to providers that actually support this mode (or ensure all providers handle useXmlToolCalling consistently).

Copilot uses AI. Check for mistakes.
Comment on lines +78 to +86
// When useXmlToolCalling is enabled, omit native tool definitions from the API request.
// The model will rely on XML tool documentation in the system prompt instead,
// and output tool calls as raw XML text parsed by TagMatcher.
const nativeToolParams = metadata?.useXmlToolCalling
? {}
: {
tools: convertOpenAIToolsToAnthropic(metadata?.tools ?? []),
tool_choice: convertOpenAIToolChoiceToAnthropic(metadata?.tool_choice, metadata?.parallelToolCalls),
}
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When metadata.useXmlToolCalling is true, this omits tools/tool_choice from the Anthropic request, but the codebase currently executes tools via native tool_use blocks with ids/nativeArgs (XML/legacy tool calls are explicitly rejected in presentAssistantMessage/BaseTool). Without an XML-to-ToolUse parser (and tool schema documentation) this will prevent any tool execution. Either implement the XML parsing + tool catalog path end-to-end, or keep sending native tools for Anthropic.

Suggested change
// When useXmlToolCalling is enabled, omit native tool definitions from the API request.
// The model will rely on XML tool documentation in the system prompt instead,
// and output tool calls as raw XML text parsed by TagMatcher.
const nativeToolParams = metadata?.useXmlToolCalling
? {}
: {
tools: convertOpenAIToolsToAnthropic(metadata?.tools ?? []),
tool_choice: convertOpenAIToolChoiceToAnthropic(metadata?.tool_choice, metadata?.parallelToolCalls),
}
// Always send native tool definitions for Anthropic so that tool_use blocks are produced.
// The useXmlToolCalling flag is currently ignored here because the rest of the codebase
// expects native tool_use events and does not support XML-based tool calling.
const nativeToolParams = {
tools: convertOpenAIToolsToAnthropic(metadata?.tools ?? []),
tool_choice: convertOpenAIToolChoiceToAnthropic(metadata?.tool_choice, metadata?.parallelToolCalls),
}

Copilot uses AI. Check for mistakes.
Comment on lines +78 to +86
// When useXmlToolCalling is enabled, omit native tool definitions from the API request.
// The model will rely on XML tool documentation in the system prompt instead,
// and output tool calls as raw XML text parsed by TagMatcher.
const nativeToolParams = metadata?.useXmlToolCalling
? {}
: {
tools: convertOpenAIToolsToAnthropic(metadata?.tools ?? []),
tool_choice: convertOpenAIToolChoiceToAnthropic(metadata?.tool_choice, metadata?.parallelToolCalls),
}
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as Anthropic: omitting native tools/tool_choice when useXmlToolCalling is true will leave the system without a working tool-call execution path unless XML tool calls are parsed into ToolUse blocks with ids/nativeArgs. As-is, this will likely break tool use for Anthropic Vertex. Either implement the XML parsing + tool documentation path, or continue sending native tool params.

Suggested change
// When useXmlToolCalling is enabled, omit native tool definitions from the API request.
// The model will rely on XML tool documentation in the system prompt instead,
// and output tool calls as raw XML text parsed by TagMatcher.
const nativeToolParams = metadata?.useXmlToolCalling
? {}
: {
tools: convertOpenAIToolsToAnthropic(metadata?.tools ?? []),
tool_choice: convertOpenAIToolChoiceToAnthropic(metadata?.tool_choice, metadata?.parallelToolCalls),
}
// Always send native tool definitions to the API request so that tool calling
// continues to work even when XML-based tool documentation is used elsewhere.
const nativeToolParams = {
tools: convertOpenAIToolsToAnthropic(metadata?.tools ?? []),
tool_choice: convertOpenAIToolChoiceToAnthropic(metadata?.tool_choice, metadata?.parallelToolCalls),
}

Copilot uses AI. Check for mistakes.
Comment on lines +91 to +93
* The model relies solely on XML tool documentation in the system prompt
* and outputs tool calls as raw XML text, which the existing TagMatcher
* in presentAssistantMessage() parses into ToolUse objects.
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment claims XML tool calls are parsed by TagMatcher in presentAssistantMessage(), but presentAssistantMessage currently treats missing tool_use.id as an invalid legacy/XML tool call and rejects it, and tools generally require nativeArgs. Please update this comment to reflect the actual execution/parsing flow, or add the missing XML parsing implementation and adjust this description accordingly.

Suggested change
* The model relies solely on XML tool documentation in the system prompt
* and outputs tool calls as raw XML text, which the existing TagMatcher
* in presentAssistantMessage() parses into ToolUse objects.
* The model is expected to rely solely on XML tool documentation in the system prompt
* and may output tool calls as raw XML (or XML-like) text.
*
* This flag only affects how the request is constructed; any parsing of XML tool
* calls into ToolUse objects must be handled by higher-level consumer code.

Copilot uses AI. Check for mistakes.

TOOL USE

You have access to a set of tools that are executed upon the user's approval. You can use one tool per message, and will receive the result of that tool use in the user's response. You use tools step-by-step to accomplish a given task, with each tool use informed by the result of the previous tool use.
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In XML mode this section says "You can use one tool per message", but the general tool-use guidelines (and native mode) explicitly allow multiple tools per message. This internal inconsistency can confuse the model and cause unpredictable tool behavior. Align the XML instructions with the actual supported behavior (either document single-tool restriction everywhere for XML mode, or remove the single-tool claim here).

Suggested change
You have access to a set of tools that are executed upon the user's approval. You can use one tool per message, and will receive the result of that tool use in the user's response. You use tools step-by-step to accomplish a given task, with each tool use informed by the result of the previous tool use.
You have access to a set of tools that are executed upon the user's approval. You can use one or more tools per message, and will receive the result of those tool uses in the user's response. You use tools step-by-step to accomplish a given task, with each tool use informed by the result of the previous tool use.

Copilot uses AI. Check for mistakes.
jthweny added 3 commits March 21, 2026 09:50
When useXmlToolCalling is enabled, omit native tool definitions
(tools, tool_choice, parallel_tool_calls) from API requests across
all 22 providers. The model relies on XML tool documentation in the
system prompt instead, fixing 400 errors with servers like vLLM that
don't support tool_choice: auto.

Providers updated:
- OpenAI-style: openai, deepseek, base-openai-compatible-provider,
  openai-compatible, lm-studio, lite-llm, xai, qwen-code, openrouter,
  requesty, unbound, vercel-ai-gateway, roo, zai
- Responses API: openai-native, openai-codex
- Custom formats: bedrock, gemini, minimax, mistral

Tests: 5 new tests in openai.spec.ts, 800 total passed
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Mar 21, 2026
@averagebird007
Copy link
Copy Markdown

Pull request overview

Adds a provider setting (useXmlToolCalling) intended to switch Anthropic/Vertex from native tool calling to XML-in-text tool calling, by updating the system prompt and omitting native tool parameters from certain provider API requests.

Changes:

  • Add useXmlToolCalling to provider settings schema and expose it in the webview “Advanced settings” UI.
  • Thread useXmlToolCalling into system prompt generation to emit XML tool-calling instructions.
  • Update Anthropic + Anthropic Vertex request building to omit tools / tool_choice when useXmlToolCalling is enabled, with new tests asserting omission.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
webview-ui/src/i18n/locales/en/settings.json Adds UI strings for the new advanced setting.
webview-ui/src/components/settings/ApiOptions.tsx Adds the “Use XML tool calling” checkbox in Advanced settings.
src/core/webview/generateSystemPrompt.ts Threads the new toggle into system prompt preview generation.
src/core/task/Task.ts Threads the toggle into runtime prompt generation and API handler metadata.
src/core/prompts/system.ts Passes the toggle into tool-use prompt sections.
src/core/prompts/sections/tool-use.ts Adds XML-mode tool-use instructions section.
src/core/prompts/sections/tool-use-guidelines.ts Adds XML-mode reinforcement to tool-use guidelines.
src/core/prompts/sections/tests/tool-use.spec.ts Adds/extends unit tests for XML vs native prompt sections.
src/api/providers/anthropic.ts Omits native tool params when XML mode is enabled.
src/api/providers/anthropic-vertex.ts Omits native tool params when XML mode is enabled.
src/api/providers/tests/anthropic.spec.ts Adds tests asserting omission/presence of tool params based on the toggle.
src/api/index.ts Adds useXmlToolCalling to handler metadata interface and documents intended behavior.
packages/types/src/provider-settings.ts Adds useXmlToolCalling to provider settings schema.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

image

or open system.ts and change the native to xml .. easy

- Add XmlToolCallParser with streaming XML detection and partial tag handling
- Add hand-crafted tool descriptions for attempt_completion and ask_followup_question
- Support multiple follow_up formats: JSON arrays, <suggest> tags, comma-less objects
- Strip <thinking> tags before XML parsing to prevent hallucination loops
- Normalize Meta/Llama tool_call format to standard XML
- Prevent XML tags from leaking into chat UI during streaming
- Add XML-aware retry messages and missing parameter errors
- Graceful degradation: text-only responses shown as followup questions
- Compact XML tool descriptions to save context window space
- Match Kilo Code/Cline system prompt conventions for better model compliance

Made-with: Cursor
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Mar 21, 2026
jthweny added 8 commits March 22, 2026 00:02
Update tool-use.spec.ts and xml-tool-catalog.spec.ts to match the new
compact XML prompt format. Update system prompt snapshots.

Made-with: Cursor
Update presentAssistantMessage tests to match the current error message
"missing tool_use.id" instead of the old "XML tool calls are no longer
supported" text.

Made-with: Cursor
Comprehensive design for a continuous learning system that analyzes
user conversations to build a dynamically updating user profile,
powered by SQLite storage with tiered scoring and an LLM analysis agent.

Made-with: Cursor
…ilter, dedup algorithm

Resolves all critical and important review items:
- Switch from better-sqlite3 to sql.js (WASM) for zero native dep packaging
- Add schema_meta table and migration runner
- Add rule-based PII post-filter as defense in depth
- Specify concrete Jaccard similarity dedup algorithm
- Add garbage collection with 90-day + score threshold + 500 entry cap
- Stabilize workspace identity via SHA-256 hash of git remote + folder name
- Move memory config to global SettingsView (not per-mode ModesView)
- Handle invalid entry ID references from analysis agent
- Add session-end analysis trigger for short conversations
- Document multi-window safety model
- Specify tiktoken o200k_base for token counting

Made-with: Cursor
16 tasks with TDD workflow, covering types, scoring, preprocessor,
SQLite store, memory writer, prompt compiler, analysis agent,
orchestrator, settings, system prompt integration, and UI toggle.

Made-with: Cursor
…mplementation

- memory-data-layer: Types, scoring, SQLite store, memory writer (Tasks 1,2,4,5)
- memory-pipeline: Preprocessor, analysis agent, prompt compiler, orchestrator (Tasks 3,6,7,8)
- memory-frontend: Settings types, system prompt, extension host, UI toggle, settings view (Tasks 9-13)

Made-with: Cursor
jthweny added 30 commits March 22, 2026 22:10
Instruments the handler, orchestrator.execute(), and ClineProvider
.getMultiOrchestrator() with granular console.log statements to trace:
- whether the orchestrator instance is created or reused
- raw vs resolved values for maxAgents, planReview, mergeMode
- providerSettings.apiProvider / apiModelId / apiKey presence
- every onStateChange callback invocation with phase + agent count
- .execute() promise resolution vs rejection with full stack traces

Made-with: Cursor
TokenUsage type now requires contextTokens field after upstream schema
change. Adds contextTokens: 0 to the mock helper to fix TS2741.

Made-with: Cursor
…ycle

AgentCoordinator.startAll():
- Replace pointless Promise.all(sync-wrapped-promises) with direct
  synchronous loop — start() is fire-and-forget, not async
- Add try/catch around each start() so a throw doesn't skip remaining
  agents or leave the failed agent unaccounted (causing waitForAll hang)
- Mark agents with undefined getCurrentTask() as failed immediately
  instead of silently skipping them
- Deduplicate completion tracking: replace completionCount with
  completedSet<string> to guard against double-counted events
- Add vacuous-truth guard in allComplete() (empty agent map ≠ complete)
- Add timeout to waitForAll() (default 10min) with diagnostic message
  listing pending agents on timeout

orchestrator.ts:
- Move agentCompleted/agentFailed event handlers BEFORE startAll() so
  early completions during the synchronous start loop are never missed
- Add pre-start guard: throw if coordinator has 0 registered agents
  instead of entering a waitForAll() that can never resolve

Made-with: Cursor
…mentation

- Empty tasks array is now rejected (not treated as valid plan)
- Trailing garbage is handled via brace-matching extraction
- Plain ``` fences are now stripped (regex makes json tag optional)
- Prompt uses "Max agents available" instead of "Max parallel tasks"
- Architect mode is now also filtered from available modes in prompt

Made-with: Cursor
… content area

The panels were imported and had message listeners but were rendered
in a dead zone between the button bar and QueuedMessages, squeezed to
zero height because no `task` existed to open the Virtuoso scroll area.

- Add a dedicated multi-orchestrator content section that takes `grow`
  flex space when `mode === "multi-orchestrator"` and state is present
- Hide the home screen (RooHero/tips/history) when panels are active
  so they aren't competing for flex height
- Remove the old panel placement that was invisible

Made-with: Cursor
… spawn, and auto-approval bugs

Made-with: Cursor
The word-count heuristic (lines 120-125) forcibly sliced any plan to 2
tasks when the user's message was under 20 words, ignoring the user's
explicit maxAgents selection. The hard cap in parsePlanResponse already
enforces maxAgents correctly.

Made-with: Cursor
Replace sequential for-loops with Promise.all in both panel-spawner.ts
and orchestrator.ts. Panels can be created in parallel (each uses a
different ViewColumn) and tasks can be created in parallel (each targets
a different ClineProvider). This eliminates the ~15-30s per-agent
sequential initialization delay.

- Extract spawnSinglePanel private method from spawnPanels
- Convert spawnPanels loop to Promise.all over spawnSinglePanel calls
- Convert task creation loop in executeFromPlan to Promise.all
- Preserve error handling: failed panels skip gracefully, all-fail throws

Made-with: Cursor
…vestigation

Agent F investigation: tasks reported as "failed" after ~15 seconds.

Root cause analysis reveals TaskCompleted is only emitted by
AttemptCompletionTool (not Task.ts), and Task.start() fires startTask()
as fire-and-forget with no catch — errors become unhandled rejections
that silently kill the task without emitting either TaskCompleted or
TaskAborted.

Added console.log/trace instrumentation to:
- Task.start() / startTask() / abortTask() / TaskAborted emission
- AgentCoordinator.registerAgent / startAll / handleAgentFinished

Full investigation written to:
docs/superpowers/specs/2026-03-22-agent-f-investigation.md

Made-with: Cursor
…verrides

The root cause: ContextProxy is a singleton shared by ALL ClineProvider
instances. When the multi-orchestrator called setValues(autoApprovalConfig),
those values were written to the shared ContextProxy — but any concurrent
provider activity (main sidebar, mode switches, other panels) could
overwrite them before the Task's checkAutoApproval() read them back via
provider.getState().

Fix: Add a per-provider _autoApprovalOverrides field to ClineProvider that
is held in instance memory (not ContextProxy). These overrides are merged
LAST in getState(), so they always win regardless of ContextProxy mutations.

The orchestrator now calls provider.setAutoApprovalOverrides() before
createTask(), instead of passing a configuration object that gets lost
in the shared ContextProxy.

Made-with: Cursor
…l spawn with delay

Panels now spawn beside each other (to the right) instead of at fixed
ViewColumns 1-3 which overlapped existing editors. Sequential creation
with 200ms delay between panels lets VS Code settle its layout.

Made-with: Cursor
The LLM was ignoring the maxAgents count and returning fewer tasks.
Changed prompt from "SHOULD use up to N" to "MUST create EXACTLY N".
User's explicit agent count selection is now respected.

Made-with: Cursor
Added multiOrchForceApproveAll flag that short-circuits the entire
auto-approval decision tree. Spawned agents now approve everything
unconditionally — tool use, commands, followup questions, outside
workspace reads/writes, protected files. Nobody is watching these
panels to click approve, so every ask must pass automatically.

Also enabled alwaysAllowReadOnlyOutsideWorkspace and
alwaysAllowWriteOutsideWorkspace since agents may work in directories
outside the current workspace.

Made-with: Cursor
…s from force-approve

When multiOrchForceApproveAll auto-approved resume_completed_task,
it restarted finished tasks causing an infinite completion loop.
Now excludes resume_completed_task and resume_task from force-approve
so completed agents stay completed.

Made-with: Cursor
…ctories

Added setWorkingDirectory() to ClineProvider so the orchestrator can
point each spawned agent at its own git worktree. Each agent's cwd is
now isolated — file reads/writes go to the worktree directory, not the
shared workspace. This prevents agents from colliding on file operations.

Made-with: Cursor
Three changes:

1. Task completion loop fix: AgentCoordinator now calls abortTask()
   on the provider's current task when TaskCompleted fires. This sets
   task.abort=true which breaks the while(!abort) loop, preventing
   the agent from making another API request after attempt_completion.

2. New agent-system-prompt.ts: Separate system prompt section for
   multi-orchestrator spawned agents. Injected as a prefix to each
   agent's task description. Includes:
   - Parallel execution context (other agents, assigned files)
   - Git worktree isolation status
   - Instruction to provide DETAILED completion summaries
   - Instruction not to ask questions (autonomous mode)

3. Updated auto-approval comments for clarity.

Made-with: Cursor
1. Agent panels now close 2 seconds after orchestration completes,
   giving the user a moment to see final state before cleanup.

2. Coordinator now captures each agent's completion_result message
   as their completionReport before aborting the task. This report
   feeds into aggregateReports() for the orchestrator's final summary.

Made-with: Cursor
…l arrangement

Rewrote PanelSpawner to:
1. Save the current editor layout before spawning
2. Call vscode.setEditorLayout with N equal-width columns
3. Place each panel into its assigned ViewColumn (1-indexed)
4. Use preserveFocus:true so panels don't steal focus from each other
5. Restore the original layout when panels are closed

This ensures all agent panels appear simultaneously in equal-width
columns without overlapping existing editors or each other.

Made-with: Cursor
Comprehensive single source of truth covering:
- Full architecture and flow
- Complete file map with status of every component
- 20+ verified working features
- 5 active bugs with root cause analysis and fix guidance
- 5 not-yet-implemented features with specifications
- VS Code API constraints and workarounds
- Agent assignment template for targeted fixes

This is a living document — updated as bugs are fixed and features added.

Made-with: Cursor
startAll() previously called currentTask.start() sequentially inside
the for-loop, causing Agent 1 to begin 1-3 seconds before Agent N.

Now we collect all start thunks into an array first, then fire them
all in a tight loop after preparation is complete. This eliminates
the sequential dispatch gap so all agents begin at the same instant.

Error handling preserved: agents whose provider has no current task
are still marked failed immediately, and start() exceptions are
caught per-agent without blocking the others.

Made-with: Cursor
…ect columns

Instead of relying on explicit ViewColumn numbers (1, 2, 3...) which
don't always map to VS Code's internal editor group indices after a
programmatic setEditorLayout, we now:

1. Wait 500ms after setEditorLayout for the layout to settle
2. Focus the first editor group explicitly
3. Create the first panel at ViewColumn.Active (leftmost group)
4. For each subsequent panel, call workbench.action.focusNextGroup
   to advance focus to the next column, then create at ViewColumn.Active

This guarantees each panel lands in the correct column regardless of
VS Code's internal group indexing.

Made-with: Cursor
File edits from multi-orchestrator agents were appearing in the wrong
editor column because VS Code's showTextDocument/vscode.diff commands
default to the ACTIVE editor group. The fix threads the ViewColumn
from PanelSpawner → ClineProvider → Task → DiffViewProvider, so all
file operations target the correct agent column.

Changes:
- DiffViewProvider: accept optional viewColumn param, use it in all
  showTextDocument and vscode.diff calls (open, saveChanges,
  saveDirectly, revertChanges, openDiffEditor)
- ClineProvider: add public viewColumn property
- PanelSpawner: set provider.viewColumn when spawning panels, add
  viewColumn to SpawnedPanel interface
- Task: pass provider.viewColumn to DiffViewProvider constructor

Made-with: Cursor
…t panels

When multiOrchForceApproveAll is enabled, the webview rendered approve/deny
buttons briefly before the backend auto-approval processed the ask. This
caused a yellow flash in agent panels.

Fix: expose multiOrchForceApproveAll via extension state to the webview,
then suppress button rendering and keyboard-triggered approval when the
flag is true.

Files changed:
- packages/types: add multiOrchForceApproveAll to ExtensionState
- ClineProvider: include flag in getStateToPostToWebview()
- ChatView: gate areButtonsVisible and Enter-key handler on flag

Made-with: Cursor
After all parallel agents complete and reports are collected, an optional
verification agent is spawned in "debug" mode to review changed files for
bugs, inconsistencies, missing error handling, and integration issues.

Changes:
- Add `multiOrchVerifyEnabled` boolean to global settings schema
- Add `verifying` phase and `VerificationFinding` type to orchestrator types
- Implement `executeVerificationPhase()` in MultiOrchestrator that spawns
  a single verification panel, feeds it all completion reports + changed
  files, waits for it to finish, and captures its findings
- Update `aggregateReports()` to include verification findings with
  severity-based icons (🟢 info / 🟡 warning / 🔴 error) in final report
- Add verification toggle to Settings → Multi-Orchestrator section
- Wire `verifyEnabled` through webviewMessageHandler for both initial
  execute and plan-approval resume paths
- Add "verifying" status icon to MultiOrchStatusPanel
- Mirror new types in webview-side type definitions

Made-with: Cursor
- e2e.spec.ts: Add `abortTask` and `clineMessages` to mock task object
  so the agent-coordinator's TaskCompleted handler doesn't throw
- plan-generator.spec.ts: Update expected prompt text from
  "Max agents available:" to "Number of agents requested:" to match
  the updated plan-generator prompt
- vscode-extension-host.ts: Add `multiOrchVerifyEnabled` to
  ExtensionState type union so webview-ui can reference it
- ClineProvider.ts: Thread `multiOrchVerifyEnabled` through getState()
  and postStateToWebview() so the settings toggle works end-to-end

Made-with: Cursor
The panels were created with ViewColumn.Active (-1 symbolic) and that
value was stored in provider.viewColumn. When DiffViewProvider used it,
VS Code interpreted -1 as "open in the currently active group" rather
than the group where the panel lives.

Now reads panel.viewColumn AFTER creation to get the real column number
(1, 2, 3...) and stores that. Also tracks viewColumn changes via
onDidChangeViewState so the value stays correct if the panel moves.

Made-with: Cursor
…panels

Two high-impact fixes:

1. API rate limiting: Changed startAll() from simultaneous to staggered
   with 2-second gaps between agent starts. Prevents all N agents from
   hitting the same API provider simultaneously, which caused "Provider
   ended the request: terminated" cascades.

2. Diff view chaos: Enabled PREVENT_FOCUS_DISRUPTION experiment for all
   spawned agents via auto-approval overrides. File edits now save
   directly to disk without opening diff editor views. This prevents
   diff views from fighting with the agent's webview panel for the
   same ViewColumn, eliminating layout disruption.

Made-with: Cursor
… handoff

700+ line living document covering:
- 20 bugs with root cause analysis, fix attempts, and recommendations
- Complete architecture overview with data flow
- Full file map with line numbers and status
- Every attempted fix that didn't work and why
- VS Code API constraints and workarounds
- 4 architectural root causes identified
- Prioritized fix strategy for next session
- 6 unimplemented features with specifications
- Test coverage status and commands

This is the definitive handoff document for continuing development.

Made-with: Cursor
…ulti-orchestrator (regression)

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Enhancement New feature or request size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants