feat: Computer Use v2 — 跨平台 Executor + Python Bridge + GUI 无障碍#137
Conversation
三平台 Computer Use (macOS + Windows + Linux),Windows 专项增强。
- MCP server: toolCalls/tools/executor/mcpServer 等 12 文件完整实现
- 平台抽象层: platforms/{win32,darwin,linux}.ts
- 跨平台 executor: executorCrossPlatform.ts
- CHICAGO_MCP + VOICE_MODE feature flags 启用
- windowMessage.ts: SendMessageW (WM_CHAR Unicode + 剪贴板粘贴)
- windowBorder.ts: 4 叠加窗口边框 (30fps 跟踪)
- uiAutomation.ts: UI Automation 元素树/点击/写值
- accessibilitySnapshot.ts: 无障碍快照 → 模型感知 GUI
- bridge.py + bridgeClient.ts: Python 长驻进程 (替代 per-call PS)
- window_management: min/max/restore/close/focus (Win32 API)
- click_element / type_into_element: 按名称操作 (无需坐标)
- 截图自动附带 Accessibility Snapshot
- 17 种方法, stdin/stdout JSON 通信
- 窗口枚举 1.5ms vs PS 500ms, 截图 360ms vs PS 800ms
- 依赖: mss + Pillow + pywinauto
📝 WalkthroughWalkthroughRefactors Computer Use architecture to cross-platform support by introducing a unified platform abstraction layer ( Changes
Sequence DiagramsequenceDiagram
participant App as Application
participant Executor as createCliExecutor()
participant Dispatcher as loadPlatform()
participant Backend as Platform Backend<br/>(darwin/win32/linux)
participant Win32Bridge as Python Bridge<br/>(Windows only)
participant Win32Utils as Win32 Utils<br/>(windowMessage, etc.)
App->>Executor: initialize computer use
Executor->>Dispatcher: loadPlatform()
Dispatcher->>Backend: require platform module
Backend-->>Dispatcher: return platform instance
Dispatcher-->>Executor: platform object
Executor->>Backend: screenshot(), click(), type(), etc.
alt Darwin
Backend->>Backend: delegate to `@ant` modules
Backend-->>Executor: result
else Windows (Bound Window)
Backend->>Win32Bridge: ensure bridge subprocess
Win32Bridge-->>Backend: bridge ready
Backend->>Win32Utils: resolveInputHwnd()
Win32Utils-->>Backend: target hwnd
Backend->>Win32Bridge: sendClick(hwnd, x, y)
Win32Bridge->>Win32Utils: PostMessage(hwnd, WM_LBUTTONDOWN)
Win32Utils-->>Win32Bridge: success
Win32Bridge-->>Backend: result
Backend-->>Executor: result
else Linux
Backend->>Backend: use xdotool/scrot/wmctrl
Backend-->>Executor: result
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
packages/@ant/computer-use-swift/src/index.ts (1)
8-17:⚠️ Potential issue | 🟠 MajorImport types from
types.tsinstead ofdarwin.js.Lines 8-17 attempt to re-export types from
./backends/darwin.js, butdarwin.tsdoesn't export those types—it only uses them for type annotations. The type definitions exist intypes.ts. Update the import path:export type { DisplayGeometry, PrepareDisplayResult, AppInfo, InstalledApp, RunningApp, ScreenshotResult, ResolvePrepareCaptureResult, WindowDisplayInfo, } from './types.js'Also update line 21 similarly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/`@ant/computer-use-swift/src/index.ts around lines 8 - 17, The file incorrectly re-exports type symbols (DisplayGeometry, PrepareDisplayResult, AppInfo, InstalledApp, RunningApp, ScreenshotResult, ResolvePrepareCaptureResult, WindowDisplayInfo) from './backends/darwin.js' but those types live in './types.js'; update the two export type blocks that reference './backends/darwin.js' (the ones exporting the listed type names) to import/export from './types.js' instead so the type exports resolve to the correct source.src/utils/computerUse/executor.ts (1)
100-106:⚠️ Potential issue | 🟠 MajorPotential command injection in PowerShell Set-Clipboard.
The
text.replace(/'/g, "''")escaping only handles single quotes. Characters like$,`,", and newlines can still be interpreted by PowerShell, leading to command injection or unexpected behavior.🛡️ Safer approach: use stdin piping instead of embedding in command string
async function writeClipboardViaPbcopy(text: string): Promise<void> { if (process.platform === 'win32') { - const { code } = await execFileNoThrow('powershell', ['-NoProfile', '-Command', `Set-Clipboard -Value '${text.replace(/'/g, "''")}'`], { + const { code } = await execFileNoThrow('powershell', ['-NoProfile', '-Command', 'Set-Clipboard -Value ([Console]::In.ReadToEnd())'], { + input: text, useCwd: false, }) if (code !== 0) { throw new Error(`PowerShell Set-Clipboard exited with code ${code}`) } return }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/executor.ts` around lines 100 - 106, The current execFileNoThrow call injects user text into the PowerShell command using string escaping (text.replace(...)) which can still allow PowerShell metacharacters to be interpreted; instead run PowerShell with a command that reads from stdin and pass the text via the child-process stdin stream. Update the execFileNoThrow invocation in executor.ts (the call that currently uses `Set-Clipboard -Value '${text.replace(/'/g, "''")}'`) to use a safe command like `Set-Clipboard -Value (Get-Content -Raw -)` (or `Set-Clipboard -Value ([Console]::In.ReadToEnd())`) and provide the plain `text` via the execFileNoThrow stdin option, removing the inline escaping logic entirely.
🟠 Major comments (19)
src/utils/computerUse/win32/virtualCursor.ts-190-260 (1)
190-260:⚠️ Potential issue | 🟠 MajorCapture the old cursor state before scheduling async cleanup.
The timeout closes over mutable globals. After
showVirtualCursor()reassigns them, the delayed cleanup can kill the newly spawned cursor; if nothing is reassigned, the old process/files never get cleaned up.Suggested fix
export function hideVirtualCursor(): void { - if (cursorStopFile) { + const proc = cursorProc + const stopFile = cursorStopFile + const scriptFile = cursorScriptFile + const posFile = stopFile ? `${stopFile}.pos` : null + + cursorProc = null + cursorStopFile = null + cursorScriptFile = null + + if (stopFile) { try { - fs.writeFileSync(cursorStopFile, 'STOP', 'utf-8') + fs.writeFileSync(stopFile, 'STOP', 'utf-8') } catch {} setTimeout(() => { try { - cursorProc?.kill() + proc?.kill() } catch {} try { - if (cursorScriptFile) fs.unlinkSync(cursorScriptFile) + if (scriptFile) fs.unlinkSync(scriptFile) } catch {} try { - if (cursorStopFile) fs.unlinkSync(cursorStopFile) + if (stopFile) fs.unlinkSync(stopFile) + } catch {} + try { + if (posFile) fs.unlinkSync(posFile) } catch {} }, 2000) } - cursorProc = null - cursorStopFile = null - cursorScriptFile = null }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/win32/virtualCursor.ts` around lines 190 - 260, The timeout-based cleanup in hideVirtualCursor closes over mutable globals (cursorProc, cursorScriptFile, cursorStopFile) which can be reassigned by showVirtualCursor and cause wrong process/files to be killed or left behind; fix by capturing the current values into local constants (e.g. const proc = cursorProc, const script = cursorScriptFile, const stop = cursorStopFile) before scheduling setTimeout, clear the module globals immediately (set cursorProc/cursorScriptFile/cursorStopFile = null) and then use the captured locals inside the delayed cleanup to kill the correct process and unlink the correct files. Ensure moveVirtualCursor still checks the global stop variable behavior as intended.src/utils/computerUse/win32/windowBorder.ts-19-66 (1)
19-66:⚠️ Potential issue | 🟠 MajorTrack marked HWNDs so
cleanupAllBorders()can actually clear them.
markBound()/unmarkBound()are stateless, so this cleanup hook has nothing to act on. If any window is still marked when this path runs, the green border is left behind.Suggested fix
+const markedWindows = new Set<string>() + /** * Set green border on bound window via DWM. */ export function markBound(hwnd: string): boolean { hwnd = validateHwnd(hwnd) // DWMWA_BORDER_COLOR = 34, COLORREF = 0x00BBGGRR // Green: R=0, G=200, B=0 → 0x0000C800 const hr = ps( @@ [CuDwm]::DwmSetWindowAttribute([IntPtr]::new([long]${hwnd}), 34, [ref]$color, 4)`, ) + if (hr === '0') markedWindows.add(hwnd) return hr === '0' } @@ export function unmarkBound(hwnd: string): boolean { hwnd = validateHwnd(hwnd) // DWMWA_COLOR_DEFAULT = 0xFFFFFFFF const hr = ps( @@ [CuDwm]::DwmSetWindowAttribute([IntPtr]::new([long]${hwnd}), 34, [ref]$color, 4)`, ) + if (hr === '0') markedWindows.delete(hwnd) return hr === '0' } @@ export function cleanupAllBorders(): void { - // DWM border color is a window attribute — it resets automatically - // when the process exits or the window closes. No cleanup needed. + for (const hwnd of markedWindows) { + try { + unmarkBound(hwnd) + } catch {} + } + markedWindows.clear() }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/win32/windowBorder.ts` around lines 19 - 66, The cleanupAllBorders function is a no-op because markBound/unmarkBound are stateless; add a module-level Set (e.g., markedHwnds) to track successfully marked HWNDs in markBound (only add when DwmSetWindowAttribute succeeds) and remove from that Set in unmarkBound when unmarking succeeds, then implement cleanupAllBorders to iterate over markedHwnds and call the same unmark logic (or call unmarkBound) for each HWND and clear the Set; keep using validateHwnd in both markBound and unmarkBound and ensure cleanupAllBorders handles failures gracefully (log or ignore) and clears entries that were reset.src/utils/computerUse/win32/shared.ts-40-47 (1)
40-47:⚠️ Potential issue | 🟠 MajorConsume
stderrand fail on non-zero exits inpsAsync().The function should check the exit code and report errors. Additionally, while Bun's
Bun.spawndoes not document blocking behavior on unconsumed pipes (unlike Node.js), consuming both stdout and stderr follows the recommended pattern and ensures error messages aren't silently lost.Suggested fix
export async function psAsync(script: string): Promise<string> { const proc = Bun.spawn( ['powershell', '-NoProfile', '-NonInteractive', '-Command', script], { stdout: 'pipe', stderr: 'pipe' }, ) - const out = await new Response(proc.stdout).text() - await proc.exited + const [out, err, exitCode] = await Promise.all([ + new Response(proc.stdout).text(), + new Response(proc.stderr).text(), + proc.exited, + ]) + if (exitCode !== 0) { + throw new Error(err.trim() || `PowerShell exited with code ${exitCode}`) + } return out.trim() }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/win32/shared.ts` around lines 40 - 47, psAsync currently only reads stdout and doesn't check the child exit code or consume stderr; update psAsync to concurrently read both proc.stdout and proc.stderr (e.g., via new Response(proc.stdout).text() and new Response(proc.stderr).text()), await proc.exited to get the exit status, and if the exit code is non-zero throw or return an error that includes stderr (and optionally stdout) so errors are not silently lost; reference the psAsync function, the proc variable from Bun.spawn, and the proc.exited promise when implementing this change.src/utils/computerUse/win32/bridgeClient.ts-176-186 (1)
176-186:⚠️ Potential issue | 🟠 Major
stopBridge()leaves pending promises hanging.Calling
pendingRequests.clear()without rejecting pending promises means callers ofcall()will never resolve/reject, causing memory leaks and potential deadlocks.🐛 Proposed fix: reject pending requests before clearing
export function stopBridge(): void { if (bridgeProc) { try { bridgeProc.stdin.end() bridgeProc.kill() } catch {} bridgeProc = null } + // Reject all pending requests to prevent hangs + for (const [id, { reject }] of pendingRequests) { + reject(new Error('Bridge stopped')) + } pendingRequests.clear() outputBuffer = '' }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/win32/bridgeClient.ts` around lines 176 - 186, stopBridge() currently clears pendingRequests without rejecting them, leaving callers of call() hanging; update stopBridge() to iterate pendingRequests (the map/collection used by call()), reject each pending promise with a clear Error (e.g., "bridge stopped" or include bridgeProc state), then clear the collection and reset outputBuffer and bridgeProc as before; reference the stopBridge(), pendingRequests, and call() symbols so you reject each pendingRequests entry before calling pendingRequests.clear().src/utils/computerUse/platforms/win32.ts-637-711 (1)
637-711:⚠️ Potential issue | 🟠 MajorSnapshot visible HWNDs before
Start-Process.Lines 637-676 can start the target app before Line 681 captures
beforeHwnds. Fast-starting apps can create their window in that gap, so it lands in the baseline and the later “new window” loop never finds it, falling through toHWND_NOT_FOUND.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/platforms/win32.ts` around lines 637 - 711, The baseline of visible windows ($beforeHwnds via [CuLaunch]::GetAllVisibleHwnds) is taken after Start-Process calls so fast apps can create windows that end up in the baseline; move the snapshot so it occurs before any Start-Process invocation (i.e. capture $beforeHwnds once just before the first attempt to launch the target and reuse that baseline for the subsequent new-window detection loop that uses [CuLaunch]::GetAllVisibleHwnds(), $proc, and GetPidForHwnd).src/utils/computerUse/platforms/types.ts-26-31 (1)
26-31:⚠️ Potential issue | 🟠 MajorAdd virtual-screen origin to
DisplayInfo.
executorCrossPlatform.tscurrently has to synthesizeoriginX/originYas0because this type only exposes width/height/scale. That makes screenshot-to-click mapping wrong on any non-primary monitor whose origin is not(0, 0).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/platforms/types.ts` around lines 26 - 31, The DisplayInfo interface is missing virtual-screen origin fields, causing executorCrossPlatform.ts to synthesize originX/originY as 0 and break screenshot-to-click mapping on non-primary monitors; update DisplayInfo to include originX: number and originY: number (or similarly named numeric properties) and then update any consumers (notably executorCrossPlatform.ts) to read these fields instead of assuming 0, ensuring callers constructing DisplayInfo provide the correct monitor origin values.packages/@ant/computer-use-mcp/src/toolCalls.ts-3890-3895 (1)
3890-3895:⚠️ Potential issue | 🟠 Major
rightscroll is routed as a negative wheel delta.Line 3893 makes every direction except
"up"negative. The Windows wheel helpers interpret positive horizontal deltas as right, so bound-windowscroll_direction="right"currently scrolls left.Suggested fix
if (coord) return handleMouseWheel(adapter, { coordinate: coord, - delta: a.scroll_direction === "up" ? (a.scroll_amount ?? 3) : -(a.scroll_amount ?? 3), + delta: + a.scroll_direction === "up" || a.scroll_direction === "right" + ? (a.scroll_amount ?? 3) + : -(a.scroll_amount ?? 3), direction: (a.scroll_direction === "left" || a.scroll_direction === "right") ? "horizontal" : "vertical", });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/`@ant/computer-use-mcp/src/toolCalls.ts around lines 3890 - 3895, In the "scroll" case the delta sign is inverted for horizontal right: change the delta calculation in the case "scroll" that calls handleMouseWheel so it determines orientation from a.scroll_direction and computes sign = +1 when scrolling up or right and -1 when scrolling down or left, then sets delta = sign * (a.scroll_amount ?? 3); keep direction = (a.scroll_direction === "left" || a.scroll_direction === "right") ? "horizontal" : "vertical". This fixes handleMouseWheel being called with the wrong sign for right/left while preserving existing defaults.src/utils/computerUse/executorCrossPlatform.ts-717-735 (1)
717-735:⚠️ Potential issue | 🟠 Major
shownever recreates a hidden indicator.After
hideIndicator()clears the backing process/files, Line 726 only callsupdateIndicator(message), which is a no-op oncemsgFileis gone. The tool reports success, but nothing reappears until the window is rebound.Suggested fix
const ind = require('./win32/inputIndicator.js') as typeof import('./win32/inputIndicator.js') if (action === 'show' && message) { + const hwnd = getBoundHwndStr() + if (!hwnd || !ind.showIndicator(hwnd)) { + return { active: false } + } ind.updateIndicator(message) return { active: true, message } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/executorCrossPlatform.ts` around lines 717 - 735, The show branch never recreates the indicator after hideIndicator() removed its backing files; updateIndicator() becomes a no-op. In statusIndicator, when action === 'show' and message is provided, ensure you recreate/initialize the indicator before calling updateIndicator — call the module's creation/init function (e.g., ind.createIndicator() or ind.initIndicator(), whichever exists) or otherwise reinitialize the indicator object after requiring './win32/inputIndicator.js', then call ind.updateIndicator(message); keep the hide and status behavior using ind.hideIndicator() and isBound() unchanged.src/utils/computerUse/win32/bridge.py-456-470 (1)
456-470:⚠️ Potential issue | 🟠 MajorPack wheel messages as Win32 words, not raw pointer shifts.
Lines 467–469 pack
WPARAM/LPARAMwith raw shifts intoctypes.c_void_p. When bound windows are offscreen at negative coordinates (e.g.,-32000,-32000),ClientToScreen()returns negative screen coordinates here. Python's integer shifts on negative values sign-extend the bits; when cast toctypes.c_void_p, this produces incorrectWM_MOUSEWHEELpayloads. Maskdeltaand both coordinate axes to 16-bit values before packing:wparam = ctypes.c_void_p((wheel_delta & 0xFFFF) << 16) lparam = ctypes.c_void_p(((pt.y & 0xFFFF) << 16) | (pt.x & 0xFFFF))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/win32/bridge.py` around lines 456 - 470, In send_mouse_wheel(), WPARAM/LPARAM are packed using raw shifts which misbehaves for negative screen coordinates; change the packing to mask values into 16-bit words before shifting so sign-extended negatives don't corrupt the payload: mask wheel_delta with 0xFFFF when placing it into the high word for wparam, and mask both pt.x and pt.y with 0xFFFF when building lparam (use these masked values when creating the ctypes pointer arguments for wparam and lparam) so WM_MOUSEWHEEL/WM_MOUSEHWHEEL receives correct 16-bit word-packed data.packages/@ant/computer-use-mcp/src/toolCalls.ts-2156-2157 (1)
2156-2157:⚠️ Potential issue | 🟠 MajorAtomic screenshots never include
accessibilityText.This branch rebuilds
shotfromresultwithout copyingresult.accessibilityText, so Line 2157 is always false whenresolvePrepareCapture()is used. Bound-window screenshots taken through the atomic path will never include the GUI tree.Suggested fix
const shot: ScreenshotResult = { base64: result.base64, width: result.width, height: result.height, displayWidth: result.displayWidth, displayHeight: result.displayHeight, displayId: result.displayId, originX: result.originX, originY: result.originY, + accessibilityText: result.accessibilityText, };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/`@ant/computer-use-mcp/src/toolCalls.ts around lines 2156 - 2157, The atomic capture path rebuilds the local "shot" object from "result" but never copies result.accessibilityText, so the condition checking shot.accessibilityText (used when building the accessibility "text" node) is always false; update the code that constructs/assigns "shot" in the atomic path (where resolvePrepareCapture is used) to include result.accessibilityText (e.g., set shot.accessibilityText = result.accessibilityText or include it in the spread) so bound-window/atomic screenshots include the GUI tree.src/utils/computerUse/win32/inputIndicator.ts-193-216 (1)
193-216:⚠️ Potential issue | 🟠 MajorCapture the cleanup handles before nulling the globals.
Line 198 schedules cleanup against
indicatorProc,scriptFile,stopFile, andmsgFile, but Lines 213-216 clear those globals immediately. The timeout then either becomes a no-op or, ifshowIndicator()runs again before the 2s delay expires, kills/deletes the newer indicator instance instead of the old one.Suggested fix
export function hideIndicator(): void { - if (stopFile) { + const proc = indicatorProc + const sf = stopFile + const script = scriptFile + const msg = msgFile + + if (sf) { try { - fs.writeFileSync(stopFile, 'STOP', 'utf-8') + fs.writeFileSync(sf, 'STOP', 'utf-8') } catch {} setTimeout(() => { try { - indicatorProc?.kill() + proc?.kill() } catch {} try { - if (scriptFile) fs.unlinkSync(scriptFile) + if (script) fs.unlinkSync(script) } catch {} try { - if (stopFile) fs.unlinkSync(stopFile) + if (sf) fs.unlinkSync(sf) } catch {} try { - if (msgFile) fs.unlinkSync(msgFile) + if (msg) fs.unlinkSync(msg) } catch {} }, 2000) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/win32/inputIndicator.ts` around lines 193 - 216, In hideIndicator(), capture the current globals (indicatorProc, scriptFile, stopFile, msgFile) into local constants before scheduling the setTimeout so the delayed cleanup operates on the original handles; then use those locals inside the setTimeout callback and only afterwards set the globals (indicatorProc, scriptFile, stopFile, msgFile) to null, ensuring the timeout does not affect a newly started indicator or become a no-op.src/utils/computerUse/win32/comWord.ts-238-245 (1)
238-245:⚠️ Potential issue | 🟠 Major
paraIndexis off by one for callers usingopenWord().paragraphs.The public API exposes paragraphs as a zero-based JS array, but this forwards
paraIndexstraight into the COM collection. Passing the first array index (0) will fail, and every later index inserts before the next paragraph instead of the requested one.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/win32/comWord.ts` around lines 238 - 245, The insertText function uses paraIndex (from openWord().paragraphs zero-based array) directly in the COM call $doc.Paragraphs.Item(...), causing off-by-one errors; update the COM index to paraIndex + 1 when constructing the body string (i.e., call $doc.Paragraphs.Item(paraIndex + 1)) and add a guard to validate paraIndex is >= 0 (or throw/return false) before calling psEscape and building the command so callers using openWord().paragraphs map correctly to the COM collection.src/utils/computerUse/win32/comWord.ts-43-49 (1)
43-49:⚠️ Potential issue | 🟠 MajorDon’t trim raw document text on the transport path.
runPs()strips all leading/trailing whitespace, soreadText()loses real content such as leading spaces and Word’s trailing paragraph newline. Only trim JSON/status responses; preserve stdout for text reads.🔧 Suggested change
-function runPs(script: string): string { +function runPs(script: string, opts?: { trim?: boolean }): string { const result = Bun.spawnSync({ cmd: ['powershell', '-NoProfile', '-NonInteractive', '-Command', script], stdout: 'pipe', stderr: 'pipe', }) - return new TextDecoder().decode(result.stdout).trim() + const stdout = new TextDecoder().decode(result.stdout) + return opts?.trim === false ? stdout : stdout.trim() } @@ export async function readText(filePath: string): Promise<string> { const script = wrapWordScript( `Write-Output $doc.Content.Text`, filePath, ) - return runPs(script) + return runPs(script, { trim: false }) }Also applies to: 194-199
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/win32/comWord.ts` around lines 43 - 49, runPs currently calls .trim() on the spawned PowerShell stdout which strips meaningful leading/trailing whitespace (used by readText); remove the .trim() so runPs returns the raw stdout (i.e., return new TextDecoder().decode(result.stdout)) and move any trimming to the callers that expect JSON/status responses (e.g., where you parse/handle status or JSON results—apply .trim() there). Update the other occurrence referenced (lines ~194-199) similarly so only JSON/status parsing paths trim, while readText and raw text paths preserve whitespace.src/utils/computerUse/win32/comWord.ts-255-260 (1)
255-260:⚠️ Potential issue | 🟠 MajorReject empty
findvalues before entering the replace flow.An empty search string can match zero-width positions, which makes the counting loop and replace-all path behave unpredictably. Guard this at the TypeScript boundary before constructing the script.
🛡️ Suggested change
export async function findReplace( filePath: string, find: string, replace: string, replaceAll?: boolean, ): Promise<number> { + if (find.length === 0) return 0 + // wdReplaceAll=2, wdReplaceOne=1 const replaceConst = replaceAll !== false ? 2 : 1Also applies to: 281-289
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/win32/comWord.ts` around lines 255 - 260, Reject empty search strings at the TypeScript boundary in findReplace by validating the find parameter before any script construction or replace logic: if find is empty or only whitespace, throw an Error or return a rejected Promise (with a clear message) to prevent zero-width matches and unpredictable counting/replaceAll behavior; apply the same guard to the related replace routine referenced around the 281-289 region so both functions validate find early.src/utils/computerUse/win32/windowMessage.ts-133-135 (1)
133-135:⚠️ Potential issue | 🟠 MajorDon't cache a miss permanently.
Caching
nullhere means a bind that happens before the real input child exists will never re-probe, even after the control appears. On freshly launched WinUI/UWP/Terminal windows,resolveInputHwnd()can stay pinned to the wrong HWND until manual unbind.♻️ Suggested change
-const editChildCache = new Map<string, string | null>() +const editChildCache = new Map<string, string>() @@ - if (editChildCache.has(parentHwnd)) { - return editChildCache.get(parentHwnd)! - } + const cachedChild = editChildCache.get(parentHwnd) + if (cachedChild) { + return cachedChild + } @@ - editChildCache.set(parentHwnd, null) return nullAlso applies to: 217-218
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/win32/windowMessage.ts` around lines 133 - 135, The cache logic in editChildCache (used by resolveInputHwnd / the parentHwnd lookup) is storing negative results (null/undefined) and thus permanently preventing later re-probing when the child control appears; change the behavior so that you only write to editChildCache when a non-null HWND is found (i.e., cache hits only), and do not persist misses—alternatively implement a short TTL or remove the cached entry when a null is returned; apply the same fix to the analogous caching at the other location referenced (the lines around 217-218).src/utils/computerUse/win32/windowMessage.ts-324-330 (1)
324-330:⚠️ Potential issue | 🟠 MajorKeep the long-text fallback in the hot path.
This always inlines one
WM_CHARcall per codepoint into a single script, so largertyperequests get disproportionately slow and eventually fail once the generated payload gets too large.CLIPBOARD_THRESHOLDandpasteViaClipboard()already exist for this case but are bypassed here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/win32/windowMessage.ts` around lines 324 - 330, sendText always inlines a WM_CHAR per codepoint which causes huge scripts and failures; change sendText to keep the long-text clipboard fallback by checking text size (or generated script length) against CLIPBOARD_THRESHOLD and, when exceeded, call pasteViaClipboard(targetHwnd, text) instead of building/invoking buildWmCharLines/runPs; for small texts continue to use buildWmCharLines, WINMSG_TYPE and runPs as before so the hot path remains fast.src/utils/computerUse/win32/windowMessage.ts-591-595 (1)
591-595:⚠️ Potential issue | 🟠 MajorTranslate
(x, y)before re-targeting mouse input to a child HWND.These helpers document
(x, y)as coordinates relative to the bound window, then callresolveInputHwnd()and reuse the same coordinates for the child window. If the child is inset inside the top-level client area, clicks/drags/wheels land in the wrong place becauselParamand screen-coordinate conversion are now using the wrong origin.Also applies to: 609-625, 635-640, 663-694
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/win32/windowMessage.ts` around lines 591 - 595, The (x,y) coords must be converted from the original bound window's client origin to the target child HWND's client origin before re-targeting input: compute the point in screen coordinates from the original hwnd (ClientToScreen equivalent) then, after resolveInputHwnd(hwnd) returns the child HWND, convert that screen point into the child HWND's client coords (ScreenToClient equivalent) and use those translated x/y when building lParam and any screen-coordinate conversions; update all helpers that call resolveInputHwnd() and use x/y (the blocks around the shown hwnd/downMsg/upMsg and the other ranges 609-625, 635-640, 663-694) to perform this two-step translation so clicks/drags/wheels land correctly in inset child windows.src/utils/computerUse/win32/windowMessage.ts-339-350 (1)
339-350:⚠️ Potential issue | 🟠 MajorRoute single-key events through
resolveInputHwnd()too.
sendText()andsendKeys()already retarget to the real edit/input child, butsendKey()still posts to the original HWND. On WinUI 3/UWP/edit-child windows that makes press/release flows diverge from combo/type behavior and miss the actual input surface.🔧 Suggested change
export function sendKey( hwnd: string, vk: number, action: 'down' | 'up', ): boolean { - hwnd = validateHwnd(hwnd) + hwnd = resolveInputHwnd(hwnd) const msg = action === 'down' ? '0x0100' : '0x0101' const lParamFn = action === 'down' ? 'KeyDownLParam' : 'KeyUpLParam'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/win32/windowMessage.ts` around lines 339 - 350, sendKey currently posts to the original HWND (it calls validateHwnd but not resolveInputHwnd), which diverges from sendText/sendKeys that retarget to the real edit/input child; update sendKey to call resolveInputHwnd(hwnd) after validateHwnd and use the resolved hwnd when composing the PostMessage script so key down/up events are routed to the actual input surface (ensure resolveInputHwnd is imported/available and preserve existing vk/action handling).src/utils/computerUse/win32/windowMessage.ts-73-80 (1)
73-80:⚠️ Potential issue | 🟠 MajorSet the extended-key flag (bit 24) in lParam for extended virtual keys.
The
KeyDownLParamandKeyUpLParamhelpers construct lParam by setting bit 0 (repeat count) and bits 16-23 (scan code) but leave bit 24 unset for all virtual keys. Per Microsoft's WM_KEYDOWN/WM_KEYUP documentation, bit 24 must be set for extended keys: arrow keys, Home, End, Delete, Insert, Page Up, Page Down, right Ctrl, and right Alt. Without this flag, these keys are posted as ordinary keys, breaking prompt navigation and terminal shortcuts in applications that distinguish extended keys.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/win32/windowMessage.ts` around lines 73 - 80, The helpers KeyDownLParam and KeyUpLParam must set the extended-key flag (bit 24) for extended virtual keys; update these functions to detect when the provided vk is an extended key (e.g., arrow keys, Home, End, Delete, Insert, PageUp, PageDown, RightCtrl, RightAlt) and include (1u << 24) in the constructed lParam when true, otherwise leave it unset; keep the existing MapVirtualKeyW(scanCode) logic and existing bits for repeat, scan code, previous state and transition in KeyDownLParam/KeyUpLParam but OR in the extended flag for the listed VKs (or use a small helper IsExtendedKey(vk) called from both functions) so extended keys have bit 24 set.
🟡 Minor comments (12)
.gitignore-18-20 (1)
18-20:⚠️ Potential issue | 🟡 MinorAlign BMP ignore pattern with the “root only” intent.
Line 20 currently ignores
*.bmprepo-wide, which conflicts with the comment and may hide tracked assets in subfolders. Use/*.bmpif root-only is intended.Proposed fix
# Binary / screenshot files (root only) /*.png -*.bmp +/*.bmp🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.gitignore around lines 18 - 20, The .gitignore currently has a root-only PNG rule ('/*.png') but a repo-wide BMP rule ('*.bmp'); change the BMP pattern to root-only by replacing '*.bmp' with '/*.bmp' so it matches the comment intent and aligns with the existing '/*.png' rule.DEV-LOG.md-42-42 (1)
42-42:⚠️ Potential issue | 🟡 MinorRemove blank line for formatting consistency.
This blank line between the section header (line 40) and the content (line 43) is inconsistent with the formatting pattern used throughout the rest of the file. All other sections have the content immediately following the
##header without a blank line (e.g., lines 3→5, 28→30, 68→70, 111→113).📝 Proposed fix
## Computer Use Windows 增强:窗口绑定截图 + UI Automation + OCR (2026-04-03) - 在三平台基础实现之上,利用 Windows 原生 API 增强 Computer Use 的 Windows 专属能力。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@DEV-LOG.md` at line 42, Remove the single blank line immediately following the affected "##" section header so the content follows the header on the next line (matching the other sections). Locate the specific "##" header noted in the comment and delete the empty line that separates it from its content to ensure consistent formatting.src/utils/computerUse/win32/uiAutomation.ts-20-60 (1)
20-60:⚠️ Potential issue | 🟡 MinorPreserve case-insensitive
controlTypematching.This new allowlist is case-sensitive, so inputs like
buttonormenuitemnow returnnulleven though they are only casing variants of valid control types. Canonicalize before lookup instead of rejecting on casing alone.Suggested fix
const VALID_CONTROL_TYPES = new Set([ 'Button', 'Calendar', 'CheckBox', 'ComboBox', 'Custom', 'DataGrid', 'DataItem', 'Document', 'Edit', 'Group', 'Header', 'HeaderItem', 'Hyperlink', 'Image', 'List', 'ListItem', 'Menu', 'MenuBar', 'MenuItem', 'Pane', 'ProgressBar', 'RadioButton', 'ScrollBar', 'Separator', 'Slider', 'Spinner', 'SplitButton', 'StatusBar', 'Tab', 'TabItem', 'Table', 'Text', 'Thumb', 'TitleBar', 'ToolBar', 'ToolTip', 'Tree', 'TreeItem', 'Window', ]) + +const CONTROL_TYPES_BY_LOWER = new Map( + [...VALID_CONTROL_TYPES].map(type => [type.toLowerCase(), type] as const), +)if (query.controlType) { - if (!VALID_CONTROL_TYPES.has(query.controlType)) { + const canonicalType = CONTROL_TYPES_BY_LOWER.get( + query.controlType.trim().toLowerCase(), + ) + if (!canonicalType) { return null // Invalid control type } - const v = query.controlType.replace(/'/g, "''") + const v = canonicalType.replace(/'/g, "''") conditions.push( `[System.Windows.Automation.PropertyCondition]::new([System.Windows.Automation.AutomationElement]::ControlTypeProperty, [System.Windows.Automation.ControlType]::${v})`, ) }Also applies to: 180-184
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/win32/uiAutomation.ts` around lines 20 - 60, The allowlist check is currently case-sensitive (VALID_CONTROL_TYPES) which rejects valid inputs like "button" or "menuitem"; normalize controlType before lookup by trimming and lowercasing it and compare against a derived lowercase set (e.g., build VALID_CONTROL_TYPES_LOWER from VALID_CONTROL_TYPES.map(s => s.toLowerCase())). Update every place where controlType is validated (references near the existing VALID_CONTROL_TYPES usage and the other occurrences around lines ~180-184) to use the normalized value (controlType?.trim().toLowerCase()) and lookup against VALID_CONTROL_TYPES_LOWER instead of the original set.src/utils/computerUse/win32/shared.ts-68-70 (1)
68-70:⚠️ Potential issue | 🟡 MinorKeep modifier aliases in sync with
VK_MAP.
VK_MAPacceptsmenuas Alt, butMODIFIER_KEYSdoesn't, so combos using that alias will be treated as a normal key instead of a held modifier.Suggested fix
export const MODIFIER_KEYS = new Set([ 'shift', 'lshift', 'rshift', 'control', 'ctrl', 'lcontrol', 'rcontrol', 'alt', + 'menu', 'option', 'lalt', 'ralt', 'win', 'meta',Also applies to: 110-127
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/win32/shared.ts` around lines 68 - 70, The MODIFIER_KEYS mapping is missing the 'menu' alias that VK_MAP treats as Alt, causing 'menu' combos to be treated as normal keys; update MODIFIER_KEYS in shared.ts to include the same alias entries present in VK_MAP (e.g., add menu: 0x12 alongside alt/option) and mirror any other missing aliases between the MODIFIER_KEYS block and VK_MAP (see the MODIFIER_KEYS and VK_MAP symbols and the repeated alias ranges around the 68–70 and 110–127 areas) so modifier aliases stay in sync.src/utils/computerUse/win32/accessibilitySnapshot.ts-170-196 (1)
170-196:⚠️ Potential issue | 🟡 MinorEscape control characters before formatting snapshot text.
Accessible names and values can contain quotes or newlines. Writing them verbatim here can split one node into multiple prompt lines and corrupt the model-facing format.
Suggested fix
function formatForModel( nodes: AccessibilityNode[], indent: number = 0, ): string { @@ for (const node of nodes) { let line = `${pad}[${node.role}]` - if (node.name) line += ` "${truncate(node.name, 40)}"` + if (node.name) line += ` "${escapeForModel(node.name, 40)}"` line += ` (${node.bounds.x},${node.bounds.y} ${node.bounds.w}x${node.bounds.h})` if (!node.enabled) line += ' DISABLED' - if (node.value) line += ` value="${truncate(node.value, 30)}"` - if (node.automationId) line += ` id=${node.automationId}` + if (node.value) line += ` value="${escapeForModel(node.value, 30)}"` + if (node.automationId) + line += ` id="${escapeForModel(node.automationId, 40)}"` lines.push(line) @@ } +function escapeForModel(s: string, max: number): string { + return truncate(s.replace(/\r?\n/g, ' ').replace(/"/g, '\\"'), max) +} + function truncate(s: string, max: number): string { return s.length > max ? s.slice(0, max - 1) + '…' : s }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/win32/accessibilitySnapshot.ts` around lines 170 - 196, The formatted snapshot can be corrupted by unescaped control characters (quotes, newlines) in node.name and node.value; update formatForModel to escape control characters before embedding and truncate the escaped string rather than the raw string. Add or reuse a helper (e.g., escapeForSnapshot or extend truncate) and call it when inserting node.name and node.value in formatForModel so quotes become \" and newlines become \n (and other non-printables are escaped), then apply the max-length truncation to the escaped result to keep output lines intact for the model.src/utils/computerUse/win32/virtualCursor.ts-30-31 (1)
30-31:⚠️ Potential issue | 🟡 MinorEscape apostrophes in the embedded stop-file path.
$stopFileis inserted into a single-quoted PowerShell string (line 79), so paths containing'will break the generated script. In PowerShell single-quoted strings, apostrophes must be escaped by doubling them ('').Suggested fix
function buildCursorScript(hwnd: string, stopFile: string): string { - const stopFileEscaped = stopFile.replace(/\\/g, '\\\\') + const stopFileEscaped = stopFile + .replace(/'/g, "''") + .replace(/\\/g, '\\\\') return `🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/win32/virtualCursor.ts` around lines 30 - 31, The code in buildCursorScript currently only escapes backslashes for stopFile (stopFileEscaped) but fails to escape single quotes, which will break the PowerShell single-quoted string; update buildCursorScript so after replacing backslashes you also replace any single quote characters in stopFile with two single quotes (i.e., stopFileEscaped = stopFileEscaped.replace(/'/g, "''")) before embedding it into the generated script to ensure PowerShell-safe single-quoted strings.src/utils/computerUse/win32/comExcel.ts-32-34 (1)
32-34:⚠️ Potential issue | 🟡 MinorError handling may silently swallow non-zero exit codes.
When
exitCode !== 0butstderris empty, no error is thrown and the function returns potentially invalid/empty stdout. This could lead to silent failures.🛡️ Proposed fix
- if (result.exitCode !== 0 && stderr) { - throw new Error(`PowerShell error: ${stderr}`) + if (result.exitCode !== 0) { + throw new Error(`PowerShell error (exit ${result.exitCode}): ${stderr || 'no stderr'}`) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/win32/comExcel.ts` around lines 32 - 34, The current check only throws when exitCode !== 0 AND stderr is present, which lets non-zero exit codes slip through; update the error handling around the PowerShell execution in comExcel.ts to throw whenever result.exitCode !== 0, even if stderr is empty—include useful context (exitCode, stderr, and stdout) in the Error message so callers can diagnose failures; look for the block using result.exitCode, stderr, and stdout and change the condition to always throw on non-zero exitCode and format the error to contain exitCode, stderr (if any), and stdout.src/utils/computerUse/win32/appDispatcher.ts-102-104 (1)
102-104:⚠️ Potential issue | 🟡 MinorPath fallback
/tmpdoesn't exist on Windows.The fallback
'/tmp'combined with\\cu_new_...creates an invalid path like/tmp\cu_new_xxx.xlsxon Windows. Use a Windows-appropriate fallback.🐛 Fix path handling
- const tmpPath = `${process.env.TEMP || '/tmp'}\\cu_new_${Date.now()}.xlsx` + const tmpDir = process.env.TEMP || process.env.TMP || 'C:\\Windows\\Temp' + const tmpPath = `${tmpDir}\\cu_new_${Date.now()}.xlsx`🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/win32/appDispatcher.ts` around lines 102 - 104, The code builds tmpPath using process.env.TEMP with a hardcoded '/tmp' fallback which produces invalid mixed separators on Windows; change tmp path creation in appDispatcher.ts to use the OS tmp directory (os.tmpdir() or process.env.TEMP/PROCESS.ENV.TMP when present) and build the filename with path.join/path.resolve to ensure correct separators, then pass that resulting tmpPath to createExcel and return it as before (refer to the tmpPath variable and createExcel call).src/utils/computerUse/win32/appDispatcher.ts-92-99 (1)
92-99:⚠️ Potential issue | 🟡 MinorError silently swallowed when opening existing Excel file.
The catch block returns the same result as success, making it impossible for callers to know if the file was actually opened. Consider propagating the error or returning a status flag.
💡 Consider adding success indicator
export interface OpenResult { type: AppType /** HWND for text/browser/generic apps (SendMessage target) */ hwnd?: string /** File path for COM-controlled apps (Excel/Word) */ filePath?: string + /** Whether the open operation succeeded */ + success?: boolean }Then set
success: trueon successful open,success: falsein catch.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/win32/appDispatcher.ts` around lines 92 - 99, The current branch handling existing files in the isExisting block hides errors because the catch returns the same result as success; update the logic in the isExisting branch (where openExcel(nameOrPath) is called) to either rethrow the caught error or return an object that includes a success flag (e.g., { type: 'excel', filePath: nameOrPath, success: false, error: err }) on failure and { type: 'excel', filePath: nameOrPath, success: true } on success so callers can distinguish open failures; refer to the openExcel call and the surrounding isExisting handling to implement this change.src/utils/computerUse/platforms/linux.ts-153-154 (1)
153-154:⚠️ Potential issue | 🟡 MinorHardcoded temp file paths may cause race conditions.
If multiple instances of the app run concurrently, they'll overwrite each other's temp files. Consider using unique filenames.
💡 Use unique temp files
-const SCREENSHOT_TMP = '/tmp/cu-screenshot-tmp.png' -const SCREENSHOT_JPG = '/tmp/cu-screenshot.jpg' +function getTempPath(ext: string): string { + return `/tmp/cu-screenshot-${process.pid}-${Date.now()}${ext}` +}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/platforms/linux.ts` around lines 153 - 154, The constants SCREENSHOT_TMP and SCREENSHOT_JPG use hardcoded paths which can collide across concurrent runs; update the code in linux.ts to generate unique temp filenames (for example using os.tmpdir()/fs.mkdtemp or appending process.pid + timestamp/UUID) and replace usages of SCREENSHOT_TMP/SCREENSHOT_JPG with the generated path variables so each invocation writes to its own temp file and cleans it up after use; ensure functions that call these constants (e.g., the screenshot capture/convert routines) accept or derive the unique path rather than relying on the hardcoded constants.src/utils/computerUse/win32/appDispatcher.ts-51-57 (1)
51-57:⚠️ Potential issue | 🟡 MinorOverly broad app name matching may cause false positives.
baseName.includes('word')matches "password", "keyword", "wordpad", etc. Consider exact match or starts-with check.💡 Tighter matching
- if ( - baseName === 'winword' || - baseName === 'word' || - baseName.includes('word') - ) - return 'word' + if (baseName === 'winword' || baseName === 'word' || baseName.startsWith('word')) + return 'word'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/win32/appDispatcher.ts` around lines 51 - 57, The matching in appDispatcher.ts is too broad: baseName.includes('word') will match "password", "keyword", etc.; update the logic in the dispatcher (the block that checks baseName for 'winword'/'word') to use a stricter check—e.g., exact equality (baseName === 'word'), startsWith('word') for prefixes you intend (like 'wordpad' only if desired), or a word-boundary regex to match whole words only—replace the includes('word') condition with one of these tighter checks to avoid false positives.packages/@ant/computer-use-mcp/src/tools.ts-459-477 (1)
459-477:⚠️ Potential issue | 🟡 Minorclick_element schema allows calls with no element selector.
With
required: [], the tool can be invoked withoutname,role, orautomationId, which would make it impossible to locate the target element. Consider requiring at least one selector.💡 Suggested fix
automationId: { type: "string", description: "Exact automationId from the accessibility snapshot. Most precise selector.", }, }, - required: [], + required: [], // Consider: anyOf/oneOf to require at least one of name/role/automationId }, } as Tool,Alternatively, add validation in the handler to return a clear error if no selectors are provided.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/`@ant/computer-use-mcp/src/tools.ts around lines 459 - 477, The inputSchema for the tool (the object typed as Tool with key inputSchema) currently has required: [] which allows calling the tool without any selector (name, role, automationId); update the inputSchema to require at least one selector (e.g., make required include a discriminator or enforce oneOf semantics requiring name or role or automationId) or, if schema-level change is undesirable, add explicit validation in the click handler that inspects the parsed input and returns a clear error when all of name, role, and automationId are missing; reference the inputSchema object and the handler that processes the tool inputs to implement this guard.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 87a62de6-945a-49c3-8950-9201057f4e64
📒 Files selected for processing (36)
.gitignoreDEV-LOG.mdbuild.tsdocs/features/computer-use-architecture-v2.mddocs/features/computer-use-tools-reference.mddocs/features/computer-use.mdpackages/@ant/computer-use-input/src/index.tspackages/@ant/computer-use-mcp/src/executor.tspackages/@ant/computer-use-mcp/src/toolCalls.tspackages/@ant/computer-use-mcp/src/tools.tspackages/@ant/computer-use-swift/src/backends/darwin.tspackages/@ant/computer-use-swift/src/index.tssrc/utils/computerUse/common.tssrc/utils/computerUse/executor.tssrc/utils/computerUse/executorCrossPlatform.tssrc/utils/computerUse/hostAdapter.tssrc/utils/computerUse/platforms/darwin.tssrc/utils/computerUse/platforms/index.tssrc/utils/computerUse/platforms/linux.tssrc/utils/computerUse/platforms/types.tssrc/utils/computerUse/platforms/win32.tssrc/utils/computerUse/swiftLoader.tssrc/utils/computerUse/win32/accessibilitySnapshot.tssrc/utils/computerUse/win32/appDispatcher.tssrc/utils/computerUse/win32/bridge.pysrc/utils/computerUse/win32/bridgeClient.tssrc/utils/computerUse/win32/comExcel.tssrc/utils/computerUse/win32/comWord.tssrc/utils/computerUse/win32/inputIndicator.tssrc/utils/computerUse/win32/ocr.tssrc/utils/computerUse/win32/shared.tssrc/utils/computerUse/win32/uiAutomation.tssrc/utils/computerUse/win32/virtualCursor.tssrc/utils/computerUse/win32/windowBorder.tssrc/utils/computerUse/win32/windowEnum.tssrc/utils/computerUse/win32/windowMessage.ts
| // Windows/Linux: operations go through SendMessage (HWND-bound) or platform | ||
| // abstraction, not global input to the foreground. The frontmost gate is a | ||
| // macOS safety net for global CGEvent input — on other platforms, skip it | ||
| // when the platform's screenshotFiltering is 'none' (no per-app filtering, | ||
| // meaning no hide/defocus, meaning frontmost is meaningless). | ||
| if (adapter.executor.capabilities.screenshotFiltering === 'none') { | ||
| return null; // pass — non-macOS platform, frontmost irrelevant | ||
| } |
There was a problem hiding this comment.
Don't short-circuit the whole gate stack on screenshotFiltering === "none".
Line 442 returns before the allowlist/tier/clipboard logic runs, and Line 577 skips the hit-test entirely. On any executor advertising "none", mouse actions now bypass request_access enforcement instead of just bypassing the macOS-specific frontmost checks. Keep the grant checks, and special-case only the screen-based checks that truly do not apply.
Also applies to: 573-578
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/`@ant/computer-use-mcp/src/toolCalls.ts around lines 437 - 444, The
early return on adapter.executor.capabilities.screenshotFiltering === 'none'
incorrectly short-circuits the entire request_access flow; instead, remove the
return and special-case only the macOS frontmost/screen-based checks so that
allowlist/tier/clipboard grant checks and the hit-test still run. Concretely,
update the logic around request_access and the frontmost-gate to: if
screenshotFiltering === 'none' skip only the platform/frontmost-specific branch
(the code that checks frontmost window or screen hit-test), but always execute
the generic grant checks (allowlist, tier, clipboard) and invoke the
hit-test/target resolution code paths used by request_access so enforcement
still applies for executors advertising 'none'. Ensure references to
adapter.executor.capabilities.screenshotFiltering, request_access, the
allowlist/tier/clipboard checks, and the hit-test/target-resolution are the
places you modify.
| async function handleVirtualMouse( | ||
| adapter: ComputerUseHostAdapter, | ||
| args: Record<string, unknown>, | ||
| ): Promise<CuCallToolResult> { | ||
| if (!adapter.executor.virtualMouse) { | ||
| return errorResult("virtual_mouse is only available on Windows with a bound window.", "feature_unavailable"); | ||
| } | ||
| const action = requireString(args, "action"); | ||
| if (action instanceof Error) return errorResult(action.message, "bad_args"); | ||
| const coord = args.coordinate; | ||
| if (!Array.isArray(coord) || coord.length < 2) { | ||
| return errorResult("coordinate [x, y] is required.", "bad_args"); | ||
| } | ||
| const validActions = new Set(["click", "double_click", "right_click", "move", "drag", "down", "up"]); | ||
| if (!validActions.has(action)) { | ||
| return errorResult(`Invalid action "${action}". Valid: ${[...validActions].join(", ")}`, "bad_args"); | ||
| } | ||
| const startCoord = Array.isArray(args.start_coordinate) ? args.start_coordinate : undefined; | ||
| const ok = await adapter.executor.virtualMouse({ | ||
| action: action as any, | ||
| x: coord[0], y: coord[1], | ||
| startX: startCoord?.[0], startY: startCoord?.[1], | ||
| }); | ||
| if (!ok) { | ||
| return errorResult("No window is currently bound.", "bad_args"); | ||
| } | ||
| const desc: Record<string, string> = { | ||
| click: `Click at (${coord[0]},${coord[1]})`, | ||
| double_click: `Double-click at (${coord[0]},${coord[1]})`, | ||
| right_click: `Right-click at (${coord[0]},${coord[1]})`, | ||
| move: `Moved to (${coord[0]},${coord[1]})`, | ||
| drag: `Dragged ${startCoord ? `(${startCoord[0]},${startCoord[1]})` : "current"} → (${coord[0]},${coord[1]})`, | ||
| down: `Button down at (${coord[0]},${coord[1]})`, | ||
| up: `Button up at (${coord[0]},${coord[1]})`, | ||
| }; | ||
| return okText(desc[action] ?? action); | ||
| } | ||
|
|
||
| async function handleVirtualKeyboard( | ||
| adapter: ComputerUseHostAdapter, | ||
| args: Record<string, unknown>, | ||
| ): Promise<CuCallToolResult> { | ||
| if (!adapter.executor.virtualKeyboard) { | ||
| return errorResult("virtual_keyboard is only available on Windows with a bound window.", "feature_unavailable"); | ||
| } | ||
| const action = requireString(args, "action"); | ||
| if (action instanceof Error) return errorResult(action.message, "bad_args"); | ||
| const text = requireString(args, "text"); | ||
| if (text instanceof Error) return errorResult(text.message, "bad_args"); | ||
|
|
||
| const validActions = new Set(["type", "combo", "press", "release", "hold"]); | ||
| if (!validActions.has(action)) { | ||
| return errorResult(`Invalid action "${action}". Valid: ${[...validActions].join(", ")}`, "bad_args"); | ||
| } | ||
|
|
||
| const duration = typeof args.duration === "number" ? args.duration : undefined; | ||
| const repeat = typeof args.repeat === "number" ? args.repeat : undefined; | ||
|
|
||
| const ok = await adapter.executor.virtualKeyboard({ | ||
| action: action as any, | ||
| text, | ||
| duration, | ||
| repeat, | ||
| }); | ||
|
|
||
| if (!ok) { | ||
| return errorResult("No window is currently bound. Use open_application or bind_window first.", "bad_args"); | ||
| } | ||
|
|
||
| const desc: Record<string, string> = { | ||
| type: `Typed "${text.length > 40 ? text.slice(0, 40) + "..." : text}"`, | ||
| combo: `Sent ${text}`, | ||
| press: `Pressed ${text} (holding)`, | ||
| release: `Released ${text}`, | ||
| hold: `Held ${text} for ${duration ?? 1}s`, | ||
| }; | ||
|
|
||
| return okText(`${desc[action]}${repeat && repeat > 1 ? ` ×${repeat}` : ""}`); | ||
| } | ||
|
|
||
| async function handleStatusIndicator( | ||
| adapter: ComputerUseHostAdapter, | ||
| args: Record<string, unknown>, | ||
| ): Promise<CuCallToolResult> { | ||
| if (!adapter.executor.statusIndicator) { | ||
| return errorResult("status_indicator is only available on Windows.", "feature_unavailable"); | ||
| } | ||
| const action = requireString(args, "action"); | ||
| if (action instanceof Error) return errorResult(action.message, "bad_args"); | ||
| if (!["show", "hide", "status"].includes(action)) { | ||
| return errorResult(`Invalid action "${action}". Valid: show, hide, status.`, "bad_args"); | ||
| } | ||
| const message = typeof args.message === "string" ? args.message : undefined; | ||
| if (action === "show" && !message) { | ||
| return errorResult("'show' requires a message parameter.", "bad_args"); | ||
| } | ||
| const result = await adapter.executor.statusIndicator(action as any, message); | ||
| if (action === "status") { | ||
| return okText(result.active ? "Indicator is active on the bound window." : "Indicator is not active (no window bound)."); | ||
| } | ||
| if (action === "show") { | ||
| return okText(`Indicator showing: "${message}"`); | ||
| } | ||
| return okText("Indicator hidden."); | ||
| } | ||
|
|
||
| async function handleMouseWheel( | ||
| adapter: ComputerUseHostAdapter, | ||
| args: Record<string, unknown>, | ||
| ): Promise<CuCallToolResult> { | ||
| if (!adapter.executor.mouseWheel) { | ||
| return errorResult("mouse_wheel is only available on Windows with a bound window.", "feature_unavailable"); | ||
| } | ||
| const coord = args.coordinate; | ||
| if (!Array.isArray(coord) || coord.length < 2) { | ||
| return errorResult("coordinate must be [x, y] array.", "bad_args"); | ||
| } | ||
| const delta = typeof args.delta === "number" ? args.delta : undefined; | ||
| if (delta === undefined) { | ||
| return errorResult("delta is required (positive=up, negative=down).", "bad_args"); | ||
| } | ||
| const horizontal = args.direction === "horizontal"; | ||
| const ok = await adapter.executor.mouseWheel(coord[0], coord[1], delta, horizontal); | ||
| if (!ok) { | ||
| return errorResult("No window is currently bound. Use open_application or bind_window first.", "bad_args"); | ||
| } | ||
| return okText( | ||
| `Mouse wheel: ${horizontal ? "horizontal" : "vertical"} scroll ${delta > 0 ? "up" : "down"} ${Math.abs(delta)} click(s) at (${coord[0]},${coord[1]}).`, | ||
| ); | ||
| } | ||
|
|
||
| async function handleActivateWindow( | ||
| adapter: ComputerUseHostAdapter, | ||
| args: Record<string, unknown>, | ||
| ): Promise<CuCallToolResult> { | ||
| if (!adapter.executor.activateWindow) { | ||
| return errorResult("activate_window is only available on Windows with a bound window.", "feature_unavailable"); | ||
| } | ||
| const clickX = typeof args.click_x === "number" ? args.click_x : undefined; | ||
| const clickY = typeof args.click_y === "number" ? args.click_y : undefined; | ||
| const ok = await adapter.executor.activateWindow(clickX, clickY); | ||
| if (!ok) { | ||
| return errorResult("No window is currently bound. Use open_application or bind_window first.", "bad_args"); | ||
| } | ||
| return okText("Window activated and focused. Ready for input."); | ||
| } | ||
|
|
||
| async function handlePromptRespond( | ||
| adapter: ComputerUseHostAdapter, | ||
| args: Record<string, unknown>, | ||
| ): Promise<CuCallToolResult> { | ||
| if (!adapter.executor.respondToPrompt) { | ||
| return errorResult("prompt_respond is only available on Windows with a bound window.", "feature_unavailable"); | ||
| } | ||
| const responseType = requireString(args, "response_type"); | ||
| if (responseType instanceof Error) return errorResult(responseType.message, "bad_args"); | ||
|
|
||
| const validTypes = new Set(["yes", "no", "enter", "escape", "select", "type"]); | ||
| if (!validTypes.has(responseType)) { | ||
| return errorResult(`Invalid response_type "${responseType}". Valid: ${[...validTypes].join(", ")}`, "bad_args"); | ||
| } | ||
|
|
||
| if (responseType === "select" && typeof args.arrow_count !== "number") { | ||
| return errorResult("'select' requires arrow_count parameter.", "bad_args"); | ||
| } | ||
| if (responseType === "type" && typeof args.text !== "string") { | ||
| return errorResult("'type' requires text parameter.", "bad_args"); | ||
| } | ||
|
|
||
| const ok = await adapter.executor.respondToPrompt({ | ||
| responseType: responseType as any, | ||
| arrowDirection: typeof args.arrow_direction === "string" ? args.arrow_direction as any : undefined, | ||
| arrowCount: typeof args.arrow_count === "number" ? args.arrow_count : undefined, | ||
| text: typeof args.text === "string" ? args.text : undefined, | ||
| }); | ||
|
|
||
| if (!ok) { | ||
| return errorResult("No window is currently bound. Use open_application or bind_window first.", "bad_args"); | ||
| } | ||
|
|
||
| const descriptions: Record<string, string> = { | ||
| yes: "Sent 'y' + Enter.", | ||
| no: "Sent 'n' + Enter.", | ||
| enter: "Sent Enter.", | ||
| escape: "Sent Escape.", | ||
| select: `Navigated ${args.arrow_direction ?? "down"} ${args.arrow_count ?? 1} time(s) + Enter.`, | ||
| type: `Typed "${args.text}" + Enter.`, | ||
| }; | ||
|
|
||
| return okText(`Prompt responded: ${descriptions[responseType] ?? responseType}. Take a screenshot to verify.`); | ||
| } | ||
|
|
||
| async function handleOpenTerminal( | ||
| adapter: ComputerUseHostAdapter, | ||
| args: Record<string, unknown>, | ||
| ): Promise<CuCallToolResult> { | ||
| if (!adapter.executor.openTerminal) { | ||
| return errorResult("open_terminal is only available on Windows.", "feature_unavailable"); | ||
| } | ||
| const agent = requireString(args, "agent"); | ||
| if (agent instanceof Error) return errorResult(agent.message, "bad_args"); | ||
|
|
||
| const validAgents = new Set(["claude", "codex", "gemini", "custom"]); | ||
| if (!validAgents.has(agent)) { | ||
| return errorResult(`Invalid agent "${agent}". Valid: claude, codex, gemini, custom.`, "bad_args"); | ||
| } | ||
| if (agent === "custom" && typeof args.command !== "string") { | ||
| return errorResult("agent='custom' requires 'command' parameter.", "bad_args"); | ||
| } | ||
|
|
||
| const result = await adapter.executor.openTerminal({ | ||
| agent: agent as any, | ||
| command: typeof args.command === "string" ? args.command : undefined, | ||
| terminal: typeof args.terminal === "string" ? args.terminal as any : undefined, | ||
| workingDirectory: typeof args.working_directory === "string" ? args.working_directory : undefined, | ||
| }); | ||
|
|
||
| if (!result) { | ||
| return errorResult( | ||
| "Failed to open terminal. Windows Terminal (wt.exe) may not be installed.", | ||
| "launch_failed", | ||
| ); | ||
| } | ||
|
|
||
| if (!result.launched) { | ||
| return okText( | ||
| `Terminal opened (hwnd=${result.hwnd}, "${result.title}") but no command was sent. Window is now bound.`, | ||
| ); | ||
| } | ||
|
|
||
| const agentNames: Record<string, string> = { | ||
| claude: "Claude Code", codex: "Codex", gemini: "Gemini", | ||
| custom: args.command as string, | ||
| }; | ||
|
|
||
| return okText( | ||
| `Terminal opened and ${agentNames[agent] ?? agent} launched.\n` + | ||
| `Window: hwnd=${result.hwnd} "${result.title}"\n` + | ||
| `Command: '${agent === "custom" ? args.command : agent}' + Enter\n` + | ||
| `Status: bound to this terminal. Take a screenshot to verify the agent started.`, | ||
| ); | ||
| } | ||
|
|
||
| async function handleBindWindow( | ||
| adapter: ComputerUseHostAdapter, | ||
| args: Record<string, unknown>, | ||
| ): Promise<CuCallToolResult> { | ||
| const action = requireString(args, "action"); | ||
| if (action instanceof Error) return errorResult(action.message, "bad_args"); | ||
|
|
||
| switch (action) { | ||
| case "list": { | ||
| if (!adapter.executor.listVisibleWindows) { | ||
| return errorResult("bind_window is only available on Windows.", "feature_unavailable"); | ||
| } | ||
| const windows = await adapter.executor.listVisibleWindows(); | ||
| if (windows.length === 0) return okText("No visible windows found."); | ||
| const lines = windows.map( | ||
| (w) => `hwnd=${w.hwnd} pid=${w.pid} "${w.title}"`, | ||
| ); | ||
| return okText(`Visible windows (${windows.length}):\n${lines.join("\n")}`); | ||
| } | ||
| case "status": { | ||
| if (!adapter.executor.getBindingStatus) { | ||
| return errorResult("bind_window is only available on Windows.", "feature_unavailable"); | ||
| } | ||
| const status = await adapter.executor.getBindingStatus(); | ||
| if (!status || !status.bound) { | ||
| return okText("No window is currently bound. Use bind_window(action='list') to see available windows, then bind_window(action='bind', title='...') to bind."); | ||
| } | ||
| let text = `Bound to: hwnd=${status.hwnd}`; | ||
| if (status.title) text += ` "${status.title}"`; | ||
| if (status.pid) text += ` pid=${status.pid}`; | ||
| if (status.rect) text += ` rect=(${status.rect.x},${status.rect.y} ${status.rect.width}x${status.rect.height})`; | ||
| return okText(text); | ||
| } | ||
| case "bind": { | ||
| if (!adapter.executor.bindToWindow) { | ||
| return errorResult("bind_window is only available on Windows.", "feature_unavailable"); | ||
| } | ||
| const title = typeof args.title === "string" ? args.title : undefined; | ||
| const hwnd = typeof args.hwnd === "string" ? args.hwnd : undefined; | ||
| const pid = typeof args.pid === "number" ? args.pid : undefined; | ||
| if (!title && !hwnd && !pid) { | ||
| return errorResult("Specify at least one of: title, hwnd, or pid.", "bad_args"); | ||
| } | ||
| const result = await adapter.executor.bindToWindow({ hwnd, title, pid }); | ||
| if (!result) { | ||
| return errorResult( | ||
| `No window found matching: ${[title && `title="${title}"`, hwnd && `hwnd=${hwnd}`, pid && `pid=${pid}`].filter(Boolean).join(", ")}. Use bind_window(action='list') to see available windows.`, | ||
| "element_not_found", | ||
| ); | ||
| } | ||
| return okText(`Bound to window: hwnd=${result.hwnd} pid=${result.pid} "${result.title}". All subsequent screenshot/click/type operations target this window.`); | ||
| } | ||
| case "unbind": { | ||
| if (!adapter.executor.unbindFromWindow) { | ||
| return errorResult("bind_window is only available on Windows.", "feature_unavailable"); | ||
| } | ||
| await adapter.executor.unbindFromWindow(); | ||
| return okText("Window binding released. Operations now target the full screen."); | ||
| } | ||
| default: | ||
| return errorResult(`Unknown bind_window action "${action}". Valid: list, bind, unbind, status.`, "bad_args"); | ||
| } | ||
| } | ||
|
|
||
| async function handleClickElement( | ||
| adapter: ComputerUseHostAdapter, | ||
| args: Record<string, unknown>, | ||
| ): Promise<CuCallToolResult> { | ||
| if (!adapter.executor.clickElement) { | ||
| return errorResult( | ||
| "click_element is only available on Windows with a bound window.", | ||
| "feature_unavailable", | ||
| ); | ||
| } | ||
| const name = typeof args.name === "string" ? args.name : undefined; | ||
| const role = typeof args.role === "string" ? args.role : undefined; | ||
| const automationId = typeof args.automationId === "string" ? args.automationId : undefined; | ||
| if (!name && !role && !automationId) { | ||
| return errorResult("At least one of name, role, or automationId is required.", "bad_args"); | ||
| } | ||
| const ok = await adapter.executor.clickElement({ name, role, automationId }); | ||
| if (!ok) { | ||
| return errorResult( | ||
| `Element not found: ${[name && `name="${name}"`, role && `role=${role}`, automationId && `id=${automationId}`].filter(Boolean).join(", ")}. Take a screenshot to see current GUI elements.`, | ||
| "element_not_found", | ||
| ); | ||
| } | ||
| return okText(`Clicked element: ${[name && `"${name}"`, role, automationId].filter(Boolean).join(" ")}`); | ||
| } | ||
|
|
||
| async function handleTypeIntoElement( | ||
| adapter: ComputerUseHostAdapter, | ||
| args: Record<string, unknown>, | ||
| ): Promise<CuCallToolResult> { | ||
| if (!adapter.executor.typeIntoElement) { | ||
| return errorResult( | ||
| "type_into_element is only available on Windows with a bound window.", | ||
| "feature_unavailable", | ||
| ); | ||
| } | ||
| const text = requireString(args, "text"); | ||
| if (text instanceof Error) return errorResult(text.message, "bad_args"); | ||
| const name = typeof args.name === "string" ? args.name : undefined; | ||
| const role = typeof args.role === "string" ? args.role : undefined; | ||
| const automationId = typeof args.automationId === "string" ? args.automationId : undefined; | ||
| const ok = await adapter.executor.typeIntoElement({ name, role, automationId }, text); | ||
| if (!ok) { | ||
| return errorResult( | ||
| `Could not type into element: ${[name && `name="${name}"`, role && `role=${role}`, automationId && `id=${automationId}`].filter(Boolean).join(", ")}. The element was not found or doesn't support text input.`, | ||
| "element_not_found", | ||
| ); | ||
| } | ||
| return okText(`Typed ${text.length} chars into: ${[name && `"${name}"`, role, automationId].filter(Boolean).join(" ")}`); | ||
| } | ||
|
|
||
| async function handleWindowManagement( | ||
| adapter: ComputerUseHostAdapter, | ||
| args: Record<string, unknown>, | ||
| ): Promise<CuCallToolResult> { | ||
| const action = requireString(args, "action"); | ||
| if (action instanceof Error) return errorResult(action.message, "bad_args"); | ||
|
|
||
| const VALID_ACTIONS = new Set([ | ||
| "minimize", "maximize", "restore", "close", "focus", "move_offscreen", "move_resize", "get_rect", | ||
| ]); | ||
| if (!VALID_ACTIONS.has(action)) { | ||
| return errorResult( | ||
| `Unknown window_management action "${action}". Valid: ${[...VALID_ACTIONS].join(", ")}`, | ||
| "bad_args", | ||
| ); | ||
| } | ||
|
|
||
| if (!adapter.executor.manageWindow) { | ||
| return errorResult( | ||
| "window_management is only available on Windows with a bound window.", | ||
| "feature_unavailable", | ||
| ); | ||
| } | ||
|
|
||
| // get_rect: just return the current window position and size | ||
| if (action === "get_rect") { | ||
| if (!adapter.executor.getWindowRect) { | ||
| return errorResult("getWindowRect not available.", "feature_unavailable"); | ||
| } | ||
| const rect = await adapter.executor.getWindowRect(); | ||
| if (!rect) { | ||
| return errorResult("No window is currently bound. Call open_application first.", "bad_args"); | ||
| } | ||
| return okText( | ||
| `Window rect: x=${rect.x}, y=${rect.y}, width=${rect.width}, height=${rect.height}`, | ||
| ); | ||
| } | ||
|
|
||
| // move_resize: requires x, y (width/height optional) | ||
| if (action === "move_resize") { | ||
| const x = typeof args.x === "number" ? args.x : undefined; | ||
| const y = typeof args.y === "number" ? args.y : undefined; | ||
| if (x === undefined || y === undefined) { | ||
| return errorResult("move_resize requires x and y parameters.", "bad_args"); | ||
| } | ||
| const width = typeof args.width === "number" ? args.width : undefined; | ||
| const height = typeof args.height === "number" ? args.height : undefined; | ||
| const ok = await adapter.executor.manageWindow(action, { x, y, width, height }); | ||
| if (!ok) { | ||
| return errorResult("No window is currently bound. Call open_application first.", "bad_args"); | ||
| } | ||
| return okText( | ||
| width && height | ||
| ? `Moved window to (${x}, ${y}) and resized to ${width}×${height}.` | ||
| : `Moved window to (${x}, ${y}).`, | ||
| ); | ||
| } | ||
|
|
||
| // All other actions: minimize, maximize, restore, close, focus, move_offscreen | ||
| const ok = await adapter.executor.manageWindow(action); | ||
| if (!ok) { | ||
| return errorResult( | ||
| "No window is currently bound. Call open_application first.", | ||
| "bad_args", | ||
| ); | ||
| } | ||
|
|
||
| const descriptions: Record<string, string> = { | ||
| minimize: "Window minimized (ShowWindow SW_MINIMIZE).", | ||
| maximize: "Window maximized (ShowWindow SW_MAXIMIZE).", | ||
| restore: "Window restored (ShowWindow SW_RESTORE).", | ||
| close: "Window closed (SendMessage WM_CLOSE). The window binding has been released.", | ||
| focus: "Window brought to front (SetForegroundWindow).", | ||
| move_offscreen: "Window moved offscreen (-32000,-32000). Still usable via SendMessage/PrintWindow.", | ||
| }; | ||
|
|
||
| return okText(descriptions[action] ?? `Action "${action}" completed.`); | ||
| } |
There was a problem hiding this comment.
The bound-window tools bypass the approval and shortcut model.
These handlers never consult overrides/subGates, and the auto-routing block returns into them before runInputActionGates, runHitTestGate, or isSystemKeyCombo. After bind_window or open_terminal, the model can bind an arbitrary HWND and then click/type/send blocked shortcuts without a matching request_access grant or tier check.
Also applies to: 3843-3889
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/`@ant/computer-use-mcp/src/toolCalls.ts around lines 2835 - 3270,
The handlers allow binding/opening a window and then performing privileged input
actions without consulting approval gates; fix by enforcing the same
gate/override checks before creating a binding and before executing input
actions. Specifically: in handleBindWindow and handleOpenTerminal, consult the
existing overrides/subGates logic and require the appropriate
request_access/tier grant (or call the same gate helpers) before accepting a
bind or marking the window as bound; in each input/action handler
(handleVirtualMouse, handleVirtualKeyboard, handleClickElement,
handleTypeIntoElement, handleMouseWheel, handleActivateWindow,
handlePromptRespond, handleWindowManagement, etc.) call the shared gate
functions (runInputActionGates, runHitTestGate, isSystemKeyCombo or equivalent)
and respect overrides/subGates before calling adapter.executor.* so that
shortcuts and privileged actions are blocked unless explicitly allowed. Ensure
you reference and reuse the same gate helper functions used elsewhere so
behavior is consistent across bind/open and subsequent actions.
| return errorResult( | ||
| "Failed to open terminal. Windows Terminal (wt.exe) may not be installed.", | ||
| "launch_failed", | ||
| ); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
rg -n 'type CuErrorKind|launch_failed|element_not_found' \
packages/@ant/computer-use-mcp/src/toolCalls.tsRepository: claude-code-best/claude-code
Length of output: 228
🏁 Script executed:
# Read the CuErrorKind type definition (lines 75-91)
sed -n '75,91p' packages/@ant/computer-use-mcp/src/toolCalls.ts
# Also read the error branches at the specified locations
echo "--- Lines 3053-3056 ---"
sed -n '3053,3056p' packages/@ant/computer-use-mcp/src/toolCalls.ts
echo "--- Lines 3123-3125 ---"
sed -n '3123,3125p' packages/@ant/computer-use-mcp/src/toolCalls.ts
echo "--- Lines 3160-3162 ---"
sed -n '3160,3162p' packages/@ant/computer-use-mcp/src/toolCalls.ts
echo "--- Lines 3185-3187 ---"
sed -n '3185,3187p' packages/@ant/computer-use-mcp/src/toolCalls.tsRepository: claude-code-best/claude-code
Length of output: 1936
Use CuErrorKind values that exist in the union.
The code passes "launch_failed" at line 3055 and "element_not_found" at lines 3125, 3162, and 3187, but CuErrorKind defines neither of these. Update the union type definition or replace these with valid error kinds from the CuErrorKind union.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/`@ant/computer-use-mcp/src/toolCalls.ts around lines 3053 - 3056,
The code is returning errorResult with invalid CuErrorKind values
("launch_failed" and "element_not_found"); update those calls to use one of the
existing CuErrorKind union members (or add the missing members to the
CuErrorKind type). Locate the errorResult invocations in toolCalls.ts (e.g., the
return wrapping the terminal open failure and the other occurrences) and replace
"launch_failed" and "element_not_found" with a valid CuErrorKind constant (or
extend the CuErrorKind union declaration to include these exact names) so the
returned kind matches the defined union.
| return result.split('\n').filter(Boolean).map(line => { | ||
| const [path, name] = line.split('|', 2) | ||
| const displayName = (name ?? '').replace(/\.app$/, '') | ||
| return { | ||
| bundleId: bundleId ?? '', | ||
| displayName: displayName ?? '', | ||
| bundleId: `com.app.${displayName.toLowerCase().replace(/\s+/g, '-')}`, | ||
| displayName, | ||
| path: path ?? '', | ||
| } | ||
| }) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for usages of listInstalled() and bundleId from installed apps
rg -n 'listInstalled|\.bundleId' --type=ts -C3 packages/@ant/computer-use src/utils/computerUseRepository: claude-code-best/claude-code
Length of output: 8822
🏁 Script executed:
# Search for how bundleId is used in opening apps or matching against known IDs
rg -n 'apps\.open|tell application id|ALWAYS_KEEP_BUNDLE_IDS|bundleId.*open' --type=ts --type=tsx -C2 src/utils/computerUse/Repository: claude-code-best/claude-code
Length of output: 99
🏁 Script executed:
# Check the ALWAYS_KEEP_BUNDLE_IDS definition to understand what apps are expected
rg -n 'ALWAYS_KEEP_BUNDLE_IDS|const ALWAYS_KEEP' --type=ts -A 5 src/utils/computerUse/Repository: claude-code-best/claude-code
Length of output: 909
🏁 Script executed:
# Search for app opening logic in executor to see if bundleId is used with "tell application id"
rg -n 'open.*bundleId|tell application id|launch.*bundle' --type=ts -C3 src/utils/computerUse/Repository: claude-code-best/claude-code
Length of output: 1140
🏁 Script executed:
# Check the Swift backend file for apps.open() implementation
rg -n 'func open|apps\.open' packages/@ant/computer-use-swift/src/backends/ --type=ts -A 5Repository: claude-code-best/claude-code
Length of output: 54
🏁 Script executed:
# Also check if there's any bundleId matching logic in the Swift backend
rg -n 'bundleId|tell application id' packages/@ant/computer-use-swift/ --type=ts -C 2Repository: claude-code-best/claude-code
Length of output: 13634
Synthetic bundle IDs will break app launching and permission filtering.
The implementation returns synthetic IDs like com.app.safari instead of real macOS bundle IDs like com.apple.Safari. This breaks:
- App opening:
open(bundleId)executestell application id "${bundleId}" to activatewhich will fail with a non-existent bundle ID - Permission grants: The app filter at
appNames.ts:177checksALWAYS_KEEP_BUNDLE_IDS.has(app.bundleId)against real IDs (com.apple.Safari,com.google.Chrome, etc.), which will never match the synthetic format
Meanwhile listRunning() retrieves real bundle IDs via app.bundleIdentifier(), creating an inconsistency. The synthetic ID approach from AppleScript output parsing is unsuitable for use cases requiring actual bundle IDs.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/`@ant/computer-use-swift/src/backends/darwin.ts around lines 173 -
181, The mapping in darwin.ts that builds synthetic bundleId values must be
replaced so bundleId is the app's real CFBundleIdentifier (not `com.app.*`);
change the mapper that returns { bundleId, displayName, path } to query the app
at path (the `path` value returned from the AppleScript parse) and read its
CFBundleIdentifier (e.g., read Contents/Info.plist CFBundleIdentifier via
defaults/plutil or use `osascript -e 'id of application "<path>"'`) and set that
as bundleId; keep displayName and path unchanged and ensure this new real
bundleId is compatible with listRunning()/app.bundleIdentifier() and
ALWAYS_KEEP_BUNDLE_IDS checks.
| const cu = requireComputerUseSwift() | ||
| const accessibility = (cu as any).tcc.checkAccessibility() | ||
| const screenRecording = (cu as any).tcc.checkScreenRecording() | ||
| return accessibility && screenRecording | ||
| ? { granted: true } | ||
| : { granted: false, accessibility, screenRecording } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if tcc is exported from computer-use-swift
rg -n 'export.*tcc|tcc.*=' packages/@ant/computer-use-swift/src/ --type=tsRepository: claude-code-best/claude-code
Length of output: 54
🏁 Script executed:
#!/bin/bash
# First, check if the computer-use-swift package exists
find . -type d -name "*computer-use*" 2>/dev/null | head -20Repository: claude-code-best/claude-code
Length of output: 174
🏁 Script executed:
#!/bin/bash
# Find where requireComputerUseSwift is defined
rg -n 'requireComputerUseSwift' --type ts --type tsxRepository: claude-code-best/claude-code
Length of output: 99
🏁 Script executed:
#!/bin/bash
# Read the full hostAdapter.ts file to understand context
wc -l src/utils/computerUse/hostAdapter.tsRepository: claude-code-best/claude-code
Length of output: 111
🏁 Script executed:
#!/bin/bash
# Find requireComputerUseSwift definition - fixed syntax
rg -n 'requireComputerUseSwift' --type tsRepository: claude-code-best/claude-code
Length of output: 2028
🏁 Script executed:
#!/bin/bash
# Read the full hostAdapter.ts file
cat -n src/utils/computerUse/hostAdapter.tsRepository: claude-code-best/claude-code
Length of output: 3408
🏁 Script executed:
#!/bin/bash
# Explore the computer-use-swift package structure
find packages/@ant/computer-use-swift -type f | head -30Repository: claude-code-best/claude-code
Length of output: 375
🏁 Script executed:
#!/bin/bash
# Read swiftLoader.ts to see what ComputerUseAPI is
cat -n src/utils/computerUse/swiftLoader.tsRepository: claude-code-best/claude-code
Length of output: 1050
🏁 Script executed:
#!/bin/bash
# Check the computer-use-swift package types
cat -n packages/@ant/computer-use-swift/src/types.tsRepository: claude-code-best/claude-code
Length of output: 2523
🏁 Script executed:
#!/bin/bash
# Check the computer-use-swift package index to see what's exported
cat -n packages/@ant/computer-use-swift/src/index.tsRepository: claude-code-best/claude-code
Length of output: 2496
🏁 Script executed:
#!/bin/bash
# Check the darwin backend file to see if tcc is defined there
cat -n packages/@ant/computer-use-swift/src/backends/darwin.tsRepository: claude-code-best/claude-code
Length of output: 10241
Missing tcc property will cause runtime error on macOS.
The code calls (cu as any).tcc.checkAccessibility() and (cu as any).tcc.checkScreenRecording() without accessing an actual property. The ComputerUseAPI class returned by requireComputerUseSwift() exports only apps, display, screenshot, and resolvePrepareCapture — there is no tcc property in the API definition. This will throw TypeError: Cannot read properties of undefined (reading 'checkAccessibility') when ensureOsPermissions is called on macOS.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/utils/computerUse/hostAdapter.ts` around lines 49 - 54, The code assumes
(cu as any).tcc exists and calls its methods, causing a TypeError; update the
ensureOsPermissions logic in hostAdapter.ts to guard against missing tcc: after
calling requireComputerUseSwift(), verify that cu.tcc is defined and that
cu.tcc.checkAccessibility and cu.tcc.checkScreenRecording are functions before
invoking them; if tcc or either method is missing, return a safe failure object
(e.g., { granted: false, accessibility: false, screenRecording: false } or
include undefined flags) instead of calling into undefined. Ensure you reference
requireComputerUseSwift() and the checkAccessibility/checkScreenRecording method
names in your changes so callers get a clear, non-throwing result.
| if (boundHwnd) { | ||
| const result = this.captureWindow?.(String(boundHwnd)) | ||
| if (result) return result | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
rg -n 'interface ScreenshotPlatform|captureWindow\?' \
src/utils/computerUse/platforms/types.ts
rg -n 'captureScreen|captureWindow' \
src/utils/computerUse/platforms/win32.tsRepository: claude-code-best/claude-code
Length of output: 414
🏁 Script executed:
sed -n '397,425p;427,445p' src/utils/computerUse/platforms/win32.tsRepository: claude-code-best/claude-code
Length of output: 1593
Make captureWindow return a promise.
ScreenshotPlatform.captureWindow is declared as Promise<ScreenshotResult | null> in the interface, but the implementation at line 427 is synchronous and returns a raw ScreenshotResult | null. The method is not async, and usage at lines 400 and 421 consumes the result synchronously (if (result) return result), which would fail if the result were actually a Promise. Change the implementation to either mark it async or explicitly wrap the return value in a Promise to match the interface.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/utils/computerUse/platforms/win32.ts` around lines 399 - 402, The
implementation of ScreenshotPlatform.captureWindow must return a Promise to
match its declared signature; update the captureWindow implementation to be
async (or explicitly return Promise.resolve(...)) and update its call sites here
(the call using boundHwnd and the other call at the later branch) to await the
result (e.g., const result = await this.captureWindow?.(String(boundHwnd)))
before checking if (result) return result so you handle the Promise correctly;
reference symbols: captureWindow, ScreenshotPlatform.captureWindow,
ScreenshotResult, boundHwnd.
| export function ensureBridge(): boolean { | ||
| if (bridgeProc) return true | ||
| try { | ||
| const scriptPath = path.join(__dirname, 'bridge.py') |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if __dirname is used elsewhere and how it's resolved
rg -n '__dirname' src/utils/computerUse/Repository: claude-code-best/claude-code
Length of output: 280
🏁 Script executed:
cat -n src/utils/computerUse/win32/bridgeClient.ts | head -60Repository: claude-code-best/claude-code
Length of output: 2092
🏁 Script executed:
ls -la src/utils/computerUse/win32/ | grep -E '\.(py|ts|tsx)$'Repository: claude-code-best/claude-code
Length of output: 1044
🏁 Script executed:
stat src/utils/computerUse/win32/bridge.py 2>&1 || echo "File not found"Repository: claude-code-best/claude-code
Length of output: 129
🏁 Script executed:
sed -n '42p' src/utils/computerUse/win32/bridgeClient.ts | wc -cRepository: claude-code-best/claude-code
Length of output: 74
🏁 Script executed:
sed -n '153p' src/utils/computerUse/win32/bridgeClient.ts | wc -cRepository: claude-code-best/claude-code
Length of output: 74
🌐 Web query:
Bun ESM __dirname support availability
💡 Result:
Bun does not natively support the dirname global in strict ES modules (ESM). Instead, Bun provides equivalent functionality through import.meta: - import.meta.dir gives the absolute path to the directory containing the current module (equivalent to dirname). - import.meta.dirname is an alias for import.meta.dir for Node.js compatibility. - import.meta.path gives the absolute path to the current file (equivalent to filename). This is the standard approach in ESM environments, matching Node.js behavior where dirname is not available in ESM without flags or polyfills. Bun's official documentation lists dirname as a Node.js global it implements, but issues and comments confirm it throws ReferenceError in ESM contexts (e.g., bundled ESM code), requiring replacement with import.meta.dir. For code using dirname (often from CommonJS dependencies), manual replacement or polyfills are needed, especially when bundling with bun build --target=bun, where dirname gets hardcoded. As of 2026-04-05 (Bun ~1.2+ based on release notes), no native ESM dirname global polyfill exists; use import.meta.dir. Example usage: console.log(import.meta.dir); // /path/to/project Compatibility note: Works in Bun runtime for unbundled ESM. Bundled code requires handling during build.
Citations:
- 1: https://bun.com/docs/runtime/module-resolution
- 2: https://bun.sh/docs/runtime/modules
- 3: https://bun.sh/docs/api/import-meta
- 4: Bundling with --target=bun changes __dirname references to hardcoded values oven-sh/bun#4216
- 5: ReferenceError: Can't find variable: __dirname oven-sh/bun#4301
Replace __dirname with import.meta.dir for Bun ESM compatibility.
Bun does not support the __dirname global in ESM modules—this will throw a ReferenceError at runtime. Use import.meta.dir instead:
Suggested fix (lines 42 & 153)
const scriptPath = path.join(import.meta.dir, 'bridge.py')The bridge.py file is co-located in the correct directory, but the path resolution method must be updated.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/utils/computerUse/win32/bridgeClient.ts` at line 42, Replace uses of the
CommonJS __dirname with the ESM-compatible import.meta.dir when building the
bridge.py path: update the scriptPath assignment that calls path.join(__dirname,
'bridge.py') (and any other path.join(__dirname, 'bridge.py') occurrence such as
the second usage around bridge execution) to use import.meta.dir instead so Bun
ESM modules can resolve the co-located bridge.py; keep the same path.join call
and variable name (scriptPath) but swap the dirname source to import.meta.dir.
与 PR #98 的关系
本 PR 是 #98 的增强版本,在其基础上进行了大幅扩展。#98 完成了三平台基础支持(macOS + Windows + Linux),本 PR 在此之上新增:
新增内容(#98 没有的)
executorCrossPlatform.ts(1150 行)platforms/{win32,darwin,linux,types,index}.ts(5 文件)bridge.py+bridgeClient.ts(716 行)accessibilitySnapshot.ts(225 行)windowMessage.ts(696 行)windowBorder.ts(66 行)virtualCursor.ts(268 行)inputIndicator.ts(254 行)comExcel.ts+comWord.ts(770 行)appDispatcher.ts(129 行)computer-use-architecture-v2.md(325 行)computer-use-tools-reference.md(496 行)增强内容(改进 #98 已有的)
toolCalls.tsclick_element、type_into_element、window_management等按名称操作工具tools.tsexecutor.ts(MCP)ocr.tsuiAutomation.tswindowEnum.ts性能对比
Summary
Test plan
bun run dev→ Computer Use 工具可用click_element/type_into_element按名称操作bun run build成功🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Refactor