feat: Computer Use v2 — 跨平台 Executor + Python Bridge + GUI 无障碍 by amDosion · Pull Request #137 · claude-code-best/claude-code

amDosion · 2026-04-05T11:54:09Z

与 PR #98 的关系

本 PR 是 #98 的增强版本，在其基础上进行了大幅扩展。#98 完成了三平台基础支持（macOS + Windows + Linux），本 PR 在此之上新增：

新增内容（#98 没有的）

模块	文件	说明
跨平台 Executor	`executorCrossPlatform.ts` (1150 行)	统一工具调用执行层，替代 #98 中平台分散的逻辑
平台抽象层	`platforms/{win32,darwin,linux,types,index}.ts` (5 文件)	类型安全的平台接口，每个平台独立实现
Python Bridge	`bridge.py` + `bridgeClient.ts` (716 行)	长驻进程替代逐次 PowerShell，17 种方法，性能提升 300x
GUI 无障碍	`accessibilitySnapshot.ts` (225 行)	截图自动附带无障碍快照，模型可感知 GUI 元素
SendMessageW 输入	`windowMessage.ts` (696 行)	Win32 原生 Unicode 输入，替代 #98 的 PowerShell SendKeys
窗口边框	`windowBorder.ts` (66 行)	4 叠加透明窗口实现 30fps 边框跟踪
虚拟光标	`virtualCursor.ts` (268 行)	操作时的视觉光标反馈
输入状态指示	`inputIndicator.ts` (254 行)	键盘/鼠标操作的状态指示器
COM 自动化	`comExcel.ts` + `comWord.ts` (770 行)	Excel/Word 原生 COM 自动化
应用分发器	`appDispatcher.ts` (129 行)	按应用类型分发操作策略
架构文档	`computer-use-architecture-v2.md` (325 行)	v2 架构设计文档
工具参考	`computer-use-tools-reference.md` (496 行)	全部工具的 API 参考

增强内容（改进 #98 已有的）

文件	#98	本 PR	差异
`toolCalls.ts`	3649 行基础实现	+558 行	新增 `click_element`、`type_into_element`、`window_management` 等按名称操作工具
`tools.ts`	706 行	+349 行	新增工具定义（无需坐标的交互工具）
`executor.ts` (MCP)	111 行	+57 行	扩展执行器支持新工具类型
`ocr.ts`	257 行	重构	精简，OCR 逻辑优化
`uiAutomation.ts`	308 行	重构	增强元素查找和交互
`windowEnum.ts`	86 行	重构	优化枚举逻辑

性能对比

操作	#98 (PowerShell)	本 PR (Python Bridge)
窗口枚举	~500ms	~1.5ms
截图	~800ms	~360ms
文字输入	SendKeys (不可靠)	SendMessageW (Unicode 原生)
GUI 交互	坐标点击	按名称点击 (无需坐标)

Summary

36 files changed, +8331/-385
新增 19 个文件，修改 17 个文件
核心改进：跨平台 Executor 统一层、Python Bridge 高性能后端、GUI 无障碍感知

Test plan

Windows: bun run dev → Computer Use 工具可用
Windows: 截图、鼠标点击、键盘输入正常
Windows: click_element / type_into_element 按名称操作
Windows: Python Bridge 自动启动，窗口枚举 < 5ms
macOS: 原有功能不受影响（darwin backend 仅格式调整）
bun run build 成功

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added cross-platform Computer Use support for Windows and Linux (alongside macOS).
- Added Windows window binding, management, and input capabilities.
- Added Windows virtual keyboard and virtual mouse for bound-window interactions.
- Added UI automation features: element-level clicking/typing and accessibility snapshots.
- Added window and terminal management tools.
- New platform abstraction layer enabling unified Computer Use API across operating systems.
Refactor
- Restructured Computer Use module architecture for cross-platform consistency.
- Updated code formatting and style throughout build configuration.

三平台 Computer Use (macOS + Windows + Linux)，Windows 专项增强。 - MCP server: toolCalls/tools/executor/mcpServer 等 12 文件完整实现 - 平台抽象层: platforms/{win32,darwin,linux}.ts - 跨平台 executor: executorCrossPlatform.ts - CHICAGO_MCP + VOICE_MODE feature flags 启用 - windowMessage.ts: SendMessageW (WM_CHAR Unicode + 剪贴板粘贴) - windowBorder.ts: 4 叠加窗口边框 (30fps 跟踪) - uiAutomation.ts: UI Automation 元素树/点击/写值 - accessibilitySnapshot.ts: 无障碍快照 → 模型感知 GUI - bridge.py + bridgeClient.ts: Python 长驻进程 (替代 per-call PS) - window_management: min/max/restore/close/focus (Win32 API) - click_element / type_into_element: 按名称操作 (无需坐标) - 截图自动附带 Accessibility Snapshot - 17 种方法, stdin/stdout JSON 通信 - 窗口枚举 1.5ms vs PS 500ms, 截图 360ms vs PS 800ms - 依赖: mss + Pillow + pywinauto

coderabbitai · 2026-04-05T11:54:24Z

📝 Walkthrough

Walkthrough

Refactors Computer Use architecture to cross-platform support by introducing a unified platform abstraction layer (src/utils/computerUse/platforms/) with platform-specific implementations for darwin/win32/linux, adding comprehensive Windows backend with Python bridge and UI automation utilities, removing platform dispatch from @ant packages, and updating executors to load platform backends dynamically.

Changes

Cohort / File(s)	Summary
Platform Abstraction Layer `src/utils/computerUse/platforms/index.ts`, `src/utils/computerUse/platforms/types.ts`, `src/utils/computerUse/swiftLoader.ts`	New cross-platform dispatcher and interface definitions; `swiftLoader` now enforces macOS-only constraint; `loadPlatform()` dynamically loads darwin/win32/linux backends.
Platform Implementations - macOS `src/utils/computerUse/platforms/darwin.ts`, `packages/@ant/computer-use-swift/src/index.ts`, `packages/@ant/computer-use-input/src/index.ts`	Delegates to `@ant/*` modules; `@ant/computer-use-swift` and `@ant/computer-use-input` now macOS-only with removed cross-platform dispatch and throwing stubs.
Platform Implementation - Linux `src/utils/computerUse/platforms/linux.ts`	New Linux backend using `xdotool`, `scrot`, `wmctrl`, `xrandr` for input/screenshot/display/app operations with fallback behaviors.
Platform Implementation - Windows `src/utils/computerUse/platforms/win32.ts`, `src/utils/computerUse/win32/bridge.py`, `src/utils/computerUse/win32/bridgeClient.ts`	New Windows backend with long-lived Python bridge subprocess communicating via JSON; HWND-bound window mode with window-targeted input via `SendMessage`/PowerShell; display enumeration, screenshot capture, app launching, and window management.
Windows Utilities `src/utils/computerUse/win32/shared.ts`, `src/utils/computerUse/win32/windowEnum.ts`, `src/utils/computerUse/win32/windowMessage.ts`, `src/utils/computerUse/win32/virtualCursor.ts`, `src/utils/computerUse/win32/inputIndicator.ts`, `src/utils/computerUse/win32/windowBorder.ts`, `src/utils/computerUse/win32/accessibilitySnapshot.ts`	New Win32 helper modules: PowerShell execution, window enumeration, window-targeted message-based input injection, virtual cursor overlay, input indicator overlay, DWM border marking, and UI Automation accessibility snapshot capture.
Windows Office Automation `src/utils/computerUse/win32/appDispatcher.ts`, `src/utils/computerUse/win32/comExcel.ts`, `src/utils/computerUse/win32/comWord.ts`	New Windows app detection and COM automation for Excel/Word via PowerShell-driven headless operations (read/write cells/ranges, formulas, create/save workbooks/documents).
OCR & UI Automation `src/utils/computerUse/win32/ocr.ts`, `src/utils/computerUse/win32/uiAutomation.ts`	Refactored to use new shared PowerShell execution; added UI element control-type validation; OCR remains Windows-specific region/window capture via PowerShell.
Executor Updates `src/utils/computerUse/executor.ts`, `src/utils/computerUse/executorCrossPlatform.ts`, `src/utils/computerUse/hostAdapter.ts`, `src/utils/computerUse/common.ts`	macOS executor now loads `executorCrossPlatform` for non-darwin; new cross-platform executor implements display, screenshot, app, input, clipboard, and window management via platform abstraction; `hostAdapter` updated to use dynamic typing for TCC checks; `common.ts` platform mapping updated to distinguish win32/linux/darwin.
MCP Tool Interface `packages/@ant/computer-use-mcp/src/executor.ts`, `packages/@ant/computer-use-mcp/src/tools.ts`, `packages/@ant/computer-use-mcp/src/toolCalls.ts`	`ComputerExecutor` gains 15+ new optional Windows-only methods (window management, virtual input, element-targeted actions); tools schema extended with Windows-only tools (window_management, bind_window, virtual_keyboard, virtual_mouse, etc.); tool dispatch logic includes bound-window auto-routing and platform-gated tool generation.
Documentation `docs/features/computer-use.md`, `docs/features/computer-use-architecture-v2.md`, `docs/features/computer-use-tools-reference.md`, `DEV-LOG.md`	New architecture documentation describing platform abstraction layer design; detailed tool reference for 37 tools across general/Windows/teaching categories; high-level execution phases; removed prior end-user guide content from main computer-use.md; cleanup of dev log entries.
Config & Build `.gitignore`, `build.ts`	Added rules for AI tool runtime directories and Python artifacts; updated `build.ts` to use single-quoted strings and remove trailing semicolons (formatting only).

Sequence Diagram

sequenceDiagram
    participant App as Application
    participant Executor as createCliExecutor()
    participant Dispatcher as loadPlatform()
    participant Backend as Platform Backend<br/>(darwin/win32/linux)
    participant Win32Bridge as Python Bridge<br/>(Windows only)
    participant Win32Utils as Win32 Utils<br/>(windowMessage, etc.)

    App->>Executor: initialize computer use
    Executor->>Dispatcher: loadPlatform()
    Dispatcher->>Backend: require platform module
    Backend-->>Dispatcher: return platform instance
    Dispatcher-->>Executor: platform object
    Executor->>Backend: screenshot(), click(), type(), etc.
    
    alt Darwin
        Backend->>Backend: delegate to `@ant` modules
        Backend-->>Executor: result
    else Windows (Bound Window)
        Backend->>Win32Bridge: ensure bridge subprocess
        Win32Bridge-->>Backend: bridge ready
        Backend->>Win32Utils: resolveInputHwnd()
        Win32Utils-->>Backend: target hwnd
        Backend->>Win32Bridge: sendClick(hwnd, x, y)
        Win32Bridge->>Win32Utils: PostMessage(hwnd, WM_LBUTTONDOWN)
        Win32Utils-->>Win32Bridge: success
        Win32Bridge-->>Backend: result
        Backend-->>Executor: result
    else Linux
        Backend->>Backend: use xdotool/scrot/wmctrl
        Backend-->>Executor: result
    end

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

feat: enable Computer Use with macOS + Windows + Linux support #98 — Directly implements the Computer Use cross-platform architecture; overlaps significantly with platform dispatcher, backends, executors, and win32/linux file additions.
Feature/computer use/mac support #108 — Modifies overlapping Computer Use files (swift backends, hostAdapter, toolCalls) for app enumeration, permission handling, and tool resolution.

Suggested labels

v6

Suggested reviewers

KonghaYao

Poem

🐰 twitches whiskers excitedly

From Darwin's den to Windows wide,
Linux joins the cross-platform ride,
One abstraction layer, unified flow,
Three backends bloom where platforms grow! 🌱

hops merrily through the architecture ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 72.46% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: introducing Computer Use v2 with cross-platform executor, Python bridge, and GUI accessibility features.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

packages/@ant/computer-use-swift/src/index.ts (1)

8-17: ⚠️ Potential issue | 🟠 Major

Import types from types.ts instead of darwin.js.

Lines 8-17 attempt to re-export types from ./backends/darwin.js, but darwin.ts doesn't export those types—it only uses them for type annotations. The type definitions exist in types.ts. Update the import path:

export type {
  DisplayGeometry,
  PrepareDisplayResult,
  AppInfo,
  InstalledApp,
  RunningApp,
  ScreenshotResult,
  ResolvePrepareCaptureResult,
  WindowDisplayInfo,
} from './types.js'

Also update line 21 similarly.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/`@ant/computer-use-swift/src/index.ts around lines 8 - 17, The file
incorrectly re-exports type symbols (DisplayGeometry, PrepareDisplayResult,
AppInfo, InstalledApp, RunningApp, ScreenshotResult,
ResolvePrepareCaptureResult, WindowDisplayInfo) from './backends/darwin.js' but
those types live in './types.js'; update the two export type blocks that
reference './backends/darwin.js' (the ones exporting the listed type names) to
import/export from './types.js' instead so the type exports resolve to the
correct source.

src/utils/computerUse/executor.ts (1)

100-106: ⚠️ Potential issue | 🟠 Major

Potential command injection in PowerShell Set-Clipboard.

The text.replace(/'/g, "''") escaping only handles single quotes. Characters like $, `, ", and newlines can still be interpreted by PowerShell, leading to command injection or unexpected behavior.

🛡️ Safer approach: use stdin piping instead of embedding in command string

 async function writeClipboardViaPbcopy(text: string): Promise<void> {
   if (process.platform === 'win32') {
-    const { code } = await execFileNoThrow('powershell', ['-NoProfile', '-Command', `Set-Clipboard -Value '${text.replace(/'/g, "''")}'`], {
+    const { code } = await execFileNoThrow('powershell', ['-NoProfile', '-Command', 'Set-Clipboard -Value ([Console]::In.ReadToEnd())'], {
+      input: text,
       useCwd: false,
     })
     if (code !== 0) {
       throw new Error(`PowerShell Set-Clipboard exited with code ${code}`)
     }
     return
   }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/utils/computerUse/executor.ts` around lines 100 - 106, The current
execFileNoThrow call injects user text into the PowerShell command using string
escaping (text.replace(...)) which can still allow PowerShell metacharacters to
be interpreted; instead run PowerShell with a command that reads from stdin and
pass the text via the child-process stdin stream. Update the execFileNoThrow
invocation in executor.ts (the call that currently uses `Set-Clipboard -Value
'${text.replace(/'/g, "''")}'`) to use a safe command like `Set-Clipboard -Value
(Get-Content -Raw -)` (or `Set-Clipboard -Value ([Console]::In.ReadToEnd())`)
and provide the plain `text` via the execFileNoThrow stdin option, removing the
inline escaping logic entirely.

🟠 Major comments (19)

src/utils/computerUse/win32/virtualCursor.ts-190-260 (1)

190-260: ⚠️ Potential issue | 🟠 Major

Capture the old cursor state before scheduling async cleanup.

The timeout closes over mutable globals. After showVirtualCursor() reassigns them, the delayed cleanup can kill the newly spawned cursor; if nothing is reassigned, the old process/files never get cleaned up.

Suggested fix

 export function hideVirtualCursor(): void {
-  if (cursorStopFile) {
+  const proc = cursorProc
+  const stopFile = cursorStopFile
+  const scriptFile = cursorScriptFile
+  const posFile = stopFile ? `${stopFile}.pos` : null
+
+  cursorProc = null
+  cursorStopFile = null
+  cursorScriptFile = null
+
+  if (stopFile) {
     try {
-      fs.writeFileSync(cursorStopFile, 'STOP', 'utf-8')
+      fs.writeFileSync(stopFile, 'STOP', 'utf-8')
     } catch {}
     setTimeout(() => {
       try {
-        cursorProc?.kill()
+        proc?.kill()
       } catch {}
       try {
-        if (cursorScriptFile) fs.unlinkSync(cursorScriptFile)
+        if (scriptFile) fs.unlinkSync(scriptFile)
       } catch {}
       try {
-        if (cursorStopFile) fs.unlinkSync(cursorStopFile)
+        if (stopFile) fs.unlinkSync(stopFile)
+      } catch {}
+      try {
+        if (posFile) fs.unlinkSync(posFile)
       } catch {}
     }, 2000)
   }
-  cursorProc = null
-  cursorStopFile = null
-  cursorScriptFile = null
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/utils/computerUse/win32/virtualCursor.ts` around lines 190 - 260, The
timeout-based cleanup in hideVirtualCursor closes over mutable globals
(cursorProc, cursorScriptFile, cursorStopFile) which can be reassigned by
showVirtualCursor and cause wrong process/files to be killed or left behind; fix
by capturing the current values into local constants (e.g. const proc =
cursorProc, const script = cursorScriptFile, const stop = cursorStopFile) before
scheduling setTimeout, clear the module globals immediately (set
cursorProc/cursorScriptFile/cursorStopFile = null) and then use the captured
locals inside the delayed cleanup to kill the correct process and unlink the
correct files. Ensure moveVirtualCursor still checks the global stop variable
behavior as intended.

src/utils/computerUse/win32/windowBorder.ts-19-66 (1)

19-66: ⚠️ Potential issue | 🟠 Major

Track marked HWNDs so cleanupAllBorders() can actually clear them.

markBound()/unmarkBound() are stateless, so this cleanup hook has nothing to act on. If any window is still marked when this path runs, the green border is left behind.

Suggested fix

+const markedWindows = new Set<string>()
+
 /**
  * Set green border on bound window via DWM.
  */
 export function markBound(hwnd: string): boolean {
   hwnd = validateHwnd(hwnd)
   // DWMWA_BORDER_COLOR = 34, COLORREF = 0x00BBGGRR
   // Green: R=0, G=200, B=0 → 0x0000C800
   const hr = ps(
@@
 [CuDwm]::DwmSetWindowAttribute([IntPtr]::new([long]${hwnd}), 34, [ref]$color, 4)`,
   )
+  if (hr === '0') markedWindows.add(hwnd)
   return hr === '0'
 }
@@
 export function unmarkBound(hwnd: string): boolean {
   hwnd = validateHwnd(hwnd)
   // DWMWA_COLOR_DEFAULT = 0xFFFFFFFF
   const hr = ps(
@@
 [CuDwm]::DwmSetWindowAttribute([IntPtr]::new([long]${hwnd}), 34, [ref]$color, 4)`,
   )
+  if (hr === '0') markedWindows.delete(hwnd)
   return hr === '0'
 }
@@
 export function cleanupAllBorders(): void {
-  // DWM border color is a window attribute — it resets automatically
-  // when the process exits or the window closes. No cleanup needed.
+  for (const hwnd of markedWindows) {
+    try {
+      unmarkBound(hwnd)
+    } catch {}
+  }
+  markedWindows.clear()
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/utils/computerUse/win32/windowBorder.ts` around lines 19 - 66, The
cleanupAllBorders function is a no-op because markBound/unmarkBound are
stateless; add a module-level Set (e.g., markedHwnds) to track successfully
marked HWNDs in markBound (only add when DwmSetWindowAttribute succeeds) and
remove from that Set in unmarkBound when unmarking succeeds, then implement
cleanupAllBorders to iterate over markedHwnds and call the same unmark logic (or
call unmarkBound) for each HWND and clear the Set; keep using validateHwnd in
both markBound and unmarkBound and ensure cleanupAllBorders handles failures
gracefully (log or ignore) and clears entries that were reset.

src/utils/computerUse/win32/shared.ts-40-47 (1)

40-47: ⚠️ Potential issue | 🟠 Major

Consume stderr and fail on non-zero exits in psAsync().

The function should check the exit code and report errors. Additionally, while Bun's Bun.spawn does not document blocking behavior on unconsumed pipes (unlike Node.js), consuming both stdout and stderr follows the recommended pattern and ensures error messages aren't silently lost.

Suggested fix

 export async function psAsync(script: string): Promise<string> {
   const proc = Bun.spawn(
     ['powershell', '-NoProfile', '-NonInteractive', '-Command', script],
     { stdout: 'pipe', stderr: 'pipe' },
   )
-  const out = await new Response(proc.stdout).text()
-  await proc.exited
+  const [out, err, exitCode] = await Promise.all([
+    new Response(proc.stdout).text(),
+    new Response(proc.stderr).text(),
+    proc.exited,
+  ])
+  if (exitCode !== 0) {
+    throw new Error(err.trim() || `PowerShell exited with code ${exitCode}`)
+  }
   return out.trim()
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/utils/computerUse/win32/shared.ts` around lines 40 - 47, psAsync
currently only reads stdout and doesn't check the child exit code or consume
stderr; update psAsync to concurrently read both proc.stdout and proc.stderr
(e.g., via new Response(proc.stdout).text() and new
Response(proc.stderr).text()), await proc.exited to get the exit status, and if
the exit code is non-zero throw or return an error that includes stderr (and
optionally stdout) so errors are not silently lost; reference the psAsync
function, the proc variable from Bun.spawn, and the proc.exited promise when
implementing this change.

src/utils/computerUse/win32/bridgeClient.ts-176-186 (1)

176-186: ⚠️ Potential issue | 🟠 Major

stopBridge() leaves pending promises hanging.

Calling pendingRequests.clear() without rejecting pending promises means callers of call() will never resolve/reject, causing memory leaks and potential deadlocks.

🐛 Proposed fix: reject pending requests before clearing

 export function stopBridge(): void {
   if (bridgeProc) {
     try {
       bridgeProc.stdin.end()
       bridgeProc.kill()
     } catch {}
     bridgeProc = null
   }
+  // Reject all pending requests to prevent hangs
+  for (const [id, { reject }] of pendingRequests) {
+    reject(new Error('Bridge stopped'))
+  }
   pendingRequests.clear()
   outputBuffer = ''
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/utils/computerUse/win32/bridgeClient.ts` around lines 176 - 186,
stopBridge() currently clears pendingRequests without rejecting them, leaving
callers of call() hanging; update stopBridge() to iterate pendingRequests (the
map/collection used by call()), reject each pending promise with a clear Error
(e.g., "bridge stopped" or include bridgeProc state), then clear the collection
and reset outputBuffer and bridgeProc as before; reference the stopBridge(),
pendingRequests, and call() symbols so you reject each pendingRequests entry
before calling pendingRequests.clear().

src/utils/computerUse/platforms/win32.ts-637-711 (1)

637-711: ⚠️ Potential issue | 🟠 Major

Snapshot visible HWNDs before Start-Process.

Lines 637-676 can start the target app before Line 681 captures beforeHwnds. Fast-starting apps can create their window in that gap, so it lands in the baseline and the later “new window” loop never finds it, falling through to HWND_NOT_FOUND.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/computerUse/platforms/win32.ts` around lines 637 - 711, The
baseline of visible windows ($beforeHwnds via [CuLaunch]::GetAllVisibleHwnds) is
taken after Start-Process calls so fast apps can create windows that end up in
the baseline; move the snapshot so it occurs before any Start-Process invocation
(i.e. capture $beforeHwnds once just before the first attempt to launch the
target and reuse that baseline for the subsequent new-window detection loop that
uses [CuLaunch]::GetAllVisibleHwnds(), $proc, and GetPidForHwnd).

src/utils/computerUse/platforms/types.ts-26-31 (1)

26-31: ⚠️ Potential issue | 🟠 Major

Add virtual-screen origin to DisplayInfo.

executorCrossPlatform.ts currently has to synthesize originX/originY as 0 because this type only exposes width/height/scale. That makes screenshot-to-click mapping wrong on any non-primary monitor whose origin is not (0, 0).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/computerUse/platforms/types.ts` around lines 26 - 31, The
DisplayInfo interface is missing virtual-screen origin fields, causing
executorCrossPlatform.ts to synthesize originX/originY as 0 and break
screenshot-to-click mapping on non-primary monitors; update DisplayInfo to
include originX: number and originY: number (or similarly named numeric
properties) and then update any consumers (notably executorCrossPlatform.ts) to
read these fields instead of assuming 0, ensuring callers constructing
DisplayInfo provide the correct monitor origin values.

packages/@ant/computer-use-mcp/src/toolCalls.ts-3890-3895 (1)

3890-3895: ⚠️ Potential issue | 🟠 Major

right scroll is routed as a negative wheel delta.

Line 3893 makes every direction except "up" negative. The Windows wheel helpers interpret positive horizontal deltas as right, so bound-window scroll_direction="right" currently scrolls left.

Suggested fix

         if (coord) return handleMouseWheel(adapter, {
           coordinate: coord,
-          delta: a.scroll_direction === "up" ? (a.scroll_amount ?? 3) : -(a.scroll_amount ?? 3),
+          delta:
+            a.scroll_direction === "up" || a.scroll_direction === "right"
+              ? (a.scroll_amount ?? 3)
+              : -(a.scroll_amount ?? 3),
           direction: (a.scroll_direction === "left" || a.scroll_direction === "right") ? "horizontal" : "vertical",
         });

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/`@ant/computer-use-mcp/src/toolCalls.ts around lines 3890 - 3895, In
the "scroll" case the delta sign is inverted for horizontal right: change the
delta calculation in the case "scroll" that calls handleMouseWheel so it
determines orientation from a.scroll_direction and computes sign = +1 when
scrolling up or right and -1 when scrolling down or left, then sets delta = sign
* (a.scroll_amount ?? 3); keep direction = (a.scroll_direction === "left" ||
a.scroll_direction === "right") ? "horizontal" : "vertical". This fixes
handleMouseWheel being called with the wrong sign for right/left while
preserving existing defaults.

src/utils/computerUse/executorCrossPlatform.ts-717-735 (1)

717-735: ⚠️ Potential issue | 🟠 Major

show never recreates a hidden indicator.

After hideIndicator() clears the backing process/files, Line 726 only calls updateIndicator(message), which is a no-op once msgFile is gone. The tool reports success, but nothing reappears until the window is rebound.

Suggested fix

         const ind =
           require('./win32/inputIndicator.js') as typeof import('./win32/inputIndicator.js')
         if (action === 'show' && message) {
+          const hwnd = getBoundHwndStr()
+          if (!hwnd || !ind.showIndicator(hwnd)) {
+            return { active: false }
+          }
           ind.updateIndicator(message)
           return { active: true, message }
         }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/utils/computerUse/executorCrossPlatform.ts` around lines 717 - 735, The
show branch never recreates the indicator after hideIndicator() removed its
backing files; updateIndicator() becomes a no-op. In statusIndicator, when
action === 'show' and message is provided, ensure you recreate/initialize the
indicator before calling updateIndicator — call the module's creation/init
function (e.g., ind.createIndicator() or ind.initIndicator(), whichever exists)
or otherwise reinitialize the indicator object after requiring
'./win32/inputIndicator.js', then call ind.updateIndicator(message); keep the
hide and status behavior using ind.hideIndicator() and isBound() unchanged.

src/utils/computerUse/win32/bridge.py-456-470 (1)

456-470: ⚠️ Potential issue | 🟠 Major

Pack wheel messages as Win32 words, not raw pointer shifts.

Lines 467–469 pack WPARAM/LPARAM with raw shifts into ctypes.c_void_p. When bound windows are offscreen at negative coordinates (e.g., -32000,-32000), ClientToScreen() returns negative screen coordinates here. Python's integer shifts on negative values sign-extend the bits; when cast to ctypes.c_void_p, this produces incorrect WM_MOUSEWHEEL payloads. Mask delta and both coordinate axes to 16-bit values before packing:
wparam = ctypes.c_void_p((wheel_delta & 0xFFFF) << 16)
lparam = ctypes.c_void_p(((pt.y & 0xFFFF) << 16) | (pt.x & 0xFFFF))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/computerUse/win32/bridge.py` around lines 456 - 470, In
send_mouse_wheel(), WPARAM/LPARAM are packed using raw shifts which misbehaves
for negative screen coordinates; change the packing to mask values into 16-bit
words before shifting so sign-extended negatives don't corrupt the payload: mask
wheel_delta with 0xFFFF when placing it into the high word for wparam, and mask
both pt.x and pt.y with 0xFFFF when building lparam (use these masked values
when creating the ctypes pointer arguments for wparam and lparam) so
WM_MOUSEWHEEL/WM_MOUSEHWHEEL receives correct 16-bit word-packed data.

packages/@ant/computer-use-mcp/src/toolCalls.ts-2156-2157 (1)

2156-2157: ⚠️ Potential issue | 🟠 Major

Atomic screenshots never include accessibilityText.

This branch rebuilds shot from result without copying result.accessibilityText, so Line 2157 is always false when resolvePrepareCapture() is used. Bound-window screenshots taken through the atomic path will never include the GUI tree.

Suggested fix

     const shot: ScreenshotResult = {
       base64: result.base64,
       width: result.width,
       height: result.height,
       displayWidth: result.displayWidth,
       displayHeight: result.displayHeight,
       displayId: result.displayId,
       originX: result.originX,
       originY: result.originY,
+      accessibilityText: result.accessibilityText,
     };

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/`@ant/computer-use-mcp/src/toolCalls.ts around lines 2156 - 2157,
The atomic capture path rebuilds the local "shot" object from "result" but never
copies result.accessibilityText, so the condition checking
shot.accessibilityText (used when building the accessibility "text" node) is
always false; update the code that constructs/assigns "shot" in the atomic path
(where resolvePrepareCapture is used) to include result.accessibilityText (e.g.,
set shot.accessibilityText = result.accessibilityText or include it in the
spread) so bound-window/atomic screenshots include the GUI tree.

src/utils/computerUse/win32/inputIndicator.ts-193-216 (1)

193-216: ⚠️ Potential issue | 🟠 Major

Capture the cleanup handles before nulling the globals.

Line 198 schedules cleanup against indicatorProc, scriptFile, stopFile, and msgFile, but Lines 213-216 clear those globals immediately. The timeout then either becomes a no-op or, if showIndicator() runs again before the 2s delay expires, kills/deletes the newer indicator instance instead of the old one.

Suggested fix

 export function hideIndicator(): void {
-  if (stopFile) {
+  const proc = indicatorProc
+  const sf = stopFile
+  const script = scriptFile
+  const msg = msgFile
+
+  if (sf) {
     try {
-      fs.writeFileSync(stopFile, 'STOP', 'utf-8')
+      fs.writeFileSync(sf, 'STOP', 'utf-8')
     } catch {}
     setTimeout(() => {
       try {
-        indicatorProc?.kill()
+        proc?.kill()
       } catch {}
       try {
-        if (scriptFile) fs.unlinkSync(scriptFile)
+        if (script) fs.unlinkSync(script)
       } catch {}
       try {
-        if (stopFile) fs.unlinkSync(stopFile)
+        if (sf) fs.unlinkSync(sf)
       } catch {}
       try {
-        if (msgFile) fs.unlinkSync(msgFile)
+        if (msg) fs.unlinkSync(msg)
       } catch {}
     }, 2000)
   }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/utils/computerUse/win32/inputIndicator.ts` around lines 193 - 216, In
hideIndicator(), capture the current globals (indicatorProc, scriptFile,
stopFile, msgFile) into local constants before scheduling the setTimeout so the
delayed cleanup operates on the original handles; then use those locals inside
the setTimeout callback and only afterwards set the globals (indicatorProc,
scriptFile, stopFile, msgFile) to null, ensuring the timeout does not affect a
newly started indicator or become a no-op.

src/utils/computerUse/win32/comWord.ts-238-245 (1)

238-245: ⚠️ Potential issue | 🟠 Major

paraIndex is off by one for callers using openWord().paragraphs.

The public API exposes paragraphs as a zero-based JS array, but this forwards paraIndex straight into the COM collection. Passing the first array index (0) will fail, and every later index inserts before the next paragraph instead of the requested one.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/computerUse/win32/comWord.ts` around lines 238 - 245, The
insertText function uses paraIndex (from openWord().paragraphs zero-based array)
directly in the COM call $doc.Paragraphs.Item(...), causing off-by-one errors;
update the COM index to paraIndex + 1 when constructing the body string (i.e.,
call $doc.Paragraphs.Item(paraIndex + 1)) and add a guard to validate paraIndex
is >= 0 (or throw/return false) before calling psEscape and building the command
so callers using openWord().paragraphs map correctly to the COM collection.

src/utils/computerUse/win32/comWord.ts-43-49 (1)

43-49: ⚠️ Potential issue | 🟠 Major

Don’t trim raw document text on the transport path.

runPs() strips all leading/trailing whitespace, so readText() loses real content such as leading spaces and Word’s trailing paragraph newline. Only trim JSON/status responses; preserve stdout for text reads.

🔧 Suggested change

-function runPs(script: string): string {
+function runPs(script: string, opts?: { trim?: boolean }): string {
   const result = Bun.spawnSync({
     cmd: ['powershell', '-NoProfile', '-NonInteractive', '-Command', script],
     stdout: 'pipe',
     stderr: 'pipe',
   })
-  return new TextDecoder().decode(result.stdout).trim()
+  const stdout = new TextDecoder().decode(result.stdout)
+  return opts?.trim === false ? stdout : stdout.trim()
 }
@@
 export async function readText(filePath: string): Promise<string> {
   const script = wrapWordScript(
     `Write-Output $doc.Content.Text`,
     filePath,
   )
-  return runPs(script)
+  return runPs(script, { trim: false })
 }

Also applies to: 194-199

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/utils/computerUse/win32/comWord.ts` around lines 43 - 49, runPs currently
calls .trim() on the spawned PowerShell stdout which strips meaningful
leading/trailing whitespace (used by readText); remove the .trim() so runPs
returns the raw stdout (i.e., return new TextDecoder().decode(result.stdout))
and move any trimming to the callers that expect JSON/status responses (e.g.,
where you parse/handle status or JSON results—apply .trim() there). Update the
other occurrence referenced (lines ~194-199) similarly so only JSON/status
parsing paths trim, while readText and raw text paths preserve whitespace.

src/utils/computerUse/win32/comWord.ts-255-260 (1)

255-260: ⚠️ Potential issue | 🟠 Major

Reject empty find values before entering the replace flow.

An empty search string can match zero-width positions, which makes the counting loop and replace-all path behave unpredictably. Guard this at the TypeScript boundary before constructing the script.

🛡️ Suggested change

 export async function findReplace(
   filePath: string,
   find: string,
   replace: string,
   replaceAll?: boolean,
 ): Promise<number> {
+  if (find.length === 0) return 0
+
   // wdReplaceAll=2, wdReplaceOne=1
   const replaceConst = replaceAll !== false ? 2 : 1

Also applies to: 281-289

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/utils/computerUse/win32/comWord.ts` around lines 255 - 260, Reject empty
search strings at the TypeScript boundary in findReplace by validating the find
parameter before any script construction or replace logic: if find is empty or
only whitespace, throw an Error or return a rejected Promise (with a clear
message) to prevent zero-width matches and unpredictable counting/replaceAll
behavior; apply the same guard to the related replace routine referenced around
the 281-289 region so both functions validate find early.

src/utils/computerUse/win32/windowMessage.ts-133-135 (1)

133-135: ⚠️ Potential issue | 🟠 Major

Don't cache a miss permanently.

Caching null here means a bind that happens before the real input child exists will never re-probe, even after the control appears. On freshly launched WinUI/UWP/Terminal windows, resolveInputHwnd() can stay pinned to the wrong HWND until manual unbind.

♻️ Suggested change

-const editChildCache = new Map<string, string | null>()
+const editChildCache = new Map<string, string>()
@@
-  if (editChildCache.has(parentHwnd)) {
-    return editChildCache.get(parentHwnd)!
-  }
+  const cachedChild = editChildCache.get(parentHwnd)
+  if (cachedChild) {
+    return cachedChild
+  }
@@
-  editChildCache.set(parentHwnd, null)
   return null

Also applies to: 217-218

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/utils/computerUse/win32/windowMessage.ts` around lines 133 - 135, The
cache logic in editChildCache (used by resolveInputHwnd / the parentHwnd lookup)
is storing negative results (null/undefined) and thus permanently preventing
later re-probing when the child control appears; change the behavior so that you
only write to editChildCache when a non-null HWND is found (i.e., cache hits
only), and do not persist misses—alternatively implement a short TTL or remove
the cached entry when a null is returned; apply the same fix to the analogous
caching at the other location referenced (the lines around 217-218).

src/utils/computerUse/win32/windowMessage.ts-324-330 (1)

324-330: ⚠️ Potential issue | 🟠 Major

Keep the long-text fallback in the hot path.

This always inlines one WM_CHAR call per codepoint into a single script, so larger type requests get disproportionately slow and eventually fail once the generated payload gets too large. CLIPBOARD_THRESHOLD and pasteViaClipboard() already exist for this case but are bypassed here.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/computerUse/win32/windowMessage.ts` around lines 324 - 330,
sendText always inlines a WM_CHAR per codepoint which causes huge scripts and
failures; change sendText to keep the long-text clipboard fallback by checking
text size (or generated script length) against CLIPBOARD_THRESHOLD and, when
exceeded, call pasteViaClipboard(targetHwnd, text) instead of building/invoking
buildWmCharLines/runPs; for small texts continue to use buildWmCharLines,
WINMSG_TYPE and runPs as before so the hot path remains fast.

src/utils/computerUse/win32/windowMessage.ts-591-595 (1)

591-595: ⚠️ Potential issue | 🟠 Major

Translate (x, y) before re-targeting mouse input to a child HWND.

These helpers document (x, y) as coordinates relative to the bound window, then call resolveInputHwnd() and reuse the same coordinates for the child window. If the child is inset inside the top-level client area, clicks/drags/wheels land in the wrong place because lParam and screen-coordinate conversion are now using the wrong origin.

Also applies to: 609-625, 635-640, 663-694
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/computerUse/win32/windowMessage.ts` around lines 591 - 595, The
(x,y) coords must be converted from the original bound window's client origin to
the target child HWND's client origin before re-targeting input: compute the
point in screen coordinates from the original hwnd (ClientToScreen equivalent)
then, after resolveInputHwnd(hwnd) returns the child HWND, convert that screen
point into the child HWND's client coords (ScreenToClient equivalent) and use
those translated x/y when building lParam and any screen-coordinate conversions;
update all helpers that call resolveInputHwnd() and use x/y (the blocks around
the shown hwnd/downMsg/upMsg and the other ranges 609-625, 635-640, 663-694) to
perform this two-step translation so clicks/drags/wheels land correctly in inset
child windows.

src/utils/computerUse/win32/windowMessage.ts-339-350 (1)

339-350: ⚠️ Potential issue | 🟠 Major

Route single-key events through resolveInputHwnd() too.

sendText() and sendKeys() already retarget to the real edit/input child, but sendKey() still posts to the original HWND. On WinUI 3/UWP/edit-child windows that makes press/release flows diverge from combo/type behavior and miss the actual input surface.

🔧 Suggested change

 export function sendKey(
   hwnd: string,
   vk: number,
   action: 'down' | 'up',
 ): boolean {
-  hwnd = validateHwnd(hwnd)
+  hwnd = resolveInputHwnd(hwnd)
   const msg = action === 'down' ? '0x0100' : '0x0101'
   const lParamFn = action === 'down' ? 'KeyDownLParam' : 'KeyUpLParam'

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/utils/computerUse/win32/windowMessage.ts` around lines 339 - 350, sendKey
currently posts to the original HWND (it calls validateHwnd but not
resolveInputHwnd), which diverges from sendText/sendKeys that retarget to the
real edit/input child; update sendKey to call resolveInputHwnd(hwnd) after
validateHwnd and use the resolved hwnd when composing the PostMessage script so
key down/up events are routed to the actual input surface (ensure
resolveInputHwnd is imported/available and preserve existing vk/action
handling).

src/utils/computerUse/win32/windowMessage.ts-73-80 (1)

73-80: ⚠️ Potential issue | 🟠 Major

Set the extended-key flag (bit 24) in lParam for extended virtual keys.

The KeyDownLParam and KeyUpLParam helpers construct lParam by setting bit 0 (repeat count) and bits 16-23 (scan code) but leave bit 24 unset for all virtual keys. Per Microsoft's WM_KEYDOWN/WM_KEYUP documentation, bit 24 must be set for extended keys: arrow keys, Home, End, Delete, Insert, Page Up, Page Down, right Ctrl, and right Alt. Without this flag, these keys are posted as ordinary keys, breaking prompt navigation and terminal shortcuts in applications that distinguish extended keys.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/computerUse/win32/windowMessage.ts` around lines 73 - 80, The
helpers KeyDownLParam and KeyUpLParam must set the extended-key flag (bit 24)
for extended virtual keys; update these functions to detect when the provided vk
is an extended key (e.g., arrow keys, Home, End, Delete, Insert, PageUp,
PageDown, RightCtrl, RightAlt) and include (1u << 24) in the constructed lParam
when true, otherwise leave it unset; keep the existing MapVirtualKeyW(scanCode)
logic and existing bits for repeat, scan code, previous state and transition in
KeyDownLParam/KeyUpLParam but OR in the extended flag for the listed VKs (or use
a small helper IsExtendedKey(vk) called from both functions) so extended keys
have bit 24 set.

🟡 Minor comments (12)

.gitignore-18-20 (1)

18-20: ⚠️ Potential issue | 🟡 Minor

Align BMP ignore pattern with the “root only” intent.

Line 20 currently ignores *.bmp repo-wide, which conflicts with the comment and may hide tracked assets in subfolders. Use /*.bmp if root-only is intended.
Proposed fix
 # Binary / screenshot files (root only)
 /*.png
-*.bmp
+/*.bmp
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.gitignore around lines 18 - 20, The .gitignore currently has a root-only
PNG rule ('/*.png') but a repo-wide BMP rule ('*.bmp'); change the BMP pattern
to root-only by replacing '*.bmp' with '/*.bmp' so it matches the comment intent
and aligns with the existing '/*.png' rule.

DEV-LOG.md-42-42 (1)

42-42: ⚠️ Potential issue | 🟡 Minor

Remove blank line for formatting consistency.

This blank line between the section header (line 40) and the content (line 43) is inconsistent with the formatting pattern used throughout the rest of the file. All other sections have the content immediately following the ## header without a blank line (e.g., lines 3→5, 28→30, 68→70, 111→113).
📝 Proposed fix
 ## Computer Use Windows 增强：窗口绑定截图 + UI Automation + OCR (2026-04-03)
-
 在三平台基础实现之上，利用 Windows 原生 API 增强 Computer Use 的 Windows 专属能力。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@DEV-LOG.md` at line 42, Remove the single blank line immediately following
the affected "##" section header so the content follows the header on the next
line (matching the other sections). Locate the specific "##" header noted in the
comment and delete the empty line that separates it from its content to ensure
consistent formatting.

src/utils/computerUse/win32/uiAutomation.ts-20-60 (1)

20-60: ⚠️ Potential issue | 🟡 Minor

Preserve case-insensitive controlType matching.

This new allowlist is case-sensitive, so inputs like button or menuitem now return null even though they are only casing variants of valid control types. Canonicalize before lookup instead of rejecting on casing alone.

Suggested fix

 const VALID_CONTROL_TYPES = new Set([
   'Button',
   'Calendar',
   'CheckBox',
   'ComboBox',
   'Custom',
   'DataGrid',
   'DataItem',
   'Document',
   'Edit',
   'Group',
   'Header',
   'HeaderItem',
   'Hyperlink',
   'Image',
   'List',
   'ListItem',
   'Menu',
   'MenuBar',
   'MenuItem',
   'Pane',
   'ProgressBar',
   'RadioButton',
   'ScrollBar',
   'Separator',
   'Slider',
   'Spinner',
   'SplitButton',
   'StatusBar',
   'Tab',
   'TabItem',
   'Table',
   'Text',
   'Thumb',
   'TitleBar',
   'ToolBar',
   'ToolTip',
   'Tree',
   'TreeItem',
   'Window',
 ])
+
+const CONTROL_TYPES_BY_LOWER = new Map(
+  [...VALID_CONTROL_TYPES].map(type => [type.toLowerCase(), type] as const),
+)

   if (query.controlType) {
-    if (!VALID_CONTROL_TYPES.has(query.controlType)) {
+    const canonicalType = CONTROL_TYPES_BY_LOWER.get(
+      query.controlType.trim().toLowerCase(),
+    )
+    if (!canonicalType) {
       return null // Invalid control type
     }
-    const v = query.controlType.replace(/'/g, "''")
+    const v = canonicalType.replace(/'/g, "''")
     conditions.push(
       `[System.Windows.Automation.PropertyCondition]::new([System.Windows.Automation.AutomationElement]::ControlTypeProperty, [System.Windows.Automation.ControlType]::${v})`,
     )
   }

Also applies to: 180-184

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/utils/computerUse/win32/uiAutomation.ts` around lines 20 - 60, The
allowlist check is currently case-sensitive (VALID_CONTROL_TYPES) which rejects
valid inputs like "button" or "menuitem"; normalize controlType before lookup by
trimming and lowercasing it and compare against a derived lowercase set (e.g.,
build VALID_CONTROL_TYPES_LOWER from VALID_CONTROL_TYPES.map(s =>
s.toLowerCase())). Update every place where controlType is validated (references
near the existing VALID_CONTROL_TYPES usage and the other occurrences around
lines ~180-184) to use the normalized value (controlType?.trim().toLowerCase())
and lookup against VALID_CONTROL_TYPES_LOWER instead of the original set.

src/utils/computerUse/win32/shared.ts-68-70 (1)

68-70: ⚠️ Potential issue | 🟡 Minor

Keep modifier aliases in sync with VK_MAP.

VK_MAP accepts menu as Alt, but MODIFIER_KEYS doesn't, so combos using that alias will be treated as a normal key instead of a held modifier.

Suggested fix

 export const MODIFIER_KEYS = new Set([
   'shift',
   'lshift',
   'rshift',
   'control',
   'ctrl',
   'lcontrol',
   'rcontrol',
   'alt',
+  'menu',
   'option',
   'lalt',
   'ralt',
   'win',
   'meta',

Also applies to: 110-127

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/utils/computerUse/win32/shared.ts` around lines 68 - 70, The
MODIFIER_KEYS mapping is missing the 'menu' alias that VK_MAP treats as Alt,
causing 'menu' combos to be treated as normal keys; update MODIFIER_KEYS in
shared.ts to include the same alias entries present in VK_MAP (e.g., add menu:
0x12 alongside alt/option) and mirror any other missing aliases between the
MODIFIER_KEYS block and VK_MAP (see the MODIFIER_KEYS and VK_MAP symbols and the
repeated alias ranges around the 68–70 and 110–127 areas) so modifier aliases
stay in sync.

src/utils/computerUse/win32/accessibilitySnapshot.ts-170-196 (1)

170-196: ⚠️ Potential issue | 🟡 Minor

Escape control characters before formatting snapshot text.

Accessible names and values can contain quotes or newlines. Writing them verbatim here can split one node into multiple prompt lines and corrupt the model-facing format.

Suggested fix

 function formatForModel(
   nodes: AccessibilityNode[],
   indent: number = 0,
 ): string {
@@
   for (const node of nodes) {
     let line = `${pad}[${node.role}]`
-    if (node.name) line += ` "${truncate(node.name, 40)}"`
+    if (node.name) line += ` "${escapeForModel(node.name, 40)}"`
     line += ` (${node.bounds.x},${node.bounds.y} ${node.bounds.w}x${node.bounds.h})`
     if (!node.enabled) line += ' DISABLED'
-    if (node.value) line += ` value="${truncate(node.value, 30)}"`
-    if (node.automationId) line += ` id=${node.automationId}`
+    if (node.value) line += ` value="${escapeForModel(node.value, 30)}"`
+    if (node.automationId)
+      line += ` id="${escapeForModel(node.automationId, 40)}"`
     lines.push(line)
@@
 }
 
+function escapeForModel(s: string, max: number): string {
+  return truncate(s.replace(/\r?\n/g, ' ').replace(/"/g, '\\"'), max)
+}
+
 function truncate(s: string, max: number): string {
   return s.length > max ? s.slice(0, max - 1) + '…' : s
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/utils/computerUse/win32/accessibilitySnapshot.ts` around lines 170 - 196,
The formatted snapshot can be corrupted by unescaped control characters (quotes,
newlines) in node.name and node.value; update formatForModel to escape control
characters before embedding and truncate the escaped string rather than the raw
string. Add or reuse a helper (e.g., escapeForSnapshot or extend truncate) and
call it when inserting node.name and node.value in formatForModel so quotes
become \" and newlines become \n (and other non-printables are escaped), then
apply the max-length truncation to the escaped result to keep output lines
intact for the model.

src/utils/computerUse/win32/virtualCursor.ts-30-31 (1)

30-31: ⚠️ Potential issue | 🟡 Minor

Escape apostrophes in the embedded stop-file path.

$stopFile is inserted into a single-quoted PowerShell string (line 79), so paths containing ' will break the generated script. In PowerShell single-quoted strings, apostrophes must be escaped by doubling them ('').

Suggested fix

function buildCursorScript(hwnd: string, stopFile: string): string {
-  const stopFileEscaped = stopFile.replace(/\\/g, '\\\\')
+  const stopFileEscaped = stopFile
+    .replace(/'/g, "''")
+    .replace(/\\/g, '\\\\')
   return `

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/utils/computerUse/win32/virtualCursor.ts` around lines 30 - 31, The code
in buildCursorScript currently only escapes backslashes for stopFile
(stopFileEscaped) but fails to escape single quotes, which will break the
PowerShell single-quoted string; update buildCursorScript so after replacing
backslashes you also replace any single quote characters in stopFile with two
single quotes (i.e., stopFileEscaped = stopFileEscaped.replace(/'/g, "''"))
before embedding it into the generated script to ensure PowerShell-safe
single-quoted strings.

src/utils/computerUse/win32/comExcel.ts-32-34 (1)

32-34: ⚠️ Potential issue | 🟡 Minor

Error handling may silently swallow non-zero exit codes.

When exitCode !== 0 but stderr is empty, no error is thrown and the function returns potentially invalid/empty stdout. This could lead to silent failures.

🛡️ Proposed fix

-  if (result.exitCode !== 0 && stderr) {
-    throw new Error(`PowerShell error: ${stderr}`)
+  if (result.exitCode !== 0) {
+    throw new Error(`PowerShell error (exit ${result.exitCode}): ${stderr || 'no stderr'}`)
   }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/utils/computerUse/win32/comExcel.ts` around lines 32 - 34, The current
check only throws when exitCode !== 0 AND stderr is present, which lets non-zero
exit codes slip through; update the error handling around the PowerShell
execution in comExcel.ts to throw whenever result.exitCode !== 0, even if stderr
is empty—include useful context (exitCode, stderr, and stdout) in the Error
message so callers can diagnose failures; look for the block using
result.exitCode, stderr, and stdout and change the condition to always throw on
non-zero exitCode and format the error to contain exitCode, stderr (if any), and
stdout.

src/utils/computerUse/win32/appDispatcher.ts-102-104 (1)

102-104: ⚠️ Potential issue | 🟡 Minor

Path fallback /tmp doesn't exist on Windows.

The fallback '/tmp' combined with \\cu_new_... creates an invalid path like /tmp\cu_new_xxx.xlsx on Windows. Use a Windows-appropriate fallback.

🐛 Fix path handling

-      const tmpPath = `${process.env.TEMP || '/tmp'}\\cu_new_${Date.now()}.xlsx`
+      const tmpDir = process.env.TEMP || process.env.TMP || 'C:\\Windows\\Temp'
+      const tmpPath = `${tmpDir}\\cu_new_${Date.now()}.xlsx`

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/utils/computerUse/win32/appDispatcher.ts` around lines 102 - 104, The
code builds tmpPath using process.env.TEMP with a hardcoded '/tmp' fallback
which produces invalid mixed separators on Windows; change tmp path creation in
appDispatcher.ts to use the OS tmp directory (os.tmpdir() or
process.env.TEMP/PROCESS.ENV.TMP when present) and build the filename with
path.join/path.resolve to ensure correct separators, then pass that resulting
tmpPath to createExcel and return it as before (refer to the tmpPath variable
and createExcel call).

src/utils/computerUse/win32/appDispatcher.ts-92-99 (1)

92-99: ⚠️ Potential issue | 🟡 Minor

Error silently swallowed when opening existing Excel file.

The catch block returns the same result as success, making it impossible for callers to know if the file was actually opened. Consider propagating the error or returning a status flag.

💡 Consider adding success indicator

 export interface OpenResult {
   type: AppType
   /** HWND for text/browser/generic apps (SendMessage target) */
   hwnd?: string
   /** File path for COM-controlled apps (Excel/Word) */
   filePath?: string
+  /** Whether the open operation succeeded */
+  success?: boolean
 }

Then set success: true on successful open, success: false in catch.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/utils/computerUse/win32/appDispatcher.ts` around lines 92 - 99, The
current branch handling existing files in the isExisting block hides errors
because the catch returns the same result as success; update the logic in the
isExisting branch (where openExcel(nameOrPath) is called) to either rethrow the
caught error or return an object that includes a success flag (e.g., { type:
'excel', filePath: nameOrPath, success: false, error: err }) on failure and {
type: 'excel', filePath: nameOrPath, success: true } on success so callers can
distinguish open failures; refer to the openExcel call and the surrounding
isExisting handling to implement this change.

src/utils/computerUse/platforms/linux.ts-153-154 (1)

153-154: ⚠️ Potential issue | 🟡 Minor

Hardcoded temp file paths may cause race conditions.

If multiple instances of the app run concurrently, they'll overwrite each other's temp files. Consider using unique filenames.

💡 Use unique temp files

-const SCREENSHOT_TMP = '/tmp/cu-screenshot-tmp.png'
-const SCREENSHOT_JPG = '/tmp/cu-screenshot.jpg'
+function getTempPath(ext: string): string {
+  return `/tmp/cu-screenshot-${process.pid}-${Date.now()}${ext}`
+}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/utils/computerUse/platforms/linux.ts` around lines 153 - 154, The
constants SCREENSHOT_TMP and SCREENSHOT_JPG use hardcoded paths which can
collide across concurrent runs; update the code in linux.ts to generate unique
temp filenames (for example using os.tmpdir()/fs.mkdtemp or appending
process.pid + timestamp/UUID) and replace usages of
SCREENSHOT_TMP/SCREENSHOT_JPG with the generated path variables so each
invocation writes to its own temp file and cleans it up after use; ensure
functions that call these constants (e.g., the screenshot capture/convert
routines) accept or derive the unique path rather than relying on the hardcoded
constants.

src/utils/computerUse/win32/appDispatcher.ts-51-57 (1)

51-57: ⚠️ Potential issue | 🟡 Minor

Overly broad app name matching may cause false positives.

baseName.includes('word') matches "password", "keyword", "wordpad", etc. Consider exact match or starts-with check.

💡 Tighter matching

-  if (
-    baseName === 'winword' ||
-    baseName === 'word' ||
-    baseName.includes('word')
-  )
-    return 'word'
+  if (baseName === 'winword' || baseName === 'word' || baseName.startsWith('word'))
+    return 'word'

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/utils/computerUse/win32/appDispatcher.ts` around lines 51 - 57, The
matching in appDispatcher.ts is too broad: baseName.includes('word') will match
"password", "keyword", etc.; update the logic in the dispatcher (the block that
checks baseName for 'winword'/'word') to use a stricter check—e.g., exact
equality (baseName === 'word'), startsWith('word') for prefixes you intend (like
'wordpad' only if desired), or a word-boundary regex to match whole words
only—replace the includes('word') condition with one of these tighter checks to
avoid false positives.

packages/@ant/computer-use-mcp/src/tools.ts-459-477 (1)

459-477: ⚠️ Potential issue | 🟡 Minor

click_element schema allows calls with no element selector.

With required: [], the tool can be invoked without name, role, or automationId, which would make it impossible to locate the target element. Consider requiring at least one selector.

💡 Suggested fix

         automationId: {
           type: "string",
           description: "Exact automationId from the accessibility snapshot. Most precise selector.",
         },
       },
-      required: [],
+      required: [],  // Consider: anyOf/oneOf to require at least one of name/role/automationId
     },
   } as Tool,

Alternatively, add validation in the handler to return a clear error if no selectors are provided.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/`@ant/computer-use-mcp/src/tools.ts around lines 459 - 477, The
inputSchema for the tool (the object typed as Tool with key inputSchema)
currently has required: [] which allows calling the tool without any selector
(name, role, automationId); update the inputSchema to require at least one
selector (e.g., make required include a discriminator or enforce oneOf semantics
requiring name or role or automationId) or, if schema-level change is
undesirable, add explicit validation in the click handler that inspects the
parsed input and returns a clear error when all of name, role, and automationId
are missing; reference the inputSchema object and the handler that processes the
tool inputs to implement this guard.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 87a62de6-945a-49c3-8950-9201057f4e64

📥 Commits

Reviewing files that changed from the base of the PR and between ba97889 and c17edcb.

📒 Files selected for processing (36)

.gitignore
DEV-LOG.md
build.ts
docs/features/computer-use-architecture-v2.md
docs/features/computer-use-tools-reference.md
docs/features/computer-use.md
packages/@ant/computer-use-input/src/index.ts
packages/@ant/computer-use-mcp/src/executor.ts
packages/@ant/computer-use-mcp/src/toolCalls.ts
packages/@ant/computer-use-mcp/src/tools.ts
packages/@ant/computer-use-swift/src/backends/darwin.ts
packages/@ant/computer-use-swift/src/index.ts
src/utils/computerUse/common.ts
src/utils/computerUse/executor.ts
src/utils/computerUse/executorCrossPlatform.ts
src/utils/computerUse/hostAdapter.ts
src/utils/computerUse/platforms/darwin.ts
src/utils/computerUse/platforms/index.ts
src/utils/computerUse/platforms/linux.ts
src/utils/computerUse/platforms/types.ts
src/utils/computerUse/platforms/win32.ts
src/utils/computerUse/swiftLoader.ts
src/utils/computerUse/win32/accessibilitySnapshot.ts
src/utils/computerUse/win32/appDispatcher.ts
src/utils/computerUse/win32/bridge.py
src/utils/computerUse/win32/bridgeClient.ts
src/utils/computerUse/win32/comExcel.ts
src/utils/computerUse/win32/comWord.ts
src/utils/computerUse/win32/inputIndicator.ts
src/utils/computerUse/win32/ocr.ts
src/utils/computerUse/win32/shared.ts
src/utils/computerUse/win32/uiAutomation.ts
src/utils/computerUse/win32/virtualCursor.ts
src/utils/computerUse/win32/windowBorder.ts
src/utils/computerUse/win32/windowEnum.ts
src/utils/computerUse/win32/windowMessage.ts

coderabbitai · 2026-04-05T12:11:52Z

packages/@ant/computer-use-mcp/src/toolCalls.ts

+  // Windows/Linux: operations go through SendMessage (HWND-bound) or platform
+  // abstraction, not global input to the foreground. The frontmost gate is a
+  // macOS safety net for global CGEvent input — on other platforms, skip it
+  // when the platform's screenshotFiltering is 'none' (no per-app filtering,
+  // meaning no hide/defocus, meaning frontmost is meaningless).
+  if (adapter.executor.capabilities.screenshotFiltering === 'none') {
+    return null; // pass — non-macOS platform, frontmost irrelevant
+  }


⚠️ Potential issue | 🔴 Critical

Don't short-circuit the whole gate stack on screenshotFiltering === "none".

Line 442 returns before the allowlist/tier/clipboard logic runs, and Line 577 skips the hit-test entirely. On any executor advertising "none", mouse actions now bypass request_access enforcement instead of just bypassing the macOS-specific frontmost checks. Keep the grant checks, and special-case only the screen-based checks that truly do not apply.

Also applies to: 573-578

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/`@ant/computer-use-mcp/src/toolCalls.ts around lines 437 - 444, The early return on adapter.executor.capabilities.screenshotFiltering === 'none' incorrectly short-circuits the entire request_access flow; instead, remove the return and special-case only the macOS frontmost/screen-based checks so that allowlist/tier/clipboard grant checks and the hit-test still run. Concretely, update the logic around request_access and the frontmost-gate to: if screenshotFiltering === 'none' skip only the platform/frontmost-specific branch (the code that checks frontmost window or screen hit-test), but always execute the generic grant checks (allowlist, tier, clipboard) and invoke the hit-test/target resolution code paths used by request_access so enforcement still applies for executors advertising 'none'. Ensure references to adapter.executor.capabilities.screenshotFiltering, request_access, the allowlist/tier/clipboard checks, and the hit-test/target-resolution are the places you modify.

coderabbitai · 2026-04-05T12:11:52Z

packages/@ant/computer-use-mcp/src/toolCalls.ts

+async function handleVirtualMouse(
+  adapter: ComputerUseHostAdapter,
+  args: Record<string, unknown>,
+): Promise<CuCallToolResult> {
+  if (!adapter.executor.virtualMouse) {
+    return errorResult("virtual_mouse is only available on Windows with a bound window.", "feature_unavailable");
+  }
+  const action = requireString(args, "action");
+  if (action instanceof Error) return errorResult(action.message, "bad_args");
+  const coord = args.coordinate;
+  if (!Array.isArray(coord) || coord.length < 2) {
+    return errorResult("coordinate [x, y] is required.", "bad_args");
+  }
+  const validActions = new Set(["click", "double_click", "right_click", "move", "drag", "down", "up"]);
+  if (!validActions.has(action)) {
+    return errorResult(`Invalid action "${action}". Valid: ${[...validActions].join(", ")}`, "bad_args");
+  }
+  const startCoord = Array.isArray(args.start_coordinate) ? args.start_coordinate : undefined;
+  const ok = await adapter.executor.virtualMouse({
+    action: action as any,
+    x: coord[0], y: coord[1],
+    startX: startCoord?.[0], startY: startCoord?.[1],
+  });
+  if (!ok) {
+    return errorResult("No window is currently bound.", "bad_args");
+  }
+  const desc: Record<string, string> = {
+    click: `Click at (${coord[0]},${coord[1]})`,
+    double_click: `Double-click at (${coord[0]},${coord[1]})`,
+    right_click: `Right-click at (${coord[0]},${coord[1]})`,
+    move: `Moved to (${coord[0]},${coord[1]})`,
+    drag: `Dragged ${startCoord ? `(${startCoord[0]},${startCoord[1]})` : "current"} → (${coord[0]},${coord[1]})`,
+    down: `Button down at (${coord[0]},${coord[1]})`,
+    up: `Button up at (${coord[0]},${coord[1]})`,
+  };
+  return okText(desc[action] ?? action);
+}
+
+async function handleVirtualKeyboard(
+  adapter: ComputerUseHostAdapter,
+  args: Record<string, unknown>,
+): Promise<CuCallToolResult> {
+  if (!adapter.executor.virtualKeyboard) {
+    return errorResult("virtual_keyboard is only available on Windows with a bound window.", "feature_unavailable");
+  }
+  const action = requireString(args, "action");
+  if (action instanceof Error) return errorResult(action.message, "bad_args");
+  const text = requireString(args, "text");
+  if (text instanceof Error) return errorResult(text.message, "bad_args");
+
+  const validActions = new Set(["type", "combo", "press", "release", "hold"]);
+  if (!validActions.has(action)) {
+    return errorResult(`Invalid action "${action}". Valid: ${[...validActions].join(", ")}`, "bad_args");
+  }
+
+  const duration = typeof args.duration === "number" ? args.duration : undefined;
+  const repeat = typeof args.repeat === "number" ? args.repeat : undefined;
+
+  const ok = await adapter.executor.virtualKeyboard({
+    action: action as any,
+    text,
+    duration,
+    repeat,
+  });
+
+  if (!ok) {
+    return errorResult("No window is currently bound. Use open_application or bind_window first.", "bad_args");
+  }
+
+  const desc: Record<string, string> = {
+    type: `Typed "${text.length > 40 ? text.slice(0, 40) + "..." : text}"`,
+    combo: `Sent ${text}`,
+    press: `Pressed ${text} (holding)`,
+    release: `Released ${text}`,
+    hold: `Held ${text} for ${duration ?? 1}s`,
+  };
+
+  return okText(`${desc[action]}${repeat && repeat > 1 ? ` ×${repeat}` : ""}`);
+}
+
+async function handleStatusIndicator(
+  adapter: ComputerUseHostAdapter,
+  args: Record<string, unknown>,
+): Promise<CuCallToolResult> {
+  if (!adapter.executor.statusIndicator) {
+    return errorResult("status_indicator is only available on Windows.", "feature_unavailable");
+  }
+  const action = requireString(args, "action");
+  if (action instanceof Error) return errorResult(action.message, "bad_args");
+  if (!["show", "hide", "status"].includes(action)) {
+    return errorResult(`Invalid action "${action}". Valid: show, hide, status.`, "bad_args");
+  }
+  const message = typeof args.message === "string" ? args.message : undefined;
+  if (action === "show" && !message) {
+    return errorResult("'show' requires a message parameter.", "bad_args");
+  }
+  const result = await adapter.executor.statusIndicator(action as any, message);
+  if (action === "status") {
+    return okText(result.active ? "Indicator is active on the bound window." : "Indicator is not active (no window bound).");
+  }
+  if (action === "show") {
+    return okText(`Indicator showing: "${message}"`);
+  }
+  return okText("Indicator hidden.");
+}
+
+async function handleMouseWheel(
+  adapter: ComputerUseHostAdapter,
+  args: Record<string, unknown>,
+): Promise<CuCallToolResult> {
+  if (!adapter.executor.mouseWheel) {
+    return errorResult("mouse_wheel is only available on Windows with a bound window.", "feature_unavailable");
+  }
+  const coord = args.coordinate;
+  if (!Array.isArray(coord) || coord.length < 2) {
+    return errorResult("coordinate must be [x, y] array.", "bad_args");
+  }
+  const delta = typeof args.delta === "number" ? args.delta : undefined;
+  if (delta === undefined) {
+    return errorResult("delta is required (positive=up, negative=down).", "bad_args");
+  }
+  const horizontal = args.direction === "horizontal";
+  const ok = await adapter.executor.mouseWheel(coord[0], coord[1], delta, horizontal);
+  if (!ok) {
+    return errorResult("No window is currently bound. Use open_application or bind_window first.", "bad_args");
+  }
+  return okText(
+    `Mouse wheel: ${horizontal ? "horizontal" : "vertical"} scroll ${delta > 0 ? "up" : "down"} ${Math.abs(delta)} click(s) at (${coord[0]},${coord[1]}).`,
+  );
+}
+
+async function handleActivateWindow(
+  adapter: ComputerUseHostAdapter,
+  args: Record<string, unknown>,
+): Promise<CuCallToolResult> {
+  if (!adapter.executor.activateWindow) {
+    return errorResult("activate_window is only available on Windows with a bound window.", "feature_unavailable");
+  }
+  const clickX = typeof args.click_x === "number" ? args.click_x : undefined;
+  const clickY = typeof args.click_y === "number" ? args.click_y : undefined;
+  const ok = await adapter.executor.activateWindow(clickX, clickY);
+  if (!ok) {
+    return errorResult("No window is currently bound. Use open_application or bind_window first.", "bad_args");
+  }
+  return okText("Window activated and focused. Ready for input.");
+}
+
+async function handlePromptRespond(
+  adapter: ComputerUseHostAdapter,
+  args: Record<string, unknown>,
+): Promise<CuCallToolResult> {
+  if (!adapter.executor.respondToPrompt) {
+    return errorResult("prompt_respond is only available on Windows with a bound window.", "feature_unavailable");
+  }
+  const responseType = requireString(args, "response_type");
+  if (responseType instanceof Error) return errorResult(responseType.message, "bad_args");
+
+  const validTypes = new Set(["yes", "no", "enter", "escape", "select", "type"]);
+  if (!validTypes.has(responseType)) {
+    return errorResult(`Invalid response_type "${responseType}". Valid: ${[...validTypes].join(", ")}`, "bad_args");
+  }
+
+  if (responseType === "select" && typeof args.arrow_count !== "number") {
+    return errorResult("'select' requires arrow_count parameter.", "bad_args");
+  }
+  if (responseType === "type" && typeof args.text !== "string") {
+    return errorResult("'type' requires text parameter.", "bad_args");
+  }
+
+  const ok = await adapter.executor.respondToPrompt({
+    responseType: responseType as any,
+    arrowDirection: typeof args.arrow_direction === "string" ? args.arrow_direction as any : undefined,
+    arrowCount: typeof args.arrow_count === "number" ? args.arrow_count : undefined,
+    text: typeof args.text === "string" ? args.text : undefined,
+  });
+
+  if (!ok) {
+    return errorResult("No window is currently bound. Use open_application or bind_window first.", "bad_args");
+  }
+
+  const descriptions: Record<string, string> = {
+    yes: "Sent 'y' + Enter.",
+    no: "Sent 'n' + Enter.",
+    enter: "Sent Enter.",
+    escape: "Sent Escape.",
+    select: `Navigated ${args.arrow_direction ?? "down"} ${args.arrow_count ?? 1} time(s) + Enter.`,
+    type: `Typed "${args.text}" + Enter.`,
+  };
+
+  return okText(`Prompt responded: ${descriptions[responseType] ?? responseType}. Take a screenshot to verify.`);
+}
+
+async function handleOpenTerminal(
+  adapter: ComputerUseHostAdapter,
+  args: Record<string, unknown>,
+): Promise<CuCallToolResult> {
+  if (!adapter.executor.openTerminal) {
+    return errorResult("open_terminal is only available on Windows.", "feature_unavailable");
+  }
+  const agent = requireString(args, "agent");
+  if (agent instanceof Error) return errorResult(agent.message, "bad_args");
+
+  const validAgents = new Set(["claude", "codex", "gemini", "custom"]);
+  if (!validAgents.has(agent)) {
+    return errorResult(`Invalid agent "${agent}". Valid: claude, codex, gemini, custom.`, "bad_args");
+  }
+  if (agent === "custom" && typeof args.command !== "string") {
+    return errorResult("agent='custom' requires 'command' parameter.", "bad_args");
+  }
+
+  const result = await adapter.executor.openTerminal({
+    agent: agent as any,
+    command: typeof args.command === "string" ? args.command : undefined,
+    terminal: typeof args.terminal === "string" ? args.terminal as any : undefined,
+    workingDirectory: typeof args.working_directory === "string" ? args.working_directory : undefined,
+  });
+
+  if (!result) {
+    return errorResult(
+      "Failed to open terminal. Windows Terminal (wt.exe) may not be installed.",
+      "launch_failed",
+    );
+  }
+
+  if (!result.launched) {
+    return okText(
+      `Terminal opened (hwnd=${result.hwnd}, "${result.title}") but no command was sent. Window is now bound.`,
+    );
+  }
+
+  const agentNames: Record<string, string> = {
+    claude: "Claude Code", codex: "Codex", gemini: "Gemini",
+    custom: args.command as string,
+  };
+
+  return okText(
+    `Terminal opened and ${agentNames[agent] ?? agent} launched.\n` +
+    `Window: hwnd=${result.hwnd} "${result.title}"\n` +
+    `Command: '${agent === "custom" ? args.command : agent}' + Enter\n` +
+    `Status: bound to this terminal. Take a screenshot to verify the agent started.`,
+  );
+}
+
+async function handleBindWindow(
+  adapter: ComputerUseHostAdapter,
+  args: Record<string, unknown>,
+): Promise<CuCallToolResult> {
+  const action = requireString(args, "action");
+  if (action instanceof Error) return errorResult(action.message, "bad_args");
+
+  switch (action) {
+    case "list": {
+      if (!adapter.executor.listVisibleWindows) {
+        return errorResult("bind_window is only available on Windows.", "feature_unavailable");
+      }
+      const windows = await adapter.executor.listVisibleWindows();
+      if (windows.length === 0) return okText("No visible windows found.");
+      const lines = windows.map(
+        (w) => `hwnd=${w.hwnd} pid=${w.pid} "${w.title}"`,
+      );
+      return okText(`Visible windows (${windows.length}):\n${lines.join("\n")}`);
+    }
+    case "status": {
+      if (!adapter.executor.getBindingStatus) {
+        return errorResult("bind_window is only available on Windows.", "feature_unavailable");
+      }
+      const status = await adapter.executor.getBindingStatus();
+      if (!status || !status.bound) {
+        return okText("No window is currently bound. Use bind_window(action='list') to see available windows, then bind_window(action='bind', title='...') to bind.");
+      }
+      let text = `Bound to: hwnd=${status.hwnd}`;
+      if (status.title) text += ` "${status.title}"`;
+      if (status.pid) text += ` pid=${status.pid}`;
+      if (status.rect) text += ` rect=(${status.rect.x},${status.rect.y} ${status.rect.width}x${status.rect.height})`;
+      return okText(text);
+    }
+    case "bind": {
+      if (!adapter.executor.bindToWindow) {
+        return errorResult("bind_window is only available on Windows.", "feature_unavailable");
+      }
+      const title = typeof args.title === "string" ? args.title : undefined;
+      const hwnd = typeof args.hwnd === "string" ? args.hwnd : undefined;
+      const pid = typeof args.pid === "number" ? args.pid : undefined;
+      if (!title && !hwnd && !pid) {
+        return errorResult("Specify at least one of: title, hwnd, or pid.", "bad_args");
+      }
+      const result = await adapter.executor.bindToWindow({ hwnd, title, pid });
+      if (!result) {
+        return errorResult(
+          `No window found matching: ${[title && `title="${title}"`, hwnd && `hwnd=${hwnd}`, pid && `pid=${pid}`].filter(Boolean).join(", ")}. Use bind_window(action='list') to see available windows.`,
+          "element_not_found",
+        );
+      }
+      return okText(`Bound to window: hwnd=${result.hwnd} pid=${result.pid} "${result.title}". All subsequent screenshot/click/type operations target this window.`);
+    }
+    case "unbind": {
+      if (!adapter.executor.unbindFromWindow) {
+        return errorResult("bind_window is only available on Windows.", "feature_unavailable");
+      }
+      await adapter.executor.unbindFromWindow();
+      return okText("Window binding released. Operations now target the full screen.");
+    }
+    default:
+      return errorResult(`Unknown bind_window action "${action}". Valid: list, bind, unbind, status.`, "bad_args");
+  }
+}
+
+async function handleClickElement(
+  adapter: ComputerUseHostAdapter,
+  args: Record<string, unknown>,
+): Promise<CuCallToolResult> {
+  if (!adapter.executor.clickElement) {
+    return errorResult(
+      "click_element is only available on Windows with a bound window.",
+      "feature_unavailable",
+    );
+  }
+  const name = typeof args.name === "string" ? args.name : undefined;
+  const role = typeof args.role === "string" ? args.role : undefined;
+  const automationId = typeof args.automationId === "string" ? args.automationId : undefined;
+  if (!name && !role && !automationId) {
+    return errorResult("At least one of name, role, or automationId is required.", "bad_args");
+  }
+  const ok = await adapter.executor.clickElement({ name, role, automationId });
+  if (!ok) {
+    return errorResult(
+      `Element not found: ${[name && `name="${name}"`, role && `role=${role}`, automationId && `id=${automationId}`].filter(Boolean).join(", ")}. Take a screenshot to see current GUI elements.`,
+      "element_not_found",
+    );
+  }
+  return okText(`Clicked element: ${[name && `"${name}"`, role, automationId].filter(Boolean).join(" ")}`);
+}
+
+async function handleTypeIntoElement(
+  adapter: ComputerUseHostAdapter,
+  args: Record<string, unknown>,
+): Promise<CuCallToolResult> {
+  if (!adapter.executor.typeIntoElement) {
+    return errorResult(
+      "type_into_element is only available on Windows with a bound window.",
+      "feature_unavailable",
+    );
+  }
+  const text = requireString(args, "text");
+  if (text instanceof Error) return errorResult(text.message, "bad_args");
+  const name = typeof args.name === "string" ? args.name : undefined;
+  const role = typeof args.role === "string" ? args.role : undefined;
+  const automationId = typeof args.automationId === "string" ? args.automationId : undefined;
+  const ok = await adapter.executor.typeIntoElement({ name, role, automationId }, text);
+  if (!ok) {
+    return errorResult(
+      `Could not type into element: ${[name && `name="${name}"`, role && `role=${role}`, automationId && `id=${automationId}`].filter(Boolean).join(", ")}. The element was not found or doesn't support text input.`,
+      "element_not_found",
+    );
+  }
+  return okText(`Typed ${text.length} chars into: ${[name && `"${name}"`, role, automationId].filter(Boolean).join(" ")}`);
+}
+
+async function handleWindowManagement(
+  adapter: ComputerUseHostAdapter,
+  args: Record<string, unknown>,
+): Promise<CuCallToolResult> {
+  const action = requireString(args, "action");
+  if (action instanceof Error) return errorResult(action.message, "bad_args");
+
+  const VALID_ACTIONS = new Set([
+    "minimize", "maximize", "restore", "close", "focus", "move_offscreen", "move_resize", "get_rect",
+  ]);
+  if (!VALID_ACTIONS.has(action)) {
+    return errorResult(
+      `Unknown window_management action "${action}". Valid: ${[...VALID_ACTIONS].join(", ")}`,
+      "bad_args",
+    );
+  }
+
+  if (!adapter.executor.manageWindow) {
+    return errorResult(
+      "window_management is only available on Windows with a bound window.",
+      "feature_unavailable",
+    );
+  }
+
+  // get_rect: just return the current window position and size
+  if (action === "get_rect") {
+    if (!adapter.executor.getWindowRect) {
+      return errorResult("getWindowRect not available.", "feature_unavailable");
+    }
+    const rect = await adapter.executor.getWindowRect();
+    if (!rect) {
+      return errorResult("No window is currently bound. Call open_application first.", "bad_args");
+    }
+    return okText(
+      `Window rect: x=${rect.x}, y=${rect.y}, width=${rect.width}, height=${rect.height}`,
+    );
+  }
+
+  // move_resize: requires x, y (width/height optional)
+  if (action === "move_resize") {
+    const x = typeof args.x === "number" ? args.x : undefined;
+    const y = typeof args.y === "number" ? args.y : undefined;
+    if (x === undefined || y === undefined) {
+      return errorResult("move_resize requires x and y parameters.", "bad_args");
+    }
+    const width = typeof args.width === "number" ? args.width : undefined;
+    const height = typeof args.height === "number" ? args.height : undefined;
+    const ok = await adapter.executor.manageWindow(action, { x, y, width, height });
+    if (!ok) {
+      return errorResult("No window is currently bound. Call open_application first.", "bad_args");
+    }
+    return okText(
+      width && height
+        ? `Moved window to (${x}, ${y}) and resized to ${width}×${height}.`
+        : `Moved window to (${x}, ${y}).`,
+    );
+  }
+
+  // All other actions: minimize, maximize, restore, close, focus, move_offscreen
+  const ok = await adapter.executor.manageWindow(action);
+  if (!ok) {
+    return errorResult(
+      "No window is currently bound. Call open_application first.",
+      "bad_args",
+    );
+  }
+
+  const descriptions: Record<string, string> = {
+    minimize: "Window minimized (ShowWindow SW_MINIMIZE).",
+    maximize: "Window maximized (ShowWindow SW_MAXIMIZE).",
+    restore: "Window restored (ShowWindow SW_RESTORE).",
+    close: "Window closed (SendMessage WM_CLOSE). The window binding has been released.",
+    focus: "Window brought to front (SetForegroundWindow).",
+    move_offscreen: "Window moved offscreen (-32000,-32000). Still usable via SendMessage/PrintWindow.",
+  };
+
+  return okText(descriptions[action] ?? `Action "${action}" completed.`);
+}


⚠️ Potential issue | 🔴 Critical

The bound-window tools bypass the approval and shortcut model.

These handlers never consult overrides/subGates, and the auto-routing block returns into them before runInputActionGates, runHitTestGate, or isSystemKeyCombo. After bind_window or open_terminal, the model can bind an arbitrary HWND and then click/type/send blocked shortcuts without a matching request_access grant or tier check.

Also applies to: 3843-3889

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/`@ant/computer-use-mcp/src/toolCalls.ts around lines 2835 - 3270, The handlers allow binding/opening a window and then performing privileged input actions without consulting approval gates; fix by enforcing the same gate/override checks before creating a binding and before executing input actions. Specifically: in handleBindWindow and handleOpenTerminal, consult the existing overrides/subGates logic and require the appropriate request_access/tier grant (or call the same gate helpers) before accepting a bind or marking the window as bound; in each input/action handler (handleVirtualMouse, handleVirtualKeyboard, handleClickElement, handleTypeIntoElement, handleMouseWheel, handleActivateWindow, handlePromptRespond, handleWindowManagement, etc.) call the shared gate functions (runInputActionGates, runHitTestGate, isSystemKeyCombo or equivalent) and respect overrides/subGates before calling adapter.executor.* so that shortcuts and privileged actions are blocked unless explicitly allowed. Ensure you reference and reuse the same gate helper functions used elsewhere so behavior is consistent across bind/open and subsequent actions.

coderabbitai · 2026-04-05T12:11:52Z

packages/@ant/computer-use-mcp/src/toolCalls.ts

+    return errorResult(
+      "Failed to open terminal. Windows Terminal (wt.exe) may not be installed.",
+      "launch_failed",
+    );


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash rg -n 'type CuErrorKind|launch_failed|element_not_found' \ packages/@ant/computer-use-mcp/src/toolCalls.ts

Repository: claude-code-best/claude-code

Length of output: 228

🏁 Script executed:

# Read the CuErrorKind type definition (lines 75-91) sed -n '75,91p' packages/@ant/computer-use-mcp/src/toolCalls.ts # Also read the error branches at the specified locations echo "--- Lines 3053-3056 ---" sed -n '3053,3056p' packages/@ant/computer-use-mcp/src/toolCalls.ts echo "--- Lines 3123-3125 ---" sed -n '3123,3125p' packages/@ant/computer-use-mcp/src/toolCalls.ts echo "--- Lines 3160-3162 ---" sed -n '3160,3162p' packages/@ant/computer-use-mcp/src/toolCalls.ts echo "--- Lines 3185-3187 ---" sed -n '3185,3187p' packages/@ant/computer-use-mcp/src/toolCalls.ts

Repository: claude-code-best/claude-code

Length of output: 1936

Use CuErrorKind values that exist in the union.

The code passes "launch_failed" at line 3055 and "element_not_found" at lines 3125, 3162, and 3187, but CuErrorKind defines neither of these. Update the union type definition or replace these with valid error kinds from the CuErrorKind union.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/`@ant/computer-use-mcp/src/toolCalls.ts around lines 3053 - 3056, The code is returning errorResult with invalid CuErrorKind values ("launch_failed" and "element_not_found"); update those calls to use one of the existing CuErrorKind union members (or add the missing members to the CuErrorKind type). Locate the errorResult invocations in toolCalls.ts (e.g., the return wrapping the terminal open failure and the other occurrences) and replace "launch_failed" and "element_not_found" with a valid CuErrorKind constant (or extend the CuErrorKind union declaration to include these exact names) so the returned kind matches the defined union.

coderabbitai · 2026-04-05T12:11:52Z

packages/@ant/computer-use-swift/src/backends/darwin.ts

+      return result.split('\n').filter(Boolean).map(line => {
+        const [path, name] = line.split('|', 2)
+        const displayName = (name ?? '').replace(/\.app$/, '')
        return {
-          bundleId: bundleId ?? '',
-          displayName: displayName ?? '',
+          bundleId: `com.app.${displayName.toLowerCase().replace(/\s+/g, '-')}`,
+          displayName,
          path: path ?? '',
        }
      })


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Search for usages of listInstalled() and bundleId from installed apps rg -n 'listInstalled|\.bundleId' --type=ts -C3 packages/@ant/computer-use src/utils/computerUse

Repository: claude-code-best/claude-code

Length of output: 8822

🏁 Script executed:

# Search for how bundleId is used in opening apps or matching against known IDs rg -n 'apps\.open|tell application id|ALWAYS_KEEP_BUNDLE_IDS|bundleId.*open' --type=ts --type=tsx -C2 src/utils/computerUse/

Repository: claude-code-best/claude-code

Length of output: 99

🏁 Script executed:

# Check the ALWAYS_KEEP_BUNDLE_IDS definition to understand what apps are expected rg -n 'ALWAYS_KEEP_BUNDLE_IDS|const ALWAYS_KEEP' --type=ts -A 5 src/utils/computerUse/

Repository: claude-code-best/claude-code

Length of output: 909

🏁 Script executed:

# Search for app opening logic in executor to see if bundleId is used with "tell application id" rg -n 'open.*bundleId|tell application id|launch.*bundle' --type=ts -C3 src/utils/computerUse/

Repository: claude-code-best/claude-code

Length of output: 1140

🏁 Script executed:

# Check the Swift backend file for apps.open() implementation rg -n 'func open|apps\.open' packages/@ant/computer-use-swift/src/backends/ --type=ts -A 5

Repository: claude-code-best/claude-code

Length of output: 54

🏁 Script executed:

# Also check if there's any bundleId matching logic in the Swift backend rg -n 'bundleId|tell application id' packages/@ant/computer-use-swift/ --type=ts -C 2

Repository: claude-code-best/claude-code

Length of output: 13634

Synthetic bundle IDs will break app launching and permission filtering.

The implementation returns synthetic IDs like com.app.safari instead of real macOS bundle IDs like com.apple.Safari. This breaks:

App opening: open(bundleId) executes tell application id "${bundleId}" to activate which will fail with a non-existent bundle ID

Permission grants: The app filter at appNames.ts:177 checks ALWAYS_KEEP_BUNDLE_IDS.has(app.bundleId) against real IDs (com.apple.Safari, com.google.Chrome, etc.), which will never match the synthetic format

Meanwhile listRunning() retrieves real bundle IDs via app.bundleIdentifier(), creating an inconsistency. The synthetic ID approach from AppleScript output parsing is unsuitable for use cases requiring actual bundle IDs.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/`@ant/computer-use-swift/src/backends/darwin.ts around lines 173 - 181, The mapping in darwin.ts that builds synthetic bundleId values must be replaced so bundleId is the app's real CFBundleIdentifier (not `com.app.*`); change the mapper that returns { bundleId, displayName, path } to query the app at path (the `path` value returned from the AppleScript parse) and read its CFBundleIdentifier (e.g., read Contents/Info.plist CFBundleIdentifier via defaults/plutil or use `osascript -e 'id of application "<path>"'`) and set that as bundleId; keep displayName and path unchanged and ensure this new real bundleId is compatible with listRunning()/app.bundleIdentifier() and ALWAYS_KEEP_BUNDLE_IDS checks.

coderabbitai · 2026-04-05T12:11:52Z

src/utils/computerUse/hostAdapter.ts

+      const cu = requireComputerUseSwift()
+      const accessibility = (cu as any).tcc.checkAccessibility()
+      const screenRecording = (cu as any).tcc.checkScreenRecording()
      return accessibility && screenRecording
        ? { granted: true }
        : { granted: false, accessibility, screenRecording }


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check if tcc is exported from computer-use-swift rg -n 'export.*tcc|tcc.*=' packages/@ant/computer-use-swift/src/ --type=ts

Repository: claude-code-best/claude-code

Length of output: 54

🏁 Script executed:

#!/bin/bash # First, check if the computer-use-swift package exists find . -type d -name "*computer-use*" 2>/dev/null | head -20

Repository: claude-code-best/claude-code

Length of output: 174

🏁 Script executed:

#!/bin/bash # Find where requireComputerUseSwift is defined rg -n 'requireComputerUseSwift' --type ts --type tsx

Repository: claude-code-best/claude-code

Length of output: 99

🏁 Script executed:

#!/bin/bash # Read the full hostAdapter.ts file to understand context wc -l src/utils/computerUse/hostAdapter.ts

Repository: claude-code-best/claude-code

Length of output: 111

🏁 Script executed:

#!/bin/bash # Find requireComputerUseSwift definition - fixed syntax rg -n 'requireComputerUseSwift' --type ts

Repository: claude-code-best/claude-code

Length of output: 2028

🏁 Script executed:

#!/bin/bash # Read the full hostAdapter.ts file cat -n src/utils/computerUse/hostAdapter.ts

Repository: claude-code-best/claude-code

Length of output: 3408

🏁 Script executed:

#!/bin/bash # Explore the computer-use-swift package structure find packages/@ant/computer-use-swift -type f | head -30

Repository: claude-code-best/claude-code

Length of output: 375

🏁 Script executed:

#!/bin/bash # Read swiftLoader.ts to see what ComputerUseAPI is cat -n src/utils/computerUse/swiftLoader.ts

Repository: claude-code-best/claude-code

Length of output: 1050

🏁 Script executed:

#!/bin/bash # Check the computer-use-swift package types cat -n packages/@ant/computer-use-swift/src/types.ts

Repository: claude-code-best/claude-code

Length of output: 2523

🏁 Script executed:

#!/bin/bash # Check the computer-use-swift package index to see what's exported cat -n packages/@ant/computer-use-swift/src/index.ts

Repository: claude-code-best/claude-code

Length of output: 2496

🏁 Script executed:

#!/bin/bash # Check the darwin backend file to see if tcc is defined there cat -n packages/@ant/computer-use-swift/src/backends/darwin.ts

Repository: claude-code-best/claude-code

Length of output: 10241

Missing tcc property will cause runtime error on macOS.

The code calls (cu as any).tcc.checkAccessibility() and (cu as any).tcc.checkScreenRecording() without accessing an actual property. The ComputerUseAPI class returned by requireComputerUseSwift() exports only apps, display, screenshot, and resolvePrepareCapture — there is no tcc property in the API definition. This will throw TypeError: Cannot read properties of undefined (reading 'checkAccessibility') when ensureOsPermissions is called on macOS.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/hostAdapter.ts` around lines 49 - 54, The code assumes (cu as any).tcc exists and calls its methods, causing a TypeError; update the ensureOsPermissions logic in hostAdapter.ts to guard against missing tcc: after calling requireComputerUseSwift(), verify that cu.tcc is defined and that cu.tcc.checkAccessibility and cu.tcc.checkScreenRecording are functions before invoking them; if tcc or either method is missing, return a safe failure object (e.g., { granted: false, accessibility: false, screenRecording: false } or include undefined flags) instead of calling into undefined. Ensure you reference requireComputerUseSwift() and the checkAccessibility/checkScreenRecording method names in your changes so callers get a clear, non-throwing result.

coderabbitai · 2026-04-05T12:11:52Z

src/utils/computerUse/platforms/win32.ts

+    if (boundHwnd) {
+      const result = this.captureWindow?.(String(boundHwnd))
+      if (result) return result
+    }


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash rg -n 'interface ScreenshotPlatform|captureWindow\?' \ src/utils/computerUse/platforms/types.ts rg -n 'captureScreen|captureWindow' \ src/utils/computerUse/platforms/win32.ts

Repository: claude-code-best/claude-code

Length of output: 414

🏁 Script executed:

sed -n '397,425p;427,445p' src/utils/computerUse/platforms/win32.ts

Repository: claude-code-best/claude-code

Length of output: 1593

Make captureWindow return a promise.

ScreenshotPlatform.captureWindow is declared as Promise<ScreenshotResult | null> in the interface, but the implementation at line 427 is synchronous and returns a raw ScreenshotResult | null. The method is not async, and usage at lines 400 and 421 consumes the result synchronously (if (result) return result), which would fail if the result were actually a Promise. Change the implementation to either mark it async or explicitly wrap the return value in a Promise to match the interface.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/platforms/win32.ts` around lines 399 - 402, The implementation of ScreenshotPlatform.captureWindow must return a Promise to match its declared signature; update the captureWindow implementation to be async (or explicitly return Promise.resolve(...)) and update its call sites here (the call using boundHwnd and the other call at the later branch) to await the result (e.g., const result = await this.captureWindow?.(String(boundHwnd))) before checking if (result) return result so you handle the Promise correctly; reference symbols: captureWindow, ScreenshotPlatform.captureWindow, ScreenshotResult, boundHwnd.

coderabbitai · 2026-04-05T12:11:52Z

src/utils/computerUse/win32/bridgeClient.ts

+export function ensureBridge(): boolean {
+  if (bridgeProc) return true
+  try {
+    const scriptPath = path.join(__dirname, 'bridge.py')


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check if __dirname is used elsewhere and how it's resolved rg -n '__dirname' src/utils/computerUse/

Repository: claude-code-best/claude-code

Length of output: 280

🏁 Script executed:

cat -n src/utils/computerUse/win32/bridgeClient.ts | head -60

Repository: claude-code-best/claude-code

Length of output: 2092

🏁 Script executed:

ls -la src/utils/computerUse/win32/ | grep -E '\.(py|ts|tsx)$'

Repository: claude-code-best/claude-code

Length of output: 1044

🏁 Script executed:

stat src/utils/computerUse/win32/bridge.py 2>&1 || echo "File not found"

Repository: claude-code-best/claude-code

Length of output: 129

🏁 Script executed:

sed -n '42p' src/utils/computerUse/win32/bridgeClient.ts | wc -c

Repository: claude-code-best/claude-code

Length of output: 74

🏁 Script executed:

sed -n '153p' src/utils/computerUse/win32/bridgeClient.ts | wc -c

Repository: claude-code-best/claude-code

Length of output: 74

🌐 Web query:

Bun ESM __dirname support availability

💡 Result:

Bun does not natively support the dirname global in strict ES modules (ESM). Instead, Bun provides equivalent functionality through import.meta: - import.meta.dir gives the absolute path to the directory containing the current module (equivalent to dirname). - import.meta.dirname is an alias for import.meta.dir for Node.js compatibility. - import.meta.path gives the absolute path to the current file (equivalent to filename). This is the standard approach in ESM environments, matching Node.js behavior where dirname is not available in ESM without flags or polyfills. Bun's official documentation lists dirname as a Node.js global it implements, but issues and comments confirm it throws ReferenceError in ESM contexts (e.g., bundled ESM code), requiring replacement with import.meta.dir. For code using dirname (often from CommonJS dependencies), manual replacement or polyfills are needed, especially when bundling with bun build --target=bun, where dirname gets hardcoded. As of 2026-04-05 (Bun ~1.2+ based on release notes), no native ESM dirname global polyfill exists; use import.meta.dir. Example usage: console.log(import.meta.dir); // /path/to/project Compatibility note: Works in Bun runtime for unbundled ESM. Bundled code requires handling during build.

Citations:

1: https://bun.com/docs/runtime/module-resolution

2: https://bun.sh/docs/runtime/modules

3: https://bun.sh/docs/api/import-meta

4: Bundling with --target=bun changes __dirname references to hardcoded values oven-sh/bun#4216

5: ReferenceError: Can't find variable: __dirname oven-sh/bun#4301

Replace __dirname with import.meta.dir for Bun ESM compatibility.

Bun does not support the __dirname global in ESM modules—this will throw a ReferenceError at runtime. Use import.meta.dir instead:

Suggested fix (lines 42 & 153)

const scriptPath = path.join(import.meta.dir, 'bridge.py')

The bridge.py file is co-located in the correct directory, but the path resolution method must be updated.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/win32/bridgeClient.ts` at line 42, Replace uses of the CommonJS __dirname with the ESM-compatible import.meta.dir when building the bridge.py path: update the scriptPath assignment that calls path.join(__dirname, 'bridge.py') (and any other path.join(__dirname, 'bridge.py') occurrence such as the second usage around bridge execution) to use import.meta.dir instead so Bun ESM modules can resolve the co-located bridge.py; keep the same path.join call and variable name (scriptPath) but swap the dirname source to import.meta.dir.

unraid added 2 commits April 5, 2026 15:38

chore: add .agents/.codex/.omx to .gitignore

7a2ade0

coderabbitai bot reviewed Apr 5, 2026

View reviewed changes

claude-code-best merged commit 2b84333 into claude-code-best:main Apr 5, 2026
4 checks passed

coderabbitai bot mentioned this pull request Apr 6, 2026

feat: enable Computer Use on Windows and Linux #145

Merged

Conversation

amDosion commented Apr 5, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

与 PR #98 的关系

新增内容（#98 没有的）

增强内容（改进 #98 已有的）

性能对比

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amDosion commented Apr 5, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 5, 2026 •

edited

Loading