Skip to content

feat: Computer Use Windows — 跨平台 executor + Python Bridge + GUI 无障碍#136

Closed
amDosion wants to merge 2 commits intoclaude-code-best:mainfrom
amDosion:feat/computer-use-windows
Closed

feat: Computer Use Windows — 跨平台 executor + Python Bridge + GUI 无障碍#136
amDosion wants to merge 2 commits intoclaude-code-best:mainfrom
amDosion:feat/computer-use-windows

Conversation

@amDosion
Copy link
Copy Markdown
Contributor

@amDosion amDosion commented Apr 5, 2026

Summary

在已合并的三平台 Computer Use 基础上(PR #98),大幅增强 Windows 专项能力:

  • 跨平台 Executor (executorCrossPlatform.ts, 1143 行) — 统一的工具调用执行层,替代原有平台分散逻辑
  • Python Bridge (bridge.py + bridgeClient.ts) — 长驻 Python 进程替代逐次 PowerShell 调用,窗口枚举 1.5ms vs 500ms,截图 360ms vs 800ms
  • GUI 无障碍增强 — Accessibility Snapshot 自动附带截图,click_element / type_into_element 按名称操作无需坐标
  • Win32 原生模块 — SendMessageW Unicode 输入、UI Automation 元素树、OCR、窗口边框跟踪、虚拟光标
  • MCP Server 完整实现 — toolCalls (4205 行) / tools (1052 行) / executor / mcpServer 等 12 文件

新增文件 (22 个)

模块 文件 说明
跨平台 executorCrossPlatform.ts 统一 executor,平台分发
跨平台 platforms/{win32,darwin,linux,types,index}.ts 平台抽象层
Win32 win32/windowMessage.ts SendMessageW + 剪贴板粘贴
Win32 win32/bridge.py + bridgeClient.ts Python Bridge 17 种方法
Win32 win32/uiAutomation.ts UI Automation 元素树
Win32 win32/accessibilitySnapshot.ts 无障碍快照
Win32 win32/ocr.ts Windows.Media.Ocr
Win32 win32/windowBorder.ts 4 叠加窗口边框 30fps
Win32 win32/windowEnum.ts EnumWindows 窗口枚举
Win32 win32/virtualCursor.ts 虚拟光标渲染
Win32 win32/inputIndicator.ts 输入状态指示
Win32 win32/comExcel.ts / comWord.ts COM 自动化
Win32 win32/shared.ts / appDispatcher.ts 共享工具
MCP toolCalls.ts / tools.ts / executor.ts / mcpServer.ts 完整 MCP server

修改文件 (14 个)

主要是移除 macOS 硬编码、扩展平台支持、增加 feature flag。

Test plan

  • Windows: bun run dev → Computer Use 工具可用
  • Windows: 截图、鼠标点击、键盘输入正常
  • Windows: click_element / type_into_element 按名称操作
  • Windows: Python Bridge 自动启动,窗口枚举 < 5ms
  • macOS: 原有功能不受影响
  • bun run build 成功

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Computer Use now supports Windows and Linux platforms (previously macOS-only).
    • Added window binding capability on Windows for focused interactions.
    • Introduced virtual keyboard and mouse input for bound windows.
    • Added UI Automation element targeting and interaction on Windows.
    • Introduced accessibility snapshots showing window elements and structure.
    • Added Excel and Word document automation on Windows.
    • Enabled terminal interaction support.
    • Added window management actions (minimize, maximize, close, move, resize).
  • Documentation

    • Added comprehensive Computer Use architecture and implementation guides.
    • Added detailed tools reference documentation.

unraid added 2 commits April 5, 2026 15:38
三平台 Computer Use (macOS + Windows + Linux),Windows 专项增强。

- MCP server: toolCalls/tools/executor/mcpServer 等 12 文件完整实现
- 平台抽象层: platforms/{win32,darwin,linux}.ts
- 跨平台 executor: executorCrossPlatform.ts
- CHICAGO_MCP + VOICE_MODE feature flags 启用

- windowMessage.ts: SendMessageW (WM_CHAR Unicode + 剪贴板粘贴)
- windowBorder.ts: 4 叠加窗口边框 (30fps 跟踪)
- uiAutomation.ts: UI Automation 元素树/点击/写值
- accessibilitySnapshot.ts: 无障碍快照 → 模型感知 GUI
- bridge.py + bridgeClient.ts: Python 长驻进程 (替代 per-call PS)

- window_management: min/max/restore/close/focus (Win32 API)
- click_element / type_into_element: 按名称操作 (无需坐标)
- 截图自动附带 Accessibility Snapshot

- 17 种方法, stdin/stdout JSON 通信
- 窗口枚举 1.5ms vs PS 500ms, 截图 360ms vs PS 800ms
- 依赖: mss + Pillow + pywinauto
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 5, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This PR refactors Computer Use to support cross-platform operation (macOS, Windows, Linux) by introducing a platform abstraction layer, making native packages macOS-only, implementing Windows/Linux backends with platform-specific APIs, and extending the executor interface with Windows-specific capabilities like window binding, UI Automation, and virtual input routing.

Changes

Cohort / File(s) Summary
Configuration & Metadata
.gitignore, DEV-LOG.md
Updated .gitignore to exclude agent/tool runtime directories and Python cache; removed Voice Mode entry from DEV-LOG.md.
Build & Formatting
build.ts
Reformatted string quotes, semicolons, and arrow function formatting without changing control flow or build logic.
Documentation
docs/features/computer-use-architecture-v2.md, docs/features/computer-use-tools-reference.md, docs/features/computer-use.md
Added comprehensive architecture and tools reference documentation; replaced user guide with phased implementation plan for cross-platform support.
Package Interfaces
packages/@ant/computer-use-input/src/index.ts, packages/@ant/computer-use-swift/src/index.ts
Restricted to macOS (darwin) only; moved type definitions inline, removed multi-platform dispatch, removed unsupported fallbacks, returning undefined for unsupported platforms.
Executor Interface
packages/@ant/computer-use-mcp/src/executor.ts
Extended ComputerExecutor with optional Windows-only capabilities: window binding/management, UI Automation, virtual input, terminal operations, accessibility snapshots.
Tool Definitions & Dispatch
packages/@ant/computer-use-mcp/src/tools.ts, packages/@ant/computer-use-mcp/src/toolCalls.ts
Added Windows-specific tool schemas; implemented new handlers for bound-window mode (virtual mouse/keyboard, UI Automation, window management); auto-route generic input to virtual input when window bound.
App Enumeration
packages/@ant/computer-use-swift/src/backends/darwin.ts
Switched app enumeration from Spotlight metadata to AppleScript-based /Applications scanning with deterministic bundle ID synthesis.
Platform Abstraction Layer
src/utils/computerUse/platforms/types.ts, src/utils/computerUse/platforms/index.ts, src/utils/computerUse/platforms/darwin.ts, src/utils/computerUse/platforms/linux.ts, src/utils/computerUse/platforms/win32.ts
Introduced unified cross-platform interface with per-platform backends; darwin delegates to native packages; linux uses xdotool/xrandr/wmctrl; Windows uses COM, UI Automation, PowerShell, and Python bridge.
Cross-Platform Executor
src/utils/computerUse/executorCrossPlatform.ts
New cross-platform executor with platform abstraction integration, HWND binding support, coordinate translation, window-bound input routing, and fallback paths.
Configuration & Setup
src/utils/computerUse/common.ts, src/utils/computerUse/swiftLoader.ts, src/utils/computerUse/hostAdapter.ts, src/utils/computerUse/executor.ts
Updated capability detection for win32/linux; enforced macOS-only guards on Swift loader; delegated non-macOS to cross-platform executor; simplified permission checks.
Windows Utilities
src/utils/computerUse/win32/shared.ts, src/utils/computerUse/win32/windowEnum.ts, src/utils/computerUse/win32/windowMessage.ts, src/utils/computerUse/win32/windowBorder.ts, src/utils/computerUse/win32/virtualCursor.ts, src/utils/computerUse/win32/inputIndicator.ts
Core Windows helpers: HWND validation, PowerShell wrappers, virtual-key mappings, window enumeration with string HWNDs, window-message-based input injection, DWM border marking, virtual cursor overlay, input indicator overlay.
Windows UIA & Accessibility
src/utils/computerUse/win32/uiAutomation.ts, src/utils/computerUse/win32/accessibilitySnapshot.ts
UI Automation element finding with control-type allowlist; accessibility tree snapshot via PowerShell with JSON parsing and text representation for models.
Windows App & File Automation
src/utils/computerUse/win32/appDispatcher.ts, src/utils/computerUse/win32/comExcel.ts, src/utils/computerUse/win32/comWord.ts
App type detection and dispatch; headless Excel workbook operations via COM (read/write cells, formulas, save); headless Word document operations via COM (read paragraphs, find/replace, insert text/tables, save/PDF).
Windows Python Bridge
src/utils/computerUse/win32/bridge.py, src/utils/computerUse/win32/bridgeClient.ts, src/utils/computerUse/win32/ocr.ts
Long-lived Python subprocess for screenshot capture (GDI PrintWindow, mss), window enumeration/management, target-window input dispatch, accessibility snapshot (pywinauto), and text pasting; TypeScript RPC client with request/response tracking and timeout handling; OCR utility using shared Python bridge.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • feat: enable Computer Use with macOS + Windows + Linux support #98: Both PRs add overlapping cross-platform Computer Use architecture changes—platform dispatch, executor refactoring, packages becoming macOS-only (packages/@ant/computer-use-input and @ant/computer-use-swift), and introduction of multi-platform backends.
  • Feature/computer use/mac support #108: Both PRs modify shared Computer Use modules including app enumeration (packages/@ant/computer-use-swift/src/backends/darwin.ts), permission checks (src/utils/computerUse/hostAdapter.ts), and app resolution logic (toolCalls resolveRequestedApps).

Suggested reviewers

  • KonghaYao

Poem

🐰 Cross-platform dreams take flight,
From macOS roots to Linux bright!
Windows joins the morning light,
With abstractions nested right—
Computer Use spreads its wings tonight! 🚀

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.03% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the primary changes: cross-platform executor, Python Bridge integration, and GUI accessibility improvements for Windows Computer Use.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@amDosion amDosion closed this Apr 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant