- AI-Ready β purpose-built JSON app structure for LLM agents (
See β Think β Actloop) - Remote Automation β drive UI on any Windows machine from anywhere via gRPC
- Language Agnostic β any language with a gRPC client can automate Windows apps
- MCP Server β plug-and-play integration with Model Context Protocol clients
- Full UI Control β click, type, scroll, screenshot, read properties, navigate trees
- Layered Security β TLS encryption, Bearer-token authentication, app + interaction whitelist/blacklist, key-input filtering
- Element Caching β live-validated cache with scoped invalidation by process or app name
UiAutomationGRPC gives AI agents structured vision into Windows applications. Instead of relying on screenshots and pixel coordinates, the agent receives a semantic JSON tree of every UI element β names, types, automation IDs, and bounding rectangles β and acts on them by ID.
| Agent | Integration | Notes |
|---|---|---|
| Claude (Anthropic) | MCP Server / Skill | First-class MCP tool support + pre-built Skill definition |
| Google Antigravity | MCP Server / Skill | MCP tools + pre-built Skill definition |
| OpenAI Codex / ChatGPT | MCP / Programmatic | Via MCP bridge or direct gRPC SDK |
| Cursor | MCP Server | Native MCP client support |
| Windsurf (Codeium) | MCP Server | Native MCP client support |
| Custom Agents | gRPC SDK | Any language β Python, TypeScript, Go, Rust, β¦ |
graph LR
A["π See<br/>get_app_structure"] --> B["π§ Think<br/>LLM analyzes JSON"]
B --> C["β‘ Act<br/>perform_action_with_structure"]
C --> A
classDef step fill:#1a365d,stroke:#64ffda,stroke-width:2px,color:#fff;
class A,B,C step;
- See β
get_app_structurereturns the full UI hierarchy as a JSON tree - Think β the LLM identifies target elements by name, type, or automation ID
- Act β
perform_action_with_structureexecutes the action and returns the refreshed tree in one call
User: "Open Calculator and compute 42 Γ 7"
1. open_app(app_name="calc")
2. get_app_structure(app_name="calc") β JSON tree
3. LLM: "I see buttons: Four, Two, Multiply, Seven, Equals"
4. perform_action_with_structure("num4Button", "INVOKE") β click 4, get new tree
5. perform_action_with_structure("num2Button", "INVOKE") β click 2
6. perform_action_with_structure("multiplyButton", "INVOKE")
7. perform_action_with_structure("num7Button", "INVOKE")
8. perform_action_with_structure("equalButton", "INVOKE")
9. LLM reads result from updated structure: "294"
For traditional scripting and test automation, the .NET SDK provides a full async API.
using UiAutomationGRPC.Library;
await using var driver = new UiAutomationDriver("http://127.0.0.1:50051", insecureMode: true);
// Launch an application
var (success, message, processId) = await driver.OpenAppAsync("calc");
// Find an element by AutomationId
var element = await driver.FindElementAsync(new FindElementRequest
{
Condition = new Condition
{
PropertyCondition = new PropertyCondition
{
PropertyName = "AutomationId",
PropertyValue = "num9Button"
}
},
Scope = TreeScope.Descendants
});
// Interact
await driver.PerformActionAsync(element.RuntimeId, ActionType.Invoke);
// Virtual input helpers
var keyboard = new VirtualKeyboard(driver);
await keyboard.SendWaitAsync("2+2=");
var mouse = new VirtualMouse(driver);
await mouse.LeftClickAsync(element.RuntimeId);dotnet add package UiAutomationGRPCgraph TD
subgraph Clients
Script["π Automation Script"]
LLM["π€ LLM / AI Agent"]
end
subgraph SDK
Library["UiAutomationGRPC.Library<br/>.NET 8 SDK"]
MCP["MCP Server<br/>.NET 8"]
Skill["Skill<br/>gRPCurl"]
end
Script --> Library
LLM --> MCP
LLM --> Skill
Library -->|gRPC| Server["UiAutomationGRPC.Server<br/>.NET 8 (Windows)"]
MCP -->|gRPC| Server
Skill -->|gRPC| Server
Server -->|Windows UIA| Target["π₯οΈ Target Application"]
classDef client fill:#0d548c,stroke:#64ffda,stroke-width:2px,color:#fff;
classDef sdk fill:#2d6a4f,stroke:#64ffda,stroke-width:2px,color:#fff;
classDef server fill:#4c381e,stroke:#64ffda,stroke-width:2px,color:#fff;
class Script,LLM client;
class Library,MCP,Skill sdk;
class Server,Target server;
| Component | Description | Target |
|---|---|---|
| UiAutomationGRPC.Server | Core gRPC service β exposes Windows UI Automation over the network | net8.0-windows |
| UiAutomationGRPC.Library | .NET client SDK β UiAutomationDriver, VirtualMouse, VirtualKeyboard |
net8.0-windows |
| UiAutomationGRPC.AI | AI integration β MCP Server + Skill definitions for LLM agents | net8.0 |
| UiAutomationGRPC.Client | Sample console app β Calculator automation reference implementation | net8.0-windows |
| Direct Element Work | App Structure (LLM-Friendly) | |
|---|---|---|
| API | FindElement, GetChildren, PerformAction |
GetAppStructure, PerformActionWithStructure |
| Best For | Scripts, known UI hierarchies | LLMs, dynamic exploration |
| Overhead | Low | Higher (builds JSON tree) |
| State | Per-element | Full application |
UI Automation pattern actions (require an element RuntimeId):
| Action | Description |
|---|---|
Invoke |
Click / activate via the UIA InvokePattern |
Toggle |
Checkboxes, switches |
SetValue |
Set text in input fields |
Select |
Select list items |
SetFocus |
Focus an element |
ExpandCollapse |
Expand / collapse tree nodes, menus (pass collapse to collapse) |
LeftClick / RightClick / DoubleClick |
Simulated mouse click at the element's clickable point |
MoveTo |
Move the cursor to the element's center |
Simulated input actions (no element required β driven by VirtualMouse / VirtualKeyboard):
| Action | Description |
|---|---|
Move |
Move the cursor to absolute screen coordinates |
LeftClick / RightClick / MouseMiddleClick |
Click at the current cursor position |
LeftDown / LeftUp / RightDown / RightUp |
Press / release a mouse button |
MousWeelScroll |
Scroll the mouse wheel |
Separate RPCs cover SendKeys (keyboard input, including modifier combinations like ^s) and TakeScreenshot (capture an element or window).
| Component | Requirement |
|---|---|
| Server | Windows, .NET 8 SDK, Administrator privileges |
| Library | .NET 8 (Windows) |
| MCP | .NET 8 SDK |
cd UiAutomationGRPC.Server
dotnet runDefault endpoint: localhost:50051
await using var driver = new UiAutomationDriver("http://127.0.0.1:50051", insecureMode: true);
var (success, message, processId) = await driver.OpenAppAsync("notepad");Build the MCP server binary first:
dotnet build UiAutomationGRPC.AI/MCP/UiAutomationGRPC.LLM.csprojClaude Code (VS Code extension): a .mcp.json file is already present at the repo root. Reload VS Code and approve the uiautomation server when prompted.
Other MCP clients (Claude Desktop, Cursor, Windsurf): configure your client to run:
dotnet run --no-build --project UiAutomationGRPC.AI/MCPThe agent can immediately start the See β Think β Act loop.
See MCP README for full setup details and troubleshooting.
Three security modes, configured in appsettings.json:
| Mode | Encryption | Authentication | Use Case |
|---|---|---|---|
| Insecure (default) | β HTTP | β None | Local development |
| HTTPS | β TLS | β None | Encrypted communication |
| HTTPS + Token | β TLS | β Bearer token | Production |
// Development
await using var driver = new UiAutomationDriver("http://127.0.0.1:50051", insecureMode: true);
// Production
await using var driver = new UiAutomationDriver("https://127.0.0.1:50051", authToken: "your-token");Additional access controls (all configured in appsettings.json):
- App Whitelist / Blacklist β restrict which applications
OpenAppcan launch (with per-app argument filtering). - Interaction restrictions β gate element interactions, structure reads, and process termination against the same allow/deny lists.
- Key-input filtering β whitelist/blacklist what
SendKeysmay send.
See Server README for full configuration details.
| Document | Contents |
|---|---|
| Server README | API reference, security, configuration, installation |
| Library README | SDK usage guide, API reference, input helpers |
| AI README | AI/LLM integration overview |
| MCP README | MCP server setup & tool documentation |
| Client README | Calculator automation walkthrough |
This project is licensed under the Apache License 2.0.
