Skip to content

Donsezan/UiAutomationGRPC

Repository files navigation

UiAutomationGRPC

Drive any Windows desktop application through AI agents or code β€” over gRPC

Build Status .NET 8 Windows gRPC License


✨ Key Features

  • AI-Ready β€” purpose-built JSON app structure for LLM agents (See β†’ Think β†’ Act loop)
  • Remote Automation β€” drive UI on any Windows machine from anywhere via gRPC
  • Language Agnostic β€” any language with a gRPC client can automate Windows apps
  • MCP Server β€” plug-and-play integration with Model Context Protocol clients
  • Full UI Control β€” click, type, scroll, screenshot, read properties, navigate trees
  • Layered Security β€” TLS encryption, Bearer-token authentication, app + interaction whitelist/blacklist, key-input filtering
  • Element Caching β€” live-validated cache with scoped invalidation by process or app name

πŸ€– LLM Agent Integration

UiAutomationGRPC gives AI agents structured vision into Windows applications. Instead of relying on screenshots and pixel coordinates, the agent receives a semantic JSON tree of every UI element β€” names, types, automation IDs, and bounding rectangles β€” and acts on them by ID.

Supported Agents

Agent Integration Notes
Claude (Anthropic) MCP Server / Skill First-class MCP tool support + pre-built Skill definition
Google Antigravity MCP Server / Skill MCP tools + pre-built Skill definition
OpenAI Codex / ChatGPT MCP / Programmatic Via MCP bridge or direct gRPC SDK
Cursor MCP Server Native MCP client support
Windsurf (Codeium) MCP Server Native MCP client support
Custom Agents gRPC SDK Any language β€” Python, TypeScript, Go, Rust, …

How It Works: See β†’ Think β†’ Act

graph LR
    A["πŸ” See<br/>get_app_structure"] --> B["🧠 Think<br/>LLM analyzes JSON"]
    B --> C["⚑ Act<br/>perform_action_with_structure"]
    C --> A

    classDef step fill:#1a365d,stroke:#64ffda,stroke-width:2px,color:#fff;
    class A,B,C step;
Loading
  1. See β€” get_app_structure returns the full UI hierarchy as a JSON tree
  2. Think β€” the LLM identifies target elements by name, type, or automation ID
  3. Act β€” perform_action_with_structure executes the action and returns the refreshed tree in one call

Example (MCP Tool Calls)

User: "Open Calculator and compute 42 Γ— 7"

1. open_app(app_name="calc")
2. get_app_structure(app_name="calc")         β†’ JSON tree
3. LLM: "I see buttons: Four, Two, Multiply, Seven, Equals"
4. perform_action_with_structure("num4Button", "INVOKE")  β†’ click 4, get new tree
5. perform_action_with_structure("num2Button", "INVOKE")  β†’ click 2
6. perform_action_with_structure("multiplyButton", "INVOKE")
7. perform_action_with_structure("num7Button", "INVOKE")
8. perform_action_with_structure("equalButton", "INVOKE")
9. LLM reads result from updated structure: "294"

πŸ’» Programmatic Automation

For traditional scripting and test automation, the .NET SDK provides a full async API.

Quick Example

using UiAutomationGRPC.Library;

await using var driver = new UiAutomationDriver("http://127.0.0.1:50051", insecureMode: true);

// Launch an application
var (success, message, processId) = await driver.OpenAppAsync("calc");

// Find an element by AutomationId
var element = await driver.FindElementAsync(new FindElementRequest
{
    Condition = new Condition
    {
        PropertyCondition = new PropertyCondition
        {
            PropertyName = "AutomationId",
            PropertyValue = "num9Button"
        }
    },
    Scope = TreeScope.Descendants
});

// Interact
await driver.PerformActionAsync(element.RuntimeId, ActionType.Invoke);

// Virtual input helpers
var keyboard = new VirtualKeyboard(driver);
await keyboard.SendWaitAsync("2+2=");

var mouse = new VirtualMouse(driver);
await mouse.LeftClickAsync(element.RuntimeId);

Install via NuGet

dotnet add package UiAutomationGRPC

πŸ—οΈ Architecture

graph TD
    subgraph Clients
        Script["πŸ“ Automation Script"]
        LLM["πŸ€– LLM / AI Agent"]
    end

    subgraph SDK
        Library["UiAutomationGRPC.Library<br/>.NET 8 SDK"]
        MCP["MCP Server<br/>.NET 8"]
        Skill["Skill<br/>gRPCurl"]
    end

    Script --> Library
    LLM --> MCP
    LLM --> Skill
    Library -->|gRPC| Server["UiAutomationGRPC.Server<br/>.NET 8 (Windows)"]
    MCP -->|gRPC| Server
    Skill -->|gRPC| Server
    Server -->|Windows UIA| Target["πŸ–₯️ Target Application"]

    classDef client fill:#0d548c,stroke:#64ffda,stroke-width:2px,color:#fff;
    classDef sdk fill:#2d6a4f,stroke:#64ffda,stroke-width:2px,color:#fff;
    classDef server fill:#4c381e,stroke:#64ffda,stroke-width:2px,color:#fff;

    class Script,LLM client;
    class Library,MCP,Skill sdk;
    class Server,Target server;
Loading

πŸ“¦ Project Structure

Component Description Target
UiAutomationGRPC.Server Core gRPC service β€” exposes Windows UI Automation over the network net8.0-windows
UiAutomationGRPC.Library .NET client SDK β€” UiAutomationDriver, VirtualMouse, VirtualKeyboard net8.0-windows
UiAutomationGRPC.AI AI integration β€” MCP Server + Skill definitions for LLM agents net8.0
UiAutomationGRPC.Client Sample console app β€” Calculator automation reference implementation net8.0-windows

Two Automation Approaches

Direct Element Work App Structure (LLM-Friendly)
API FindElement, GetChildren, PerformAction GetAppStructure, PerformActionWithStructure
Best For Scripts, known UI hierarchies LLMs, dynamic exploration
Overhead Low Higher (builds JSON tree)
State Per-element Full application

Available Actions

UI Automation pattern actions (require an element RuntimeId):

Action Description
Invoke Click / activate via the UIA InvokePattern
Toggle Checkboxes, switches
SetValue Set text in input fields
Select Select list items
SetFocus Focus an element
ExpandCollapse Expand / collapse tree nodes, menus (pass collapse to collapse)
LeftClick / RightClick / DoubleClick Simulated mouse click at the element's clickable point
MoveTo Move the cursor to the element's center

Simulated input actions (no element required β€” driven by VirtualMouse / VirtualKeyboard):

Action Description
Move Move the cursor to absolute screen coordinates
LeftClick / RightClick / MouseMiddleClick Click at the current cursor position
LeftDown / LeftUp / RightDown / RightUp Press / release a mouse button
MousWeelScroll Scroll the mouse wheel

Separate RPCs cover SendKeys (keyboard input, including modifier combinations like ^s) and TakeScreenshot (capture an element or window).


πŸš€ Getting Started

1. Prerequisites

Component Requirement
Server Windows, .NET 8 SDK, Administrator privileges
Library .NET 8 (Windows)
MCP .NET 8 SDK

2. Start the Server

cd UiAutomationGRPC.Server
dotnet run

Default endpoint: localhost:50051

3a. Use via .NET SDK

await using var driver = new UiAutomationDriver("http://127.0.0.1:50051", insecureMode: true);
var (success, message, processId) = await driver.OpenAppAsync("notepad");

3b. Use via MCP (AI Agents)

Build the MCP server binary first:

dotnet build UiAutomationGRPC.AI/MCP/UiAutomationGRPC.LLM.csproj

Claude Code (VS Code extension): a .mcp.json file is already present at the repo root. Reload VS Code and approve the uiautomation server when prompted.

Other MCP clients (Claude Desktop, Cursor, Windsurf): configure your client to run:

dotnet run --no-build --project UiAutomationGRPC.AI/MCP

The agent can immediately start the See β†’ Think β†’ Act loop.

See MCP README for full setup details and troubleshooting.


πŸ”’ Security

Three security modes, configured in appsettings.json:

Mode Encryption Authentication Use Case
Insecure (default) ❌ HTTP ❌ None Local development
HTTPS βœ… TLS ❌ None Encrypted communication
HTTPS + Token βœ… TLS βœ… Bearer token Production
// Development
await using var driver = new UiAutomationDriver("http://127.0.0.1:50051", insecureMode: true);

// Production
await using var driver = new UiAutomationDriver("https://127.0.0.1:50051", authToken: "your-token");

Additional access controls (all configured in appsettings.json):

  • App Whitelist / Blacklist β€” restrict which applications OpenApp can launch (with per-app argument filtering).
  • Interaction restrictions β€” gate element interactions, structure reads, and process termination against the same allow/deny lists.
  • Key-input filtering β€” whitelist/blacklist what SendKeys may send.

See Server README for full configuration details.


πŸ“– Documentation

Document Contents
Server README API reference, security, configuration, installation
Library README SDK usage guide, API reference, input helpers
AI README AI/LLM integration overview
MCP README MCP server setup & tool documentation
Client README Calculator automation walkthrough

πŸ“„ License

This project is licensed under the Apache License 2.0.

Packages

 
 
 

Contributors

Languages