Skip to content

joue-zero/computer-use-agent

Repository files navigation

Computer Use Agent — Backend

Author: Youssef Elnaggar

A production-quality FastAPI backend that replaces the Streamlit interface in Anthropic's computer-use-demo with a scalable, concurrent session management API.


Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Browser UI (HTML/JS)                     │
│  Sidebar (sessions) │  VNC iframe (noVNC) │  Chat + SSE stream  │
└───────────┬─────────────────────────┬───────────────────────────┘
            │ REST (create/list/del)  │ REST + SSE (per-session)
            ▼                         ▼
┌──────────────────────┐    ┌────────────────────────────────────┐
│   Orchestrator API   │    │   Session Runtime Container (N)    │
│   (FastAPI, :7000)   │    │   (FastAPI, :8000 inside container)│
│                      │    │                                    │
│  POST /sessions  ────┼───►│  /run     → agent loop (thread)    │
│  GET  /sessions      │    │  /history → persisted messages     │
│  DEL  /sessions/:id  │    │  /events/stream → SSE              │
│                      │    │  /session → session status         │
└──────────┬───────────┘    └──────────┬──────────┬──────────────┘
           │                           │          │
           ▼                           ▼          ▼
     ┌──────────┐              ┌────────────┐  ┌────────────────┐
     │ Postgres │              │ Xvfb + VNC │  │  Anthropic API │
     │ (shared) │              │   noVNC    │  │ (claude-sonnet)│
     └──────────┘              └────────────┘  └────────────────┘

Key design decisions

Decision Rationale
One Docker container per session True parallel execution — each session has its own Xvfb desktop; no shared VM lock
SSE (Server-Sent Events) Simple, reliable, browser-native streaming; no WebSocket upgrade needed
Postgres for persistence Shared database lets all containers write chat history and events to one place
threading.Lock per session Prevents a second HTTP request from submitting a new task while the agent is running, without any external dependency
Full history replay Every call to /run rebuilds the Anthropic messages list from DB, giving the agent proper multi-turn context

Project layout

.
├── orchestrator/               # Orchestrator service
│   ├── main.py                 # FastAPI app — sessions CRUD + Docker spawn
│   ├── config.py               # Settings (docker, db, image names)
│   ├── db.py                   # Postgres ORM — session_runtimes table
│   ├── docker_manager.py       # DockerSessionSpawner (spawn / stop_and_remove)
│   ├── schemas.py              # Pydantic response models
│   └── static/                 # Browser UI (Tailwind, vanilla JS)
│       ├── index.html
│       └── app.js
│
├── app/                        # Session runtime (runs inside each container)
│   ├── runtime_main.py         # FastAPI app — /run, /history, /events/stream, /session
│   ├── core/config.py          # Runtime settings (model, db url, api key)
│   ├── schemas/session.py      # Shared Pydantic schemas
│   └── services/
│       ├── session_manager.py  # threading.Lock-based session locking
│       ├── runner.py           # Background thread: sampling loop + SSE events
│       ├── database.py         # SQLAlchemy ORM (sessions, messages, events)
│       ├── event_bus.py        # Thread-safe in-process SSE publisher
│       └── anthropic_adapter.py # Imports sampling_loop from quickstart repo
│
├── Dockerfile.orchestrator     # Slim Python image for the orchestrator
├── Dockerfile.session          # Built from ubuntu:22.04; copies loop+tools from computer-use-demo/
├── session_entrypoint.sh       # Container startup: Xvfb → VNC → noVNC → uvicorn
├── docker-compose.yml          # Orchestrator + Postgres + session-runtime build target
└── pyproject.toml

Quick start

Prerequisites

1 — Configure environment

cp .env.example .env
# Edit .env and set ANTHROPIC_API_KEY

2 — Build the session runtime image

Built once. The orchestrator spawns fresh containers from it on demand.

docker compose build session-runtime

3 — Start everything

docker compose up -d

Open http://localhost:7000 in your browser.


API reference

The orchestrator exposes a small REST API at http://localhost:7000.

Sessions

Method Path Description
POST /sessions Create a session; spawns a new VM container
GET /sessions List all active sessions
GET /sessions/{id} Get a single session
DELETE /sessions/{id} Stop container and remove session

POST /sessions response

{
  "session_id": "uuid",
  "api_base_url": "http://127.0.0.1:49200",
  "vnc_web_url":  "http://127.0.0.1:49201/vnc.html",
  "vnc_raw_url":  "vnc://127.0.0.1:49202"
}

Session runtime API (per-container, accessed via api_base_url)

Method Path Description
POST /run Submit a task; returns 202 immediately
GET /history All persisted user + assistant messages
GET /events All stored events (used to reconstruct full history on page reload)
GET /events/stream SSE stream of real-time agent events
GET /session Session status and current state

POST /run request / response

// Request
{ "content": "Search the weather in Dubai" }

// Response 202
{
  "id": "uuid",
  "session_id": "uuid",
  "role": "user",
  "content": "Search the weather in Dubai",
  "created_at": "2026-05-01T19:00:00Z"
}

// Response 409 — agent already busy
{ "detail": "Session is currently processing a task — wait for it to finish" }

SSE event types

Event Payload
session_started "processing"
assistant_chunk Incremental text from the model
tool_use {"name": "...", "input": {...}}
tool_result {"tool_use_id": "...", "output": "...", "base64_image": "<base64 or null>", "error": null}
completed "waiting next input"
error Error message string
heartbeat ISO timestamp (keep-alive)

Sequence diagram

Browser                 Orchestrator            Docker          Session Runtime
   │                        │                      │                   │
   │── POST /sessions ─────►│                      │                   │
   │                        │── spawn container ──►│                   │
   │                        │◄── host ports ───────│                   │
   │◄── {api_base_url, ...}─│                      │                   │
   │                        │                      │                   │
   │── EventSource(api/events/stream) ────────────────────────────────►│
   │◄─────────────── heartbeat ────────────────────────────────────────│
   │                        │                      │                   │
   │── POST api/run ──────────────────────────────────────────────────►│
   │◄─────────────── 202 Accepted ─────────────────────────────────────│
   │                        │                      │  agent loop runs  │
   │◄─── SSE: session_started ─────────────────────────────────────────│
   │◄─── SSE: tool_use (screenshot) ───────────────────────────────────│
   │◄─── SSE: tool_result ─────────────────────────────────────────────│
   │◄─── SSE: assistant_chunk (text) ──────────────────────────────────│
   │◄─── SSE: completed ───────────────────────────────────────────────│

Concurrent sessions sequence

Browser A          Browser B          Orchestrator       Docker
   │                   │                   │                │
   │── POST /sessions ────────────────────►│                │
   │                   │                   │── spawn ctr A─►│
   │◄── api_url_A ─────────────────────────│                │
   │                   │                   │                │
   │                   │── POST /sessions─►│                │
   │                   │                   │── spawn ctr B─►│  ← starts immediately, no wait
   │                   │◄── api_url_B ─────│                │
   │                   │                   │                │
   │── POST A/run ─────────────────────────────────────────►│ Container A
   │                   │── POST B/run ─────────────────────►│ Container B  ← runs in parallel
   │◄── SSE A stream ──│                                    │
   │                   │◄── SSE B stream ───────────────────│

Concurrency model

Each POST /sessions call launches a new Docker container with a fully isolated desktop environment (Xvfb + VNC + noVNC + uvicorn). Two sessions never share a desktop, a display number, or a process — they run completely in parallel.

Within a single session, an in-memory threading.Lock prevents two simultaneous /run requests from spawning two agent loops for the same session. The second request receives an HTTP 409.

There is no global queue, no shared VM lock, and no fixed concurrency cap — the system dynamically spawns a new worker for every incoming session request.


Demo usage (Usage Case 2 — parallel sessions)

  1. Open http://localhost:7000 in two separate browser tabs.
  2. In tab A, click New Session → send "Search the weather in Tokyo".
  3. In tab B, click New Session → send "Search the weather in New York".
  4. Verify both VNC panels show Firefox opening and searching simultaneously.

Environment variables

Orchestrator

Variable Default Description
ANTHROPIC_API_KEY (required) Anthropic API key passed to each session container
DATABASE_URL (required) Postgres connection URL for the orchestrator
DOCKER_BASE_URL unix:///var/run/docker.sock Docker daemon endpoint (socket mounted via compose)
SESSION_RUNTIME_IMAGE computeruse-session-runtime:latest Image spawned per session
DOCKER_NETWORK computeruse_default Docker network containers join
SESSION_MAX_AGE_HOURS 2 Sessions older than this are auto-stopped
CLEANUP_INTERVAL_SECONDS 60 How often the cleanup task runs
CONTAINER_CPU_LIMIT 2.0 vCPU cap per session container
CONTAINER_MEMORY_LIMIT 4g Memory cap per session container

Postgres

Variable Default Description
POSTGRES_USER app Database username
POSTGRES_PASSWORD app Database password
POSTGRES_DB computer_use Database name

Session runtime (per-container)

Variable Default Description
ANTHROPIC_API_KEY (injected by orchestrator) Key used by the agent loop
AGENT_MODEL claude-sonnet-4-5-20250929 Claude model used by the agent
AGENT_TOOL_VERSION computer_use_20250124 Computer-use tool version string
MAX_TOKENS 4096 Max tokens per API call
DATABASE_URL postgresql+psycopg://app:app@postgres:5432/computer_use Postgres Database

About

A production-quality FastAPI backend that replaces the Streamlit interface in Anthropic's computer-use-demo with a scalable, concurrent session management API.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors