Skip to content

neuhai/WebServ

Repository files navigation

WEBSERV: A Browser-Server Environment for Efficient Training of Reinforcement Learning-based Web Agents at Scale

arXiv License: MIT

Yuxuan Lu, Jing Huang, Hui Liu, Jiri Gesi, Yan Han, Shihan Fu, Tianqi Zheng, Dakuo Wang

Overview

WEBSERV is a scalable and efficient environment for training and evaluating Reinforcement Learning (RL) web agents. The system addresses key limitations in existing environments by providing:

  • Compact, site-agnostic browser environment that balances context and action complexity, avoiding overwhelming policy models with excessive and noisy context
  • Scalable RL environment via efficient launching and resetting web-servers to enable parallel RL rollouts at scale
  • Realistic and robust browser-side interaction with controllable server-side state
  • Deterministic action execution that waits for UI and network stabilization

WEBSERV achieves state-of-the-art single-prompt success rates on WebArena tasks while cutting launch latency by ~5x and storage need by ~240x, with a comparable memory footprint, enabling 200+ concurrent containers on a single host.

For more details, see our paper.

Quick Start

Installation

# two-stage installation
uv sync
uv sync --extra compile --extra webarena

Basic Usage

The batch_agent.py entrypoint allows you to run multiple WebAgent tasks concurrently with fine-grained control over concurrency, retry logic, and agent types.

Basic Usage

Run multiple tasks by task ID:

python -m rl_web_agent.entrypoints.batch_agent --task_ids 1,2,3,4,5

Run tasks by site (filters tasks that match the specified sites):

python -m rl_web_agent.entrypoints.batch_agent --sites shopping,reddit

Command-Line Options

Option Description Default
--task_ids Comma-separated list of task IDs to run Required (or use --sites)
--sites Comma-separated list of sites to filter tasks Required (or use --task_ids)
--tasks_dir Directory containing task JSON files dataset/train_webarena
--output_dir Output directory for results and traces results
--max_concurrent Maximum number of concurrent tasks 3
--max_concurrent_launch Maximum number of concurrent browser launches 1
--retry_count Number of retries when agent returns error 3
--agent_type Agent type: regular or tool regular

Agent Types

The batch agent supports two agent types:

Regular Agent (default):

python -m rl_web_agent.entrypoints.batch_agent --task_ids 1,2,3 --agent_type regular

Tool Agent (uses function calling interface):

python -m rl_web_agent.entrypoints.batch_agent --task_ids 1,2,3 --agent_type tool

WebArena Training Example

The examples/webarena.sh script demonstrates how to run RL training with VERL on WebArena tasks using the tool agent configuration.

Prerequisites

  • 8x H100 GPUs (or adjust GPU configuration)
  • VERL training environment set up

Usage

# Make sure you're in the project root directory
cd /path/to/webserv

# Run the training script
bash examples/webarena.sh

Configuration

The script uses examples/webarena_tool_config.yaml which defines the browser tool interface for function calling. This configuration:

  • Defines the step_browser function with all browser actions (click, type, hover, select, etc.)
  • Configures the tool to use native browser interactions
  • Sets maximum parallel tool executions

Browser Tool Implementation

The rl_web_agent/tools/browser_tool.py module provides a reference tool implementation for VERL. It implements the BaseTool interface from VERL and integrates WebAgentEnv for browser automation.

Key Features:

  • Tool Lifecycle: Implements create(), execute(), calc_reward(), and release() methods required by VERL
  • Parallel Execution Control: Uses a class-level semaphore to limit concurrent browser operations (configurable via max_parallel)
  • WebAgentEnv Integration: Wraps the WebAgentEnv class to provide browser actions as function-calling tools
  • Reward Calculation: Extracts evaluation scores from WebAgentEnv observations for RL training
  • Rollout Tracing: Integrates with VERL's rollout trace system for debugging and analysis

Tool Methods:

  • create(): Initializes a browser environment instance with a task configuration, returns initial observation
  • execute(): Executes browser actions (click, type, navigate, etc.) and returns formatted observations
  • calc_reward(): Returns the evaluation score from the environment (0.0 to 1.0)
  • release(): Cleans up browser environment instances

The tool is configured in examples/webarena_tool_config.yaml and used by the VERL training pipeline to provide agents with a structured function-calling interface to the browser environment.

About

Source code for WEBSERV: A Browser-Server Environment for Efficient Training of Reinforcement Learning-based Web Agents at Scale

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors