WEBSERV: A Browser-Server Environment for Efficient Training of Reinforcement Learning-based Web Agents at Scale

Yuxuan Lu, Jing Huang, Hui Liu, Jiri Gesi, Yan Han, Shihan Fu, Tianqi Zheng, Dakuo Wang

Overview

WEBSERV is a scalable and efficient environment for training and evaluating Reinforcement Learning (RL) web agents. The system addresses key limitations in existing environments by providing:

Compact, site-agnostic browser environment that balances context and action complexity, avoiding overwhelming policy models with excessive and noisy context
Scalable RL environment via efficient launching and resetting web-servers to enable parallel RL rollouts at scale
Realistic and robust browser-side interaction with controllable server-side state
Deterministic action execution that waits for UI and network stabilization

WEBSERV achieves state-of-the-art single-prompt success rates on WebArena tasks while cutting launch latency by ~5x and storage need by ~240x, with a comparable memory footprint, enabling 200+ concurrent containers on a single host.

For more details, see our paper.

Quick Start

Installation

# two-stage installation
uv sync
uv sync --extra compile --extra webarena

Basic Usage

The batch_agent.py entrypoint allows you to run multiple WebAgent tasks concurrently with fine-grained control over concurrency, retry logic, and agent types.

Basic Usage

Run multiple tasks by task ID:

python -m rl_web_agent.entrypoints.batch_agent --task_ids 1,2,3,4,5

Run tasks by site (filters tasks that match the specified sites):

python -m rl_web_agent.entrypoints.batch_agent --sites shopping,reddit

Command-Line Options

Option	Description	Default
`--task_ids`	Comma-separated list of task IDs to run	Required (or use `--sites`)
`--sites`	Comma-separated list of sites to filter tasks	Required (or use `--task_ids`)
`--tasks_dir`	Directory containing task JSON files	`dataset/train_webarena`
`--output_dir`	Output directory for results and traces	`results`
`--max_concurrent`	Maximum number of concurrent tasks	`3`
`--max_concurrent_launch`	Maximum number of concurrent browser launches	`1`
`--retry_count`	Number of retries when agent returns error	`3`
`--agent_type`	Agent type: `regular` or `tool`	`regular`

Agent Types

The batch agent supports two agent types:

Regular Agent (default):

python -m rl_web_agent.entrypoints.batch_agent --task_ids 1,2,3 --agent_type regular

Tool Agent (uses function calling interface):

python -m rl_web_agent.entrypoints.batch_agent --task_ids 1,2,3 --agent_type tool

WebArena Training Example

The examples/webarena.sh script demonstrates how to run RL training with VERL on WebArena tasks using the tool agent configuration.

Prerequisites

8x H100 GPUs (or adjust GPU configuration)
VERL training environment set up

Usage

# Make sure you're in the project root directory
cd /path/to/webserv

# Run the training script
bash examples/webarena.sh

Configuration

The script uses examples/webarena_tool_config.yaml which defines the browser tool interface for function calling. This configuration:

Defines the step_browser function with all browser actions (click, type, hover, select, etc.)
Configures the tool to use native browser interactions
Sets maximum parallel tool executions

Browser Tool Implementation

The rl_web_agent/tools/browser_tool.py module provides a reference tool implementation for VERL. It implements the BaseTool interface from VERL and integrates WebAgentEnv for browser automation.

Key Features:

Tool Lifecycle: Implements create(), execute(), calc_reward(), and release() methods required by VERL
Parallel Execution Control: Uses a class-level semaphore to limit concurrent browser operations (configurable via max_parallel)
WebAgentEnv Integration: Wraps the WebAgentEnv class to provide browser actions as function-calling tools
Reward Calculation: Extracts evaluation scores from WebAgentEnv observations for RL training
Rollout Tracing: Integrates with VERL's rollout trace system for debugging and analysis

Tool Methods:

create(): Initializes a browser environment instance with a task configuration, returns initial observation
execute(): Executes browser actions (click, type, navigate, etc.) and returns formatted observations
calc_reward(): Returns the evaluation score from the environment (0.0 to 1.0)
release(): Cleans up browser environment instances

The tool is configured in examples/webarena_tool_config.yaml and used by the VERL training pipeline to provide agents with a structured function-calling interface to the browser environment.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
rl_web_agent		rl_web_agent
thirdparty		thirdparty
.cursorrules		.cursorrules
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
docker-bake.hcl		docker-bake.hcl
incus_server.py		incus_server.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WEBSERV: A Browser-Server Environment for Efficient Training of Reinforcement Learning-based Web Agents at Scale

Overview

Quick Start

Installation

Basic Usage

Basic Usage

Command-Line Options

Agent Types

WebArena Training Example

Prerequisites

Usage

Configuration

Browser Tool Implementation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

WEBSERV: A Browser-Server Environment for Efficient Training of Reinforcement Learning-based Web Agents at Scale

Overview

Quick Start

Installation

Basic Usage

Basic Usage

Command-Line Options

Agent Types

WebArena Training Example

Prerequisites

Usage

Configuration

Browser Tool Implementation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages