WEBSERV: A Browser-Server Environment for Efficient Training of Reinforcement Learning-based Web Agents at Scale
Yuxuan Lu, Jing Huang, Hui Liu, Jiri Gesi, Yan Han, Shihan Fu, Tianqi Zheng, Dakuo Wang
WEBSERV is a scalable and efficient environment for training and evaluating Reinforcement Learning (RL) web agents. The system addresses key limitations in existing environments by providing:
- Compact, site-agnostic browser environment that balances context and action complexity, avoiding overwhelming policy models with excessive and noisy context
- Scalable RL environment via efficient launching and resetting web-servers to enable parallel RL rollouts at scale
- Realistic and robust browser-side interaction with controllable server-side state
- Deterministic action execution that waits for UI and network stabilization
WEBSERV achieves state-of-the-art single-prompt success rates on WebArena tasks while cutting launch latency by ~5x and storage need by ~240x, with a comparable memory footprint, enabling 200+ concurrent containers on a single host.
For more details, see our paper.
# two-stage installation
uv sync
uv sync --extra compile --extra webarenaThe batch_agent.py entrypoint allows you to run multiple WebAgent tasks concurrently with fine-grained control over concurrency, retry logic, and agent types.
Run multiple tasks by task ID:
python -m rl_web_agent.entrypoints.batch_agent --task_ids 1,2,3,4,5Run tasks by site (filters tasks that match the specified sites):
python -m rl_web_agent.entrypoints.batch_agent --sites shopping,reddit| Option | Description | Default |
|---|---|---|
--task_ids |
Comma-separated list of task IDs to run | Required (or use --sites) |
--sites |
Comma-separated list of sites to filter tasks | Required (or use --task_ids) |
--tasks_dir |
Directory containing task JSON files | dataset/train_webarena |
--output_dir |
Output directory for results and traces | results |
--max_concurrent |
Maximum number of concurrent tasks | 3 |
--max_concurrent_launch |
Maximum number of concurrent browser launches | 1 |
--retry_count |
Number of retries when agent returns error | 3 |
--agent_type |
Agent type: regular or tool |
regular |
The batch agent supports two agent types:
Regular Agent (default):
python -m rl_web_agent.entrypoints.batch_agent --task_ids 1,2,3 --agent_type regularTool Agent (uses function calling interface):
python -m rl_web_agent.entrypoints.batch_agent --task_ids 1,2,3 --agent_type toolThe examples/webarena.sh script demonstrates how to run RL training with VERL on WebArena tasks using the tool agent configuration.
- 8x H100 GPUs (or adjust GPU configuration)
- VERL training environment set up
# Make sure you're in the project root directory
cd /path/to/webserv
# Run the training script
bash examples/webarena.shThe script uses examples/webarena_tool_config.yaml which defines the browser tool interface for function calling. This configuration:
- Defines the
step_browserfunction with all browser actions (click, type, hover, select, etc.) - Configures the tool to use native browser interactions
- Sets maximum parallel tool executions
The rl_web_agent/tools/browser_tool.py module provides a reference tool implementation for VERL. It implements the BaseTool interface from VERL and integrates WebAgentEnv for browser automation.
Key Features:
- Tool Lifecycle: Implements
create(),execute(),calc_reward(), andrelease()methods required by VERL - Parallel Execution Control: Uses a class-level semaphore to limit concurrent browser operations (configurable via
max_parallel) - WebAgentEnv Integration: Wraps the
WebAgentEnvclass to provide browser actions as function-calling tools - Reward Calculation: Extracts evaluation scores from
WebAgentEnvobservations for RL training - Rollout Tracing: Integrates with VERL's rollout trace system for debugging and analysis
Tool Methods:
create(): Initializes a browser environment instance with a task configuration, returns initial observationexecute(): Executes browser actions (click, type, navigate, etc.) and returns formatted observationscalc_reward(): Returns the evaluation score from the environment (0.0 to 1.0)release(): Cleans up browser environment instances
The tool is configured in examples/webarena_tool_config.yaml and used by the VERL training pipeline to provide agents with a structured function-calling interface to the browser environment.