Merged
Conversation
* Add Docker site environments and integration tests - Add Dockerfiles and configuration for 6 sites: gitlab, map, reddit, shopping, shopping_admin, wikipedia - Add docker-compose.yml for orchestrating all services - Add integration tests with Playwright for each site - Add dev utilities for logging, git, network, and path operations - Add environment settings and tasks for building/managing containers - Move contributing code to dev directory Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Reorganize dev tasks and clean up structure - Split dev tasks into category files: code_tasks, data_tasks, docs_tasks, env_tasks - Move docker_build to top-level task in tasks.py - Move monitoring config to assets/environments/monitoring/ - Remove template-dependent tasks and dev/templates/ - Add Docker sites CI workflow Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Update environment Docker documentation - Fix site names from hyphens to underscores (shopping_admin not shopping-admin) - Update Available Sites table with all 6 sites including Map - Add Env-Ctrl ports column to tables - Fix image names to current convention (am1n3e/webarena-verified-<site>) - Update directory structure from contributing/ to dev/environments/ - Add Docker Compose quick start instructions - Add Data Management commands (data-download, setup) - Update Base Image Pipeline with correct script names - Add Environment Variables reference for Docker Compose Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Simplify invoke task names by removing redundant prefixes Renamed tasks to avoid namespace-prefixed names: - dev.docs.docs-serve → dev.docs.serve - dev.docs.docs-build → dev.docs.build - dev.docs.docs-deploy → dev.docs.deploy - dev.code.code-format-and-check → dev.code.format - dev.data.data-format → dev.data.format - dev.env.env-init → dev.env.init - demo.demo-gitlab-start → demo.gitlab-start - demo.demo-gitlab-stop → demo.gitlab-stop Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add per-environment documentation files - Create dedicated doc pages for each environment (shopping_admin, shopping, reddit, gitlab, wikipedia, map) - Move shared Docker info to index.md (size improvements, env vars, commands) - Add announcement about Docker images availability to README - Update map.md to explain single-container optimization vs original 5 containers - Remove docker_images.md (content redistributed) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add CI tasks and reorganize dev README files - Add dev/ci_tasks.py for CI-related invoke tasks - Add Dockerfile.ci for Wikipedia environment - Move site README files from docker_overrides/ to sites/ level - Update GitHub workflow and gitignore - Update tasks.py imports Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Reorder Quick Start to prioritize uvx over Docker and pip Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Update README and disable Docker sites workflow - Add section showing how to run WebArena environments with docker run - Remove map NOTES.md - Disable test-docker-sites.yml workflow temporarily Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add CI test data support for Wikipedia and Map sites - Add dev.ci.setup-wikipedia task to download small Ray Charles ZIM (~2.7MB) - Add dev.ci.generate-map-data task to generate Monaco test data - Add --data-dir parameter to envs.docker.test for mounting CI data - Update Wikipedia tests to work with both small and full ZIM files - Add Map CI tests for Monaco data - Split CI workflows into one per site (only wikipedia enabled for testing) - Remove Wikipedia Dockerfile.ci (use normal build with data mount) - Store CI data in data/ directory at repo root Usage: inv dev.ci.setup-wikipedia inv envs.docker.build --site=wikipedia --tag=test inv envs.docker.test --site=wikipedia --tag=test --data-dir=data/wikipedia Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add container management and setup CLI commands Port container start/stop functionality from dev/ to src/ and add new setup commands for Docker volume management. New CLI commands: - `env start/stop/status --site <name>` - Manage Docker containers - `env start --port/--env-ctrl-port` - Custom port mapping - `env setup init --site --data-dir` - Download data and create volumes - `env setup clean --site --force` - Remove Docker volumes New modules: - environments/container/ - ContainerManager, defaults, utilities - environments/setup/ - Volume setup orchestration, Docker operations Config changes: - Added ContainerConfig, ContainerSetupConfig, ContainerVolumeSpec types - Added optional `container` field to EnvironmentConfig Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Refactor container management based on PR review - Create ContainerBackend Protocol with DockerBackend implementation - Move container status types to types/container.py as Pydantic models - Move DEFAULT_CONTAINER_CONFIGS to environments/container/config.py - Use pre-computed volume names (webarena_verified_*) instead of suffix - Use keyword-only arguments (*,) throughout container APIs - Add hostname parameter to ContainerManager with default "localhost" - Simplify defaults.py to re-export from config.py Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Split backend into folder with one file per class - Create backend/protocol.py with ContainerBackend Protocol - Create backend/docker.py with DockerBackend implementation - Create backend/__init__.py with re-exports and get_default_backend - Remove defaults.py, import directly from config.py Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Remove __all__ from non-__init__ files Keep __all__ only in __init__.py files per convention. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Remove patches directory and related infrastructure The PatchManager class and patches directory are no longer needed as patching functionality has been moved to the container initialization process. This removes dead code and simplifies the codebase. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Remove Gatus monitoring service The health monitoring dashboard added complexity without providing sufficient value for local development workflows. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add CI workflows for shopping, shopping_admin, and reddit Enables automated testing when changes are made to these Docker environment sites or their integration tests. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Address PR review comments and enable GitLab CI Fixes from Copilot review: - Fix _get_service_url() to read port from env vars - Fix Shopping Admin port in READMEs (6680 -> 7780) - Add --map_env_ctrl_url option and add map to env-ctrl tests - Fix Wikipedia search test to assert visibility - Add timeout to map playwright tests - Remove overly broad HTTPError handling in map test - Fix gitlab entrypoint.sh set -u crash on early signal - Fix SO_REUSEADDR order in network_utils (before bind) - Fix race condition in container port allocation (let Docker assign) CI changes: - Enable test-docker-gitlab.yml workflow - Remove obsolete .disabled workflow files Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Refactor container startup: simplify wait logic and fix config duplication - Add env-ctrl init to gitlab entrypoint (was missing) - Change wait default to True, add --no-wait flag - Pass WA_ENV_CTRL_EXTERNAL_SITE_URL env var to docker run (no double init) - Replace _wait_and_configure with _wait_for_ready (just polls, no init call) - Add health_check_path to ContainerConfig for external URL polling - Remove duplicate volumes field from ContainerConfig (derive from setup.volumes) - Make port and env_ctrl_port mandatory in manager.start() - Accept 4xx responses as "site is up" in external URL polling - Fix ruff/ty issues Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Update README with CLI commands for environment management - Add CLI examples for env start/stop/status commands - Keep Docker direct commands as alternative - Update port mappings to include env-ctrl ports Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Update environment docs with CLI commands - Replace invoke/docker-compose examples with webarena-verified CLI - Keep Docker direct commands as alternative - Update Quick Start sections for all sites - Simplify data setup instructions for wikipedia and map Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Updated entrypoint * Mark Map site as beta in announcement Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Changes