Repository Guidelines

Repo: https://github.com/platonai/Browser4

Table of Contents

Project Overview
Quick Start
Project Structure
Key APIs and Concepts
Code Style Guidelines
Testing Guidelines
Configuration
Development Principles
Definition of Done
Common Issues & Troubleshooting
Documentation References
Claude-Specific Guidance

Project Overview

Browser4 is a lightning-fast, coroutine-safe browser engine for AI agents. It provides:

Browser Agents — Fully autonomous browser agents that reason, plan, and execute end-to-end tasks
Browser Automation — High-performance automation for workflows, navigation, and data extraction
Machine Learning Agent — Learns field structures across complex pages without consuming tokens
Extreme Performance — Fully coroutine-safe; supports 100k ~ 200k complex page visits per machine per day
Data Extraction — Hybrid of LLM, ML, and selectors for clean data across chaotic pages

Quick Start

Prerequisites

Java 17+
Latest Google Chrome

Build Commands

Linux/macOS:

chmod +x mvnw
./mvnw -q -DskipTests

Windows (PowerShell):

.\mvnw.cmd -q -D"skipTests"

Windows (cmd):

mvnw.cmd -q -DskipTests

Run Tests

Core module tests (Linux/macOS):

./mvnw -pl browser4-core -am test -Dsurefire.failIfNoSpecifiedTests=false

Core module tests (Windows PowerShell):

.\mvnw.cmd -pl browser4-core -am test -D"surefire.failIfNoSpecifiedTests=false"

Recommended Build Scripts

Windows: bin/build.ps1 [-test]
Linux/macOS: bin/build.sh [-test]

Project Structure

Module	Description
`browser4-core`	Core engine: sessions, scheduling, DOM, browser control
`browser4-agentic`	AI agents implementation, MCP, skills registration
`browser4-rest`	Spring Boot REST layer & command endpoints
`sdks/*`	Browser4 CLI + skill assets (`sdks/browser4-cli`, `sdks/skill`)
`browser4/*`	Product packaging (`browser4/browser4-agents`)
`examples/*`	Runnable examples (`examples/browser4-examples`)
`browser4-tests`	E2E & heavy integration & scenario tests
`browser4-tests-common`	Shared test base classes and utilities
`pulsar-benchmarks`	JMH benchmarks

Key APIs and Concepts

Sessions

// Create a session
val session = AgenticContexts.createSession()

// Create an agent
val agent = AgenticContexts.getOrCreateAgent()

Core API Classes

WebDriver — Browser control interface with human-like behaviors
PulsarSession → AgenticSession — Page loading, parsing, and extraction
LoadOptions — CLI-style URL parameters for page loading
BrowserPerceptiveAgent — AI agent implementation

Load Options

URL parameters control page loading behavior:

val page = session.load(url, "-expires 1d -refresh -parse")

Key options:

-expires <duration> — Page expiration time
-refresh — Force page refresh
-parse — Activate parsing subsystem
-outLink <selector> — Extract links matching selector

Code Style Guidelines

Kotlin Conventions

Prefer immutable data class
Use explicit return types
Apply null-safety patterns (require/check/?:)
Public APIs require KDoc documentation
Store AI generated task docs in docs-dev/copilot/

KDoc Template:

/**
 * Brief description of what the function does.
 *
 * @param paramName Description of the parameter.
 * @return Description of the return value.
 * @throws ExceptionType When this exception is thrown.
 */
fun functionName(paramName: Type): ReturnType {
    require(paramName.isValid) { "paramName must be valid" }
    // implementation
}

Logging

Use placeholder-style logging (avoid string concatenation):

logger.info("Task {} finished in {} ms", taskId, cost)

Testing Guidelines

Minimal Test Policy (default)

To keep iteration fast, don’t run full test suites by default.

Default: mvnw compile with tests skipped
Then: run the smallest relevant test scope (module/class) when logic changes
Upgrade scope when risk increases (cross-module, public API/DTO/serialization, Spring wiring, dependency bumps, concurrency/I/O, browser/CDP lifecycle)

See TESTING.md for details and trade-offs.

Test Commands in This Repository

Use bin/test.ps1 on Windows for scoped runs: fast, it, e2e, rest, skills, mcp, cli, browser4
Maven profile switches in root pom.xml are property-driven: -DrunITs=true, -DrunE2ETests=true, -DrunSDKTests=true, -DrunCoreTests=true, -DrunRestTests=true
sdks/browser4-cli/tests/e2e.rs: all e2e scenarios must start and depend on Browser4.jar; this includes single-scenario runs via --scenario.

Test Location

Module unit tests: src/test/kotlin/...
Centralized integration/E2E: browser4-tests/
Shared utilities: browser4-tests-common/

Naming Conventions

Unit tests: <ClassName>Test.kt
Integration tests: <ClassName>IT.kt
E2E tests: <ClassName>E2ETest.kt
Method names: Use camelCase (NOT backtick naming)
- ✅ testUserLoginWithValidCredentials() + @DisplayName("test user login with valid credentials")
- ❌ `test user login with valid credentials`

Test Performance Targets

Unit tests: <100ms
Integration tests: <5s
E2E tests: <30s

Coverage Targets

Global: ≥70%
Core logic: ≥80%
Utilities: ≥90%
Controllers: ≥85%

Configuration

Application Port

Default: 8182

Configuration Files

application.properties — Main configuration
application-*.properties — Profile-specific overrides
application-private.properties — Private overrides (ignored by Git), secrets should be set here or via environment variables

Key Configuration Properties

# LLM API Key
openrouter.api.key=your-api-key

# Browser context mode
browser.context.mode=DEFAULT  # DEFAULT | SYSTEM_DEFAULT | SEQUENTIAL | TEMPORARY

# Display mode
browser.display.mode=GUI  # GUI | HEADLESS | SUPERVISED

Development Principles

Minimal Changes — Make the smallest possible modifications
Preserve Style — Match existing code patterns
Clear Logging — Use structured, placeholder-based logging
Test Coverage — Include tests for new/changed logic
Documentation — Update docs for public API changes

Definition of Done (PR Checklist)

Build and related tests pass
No new high-noise logs or warnings
New/changed logic has tests (main path + edge case)
No secrets or private endpoints committed
No arbitrary version changes (follow parent BOM)
Documentation updated for public behavior changes
Performance impact assessed if significant (>5%)

Common Issues & Troubleshooting

Issue	Solution
`mvnw` no execute permission	`chmod +x mvnw`
JDK version mismatch	Ensure JDK 17+ in `JAVA_HOME`
Windows parameter escaping	Use `-D"key.with.dots=value"`
Port 8182 in use	Override `server.port` or use root `application.properties`
CDP retry log storms	Use existing retry utilities, lower log level

Documentation References

Claude-Specific Guidance

Understanding Browser4 Architecture

Browser4 is built around three core concepts:

Sessions - Main interface to manage page loading, fetching, parsing, extracting, AI chatting, page state, persistence, and more
Agents - Autonomous browser agents with reasoning capabilities
WebDrivers - Low-level browser control with human-like behaviors

Task Planning and Execution

When given a task, Claude should:

Analyze Requirements - Break down the task into minimal changes
Explore First - Use grep/glob or explore agent to understand relevant code
Make Minimal Changes - Preserve existing style and patterns
Test Incrementally - Run targeted tests after each change
Document Changes - Update relevant documentation

Common Task Patterns

Adding a New Feature

Identify the relevant module (browser4-core, browser4-agentic, browser4-rest)
Check existing similar features for patterns
Add interface/API in appropriate package
Implement with proper error handling and logging
Add tests (unit + integration if needed)
Update documentation

Fixing a Bug

Reproduce the issue with a test
Use grep to find related code
Make minimal fix
Verify test passes
Check for similar patterns elsewhere

Refactoring Code

Ensure tests exist for current behavior
Make incremental changes
Run tests after each step
Preserve public API contracts
Update KDoc if API changes

Adding a `browser4-cli` Command

Add a CommandDef in sdks/browser4-cli/src/commands.rs; keep the CLI command name kebab-case, use a browser_-prefixed snake_case MCP tool name, and map args/options to JSON in tool_params_fn
Add the frontend alias in browser4-rest/.../MCPToolController.kt so names like browser_my_tool resolve to the internal tool name such as my_tool
Reuse existing backend tools when possible; if a new browser capability is required, add an @MCP method in WebDriver.kt, implement it in the concrete driver, and only add an explicit BrowserTabToolExecutor case when parameter mapping is non-trivial
Update sdks/browser4-cli/src/main.rs only when the command needs custom dispatch, dynamic tool-name selection, stale-session recovery, or inclusion in no_snapshot_commands() for read-only behavior
Update sdks/skill/SKILL.md for user-facing command documentation; CLI help is generated from CommandDef, so avoid hand-editing help infrastructure
Cover the change with the smallest relevant tests: sdks/browser4-cli/src/commands.rs unit tests, browser4-rest controller mapping tests, sdks/browser4-cli/tests/e2e.rs, and browser4-tests/browser4-rest-tests/.../MCPToolControllerE2ETest.kt when the command changes the end-to-end flow
Watch the common failure points: missing backend alias, omitted sessionId in custom handlers, forgetting no_snapshot_commands() for read-only commands, mismatched element-ref parameter names, and snake_case/camelCase argument normalization

Browser Automation Specifics

Key Classes to Know:

WebDriver - Main browser control interface
PageHandler - Page lifecycle management
ClickableDOM - DOM interaction utilities
LoadOptions - Page loading parameters

Common Patterns:

val session = AgenticContexts.getOrCreateSession()
val agent = session.companionAgent
val driver = session.getOrCreateBoundDriver()
var page = session.open(url)
var document = session.parse(page)
var fields = session.extract(document, mapOf("title" to "#title"))
var result = agent.act("scroll to the bottom")
result = agent.act("scroll to the top")
result = agent.act("enter 'pulsar' into the search box and submit the form (RESULTS will display in the same page)")
result = agent.act("click search button")
var content = driver.selectFirstTextOrNull("body")
content = driver.selectFirstTextOrNull("body")
var history = agent.run("find the search box, type 'web scraping' and submit the form (RESULTS will display in the same page)")
page = session.capture(driver)
document = session.parse(page)
fields = session.extract(document, mapOf("title" to "#title"))

MCP (Model Context Protocol) Integration

Browser4 integrates with MCP for tool calling:

// Define a tool
class CustomTool : MCPTool {
    override val name = "custom_action"
    override val description = "Performs a custom action"

    override fun execute(params: Map<String, Any>): ToolResult {
        // Implementation
    }
}

// Register the tool
skillRegistry.register(CustomTool())

Performance Considerations

Coroutine Safety - All operations must be coroutine-safe
Resource Cleanup - Always close sessions/drivers in finally blocks
Batch Operations - Use parallel processing for multiple pages
Caching - Respect page expiration settings

Security Best Practices

Input Validation - Always validate URLs and user inputs
API Keys - Never hardcode, use configuration
XSS Prevention - Sanitize extracted content
CDP Security - Handle Chrome DevTools Protocol errors gracefully

Debugging with Claude

For Build Issues:

# Check Maven output
./mvnw clean compile -X

# Verify dependencies
./mvnw dependency:tree

For Test Failures:

# Run specific test
./mvnw -pl browser4-core test -Dtest=SpecificTest

# With debug output
./mvnw -pl browser4-core test -Dtest=SpecificTest -X

For Runtime Issues:

Check logs in logs/ directory
Enable trace logging for specific packages
Use -diagnose LoadOption for page loading issues

Working with Agents

Browser4's agentic capabilities allow autonomous task execution:

val agent = AgenticContexts.getOrCreateAgent()

// Simple task
val result = agent.run("Go to example.com and find the latest news")

// Complex multi-step task
val result = agent.run("""
    1. Navigate to shopping site
    2. Search for 'laptops under $1000'
    3. Filter by rating > 4 stars
    4. Extract top 5 products with specs
    5. Return as JSON
""")

Agent Best Practices:

Provide clear, step-by-step instructions
Use structured output formats (JSON, tables)
Handle errors gracefully
Set appropriate timeouts

Code Review Checklist

Before submitting changes, verify:

Code follows Kotlin conventions (immutable, explicit types)
Public APIs have KDoc documentation
Logging uses placeholders, not concatenation
Tests cover main path and at least one edge case
No hardcoded values (use configuration)
Changes are minimal and focused
Existing tests still pass
No new warnings or deprecations

Getting Help

Check docs/ for detailed guides
Review examples/ for usage patterns
Look in browser4-tests/ for test examples
See docs-dev/copilot/ for development notes

Last updated: 2026-03-14

FilesExpand file tree

AGENTS.md

Latest commit

History