Repo: https://github.com/platonai/Browser4
Table of Contents
- Project Overview
- Quick Start
- Project Structure
- Key APIs and Concepts
- Code Style Guidelines
- Testing Guidelines
- Configuration
- Development Principles
- Definition of Done
- Common Issues & Troubleshooting
- Documentation References
- Claude-Specific Guidance
Browser4 is a lightning-fast, coroutine-safe browser engine for AI agents. It provides:
- Browser Agents — Fully autonomous browser agents that reason, plan, and execute end-to-end tasks
- Browser Automation — High-performance automation for workflows, navigation, and data extraction
- Machine Learning Agent — Learns field structures across complex pages without consuming tokens
- Extreme Performance — Fully coroutine-safe; supports 100k ~ 200k complex page visits per machine per day
- Data Extraction — Hybrid of LLM, ML, and selectors for clean data across chaotic pages
- Java 17+
- Latest Google Chrome
Linux/macOS:
chmod +x mvnw
./mvnw -q -DskipTestsWindows (PowerShell):
.\mvnw.cmd -q -D"skipTests"Windows (cmd):
mvnw.cmd -q -DskipTestsCore module tests (Linux/macOS):
./mvnw -pl browser4-core -am test -Dsurefire.failIfNoSpecifiedTests=falseCore module tests (Windows PowerShell):
.\mvnw.cmd -pl browser4-core -am test -D"surefire.failIfNoSpecifiedTests=false"- Windows:
bin/build.ps1 [-test] - Linux/macOS:
bin/build.sh [-test]
| Module | Description |
|---|---|
browser4-core |
Core engine: sessions, scheduling, DOM, browser control |
browser4-agentic |
AI agents implementation, MCP, skills registration |
browser4-rest |
Spring Boot REST layer & command endpoints |
sdks/* |
Browser4 CLI + skill assets (sdks/browser4-cli, sdks/skill) |
browser4/* |
Product packaging (browser4/browser4-agents) |
examples/* |
Runnable examples (examples/browser4-examples) |
browser4-tests |
E2E & heavy integration & scenario tests |
browser4-tests-common |
Shared test base classes and utilities |
pulsar-benchmarks |
JMH benchmarks |
// Create a session
val session = AgenticContexts.createSession()
// Create an agent
val agent = AgenticContexts.getOrCreateAgent()WebDriver— Browser control interface with human-like behaviorsPulsarSession→AgenticSession— Page loading, parsing, and extractionLoadOptions— CLI-style URL parameters for page loadingBrowserPerceptiveAgent— AI agent implementation
URL parameters control page loading behavior:
val page = session.load(url, "-expires 1d -refresh -parse")Key options:
-expires <duration>— Page expiration time-refresh— Force page refresh-parse— Activate parsing subsystem-outLink <selector>— Extract links matching selector
- Prefer immutable
data class - Use explicit return types
- Apply null-safety patterns (
require/check/?:) - Public APIs require KDoc documentation
- Store AI generated task docs in
docs-dev/copilot/
KDoc Template:
/**
* Brief description of what the function does.
*
* @param paramName Description of the parameter.
* @return Description of the return value.
* @throws ExceptionType When this exception is thrown.
*/
fun functionName(paramName: Type): ReturnType {
require(paramName.isValid) { "paramName must be valid" }
// implementation
}Use placeholder-style logging (avoid string concatenation):
logger.info("Task {} finished in {} ms", taskId, cost)To keep iteration fast, don’t run full test suites by default.
- Default:
mvnwcompile with tests skipped - Then: run the smallest relevant test scope (module/class) when logic changes
- Upgrade scope when risk increases (cross-module, public API/DTO/serialization, Spring wiring, dependency bumps, concurrency/I/O, browser/CDP lifecycle)
See TESTING.md for details and trade-offs.
- Use
bin/test.ps1on Windows for scoped runs:fast,it,e2e,rest,skills,mcp,cli,browser4 - Maven profile switches in root
pom.xmlare property-driven:-DrunITs=true,-DrunE2ETests=true,-DrunSDKTests=true,-DrunCoreTests=true,-DrunRestTests=true sdks/browser4-cli/tests/e2e.rs: all e2e scenarios must start and depend on Browser4.jar; this includes single-scenario runs via--scenario.
- Module unit tests:
src/test/kotlin/... - Centralized integration/E2E:
browser4-tests/ - Shared utilities:
browser4-tests-common/
- Unit tests:
<ClassName>Test.kt - Integration tests:
<ClassName>IT.kt - E2E tests:
<ClassName>E2ETest.kt - Method names: Use camelCase (NOT backtick naming)
- ✅
testUserLoginWithValidCredentials()+@DisplayName("test user login with valid credentials") - ❌
`test user login with valid credentials`
- ✅
- Unit tests: <100ms
- Integration tests: <5s
- E2E tests: <30s
- Global: ≥70%
- Core logic: ≥80%
- Utilities: ≥90%
- Controllers: ≥85%
Default: 8182
application.properties— Main configurationapplication-*.properties— Profile-specific overridesapplication-private.properties— Private overrides (ignored by Git), secrets should be set here or via environment variables
# LLM API Key
openrouter.api.key=your-api-key
# Browser context mode
browser.context.mode=DEFAULT # DEFAULT | SYSTEM_DEFAULT | SEQUENTIAL | TEMPORARY
# Display mode
browser.display.mode=GUI # GUI | HEADLESS | SUPERVISED- Minimal Changes — Make the smallest possible modifications
- Preserve Style — Match existing code patterns
- Clear Logging — Use structured, placeholder-based logging
- Test Coverage — Include tests for new/changed logic
- Documentation — Update docs for public API changes
- Build and related tests pass
- No new high-noise logs or warnings
- New/changed logic has tests (main path + edge case)
- No secrets or private endpoints committed
- No arbitrary version changes (follow parent BOM)
- Documentation updated for public behavior changes
- Performance impact assessed if significant (>5%)
| Issue | Solution |
|---|---|
mvnw no execute permission |
chmod +x mvnw |
| JDK version mismatch | Ensure JDK 17+ in JAVA_HOME |
| Windows parameter escaping | Use -D"key.with.dots=value" |
| Port 8182 in use | Override server.port or use root application.properties |
| CDP retry log storms | Use existing retry utilities, lower log level |
- Configuration Guide
- Build Guide
- Testing Taxonomy
- Browser4 CLI Skills Development Guide
- Advanced Guide
- REST API Examples
- Concepts
- X-SQL
- AI Products Guidance
Browser4 is built around three core concepts:
- Sessions - Main interface to manage page loading, fetching, parsing, extracting, AI chatting, page state, persistence, and more
- Agents - Autonomous browser agents with reasoning capabilities
- WebDrivers - Low-level browser control with human-like behaviors
When given a task, Claude should:
- Analyze Requirements - Break down the task into minimal changes
- Explore First - Use grep/glob or explore agent to understand relevant code
- Make Minimal Changes - Preserve existing style and patterns
- Test Incrementally - Run targeted tests after each change
- Document Changes - Update relevant documentation
- Identify the relevant module (browser4-core, browser4-agentic, browser4-rest)
- Check existing similar features for patterns
- Add interface/API in appropriate package
- Implement with proper error handling and logging
- Add tests (unit + integration if needed)
- Update documentation
- Reproduce the issue with a test
- Use grep to find related code
- Make minimal fix
- Verify test passes
- Check for similar patterns elsewhere
- Ensure tests exist for current behavior
- Make incremental changes
- Run tests after each step
- Preserve public API contracts
- Update KDoc if API changes
- Add a
CommandDefinsdks/browser4-cli/src/commands.rs; keep the CLI command name kebab-case, use abrowser_-prefixed snake_case MCP tool name, and map args/options to JSON intool_params_fn - Add the frontend alias in
browser4-rest/.../MCPToolController.ktso names likebrowser_my_toolresolve to the internal tool name such asmy_tool - Reuse existing backend tools when possible; if a new browser capability is required, add an
@MCPmethod inWebDriver.kt, implement it in the concrete driver, and only add an explicitBrowserTabToolExecutorcase when parameter mapping is non-trivial - Update
sdks/browser4-cli/src/main.rsonly when the command needs custom dispatch, dynamic tool-name selection, stale-session recovery, or inclusion inno_snapshot_commands()for read-only behavior - Update
sdks/skill/SKILL.mdfor user-facing command documentation; CLI help is generated fromCommandDef, so avoid hand-editing help infrastructure - Cover the change with the smallest relevant tests:
sdks/browser4-cli/src/commands.rsunit tests,browser4-restcontroller mapping tests,sdks/browser4-cli/tests/e2e.rs, andbrowser4-tests/browser4-rest-tests/.../MCPToolControllerE2ETest.ktwhen the command changes the end-to-end flow - Watch the common failure points: missing backend alias, omitted
sessionIdin custom handlers, forgettingno_snapshot_commands()for read-only commands, mismatched element-ref parameter names, and snake_case/camelCase argument normalization
Key Classes to Know:
WebDriver- Main browser control interfacePageHandler- Page lifecycle managementClickableDOM- DOM interaction utilitiesLoadOptions- Page loading parameters
Common Patterns:
val session = AgenticContexts.getOrCreateSession()
val agent = session.companionAgent
val driver = session.getOrCreateBoundDriver()
var page = session.open(url)
var document = session.parse(page)
var fields = session.extract(document, mapOf("title" to "#title"))
var result = agent.act("scroll to the bottom")
result = agent.act("scroll to the top")
result = agent.act("enter 'pulsar' into the search box and submit the form (RESULTS will display in the same page)")
result = agent.act("click search button")
var content = driver.selectFirstTextOrNull("body")
content = driver.selectFirstTextOrNull("body")
var history = agent.run("find the search box, type 'web scraping' and submit the form (RESULTS will display in the same page)")
page = session.capture(driver)
document = session.parse(page)
fields = session.extract(document, mapOf("title" to "#title"))Browser4 integrates with MCP for tool calling:
// Define a tool
class CustomTool : MCPTool {
override val name = "custom_action"
override val description = "Performs a custom action"
override fun execute(params: Map<String, Any>): ToolResult {
// Implementation
}
}
// Register the tool
skillRegistry.register(CustomTool())- Coroutine Safety - All operations must be coroutine-safe
- Resource Cleanup - Always close sessions/drivers in finally blocks
- Batch Operations - Use parallel processing for multiple pages
- Caching - Respect page expiration settings
- Input Validation - Always validate URLs and user inputs
- API Keys - Never hardcode, use configuration
- XSS Prevention - Sanitize extracted content
- CDP Security - Handle Chrome DevTools Protocol errors gracefully
For Build Issues:
# Check Maven output
./mvnw clean compile -X
# Verify dependencies
./mvnw dependency:treeFor Test Failures:
# Run specific test
./mvnw -pl browser4-core test -Dtest=SpecificTest
# With debug output
./mvnw -pl browser4-core test -Dtest=SpecificTest -XFor Runtime Issues:
- Check logs in
logs/directory - Enable trace logging for specific packages
- Use
-diagnoseLoadOption for page loading issues
Browser4's agentic capabilities allow autonomous task execution:
val agent = AgenticContexts.getOrCreateAgent()
// Simple task
val result = agent.run("Go to example.com and find the latest news")
// Complex multi-step task
val result = agent.run("""
1. Navigate to shopping site
2. Search for 'laptops under $1000'
3. Filter by rating > 4 stars
4. Extract top 5 products with specs
5. Return as JSON
""")Agent Best Practices:
- Provide clear, step-by-step instructions
- Use structured output formats (JSON, tables)
- Handle errors gracefully
- Set appropriate timeouts
Before submitting changes, verify:
- Code follows Kotlin conventions (immutable, explicit types)
- Public APIs have KDoc documentation
- Logging uses placeholders, not concatenation
- Tests cover main path and at least one edge case
- No hardcoded values (use configuration)
- Changes are minimal and focused
- Existing tests still pass
- No new warnings or deprecations
- Check
docs/for detailed guides - Review
examples/for usage patterns - Look in
browser4-tests/for test examples - See
docs-dev/copilot/for development notes
Last updated: 2026-03-14