Skip to content
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added

- **Multi-worker concurrency** — two layers, available identically across CLI, web, and the Python API:
- *Page-level*: `TestConfig.workers` (CLI `--workers`, web `workers` in the `POST /api/run` body) tests multiple pages of a single run in parallel, each worker driving its own browser/context. Defaults to `1` (sequential, unchanged behaviour); capped at 16. Authentication is performed once and replicated to every worker via Playwright `storage_state`.
- *Session-level*: `BatchRunner` (`from qa_agent import BatchRunner`) runs multiple independent sessions through a bounded thread pool. The CLI exposes it via `--batch-file`/`--pool-size`; the web server now uses it instead of an unbounded thread-per-job model (`QA_AGENT_JOB_POOL_SIZE`, default 4).
- **Expanded public API** — `from qa_agent import QAAgent, TestConfig, BatchRunner, …` now re-exports the full public surface for library use.

## [0.2.3] - 2026-05-22

### Fixed
Expand Down
111 changes: 111 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Setup

```bash
pip install -e ".[dev,web,pdf]"
playwright install chromium
```

The package must be installed (editable is fine) for `python -m qa_agent`, the
`qa-agent`/`qa-agent-web` entry points, and subprocess-based tests (e.g.
`tests/_cli_exit_helper.py`) to find the `qa_agent` module. If packaging tests
fail with `ModuleNotFoundError: No module named 'qa_agent'` or version
mismatches, run `pip install -e .` first — check with `pip show qa-agent`.

## Commands

```bash
# Unit tests (fast, no browser)
pytest -v -m "not integration and not network"

# Single test
pytest tests/test_agent.py::TestClassName::test_name -v

# Integration tests (real Playwright against local fixture server)
pytest -v -m integration --no-cov

# Lint / format / type-check
ruff check .
ruff format .
mypy qa_agent

# Build
rm -rf build/ dist/ && python -m build
```

Coverage is enforced at 70% via `--cov-fail-under=70` in `pyproject.toml`
(applies to default `pytest` invocations). Running a small subset of tests
without `--no-cov` will fail on the coverage gate even if the tests
themselves pass — use `-p no:cacheprovider -o addopts=""` or `--no-cov` to
bypass when checking a few tests in isolation.

Integration tests serve fixtures from `tests/fixtures/test-target/` (a
73-page HTML fixture site driven by `manifest.json`, which is the source of
truth for parametrized integration tests — each entry maps a fixture file to
an expected finding title/category). Start the fixture server manually for
debugging:

```bash
cd tests/fixtures/test-target && python3 -m http.server 8181
```

## Architecture

Request flow: `cli.py` parses args into a `TestConfig` (`config.py`) →
if `--instructions`/`--instructions-file` is set, `ai_planner.py` calls
`llm_client.py` (Anthropic/OpenAI via stdlib `urllib`, no SDK deps) to
produce a `TestPlan`, cached on disk by `plan_cache.py` (24h TTL) →
`agent.py` (`QAAgent`) launches Playwright, iterates/crawls target URLs, and
runs each enabled tester from `testers/` against every page, collecting
`Finding` objects → reporters in `reporters/` consume the resulting
`TestSession` and write console/markdown/json/pdf output.

- **Concurrency**: `concurrency.py` implements page-level worker pools
(`--workers`, max 16) within a single run, and `batch.py` (`BatchRunner`)
runs multiple independent `TestConfig` sessions concurrently with a bounded
pool (`--pool-size`/`--batch-file`, max 8). Total live browsers ≈
`pool_size × workers`.
- **Rate limiting**: `rate_limiter.py` (`HostRateLimiter`) paces
`page.goto()` navigations per-hostname (`--rate-limit`, default 3 req/s,
`0` disables). One shared instance per `QAAgent` run covers all its
workers; `BatchRunner` can hold a single shared instance passed to every
`QAAgent` it constructs so concurrent batch jobs hitting the same host
share one budget.
- **Testers** (`testers/`) all extend `BaseTester` (`testers/base.py`),
receive a Playwright `Page` + `TestConfig`, and return `list[Finding]`.
`custom.py` runs AI-generated steps from the cached `TestPlan`.
`wcag_compliance.py` is opt-in (`--wcag-compliance`) and excluded from
coverage.
- **Reporters** (`reporters/`) all extend `BaseReporter` and consume a
`TestSession`; JSON is always written regardless of `--output` (web UI
relies on it for session discovery).
- **Web UI** (`web/`): Flask app (`server.py`) with SSE streaming for live
run output; templates/static assets are in `web/templates/` and
`web/static/`. No auth — local/internal use only.
- **Models** (`models.py`): `Finding`, `FindingCategory`, `Severity`,
`PageAnalysis`, `TestSession`, `TestPlan` — the shared data contracts
between testers, the agent, and reporters.

### Adding a new tester

1. New module in `testers/` extending `BaseTester`, implement `run() ->
list[Finding]`.
2. Export from `testers/__init__.py`.
3. Add a `test_*` bool to `TestConfig` (`config.py`).
4. Wire into `agent.py` `_test_page()`.
5. Add `--skip-*`/opt-in flag in `cli.py` if needed.
6. Add tests in `tests/testers/`.

### Severity levels

`CRITICAL` (security/data loss) · `HIGH` (major usability blockers) ·
`MEDIUM` (UX/accessibility) · `LOW` (minor/best-practice) · `INFO`.

### Exit codes (CLI)

`0` no critical/high findings · `1` critical/high findings found · `2` error
during run · `130` interrupted (Ctrl+C). Covered by
`tests/test_packaging.py::TestExitCodeSmoke` via `tests/_cli_exit_helper.py`.
63 changes: 63 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,25 @@ print(f"Pages tested: {len(session.pages_tested)}")
print(f"Total findings: {session.total_findings}")
```

Set `workers` to test pages in parallel, and use `BatchRunner` to run several
independent sessions concurrently with a bounded pool:

```python
from qa_agent import BatchRunner, TestConfig

configs = [
TestConfig(urls=["https://example.com"], workers=4),
TestConfig(urls=["https://other.test"]),
]

with BatchRunner(pool_size=4) as runner:
for result in runner.run_all(configs):
if isinstance(result, Exception):
print(f"session failed: {result}")
else:
print(f"{result.session_id}: {result.total_findings} findings")
```

→ [Full Python API Reference](https://github.com/billrichards/qa-agent/blob/main/docs/api-reference.md) — all classes, methods, and configuration options.

---
Expand Down Expand Up @@ -241,6 +260,50 @@ qa-agent --mode focused https://example.com # default — test only given URLs
qa-agent --mode explore https://example.com # crawl and test discovered pages
```

### Concurrency

Test multiple pages in parallel with cooperating workers. Each worker drives its
own browser, so memory and CPU scale with the worker count (capped at 16).

```bash
qa-agent --workers 4 --mode explore https://example.com # 4 pages at a time
```

Run several independent sessions concurrently from a JSON spec file. Each entry
needs `urls` plus optional per-run overrides (`mode`, `max_depth`, `max_pages`,
`instructions`, `workers`); all other settings come from the command-line flags.

```bash
qa-agent --batch-file runs.json --pool-size 4
```

```json
[
{"urls": ["https://example.com"], "mode": "explore", "workers": 4},
{"urls": ["https://other.test/login"], "instructions": "Check the checkout flow"}
]
```

| Flag | Default | Description |
|---|---|---|
| `--workers N` | `1` | Concurrent page-workers per run (max 16) |
| `--batch-file FILE` | — | JSON file of multiple runs to execute concurrently |
| `--pool-size N` | `4` | Max concurrent runs for `--batch-file` (max 8) |
| `--rate-limit N` | `3.0` | Max page navigations/sec to any single host (0 = unlimited) |

> Total live browsers ≈ `pool-size × workers`, so size both with that
> multiplicative cost in mind. The web API accepts the same `workers` value in
> the `POST /api/run` body, and the pool size is set server-side via the
> `QA_AGENT_JOB_POOL_SIZE` environment variable.

By default, navigations to any single host are throttled to 3 requests/second
across all workers and batch jobs, to avoid overwhelming dev/staging servers
with "too many connections" when running with many concurrent browsers. Raise
or disable this with `--rate-limit` (e.g. `--rate-limit 10` or `--rate-limit 0`
for unlimited). The limit applies only to page navigations (`page.goto()`), not
in-page interactions like clicks or form fills. The web server uses the same
3 req/s default, overridable via the `QA_AGENT_RATE_LIMIT` environment variable.

### Exploration (explore mode)

| Flag | Default | Description |
Expand Down
31 changes: 31 additions & 0 deletions qa_agent/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,34 @@
except PackageNotFoundError:
# Package not installed (e.g. running from source without install)
__version__ = "0.2.3"

from .agent import QAAgent # noqa: E402
from .batch import BatchJob, BatchRunner # noqa: E402
from .config import ( # noqa: E402
AuthConfig,
OutputFormat,
RecordingConfig,
ScreenshotConfig,
TestConfig,
TestMode,
)
from .llm_client import LLMProvider # noqa: E402
from .models import Finding, PageAnalysis, Severity, TestSession # noqa: E402

__all__ = [
"QAAgent",
"BatchRunner",
"BatchJob",
"TestConfig",
"AuthConfig",
"ScreenshotConfig",
"RecordingConfig",
"TestMode",
"OutputFormat",
"LLMProvider",
"TestSession",
"PageAnalysis",
"Finding",
"Severity",
"__version__",
]
Loading
Loading