Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions .github/workflows/doc-path-audit.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: Doc-path audit

# Verifies every relative file-path reference in CLAUDE.md, README.md,
# skills.md, TRIAGE.md, TESTS.md, and docs/*.md points at a file that
# actually exists in the repo.
#
# Failure mode this catches: a file gets renamed, moved, or archived,
# and a prose reference somewhere in the doc set still points at the
# old location. The bidirectional spec-coverage audit handles drift for
# tests ↔ specs; this one handles drift for prose ↔ files.
#
# Same lightweight shape as spec-coverage.yml — pure-bash script, no
# network, no node setup.

on:
pull_request:
paths:
- '**/*.md'
- 'scripts/check-doc-paths.sh'
- '.github/workflows/doc-path-audit.yml'
push:
branches: [main]
workflow_dispatch:

permissions:
contents: read

jobs:
doc-path-audit:
runs-on: ubuntu-latest
permissions:
contents: read
steps:
- name: Checkout
uses: actions/checkout@v6

- name: Run doc-path audit
run: bash scripts/check-doc-paths.sh
4 changes: 2 additions & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ enforced, …) not a single product. Apps are upstream; we don't own their UI.
2. [`docs/spec-coverage.md`](./docs/spec-coverage.md) — audit table: which test pins which requirement
3. [`docs/spec-coverage-deferred.md`](./docs/spec-coverage-deferred.md) — gap pile + categories
4. [`skills.md`](./skills.md) — findings (F1–F11), link-coverage rules, per-app admin gating narrative, adding-a-new-app guide, bug-staging workflow
5. [`TESTS.md`](./TESTS.md) — per-test catalog (hand-maintained, may lag — `git log -- tests/` is the truth)
5. To answer "what does test X do?" — read the test file's head comment + `@spec` tag, OR run `grep -nA1 "// @spec" tests/` for the full inventory. The retired per-test catalog (`TESTS.md`) was hand-maintained and went stale; it's archived at `docs/archive/TESTS.md` for provenance.

When CI is red, [`TRIAGE.md`](./TRIAGE.md) is the 2-minute "what is
this and what do I do" runbook — failure pattern → cause → action.
Expand Down Expand Up @@ -113,7 +113,7 @@ structure and how to update it.
| **IDP / mPass** | The unauthenticated landing on the IDP host is a **method picker** ("QR Code Login" / "Password Login") — NOT a direct username/password form. `cognitoLogin()` clicks "Password Login" internally before filling fields. Tests calling `cognitoLogin()` shouldn't pre-assert on `input[type="password"]` visibility before the helper runs — assert on the method picker (e.g. `getByText(/password login/i)`) or skip the pre-assertion and trust `cognitoLogin` to find the right form. |
| **Multi-tab login mutex** | Only **one** OAuth login flow can be in progress per browser at a time (FOSSSMBBUN-88 / `session-lifecycle#single-flight`). The portal Login button reads a non-HttpOnly `mpass_login_lock` cookie set by `/authorize`; mpass-auth-proxy returns **409** as a server-side backstop. TTL = 600s. Tests that start a login but **don't complete it** must either complete or close the context before starting a second concurrent flow — otherwise the second `/authorize` 409s. The lock cookie is cleared on successful `/mpass-callback`, on failed `/mpass-callback` (any path), and on `/mpass/logout`. |
| **Identities** | Two SSO users with an **asymmetric workspace surface**, NOT a role-split. Both auto-join the shared SMB workspace `fossarbisoft` as Member; FOSS_USER also belongs to additional workspaces/teams that NORMAL_USER doesn't. Verified via direct API probes against the live sandbox. **`FOSS_USER` ("User A", multi-workspace surface):** Plane Member of `fossarbisoft` + Admin of `aa` (`FOSS_USER_PRIVATE_WORKSPACE_SLUG`); Outline workspace admin of own team (UUID `1a2e0bad-…` per `OUTLINE_TEAM_ID`); Penpot Owner of personal "Default" team (UUID `c16a7502-…` per `PENPOT_TEAM_ID`) plus Editor on the SMB `fossarbisoft` team; SurfSense Owner of search space id 1 (`SURFSENSE_SEARCH_SPACE_ID`); Twenty `canAccessFullAdminPanel=true`. **`NORMAL_USER` ("User B", SMB-workspace-only surface):** Plane Member of `fossarbisoft` only (no `aa`); Outline workspace admin of own (different) team; Penpot Owner of own personal Default; no broader admin scope. **Use cases:** Where FOSS_USER and NORMAL_USER overlap in `fossarbisoft` they have the SAME role (both Member) — so the per-app `*-admin` tests are about WORKSPACE-LEVEL admin (FOSS_USER has it on `aa`, NORMAL_USER doesn't); about each user's own-team admin (vacuously equal); or about cross-workspace isolation (NORMAL_USER's UI/API access to `aa` must be refused). `tests/apps/pm-workspace-isolation.spec.ts` pins the isolation contract for Plane. `PLANE_ADMIN_USER`/`PLANE_ADMIN_PASS` is the local-creds god-mode admin (separate from SSO). `foss-server-bundle`'s `scripts/provision-admin/*` (PR #63) can promote a chosen email to SMB-workspace admin in each app; not currently run with FOSS_USER's email. |
| **Spec source — vendored** | The SSO openspec lives in-tree at `vendor/openspec/specs/`. The audit reads from there — no network or token at CI time. Updates land as normal PR diffs against the markdown files. `vendor/openspec/skills/` carries the per-app admin contracts (one `SKILL.md` per app) + the devstack `app-rules/RULES.md` — vendored as reference; the audit doesn't parse them yet (they don't use the `### Requirement:` format). |
| **Spec source — vendored** | The SSO openspec lives in-tree at `vendor/openspec/specs/`. The audit reads from there — no network or token at CI time. Updates land as normal PR diffs against the markdown files. `vendor/openspec/skills/` carries the per-app admin contracts (one `SKILL.md` per app) + the devstack `vendor/openspec/skills/app-rules/RULES.md` — vendored as reference; the audit doesn't parse them yet (they don't use the `### Requirement:` format). |

## What to work on (live)

Expand Down
7 changes: 5 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,11 @@ typecheck: ## Run TypeScript type-check
audit: ## Run the spec-coverage audit (needs SPEC_DIR or SPEC_REPO_TOKEN)
bash scripts/check-spec-coverage.sh

pre-commit: typecheck audit ## Run typecheck + audit before pushing. Add a fast test subset locally if useful.
@echo "✓ typecheck + spec-coverage audit clean"
audit-paths: ## Audit doc cross-references — every relative file path in CLAUDE/README/skills/TRIAGE/docs must resolve
bash scripts/check-doc-paths.sh

pre-commit: typecheck audit audit-paths ## Run typecheck + audits before pushing. Add a fast test subset locally if useful.
@echo "✓ typecheck + spec-coverage + doc-path audits clean"

test: ## Run full test suite
@# Guard against the common footgun: someone runs `make test ID=FOSSSMBBUN-112 FORCE=1`
Expand Down
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
# FOSS E2E — Playwright Test Suite

End-to-end tests for the FOSS platform. See **[`TESTS.md`](./TESTS.md)** for
a top-to-bottom catalog of every test in the suite. Highlights: SSO chain,
End-to-end tests for the FOSS platform. See
**[`docs/spec-coverage.md`](./docs/spec-coverage.md)** for the
requirement-by-requirement audit table. Highlights: SSO chain,
multi-app session sharing, cookie expiry bounds, session lifecycle
(logout / invalidation / replay / deletion), per-app link coverage, the
Plane god-mode admin escape hatch, Outline's admin `/settings/*` SSO-gating
Expand Down Expand Up @@ -290,7 +291,7 @@ tests/
`FOSS_BASE_URL`.
`skills.md` — local invariant contract (what every app must satisfy).
[vendored openspec](./vendor/openspec/) — canonical edge-layer +
per-app rules organised by capability spec; the `security/` and
per-app rules organised by capability spec; the `tests/security/` and
`identity-consistency` tests verify these on the live deployment.

## Auth architecture
Expand Down
2 changes: 1 addition & 1 deletion TRIAGE.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ Don't triage the skipped tests. They're not the problem. Fix the smoke.
| `per-app-login-smoke[<app>]` returns 2xx but no usable email | App middleware authed the request but didn't synthesise the email — likely `DEFAULT_EMAIL_DOMAIN` misconfig | Check the app container's env. See `vendor/openspec/skills/app-rules/RULES.md` for the canonical pattern. |
| `outline-admin.spec.ts /settings/<X>` failed once, passed on retry | Outline's chunk-loader 429 burst flake | Known flake. Documented in `CLAUDE.md` → Deployment gotchas → Outline. Not actionable. |
| `concurrent-first-login.spec.ts Outline: 3 parallel first-time visits resolve to ONE identity` | The race-condition test occasionally needs FOSS_USER to be a *fresh* user; if they're already provisioned in Outline, the test can flake | Re-run; if it stays red across re-runs, it's a real regression. |
| `security/headers.spec.ts` portal CSP / COOP / CORP / Server | Bundle-side; portal nginx config | Waiting on `foss-server-bundle#66`. Suite-side fix already in (`tests/security/headers.spec.ts` scoped to portal only). |
| `tests/security/headers.spec.ts` portal CSP / COOP / CORP / Server | Bundle-side; portal nginx config | Waiting on `foss-server-bundle#66`. Suite-side fix already in (`tests/security/headers.spec.ts` scoped to portal only). |
| `tests/meta/playwright-practices.spec.ts` failed | Convention violation in a spec file | Read the failure — it names the file, line, and rule. Fix the spec, not the meta. |
| `tests/meta/...` "every contract-bearing spec carries at least one @spec tag" failed | A new test file lacks an `@spec` tag and isn't in `UNTAGGED_ALLOWLIST` | Add `// @spec <module>#<requirement-slug>` pointing at a vendored requirement, OR add a new `### Requirement:` to the matching SKILL.md / spec.md and tag against it, OR add the file to `UNTAGGED_ALLOWLIST` in the meta spec with a one-line rationale. |
| `spec-coverage` check failed with `❌ Missing` rows | New requirement in `vendor/openspec/` but no test pins it | Either add a `// @spec module#slug` tag above the test that covers it, or add it to `docs/spec-coverage-deferred.md` with a category + rationale. |
Expand Down
File renamed without changes.
85 changes: 85 additions & 0 deletions docs/archive/test-review-2026-05-22.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# Test-suite review — 2026-05-22

Scope: `tests/` (excluding `tests/bugs/`) — **44 files reviewed**, **306 expanded tests** (83 `test()` declarations × parameterization).

## Bottom line

The suite is in **good shape**. One Important finding worth fixing in a hygiene PR; no Critical findings. The 306-test count is justified — it expands from 83 distinct `test()` declarations, each parameterized for per-app + per-payload precision. Removing the multipliers would lose the "which exact attack vector / which exact app" failure-localisation signal.

## Critical

(none)

## Important

- **`tests/auth/proxy-short-circuit.spec.ts:34`** — hardcoded `_oauth2_proxy` literal
- **Why:** Convention (Mechanical check 2 from this skill): import `AUTH_COOKIE` from `constants.ts` rather than hardcoding the cookie name. The only files allowed to use the literal are `cookie-tampering` and `session-fixation` (which assert tampering signals).
- **Fix:** Replace `const SSO_COOKIE_NAMES = new Set(["_oauth2_proxy"]);` with `const SSO_COOKIE_NAMES = new Set([AUTH_COOKIE]);` and add `AUTH_COOKIE` to the existing `constants.ts` import.

## Nit

(none)

## Findings discarded (false positives worth recording so future runs skip them)

- **S2 (login choreography duplication):** 5 grep hits, all legitimate.
- `tests/security/sso-mode-no-local-login.spec.ts` looks for password inputs to assert they DON'T exist (the test's whole point).
- `tests/auth/session-lifecycle.spec.ts` checks if a password input is visible (re-auth detection signal).
- `tests/apps/pm-godmode.spec.ts` uses local creds at `/god-mode/` — explicitly NOT SSO, per CLAUDE.md.
- **S4 (sleep / `delay()` / `node:timers/promises`):** all 30+ hits are `test.setTimeout(...)` or `raw.setTimeout(...)` — Playwright's per-test timeout setter, not a sleep call. Grep pattern needs to be narrower (`page.waitForTimeout` is the actual ban, already covered by Mechanical check 3).
- **S6 (cookie surgery):** all hits are in tests where cookie state IS the system-under-test:
- `cookie-tampering.spec.ts` + `session-fixation.spec.ts` — explicitly listed as allowed in CLAUDE.md.
- `layer2-re-establish.spec.ts` — tests session re-establishment when Layer 1 cookies/storage are cleared. The cookie surgery is the setup, not a shortcut around the contract.
- `session-lifecycle.spec.ts` — tests session lifecycle by clearing/restoring `AUTH_COOKIE`. Same reasoning.
- **S3 (bare `toBe(true|false)`):** 27 files flagged by raw count, but all sampled cases use the `expect(value, "descriptive message").toBe(...)` form — the message arg is on the line above. The skill's grep matches the closing parens only and gives a noisy signal. Skip this check on future runs, or rewrite the grep to detect missing-message-arg directly (e.g. AST-based, since `expect(X).toBe(Y)` vs `expect(X, "msg").toBe(Y)` is hard to distinguish with regex).

## On the "do we have too many tests?" question

| Concern | Reality |
|---|---|
| Raw count | 306 tests across 44 files |
| Source count | 83 `test()` declarations |
| Expansion ratio | 3.7× average, driven by **per-app loops** (×5) and **per-payload loops** (×4 for redirects, ×5 for OIDC state values) |

### Highest-multiplier files

| File | Tests | Pattern |
|---|---|---|
| `apps/outline-admin.spec.ts` | 39 | 13 Outline settings paths × 3 contracts (cold-bounce / non-admin role / admin role) |
| `security/headers.spec.ts` | 27 | Targets × `HEADER_RULES` + the new HTML-hardening block |
| `security/oidc-state-integrity.spec.ts` | 25 | 5 apps × 5 invalid-state payloads |
| `security/bypass-surface.spec.ts` | 21 | Bypass paths × apps × statuses |
| `security/open-redirect.spec.ts` | 20 | 5 apps × 4 evasion payloads |
| `security/open-redirect-crlf.spec.ts` | 20 | 5 apps × 4 CRLF payloads |
| `security/http-method-tampering.spec.ts` | 20 | 5 apps × 4 methods |

Each line in this table is "one test per attack vector × per app." Collapsing them (e.g. one test that loops internally and aggregates failures) would hide which specific app/payload combination failed — the most useful signal during a security regression.

### Specs to leave alone (high count is justified)

All of them.

### Specs to consider trimming (none currently)

- A reasonable rule-of-thumb threshold is `tests-per-file > 40` for non-security specs. The only file approaching that is `outline-admin.spec.ts` at 39, but its three contracts × 13 paths shape is intentional and reviewed (see the file's leading comment about Outline's chunk-loader 429 flake handling).

## Summary

| Severity | Count |
|---|---|
| Critical | 0 |
| Important | 1 |
| Nit | 0 |

**Action items:**

1. Open a hygiene PR for the single Important finding (`proxy-short-circuit.spec.ts:34` — use `AUTH_COOKIE` import).

That's it. No further action needed; the suite size and shape are correct.

## Cross-references

- `CLAUDE.md` — suite conventions (the rulebook this review measured against)
- `skills.md` §1 / §2 — invariant-based test organisation
- `scripts/check-spec-coverage.sh` — separate audit for `@spec` coverage (last run: 27 ✅ / 25 ⚠️ / 0 ❌ / 52 total)
- `.claude/skills/test-reviewer/SKILL.md` — the skill that produced this report
Loading
Loading