Pressingly · awais786 · Jun 3, 2026 · Jun 3, 2026 · Jun 3, 2026 · Jun 3, 2026
diff --git a/.github/workflows/doc-path-audit.yml b/.github/workflows/doc-path-audit.yml
@@ -0,0 +1,38 @@
+name: Doc-path audit
+
+# Verifies every relative file-path reference in CLAUDE.md, README.md,
+# skills.md, TRIAGE.md, TESTS.md, and docs/*.md points at a file that
+# actually exists in the repo.
+#
+# Failure mode this catches: a file gets renamed, moved, or archived,
+# and a prose reference somewhere in the doc set still points at the
+# old location. The bidirectional spec-coverage audit handles drift for
+# tests ↔ specs; this one handles drift for prose ↔ files.
+#
+# Same lightweight shape as spec-coverage.yml — pure-bash script, no
+# network, no node setup.
+
+on:
+  pull_request:
+    paths:
+      - '**/*.md'
+      - 'scripts/check-doc-paths.sh'
+      - '.github/workflows/doc-path-audit.yml'
+  push:
+    branches: [main]
+  workflow_dispatch:
+
+permissions:
+  contents: read
+
+jobs:
+  doc-path-audit:
+    runs-on: ubuntu-latest
+    permissions:
+      contents: read
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v6
+
+      - name: Run doc-path audit
+        run: bash scripts/check-doc-paths.sh
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -18,7 +18,7 @@ enforced, …) not a single product. Apps are upstream; we don't own their UI.
 2. [`docs/spec-coverage.md`](./docs/spec-coverage.md) — audit table: which test pins which requirement
 3. [`docs/spec-coverage-deferred.md`](./docs/spec-coverage-deferred.md) — gap pile + categories
 4. [`skills.md`](./skills.md) — findings (F1–F11), link-coverage rules, per-app admin gating narrative, adding-a-new-app guide, bug-staging workflow
-5. [`TESTS.md`](./TESTS.md) — per-test catalog (hand-maintained, may lag — `git log -- tests/` is the truth)
+5. To answer "what does test X do?" — read the test file's head comment + `@spec` tag, OR run `grep -nA1 "// @spec" tests/` for the full inventory. The retired per-test catalog (`TESTS.md`) was hand-maintained and went stale; it's archived at `docs/archive/TESTS.md` for provenance.
 
 When CI is red, [`TRIAGE.md`](./TRIAGE.md) is the 2-minute "what is
 this and what do I do" runbook — failure pattern → cause → action.
@@ -113,7 +113,7 @@ structure and how to update it.
 | **IDP / mPass** | The unauthenticated landing on the IDP host is a **method picker** ("QR Code Login" / "Password Login") — NOT a direct username/password form. `cognitoLogin()` clicks "Password Login" internally before filling fields. Tests calling `cognitoLogin()` shouldn't pre-assert on `input[type="password"]` visibility before the helper runs — assert on the method picker (e.g. `getByText(/password login/i)`) or skip the pre-assertion and trust `cognitoLogin` to find the right form. |
 | **Multi-tab login mutex** | Only **one** OAuth login flow can be in progress per browser at a time (FOSSSMBBUN-88 / `session-lifecycle#single-flight`). The portal Login button reads a non-HttpOnly `mpass_login_lock` cookie set by `/authorize`; mpass-auth-proxy returns **409** as a server-side backstop. TTL = 600s. Tests that start a login but **don't complete it** must either complete or close the context before starting a second concurrent flow — otherwise the second `/authorize` 409s. The lock cookie is cleared on successful `/mpass-callback`, on failed `/mpass-callback` (any path), and on `/mpass/logout`. |
 | **Identities** | Two SSO users with an **asymmetric workspace surface**, NOT a role-split. Both auto-join the shared SMB workspace `fossarbisoft` as Member; FOSS_USER also belongs to additional workspaces/teams that NORMAL_USER doesn't. Verified via direct API probes against the live sandbox. **`FOSS_USER` ("User A", multi-workspace surface):** Plane Member of `fossarbisoft` + Admin of `aa` (`FOSS_USER_PRIVATE_WORKSPACE_SLUG`); Outline workspace admin of own team (UUID `1a2e0bad-…` per `OUTLINE_TEAM_ID`); Penpot Owner of personal "Default" team (UUID `c16a7502-…` per `PENPOT_TEAM_ID`) plus Editor on the SMB `fossarbisoft` team; SurfSense Owner of search space id 1 (`SURFSENSE_SEARCH_SPACE_ID`); Twenty `canAccessFullAdminPanel=true`. **`NORMAL_USER` ("User B", SMB-workspace-only surface):** Plane Member of `fossarbisoft` only (no `aa`); Outline workspace admin of own (different) team; Penpot Owner of own personal Default; no broader admin scope. **Use cases:** Where FOSS_USER and NORMAL_USER overlap in `fossarbisoft` they have the SAME role (both Member) — so the per-app `*-admin` tests are about WORKSPACE-LEVEL admin (FOSS_USER has it on `aa`, NORMAL_USER doesn't); about each user's own-team admin (vacuously equal); or about cross-workspace isolation (NORMAL_USER's UI/API access to `aa` must be refused). `tests/apps/pm-workspace-isolation.spec.ts` pins the isolation contract for Plane. `PLANE_ADMIN_USER`/`PLANE_ADMIN_PASS` is the local-creds god-mode admin (separate from SSO). `foss-server-bundle`'s `scripts/provision-admin/*` (PR #63) can promote a chosen email to SMB-workspace admin in each app; not currently run with FOSS_USER's email. |
-| **Spec source — vendored** | The SSO openspec lives in-tree at `vendor/openspec/specs/`. The audit reads from there — no network or token at CI time. Updates land as normal PR diffs against the markdown files. `vendor/openspec/skills/` carries the per-app admin contracts (one `SKILL.md` per app) + the devstack `app-rules/RULES.md` — vendored as reference; the audit doesn't parse them yet (they don't use the `### Requirement:` format). |
+| **Spec source — vendored** | The SSO openspec lives in-tree at `vendor/openspec/specs/`. The audit reads from there — no network or token at CI time. Updates land as normal PR diffs against the markdown files. `vendor/openspec/skills/` carries the per-app admin contracts (one `SKILL.md` per app) + the devstack `vendor/openspec/skills/app-rules/RULES.md` — vendored as reference; the audit doesn't parse them yet (they don't use the `### Requirement:` format). |
 
 ## What to work on (live)
 

diff --git a/Makefile b/Makefile
@@ -34,8 +34,11 @@ typecheck: ## Run TypeScript type-check
 audit: ## Run the spec-coverage audit (needs SPEC_DIR or SPEC_REPO_TOKEN)
 	bash scripts/check-spec-coverage.sh
 
-pre-commit: typecheck audit ## Run typecheck + audit before pushing. Add a fast test subset locally if useful.
-	@echo "✓ typecheck + spec-coverage audit clean"
+audit-paths: ## Audit doc cross-references — every relative file path in CLAUDE/README/skills/TRIAGE/docs must resolve
+	bash scripts/check-doc-paths.sh
+
+pre-commit: typecheck audit audit-paths ## Run typecheck + audits before pushing. Add a fast test subset locally if useful.
+	@echo "✓ typecheck + spec-coverage + doc-path audits clean"
 
 test: ## Run full test suite
 	@# Guard against the common footgun: someone runs `make test ID=FOSSSMBBUN-112 FORCE=1`

diff --git a/README.md b/README.md
@@ -1,7 +1,8 @@
 # FOSS E2E — Playwright Test Suite
 
-End-to-end tests for the FOSS platform. See **[`TESTS.md`](./TESTS.md)** for
-a top-to-bottom catalog of every test in the suite. Highlights: SSO chain,
+End-to-end tests for the FOSS platform. See
+**[`docs/spec-coverage.md`](./docs/spec-coverage.md)** for the
+requirement-by-requirement audit table. Highlights: SSO chain,
 multi-app session sharing, cookie expiry bounds, session lifecycle
 (logout / invalidation / replay / deletion), per-app link coverage, the
 Plane god-mode admin escape hatch, Outline's admin `/settings/*` SSO-gating
@@ -290,7 +291,7 @@ tests/
 `FOSS_BASE_URL`.
 `skills.md` — local invariant contract (what every app must satisfy).
 [vendored openspec](./vendor/openspec/) — canonical edge-layer +
-per-app rules organised by capability spec; the `security/` and
+per-app rules organised by capability spec; the `tests/security/` and
 `identity-consistency` tests verify these on the live deployment.
 
 ## Auth architecture

diff --git a/TRIAGE.md b/TRIAGE.md
@@ -38,7 +38,7 @@ Don't triage the skipped tests. They're not the problem. Fix the smoke.
 | `per-app-login-smoke[<app>]` returns 2xx but no usable email | App middleware authed the request but didn't synthesise the email — likely `DEFAULT_EMAIL_DOMAIN` misconfig | Check the app container's env. See `vendor/openspec/skills/app-rules/RULES.md` for the canonical pattern. |
 | `outline-admin.spec.ts /settings/<X>` failed once, passed on retry | Outline's chunk-loader 429 burst flake | Known flake. Documented in `CLAUDE.md` → Deployment gotchas → Outline. Not actionable. |
 | `concurrent-first-login.spec.ts Outline: 3 parallel first-time visits resolve to ONE identity` | The race-condition test occasionally needs FOSS_USER to be a *fresh* user; if they're already provisioned in Outline, the test can flake | Re-run; if it stays red across re-runs, it's a real regression. |
-| `security/headers.spec.ts` portal CSP / COOP / CORP / Server | Bundle-side; portal nginx config | Waiting on `foss-server-bundle#66`. Suite-side fix already in (`tests/security/headers.spec.ts` scoped to portal only). |
+| `tests/security/headers.spec.ts` portal CSP / COOP / CORP / Server | Bundle-side; portal nginx config | Waiting on `foss-server-bundle#66`. Suite-side fix already in (`tests/security/headers.spec.ts` scoped to portal only). |
 | `tests/meta/playwright-practices.spec.ts` failed | Convention violation in a spec file | Read the failure — it names the file, line, and rule. Fix the spec, not the meta. |
 | `tests/meta/...` "every contract-bearing spec carries at least one @spec tag" failed | A new test file lacks an `@spec` tag and isn't in `UNTAGGED_ALLOWLIST` | Add `// @spec <module>#<requirement-slug>` pointing at a vendored requirement, OR add a new `### Requirement:` to the matching SKILL.md / spec.md and tag against it, OR add the file to `UNTAGGED_ALLOWLIST` in the meta spec with a one-line rationale. |
 | `spec-coverage` check failed with `❌ Missing` rows | New requirement in `vendor/openspec/` but no test pins it | Either add a `// @spec module#slug` tag above the test that covers it, or add it to `docs/spec-coverage-deferred.md` with a category + rationale. |

diff --git a/TESTS.md → docs/archive/TESTS.md b/TESTS.md → docs/archive/TESTS.md
diff --git a/docs/archive/test-review-2026-05-22.md b/docs/archive/test-review-2026-05-22.md
@@ -0,0 +1,85 @@
+# Test-suite review — 2026-05-22
+
+Scope: `tests/` (excluding `tests/bugs/`) — **44 files reviewed**, **306 expanded tests** (83 `test()` declarations × parameterization).
+
+## Bottom line
+
+The suite is in **good shape**. One Important finding worth fixing in a hygiene PR; no Critical findings. The 306-test count is justified — it expands from 83 distinct `test()` declarations, each parameterized for per-app + per-payload precision. Removing the multipliers would lose the "which exact attack vector / which exact app" failure-localisation signal.
+
+## Critical
+
+(none)
+
+## Important
+
+- **`tests/auth/proxy-short-circuit.spec.ts:34`** — hardcoded `_oauth2_proxy` literal
+  - **Why:** Convention (Mechanical check 2 from this skill): import `AUTH_COOKIE` from `constants.ts` rather than hardcoding the cookie name. The only files allowed to use the literal are `cookie-tampering` and `session-fixation` (which assert tampering signals).
+  - **Fix:** Replace `const SSO_COOKIE_NAMES = new Set(["_oauth2_proxy"]);` with `const SSO_COOKIE_NAMES = new Set([AUTH_COOKIE]);` and add `AUTH_COOKIE` to the existing `constants.ts` import.
+
+## Nit
+
+(none)
+
+## Findings discarded (false positives worth recording so future runs skip them)
+
+- **S2 (login choreography duplication):** 5 grep hits, all legitimate.
+  - `tests/security/sso-mode-no-local-login.spec.ts` looks for password inputs to assert they DON'T exist (the test's whole point).
+  - `tests/auth/session-lifecycle.spec.ts` checks if a password input is visible (re-auth detection signal).
+  - `tests/apps/pm-godmode.spec.ts` uses local creds at `/god-mode/` — explicitly NOT SSO, per CLAUDE.md.
+- **S4 (sleep / `delay()` / `node:timers/promises`):** all 30+ hits are `test.setTimeout(...)` or `raw.setTimeout(...)` — Playwright's per-test timeout setter, not a sleep call. Grep pattern needs to be narrower (`page.waitForTimeout` is the actual ban, already covered by Mechanical check 3).
+- **S6 (cookie surgery):** all hits are in tests where cookie state IS the system-under-test:
+  - `cookie-tampering.spec.ts` + `session-fixation.spec.ts` — explicitly listed as allowed in CLAUDE.md.
+  - `layer2-re-establish.spec.ts` — tests session re-establishment when Layer 1 cookies/storage are cleared. The cookie surgery is the setup, not a shortcut around the contract.
+  - `session-lifecycle.spec.ts` — tests session lifecycle by clearing/restoring `AUTH_COOKIE`. Same reasoning.
+- **S3 (bare `toBe(true|false)`):** 27 files flagged by raw count, but all sampled cases use the `expect(value, "descriptive message").toBe(...)` form — the message arg is on the line above. The skill's grep matches the closing parens only and gives a noisy signal. Skip this check on future runs, or rewrite the grep to detect missing-message-arg directly (e.g. AST-based, since `expect(X).toBe(Y)` vs `expect(X, "msg").toBe(Y)` is hard to distinguish with regex).
+
+## On the "do we have too many tests?" question
+
+| Concern | Reality |
+|---|---|
+| Raw count | 306 tests across 44 files |
+| Source count | 83 `test()` declarations |
+| Expansion ratio | 3.7× average, driven by **per-app loops** (×5) and **per-payload loops** (×4 for redirects, ×5 for OIDC state values) |
+
+### Highest-multiplier files
+
+| File | Tests | Pattern |
+|---|---|---|
+| `apps/outline-admin.spec.ts` | 39 | 13 Outline settings paths × 3 contracts (cold-bounce / non-admin role / admin role) |
+| `security/headers.spec.ts` | 27 | Targets × `HEADER_RULES` + the new HTML-hardening block |
+| `security/oidc-state-integrity.spec.ts` | 25 | 5 apps × 5 invalid-state payloads |
+| `security/bypass-surface.spec.ts` | 21 | Bypass paths × apps × statuses |
+| `security/open-redirect.spec.ts` | 20 | 5 apps × 4 evasion payloads |
+| `security/open-redirect-crlf.spec.ts` | 20 | 5 apps × 4 CRLF payloads |
+| `security/http-method-tampering.spec.ts` | 20 | 5 apps × 4 methods |
+
+Each line in this table is "one test per attack vector × per app." Collapsing them (e.g. one test that loops internally and aggregates failures) would hide which specific app/payload combination failed — the most useful signal during a security regression.
+
+### Specs to leave alone (high count is justified)
+
+All of them.
+
+### Specs to consider trimming (none currently)
+
+- A reasonable rule-of-thumb threshold is `tests-per-file > 40` for non-security specs. The only file approaching that is `outline-admin.spec.ts` at 39, but its three contracts × 13 paths shape is intentional and reviewed (see the file's leading comment about Outline's chunk-loader 429 flake handling).
+
+## Summary
+
+| Severity | Count |
+|---|---|
+| Critical | 0 |
+| Important | 1 |
+| Nit | 0 |
+
+**Action items:**
+
+1. Open a hygiene PR for the single Important finding (`proxy-short-circuit.spec.ts:34` — use `AUTH_COOKIE` import).
+
+That's it. No further action needed; the suite size and shape are correct.
+
+## Cross-references
+
+- `CLAUDE.md` — suite conventions (the rulebook this review measured against)
+- `skills.md` §1 / §2 — invariant-based test organisation
+- `scripts/check-spec-coverage.sh` — separate audit for `@spec` coverage (last run: 27 ✅ / 25 ⚠️ / 0 ❌ / 52 total)
+- `.claude/skills/test-reviewer/SKILL.md` — the skill that produced this report