CI Hardening Phase 3: Deep analysis — post #699/#700/#701

## CI Hardening Phase 3: Deep Analysis — Post #699/#700/#701

**Author:** Flight (Lead)  
**Date:** 2026-03-30  
**Scope:** Every workflow, action, test, and operational pattern in bradygaster/squad CI  
**Refs:** diberry/squad#121 (original 19-item analysis), bradygaster/squad#699 (workarounds), #700 (Phase 1), #701 (Phase 2)

---

## What's Been Done (Phases 1-2)

### Phase 1 — PR #700 (MERGED) ✅
5 quick wins implemented:
1. **Retry logic on npm install** — 3-attempt wrapper on all npm install/ci calls
2. **Job timeout tuning** — timeout-minutes on every job (3-15 min)
3. **npm cache optimization** — cache npm on setup-node where missing
4. **Conditional docs quality** — path filter skips docs-quality on code-only PRs
5. **Publish secret validation** — fail-fast check for NPM_TOKEN

### Phase 2 — PR #701 (OPEN, pending merge) 🔄
5 short-term items + security hardening:
6. **Lockfile stability** — drift detection in preflight (prevents v0.9.1-style incidents)
7. **Composite action** — DRY setup-node + npm install wrapper (`.github/actions/setup-squad-node/`)
8. **Ralph cron audit** — documented "no cron" as intentional design decision
9. **API rate limit monitoring** — warns <100, fails <10 remaining (state-only logging for security)
10. **npm registry health check** — `npm ping` + namespace check before publish

Security fixes: shell injection prevention in composite action, lockfile integrity hash validation, rate limit count redaction from public logs.

### PR #699 (MERGED) ✅
Combined CI workarounds for @github/copilot-sdk ESM bug (vscode-jsonrpc/node module resolution).

---

## Current CI Health Snapshot (2026-03-30)

| Metric | Value |
|--------|-------|
| **Workflows** | 15 active |
| **Tests** | 5,038+ passing across 196 test files |
| **CI duration** | ~4-5 min (main), ~2-3 min (publish) |
| **Success rate (last 50)** | Squad CI: 15/22 success, 4 action_required, 3 failure |
| **Actions minutes** | ~145 min/week (~$0.35/month) |
| **Platforms tested** | Ubuntu only |
| **Node versions** | 22 (main), 20 (insider) |

---

## Phase 3: Deep Analysis — 30 Items Across 6 Categories

### Category A: Reliability & Resilience

#### A1. Concurrency controls missing on most workflows
- **Problem:** Only `squad-docs.yml` has concurrency settings. All other workflows can run multiple instances simultaneously — if 3 PRs push in quick succession, 3 `squad-ci` runs compete for resources.
- **Impact:** HIGH — resource waste, potential race conditions on label/triage workflows
- **Effort:** LOW (YAML-only)
- **Fix:** Add `concurrency: { group: '${{ github.workflow }}-${{ github.ref }}', cancel-in-progress: true }` to `squad-ci.yml`, `squad-heartbeat.yml`, `squad-triage.yml`, `squad-label-enforce.yml`, `squad-issue-assign.yml`
- **Priority:** P1
- **Owner:** Booster

#### A2. Promote workflow uses dangerous merge strategy
- **Problem:** `squad-promote.yml` uses `--no-commit --no-ff -X theirs` to merge dev→preview. This auto-resolves ALL conflicts by taking dev's version without human review.
- **Impact:** HIGH — bad code can silently override preview/main
- **Effort:** MEDIUM
- **Fix:** Switch to `--no-ff` without `-X theirs`. Fail on conflicts and require manual resolution. Add dry-run diff output showing exactly what would change.
- **Priority:** P1
- **Owner:** Procedures + Flight

#### A3. No rollback mechanism for docs deployments
- **Problem:** `squad-docs.yml` deploys directly to GitHub Pages with no build output validation and no rollback. Broken docs go live instantly.
- **Impact:** MEDIUM — user-facing docs broken until next push to main
- **Effort:** MEDIUM
- **Fix:** Add post-deploy smoke check (HTTP status on key pages). Store previous deployment artifact for rollback. Consider deployment environments with approval gates.
- **Priority:** P2
- **Owner:** PAO + Booster

#### A4. Insider publish creates inconsistent state on partial failure
- **Problem:** `squad-insider-publish.yml` publishes SDK and CLI sequentially. If CLI publish fails after SDK succeeds, npm has mismatched SDK/CLI versions on the `insider` tag.
- **Impact:** MEDIUM — insider users get broken dependency graph
- **Effort:** MEDIUM
- **Fix:** Add post-publish verification step. If CLI fails, emit clear error with manual recovery instructions. Consider publish-both-or-neither pattern with npm unpublish fallback (risky but atomic).
- **Priority:** P2
- **Owner:** Surgeon + Booster

#### A5. Release/publish workflow coordination gap
- **Problem:** `squad-release.yml` (push to main) and `squad-npm-publish.yml` (release event) can trigger near-simultaneously. Release creates tag + GitHub Release, which triggers npm publish. No explicit handoff or dependency check.
- **Impact:** MEDIUM — timing-sensitive; if release workflow slow, npm publish may start before tag exists
- **Effort:** LOW
- **Fix:** Add `workflow_run` trigger on `squad-npm-publish.yml` that waits for `squad-release.yml` completion instead of relying on release event timing.
- **Priority:** P2
- **Owner:** Booster

#### A6. Tag deduplication gaps in release workflows
- **Problem:** Both `squad-release.yml` and `squad-insider-release.yml` create tags. If a tag already exists, `git push` fails. `squad-release.yml` checks for existing tags, but `squad-insider-release.yml` does not.
- **Impact:** LOW — insider releases would fail on re-push; not catastrophic
- **Effort:** LOW
- **Fix:** Add tag existence check to `squad-insider-release.yml` matching the pattern in `squad-release.yml`.
- **Priority:** P3
- **Owner:** Booster

---

### Category B: Testing Depth

#### B1. No cross-platform CI (Windows/macOS)
- **Problem:** All CI runs on `ubuntu-latest` only. SDK/CLI published on npm for all platforms but never tested on Windows or macOS. Known platform-specific skips exist in tests (storage-provider, resolution, doctor).
- **Impact:** HIGH — Windows CLI bugs ship undetected
- **Effort:** MEDIUM (2-3 hours)
- **Fix:** Create weekly `squad-cross-platform.yml` with matrix: `{os: [ubuntu-latest, windows-latest, macos-latest], node: [20, 22]}`. Gate behind `workflow_dispatch` + weekly cron. Opt-in via repo variable.
- **Priority:** P1
- **Owner:** Booster + FIDO
- **Refs:** bradygaster/squad#582 (flaky tests under load), #677 (Docker skip guards)

#### B2. Docker-dependent tests need skip guards
- **Problem:** Some tests require Docker (e.g., Aspire integration). These fail in environments without Docker, creating false failures.
- **Impact:** MEDIUM — blocks cross-platform CI (B1) since Windows/macOS runners may not have Docker
- **Effort:** LOW
- **Fix:** Already tracked as bradygaster/squad#677. Add `describe.skipIf(!hasDocker)` guards. Must ship before B1.
- **Priority:** P1 (prerequisite for B1)
- **Owner:** FIDO
- **Refs:** bradygaster/squad#677

#### B3. Flaky test quarantine system
- **Problem:** No mechanism to detect, track, or quarantine flaky tests. 3/22 recent Squad CI failures — unclear if flakes or real issues. Issue #582 documents flaky tests under full-suite load.
- **Impact:** HIGH — flakes erode trust in CI, developers ignore failures
- **Effort:** HIGH (4-6 hours)
- **Fix:** Add vitest `--reporter=json` output + post-test analysis script. Track per-test pass/fail across last N runs. Quarantine tests with >20% but <100% failure rate. Post flake report to PR comment.
- **Priority:** P2
- **Owner:** FIDO + Booster
- **Refs:** bradygaster/squad#582

#### B4. Acceptance tests (Gherkin BDD) not visible as separate CI gate
- **Problem:** 8 Gherkin feature files exist in `test/acceptance/` with full step definitions and terminal harness. But they run as part of the monolithic `npm test` — no visibility into acceptance test health vs unit test health.
- **Impact:** LOW — informational gap, not a failure gap
- **Effort:** LOW
- **Fix:** Add separate vitest workspace config for acceptance tests. Report as distinct CI step in `squad-ci.yml`.
- **Priority:** P3
- **Owner:** FIDO

#### B5. Sample tests only run when SDK source changes
- **Problem:** `samples-build` job in `squad-ci.yml` is gated on `packages/squad-sdk/src/` changes. If a sample's own code breaks (e.g., bad import), CI won't catch it unless SDK is also changed.
- **Impact:** MEDIUM — sample rot goes undetected
- **Effort:** LOW
- **Fix:** Expand path filter to include `samples/**` in trigger condition. Also add weekly cron run of samples-build.
- **Priority:** P2
- **Owner:** Booster

#### B6. No test sharding for faster execution
- **Problem:** All 196 test files run in a single job (~4-5 min). As test count grows, this becomes a bottleneck.
- **Impact:** LOW (currently fast enough)
- **Effort:** MEDIUM
- **Fix:** Use vitest `--shard=1/N` flag with matrix strategy when test count exceeds threshold (~300 files). Not needed now.
- **Priority:** P3 (future)
- **Owner:** Booster

#### B7. E2E CLI smoke test expansion
- **Problem:** `squad-npm-publish.yml` has CLI packaging smoke test, but it only validates `npm pack`. No test of actual global install + command execution.
- **Impact:** MEDIUM — global install breakage ships undetected
- **Effort:** LOW (1-2 hours)
- **Fix:** Extend smoke test to: `npm install -g @bradygaster/squad-cli`, `squad --version`, `squad init --help`, `squad status`.
- **Priority:** P2
- **Owner:** Booster + EECOM

---

### Category C: Security & Supply Chain

#### C1. Third-party actions not pinned to SHAs
- **Problem:** All first-party actions (`actions/checkout`, `actions/setup-node`) use `@v4` tags. Third-party actions in `squad-docs-links.yml` use tags: `lycheeverse/lychee-action@v2`, `peter-evans/create-issue-from-file@v5`. Tags can be force-pushed by upstream maintainers.
- **Impact:** HIGH — supply chain attack vector
- **Effort:** LOW
- **Fix:** Pin ALL actions (first-party and third-party) to full SHA commits. Use `npx pin-github-action` or Dependabot's action pinning feature. Example: `actions/checkout@b4ffde65f...` instead of `@v4`.
- **Priority:** P1
- **Owner:** Booster

#### C2. No dependency review on PRs
- **Problem:** No Dependabot, Renovate, or `dependency-review-action` configured. New dependencies added via PR are not automatically audited for vulnerabilities or license issues.
- **Impact:** HIGH — malicious/vulnerable transitive deps go unreviewed
- **Effort:** LOW
- **Fix:** Add `.github/dependabot.yml` for automated PR creation on dependency updates. Add `actions/dependency-review-action@v4` to PR CI pipeline. Enable GitHub's built-in dependency graph.
- **Priority:** P1
- **Owner:** Booster + FIDO

#### C3. No SAST/security scanning in CI
- **Problem:** No CodeQL, Snyk, SonarQube, or `npm audit` in CI. TypeScript strict mode and ESLint are the only static analysis.
- **Impact:** HIGH — security regressions, known CVEs in deps go undetected
- **Effort:** MEDIUM (3-4 hours)
- **Fix:** Add `npm audit --audit-level=moderate` as CI gate. Add CodeQL analysis workflow for JavaScript/TypeScript. Consider `github/codeql-action@v3` with auto-build.
- **Priority:** P1
- **Owner:** FIDO + Booster

#### C4. No CODEOWNERS file
- **Problem:** No `.github/CODEOWNERS`. PRs don't auto-assign reviewers based on file paths. Anyone can modify workflow files without required review from CI owners.
- **Impact:** MEDIUM — workflow changes not automatically routed to right reviewer
- **Effort:** LOW (15 min)
- **Fix:** Create `.github/CODEOWNERS` with: `.github/workflows/* @bradygaster`, `packages/squad-sdk/* @eecom-owner`, `packages/squad-cli/* @eecom-owner`, `docs/* @pao-owner`.
- **Priority:** P2
- **Owner:** Flight + Procedures

#### C5. Team.md parsing is a fragile injection surface
- **Problem:** 6 workflows parse `.squad/team.md` or `.ai-team/team.md` using string splitting on `|`. The parsed values are used in GitHub API calls (label names, comments, assignments). A malicious team.md edit could inject unexpected values.
- **Impact:** MEDIUM — limited to repos with write access, but violates defense-in-depth
- **Effort:** MEDIUM
- **Fix:** Extract team.md parsing into a single composite action with input validation (A7 recommendation). Sanitize all parsed values before use in API calls. Add regex validation for member names.
- **Priority:** P2
- **Owner:** Booster + RETRO

#### C6. No license compliance scanning
- **Problem:** No automated check for license compatibility of npm dependencies. SDK/CLI are published packages — transitive dependency licenses matter.
- **Impact:** MEDIUM — legal compliance risk for downstream consumers
- **Effort:** LOW
- **Fix:** Add `license-checker` or `licensee` check to CI. Maintain allowlist of approved licenses (MIT, Apache-2.0, ISC, BSD-*).
- **Priority:** P3
- **Owner:** Procedures

---

### Category D: Performance & Cost

#### D1. No build artifact caching between jobs
- **Problem:** Multiple jobs in `squad-ci.yml` run `npm ci` and `npm run build` independently. The `test`, `exports-map-check`, `export-smoke-test`, and `samples-build` jobs all rebuild from scratch.
- **Impact:** MEDIUM — ~2 min wasted per run on redundant builds
- **Effort:** MEDIUM
- **Fix:** Use `actions/cache` or `actions/upload-artifact` + `actions/download-artifact` to share build output between jobs. Build once in a `build` job, download in dependent jobs.
- **Priority:** P2
- **Owner:** Booster

#### D2. Playwright browser install is unconditional
- **Problem:** `squad-ci.yml` always installs Playwright browsers (~30s) even if no Playwright tests would run.
- **Impact:** LOW — 30 seconds per run
- **Effort:** LOW
- **Fix:** Already identified in original analysis (item 2.4). Cache Playwright browsers. Skip install if only doc changes.
- **Priority:** P3
- **Owner:** Booster

#### D3. `action_required` conclusions need investigation
- **Problem:** 4 out of 22 recent Squad CI runs show `action_required` conclusion. These are likely first-time fork contributors needing workflow approval. But the pattern creates noise in CI history.
- **Impact:** LOW — informational
- **Effort:** LOW
- **Fix:** Document that `action_required` is expected for fork PRs. Consider auto-approve for known contributors. Add to CI runbook.
- **Priority:** P3
- **Owner:** Procedures

#### D4. Expand Node version matrix for insider builds
- **Problem:** `squad-insider-publish.yml` uses a matrix strategy but only tests Node 20. Main CI only tests Node 22. Node 18 (LTS until April 2025) is untested.
- **Impact:** MEDIUM — Node version incompatibilities ship undetected
- **Effort:** LOW
- **Fix:** Expand insider matrix to `[20, 22]`. Add Node 22 validation to insider builds (currently only Node 20). Align with `engines` field (`>=22.5.0`).
- **Priority:** P2
- **Owner:** Booster

---

### Category E: Developer Experience

#### E1. No `.nvmrc` or `.node-version` file
- **Problem:** `package.json` specifies `engines.node: ">=22.5.0"` but there's no `.nvmrc` or `.node-version` file. Contributors must read `package.json` to know which Node version to use.
- **Impact:** LOW — friction for new contributors
- **Effort:** LOW (1 min)
- **Fix:** Create `.nvmrc` with `22` or `.node-version` with `22.5.0`.
- **Priority:** P3
- **Owner:** Booster

#### E2. No local CI simulation (act config)
- **Problem:** Workflows have excellent inline documentation for running individual gates locally, but there's no way to run the full CI pipeline locally. No `act` config, no Makefile.
- **Impact:** MEDIUM — contributors push to test, wasting CI minutes and time
- **Effort:** MEDIUM (2-3 hours)
- **Fix:** Add `Makefile` or `scripts/ci-local.sh` that runs all CI gates in sequence: build, test, lint, changelog check, exports check, samples build. Optional: add `.actrc` for GitHub Actions local runner.
- **Priority:** P2
- **Owner:** Booster + EECOM

#### E3. CI failure messages could be more actionable
- **Problem:** Some CI gates output generic failure messages. For example, the large deletion guard just says "PR deletes more than 50 files" without showing which files. The exports map check doesn't show which export is missing.
- **Impact:** LOW — developers can dig into logs, but friction is real
- **Effort:** LOW
- **Fix:** Improve error messages in each gate to include: what failed, what the expected state is, and how to fix it. Use `::error file=...` annotations for file-specific failures.
- **Priority:** P3
- **Owner:** Booster

#### E4. Status check names not consistently prefixed
- **Problem:** CI jobs have names like `test`, `docs-quality`, `changelog-gate` — but in the GitHub Checks UI, they appear under the workflow name "Squad CI". The individual job names aren't immediately recognizable in the PR status list.
- **Impact:** LOW — cosmetic
- **Effort:** LOW
- **Fix:** Prefix job names consistently: "Squad CI / Build & Test", "Squad CI / Changelog Gate", etc. This makes the Checks list scannable.
- **Priority:** P3
- **Owner:** Booster

---

### Category F: Observability & Monitoring

#### F1. No CI metrics collection
- **Problem:** No tracking of CI pass rate, duration trends, or flake rate over time. Can't answer "is CI getting slower?" or "what's our flake rate this month?"
- **Impact:** MEDIUM — can't detect gradual degradation
- **Effort:** MEDIUM (3-4 hours)
- **Fix:** Add post-run step that writes metrics to a JSON artifact. Create weekly cron workflow that aggregates metrics via `gh run list` API and posts summary to a discussion or issue.
- **Priority:** P2
- **Owner:** Booster + Flight

#### F2. No alerts for CI degradation
- **Problem:** If CI starts failing consistently, no one is notified. The team discovers failures when they check GitHub.
- **Impact:** MEDIUM — delayed response to CI breakage
- **Effort:** LOW
- **Fix:** Add a daily/weekly cron workflow that checks recent CI pass rate via `gh run list`. If pass rate drops below 80%, create an issue or send notification. Could integrate with existing Ralph heartbeat.
- **Priority:** P2
- **Owner:** Booster + Ralph

#### F3. No scheduled link checking
- **Problem:** `squad-docs-links.yml` is manual-only. External links rot silently.
- **Impact:** LOW — docs quality degrades over time
- **Effort:** LOW (5 min)
- **Fix:** Add weekly cron trigger to `squad-docs-links.yml`: `schedule: [{cron: '0 9 * * 1'}]`.
- **Priority:** P3
- **Owner:** Booster

#### F4. No build time regression detection
- **Problem:** No tracking of build/test duration. If a PR adds 60 seconds to test time, no one notices until it accumulates.
- **Impact:** LOW (CI is fast now)
- **Effort:** MEDIUM
- **Fix:** Record build/test duration in CI artifacts. Add threshold alert if total CI time exceeds 8 minutes (currently ~4-5 min). Post warning to PR if duration increases >20%.
- **Priority:** P3 (future)
- **Owner:** Booster

---

## Summary Table

| # | Category | Item | Impact | Effort | Priority | Owner |
|---|----------|------|--------|--------|----------|-------|
| A1 | Reliability | Concurrency controls | HIGH | LOW | **P1** | Booster |
| A2 | Reliability | Promote merge strategy fix | HIGH | MEDIUM | **P1** | Procedures + Flight |
| A3 | Reliability | Docs deployment rollback | MEDIUM | MEDIUM | P2 | PAO + Booster |
| A4 | Reliability | Insider publish atomicity | MEDIUM | MEDIUM | P2 | Surgeon + Booster |
| A5 | Reliability | Release/publish coordination | MEDIUM | LOW | P2 | Booster |
| A6 | Reliability | Insider tag deduplication | LOW | LOW | P3 | Booster |
| B1 | Testing | Cross-platform CI | HIGH | MEDIUM | **P1** | Booster + FIDO |
| B2 | Testing | Docker skip guards | MEDIUM | LOW | **P1** | FIDO |
| B3 | Testing | Flaky test quarantine | HIGH | HIGH | P2 | FIDO + Booster |
| B4 | Testing | Acceptance test visibility | LOW | LOW | P3 | FIDO |
| B5 | Testing | Sample test trigger expansion | MEDIUM | LOW | P2 | Booster |
| B6 | Testing | Test sharding | LOW | MEDIUM | P3 | Booster |
| B7 | Testing | E2E CLI smoke expansion | MEDIUM | LOW | P2 | Booster + EECOM |
| C1 | Security | SHA-pin all actions | HIGH | LOW | **P1** | Booster |
| C2 | Security | Dependency review on PRs | HIGH | LOW | **P1** | Booster + FIDO |
| C3 | Security | SAST/security scanning | HIGH | MEDIUM | **P1** | FIDO + Booster |
| C4 | Security | CODEOWNERS file | MEDIUM | LOW | P2 | Flight + Procedures |
| C5 | Security | Team.md parsing hardening | MEDIUM | MEDIUM | P2 | Booster + RETRO |
| C6 | Security | License compliance | MEDIUM | LOW | P3 | Procedures |
| D1 | Performance | Build artifact caching | MEDIUM | MEDIUM | P2 | Booster |
| D2 | Performance | Conditional Playwright install | LOW | LOW | P3 | Booster |
| D3 | Performance | Document action_required | LOW | LOW | P3 | Procedures |
| D4 | Performance | Node version matrix expansion | MEDIUM | LOW | P2 | Booster |
| E1 | DX | .nvmrc file | LOW | LOW | P3 | Booster |
| E2 | DX | Local CI simulation | MEDIUM | MEDIUM | P2 | Booster + EECOM |
| E3 | DX | Actionable failure messages | LOW | LOW | P3 | Booster |
| E4 | DX | Status check name consistency | LOW | LOW | P3 | Booster |
| F1 | Observability | CI metrics collection | MEDIUM | MEDIUM | P2 | Booster + Flight |
| F2 | Observability | CI degradation alerts | MEDIUM | LOW | P2 | Booster + Ralph |
| F3 | Observability | Scheduled link checking | LOW | LOW | P3 | Booster |
| F4 | Observability | Build time regression detection | LOW | MEDIUM | P3 | Booster |

---

## Recommended Implementation Order

### Wave 1: Security + Reliability Foundation (P1, ~6-8 hours)
Ship these first — they prevent real incidents:

1. **C1** SHA-pin all actions (30 min) — supply chain protection
2. **C2** Dependency review on PRs (1 hour) — add dependabot.yml + dependency-review-action
3. **C3** npm audit + CodeQL (3-4 hours) — security scanning baseline
4. **A1** Concurrency controls (30 min) — prevent race conditions
5. **A2** Fix promote merge strategy (1-2 hours) — prevent silent conflict resolution
6. **B2** Docker skip guards (prerequisite for B1) — already tracked as #677

### Wave 2: Testing Coverage + DX (P1-P2, ~8-10 hours)
Ship after Wave 1 — expands what CI catches:

7. **B1** Cross-platform CI (2-3 hours) — requires B2 first
8. **B5** Sample test trigger expansion (30 min)
9. **B7** E2E CLI smoke expansion (1-2 hours)
10. **E2** Local CI simulation (2-3 hours)
11. **C4** CODEOWNERS file (15 min)

### Wave 3: Observability + Polish (P2-P3, ~6-8 hours)
Ship when bandwidth allows:

12. **F1** CI metrics collection (3-4 hours)
13. **F2** CI degradation alerts (1 hour)
14. **B3** Flaky test quarantine (4-6 hours)
15. **D1** Build artifact caching (2 hours)
16. **A3** Docs deployment rollback (2 hours)
17. **A5** Release/publish coordination (1 hour)

### Backlog (P3, defer until needed):
18-30. All P3 items — implement when relevant or as quick wins during other work.

---

## Agent Routing Summary

| Agent | Items | Rationale |
|-------|-------|-----------|
| **Booster** | A1, A3-A6, B1, B5-B7, C1-C2, D1-D4, E1-E4, F1-F4 | CI/DevOps engineer — owns workflow files |
| **FIDO** | B1-B4, C2-C3 | Quality/Reliability — owns test infrastructure |
| **Procedures** | A2, C4, C6, D3 | Governance + process documentation |
| **Flight** | A2, C4, F1 | Architecture decisions, CODEOWNERS routing |
| **RETRO** | C5 | Security review of parsing surfaces |
| **PAO** | A3 | Docs deployment ownership |
| **Surgeon** | A4 | Publish pipeline ownership |
| **EECOM** | B7, E2 | SDK/CLI integration testing |
| **Ralph** | F2 | Heartbeat integration for CI alerts |

---

## Notes

- **#701 is still open** — once merged, Phase 2 items (6-10) are complete. This issue covers everything beyond that.
- **Issue #582** (flaky tests under load) and **#677** (Docker skip guards) are prerequisites for several items above. Prioritize accordingly.
- **Cost is not a concern** — 145 min/week is well within GitHub's free tier. Cross-platform CI (Wave 2) will add ~50-100 min/week, still manageable.
- **The original 19-item analysis** lives at `docs/proposals/ci-hardening-opportunities.md` in the bradygaster/squad repo. This Phase 3 goes deeper with 30 new items.
- **All items marked "Booster"** — this is the CI/DevOps agent role. If no Booster exists, route to whoever owns `.github/workflows/`.

---

*Filed by Flight. Dina: route from Wave 1 down. Each wave is independently shippable.*

Metric	Value
Workflows	15 active
Tests	5,038+ passing across 196 test files
CI duration	~4-5 min (main), ~2-3 min (publish)
Success rate (last 50)	Squad CI: 15/22 success, 4 action_required, 3 failure
Actions minutes	~~145 min/week (~~$0.35/month)
Platforms tested	Ubuntu only
Node versions	22 (main), 20 (insider)

#	Category	Item	Impact	Effort	Priority	Owner
A1	Reliability	Concurrency controls	HIGH	LOW	P1	Booster
A2	Reliability	Promote merge strategy fix	HIGH	MEDIUM	P1	Procedures + Flight
A3	Reliability	Docs deployment rollback	MEDIUM	MEDIUM	P2	PAO + Booster
A4	Reliability	Insider publish atomicity	MEDIUM	MEDIUM	P2	Surgeon + Booster
A5	Reliability	Release/publish coordination	MEDIUM	LOW	P2	Booster
A6	Reliability	Insider tag deduplication	LOW	LOW	P3	Booster
B1	Testing	Cross-platform CI	HIGH	MEDIUM	P1	Booster + FIDO
B2	Testing	Docker skip guards	MEDIUM	LOW	P1	FIDO
B3	Testing	Flaky test quarantine	HIGH	HIGH	P2	FIDO + Booster
B4	Testing	Acceptance test visibility	LOW	LOW	P3	FIDO
B5	Testing	Sample test trigger expansion	MEDIUM	LOW	P2	Booster
B6	Testing	Test sharding	LOW	MEDIUM	P3	Booster
B7	Testing	E2E CLI smoke expansion	MEDIUM	LOW	P2	Booster + EECOM
C1	Security	SHA-pin all actions	HIGH	LOW	P1	Booster
C2	Security	Dependency review on PRs	HIGH	LOW	P1	Booster + FIDO
C3	Security	SAST/security scanning	HIGH	MEDIUM	P1	FIDO + Booster
C4	Security	CODEOWNERS file	MEDIUM	LOW	P2	Flight + Procedures
C5	Security	Team.md parsing hardening	MEDIUM	MEDIUM	P2	Booster + RETRO
C6	Security	License compliance	MEDIUM	LOW	P3	Procedures
D1	Performance	Build artifact caching	MEDIUM	MEDIUM	P2	Booster
D2	Performance	Conditional Playwright install	LOW	LOW	P3	Booster
D3	Performance	Document action_required	LOW	LOW	P3	Procedures
D4	Performance	Node version matrix expansion	MEDIUM	LOW	P2	Booster
E1	DX	.nvmrc file	LOW	LOW	P3	Booster
E2	DX	Local CI simulation	MEDIUM	MEDIUM	P2	Booster + EECOM
E3	DX	Actionable failure messages	LOW	LOW	P3	Booster
E4	DX	Status check name consistency	LOW	LOW	P3	Booster
F1	Observability	CI metrics collection	MEDIUM	MEDIUM	P2	Booster + Flight
F2	Observability	CI degradation alerts	MEDIUM	LOW	P2	Booster + Ralph
F3	Observability	Scheduled link checking	LOW	LOW	P3	Booster
F4	Observability	Build time regression detection	LOW	MEDIUM	P3	Booster

Agent	Items	Rationale
Booster	A1, A3-A6, B1, B5-B7, C1-C2, D1-D4, E1-E4, F1-F4	CI/DevOps engineer — owns workflow files
FIDO	B1-B4, C2-C3	Quality/Reliability — owns test infrastructure
Procedures	A2, C4, C6, D3	Governance + process documentation
Flight	A2, C4, F1	Architecture decisions, CODEOWNERS routing
RETRO	C5	Security review of parsing surfaces
PAO	A3	Docs deployment ownership
Surgeon	A4	Publish pipeline ownership
EECOM	B7, E2	SDK/CLI integration testing
Ralph	F2	Heartbeat integration for CI alerts

CI Hardening Phase 3: Deep analysis — post #699/#700/#701 #122

Description

CI Hardening Phase 3: Deep Analysis — Post bradygaster#699/bradygaster#700/bradygaster#701

What's Been Done (Phases 1-2)

Phase 1 — PR bradygaster#700 (MERGED) ✅

Phase 2 — PR bradygaster#701 (OPEN, pending merge) 🔄

PR bradygaster#699 (MERGED) ✅

Current CI Health Snapshot (2026-03-30)

Phase 3: Deep Analysis — 30 Items Across 6 Categories

Category A: Reliability & Resilience

A1. Concurrency controls missing on most workflows

A2. Promote workflow uses dangerous merge strategy

A3. No rollback mechanism for docs deployments

A4. Insider publish creates inconsistent state on partial failure

A5. Release/publish workflow coordination gap

A6. Tag deduplication gaps in release workflows

Category B: Testing Depth

B1. No cross-platform CI (Windows/macOS)

B2. Docker-dependent tests need skip guards

B3. Flaky test quarantine system

B4. Acceptance tests (Gherkin BDD) not visible as separate CI gate

B5. Sample tests only run when SDK source changes

B6. No test sharding for faster execution

B7. E2E CLI smoke test expansion

Category C: Security & Supply Chain

C1. Third-party actions not pinned to SHAs

C2. No dependency review on PRs

C3. No SAST/security scanning in CI

C4. No CODEOWNERS file

C5. Team.md parsing is a fragile injection surface

C6. No license compliance scanning

Category D: Performance & Cost

D1. No build artifact caching between jobs

D2. Playwright browser install is unconditional

D3. action_required conclusions need investigation

D4. Expand Node version matrix for insider builds

Category E: Developer Experience

E1. No .nvmrc or .node-version file

E2. No local CI simulation (act config)

E3. CI failure messages could be more actionable

E4. Status check names not consistently prefixed

Category F: Observability & Monitoring

F1. No CI metrics collection

F2. No alerts for CI degradation

F3. No scheduled link checking

F4. No build time regression detection

Summary Table

Recommended Implementation Order

Wave 1: Security + Reliability Foundation (P1, ~6-8 hours)

Wave 2: Testing Coverage + DX (P1-P2, ~8-10 hours)

Wave 3: Observability + Polish (P2-P3, ~6-8 hours)

Backlog (P3, defer until needed):

Agent Routing Summary

Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

D3. `action_required` conclusions need investigation

E1. No `.nvmrc` or `.node-version` file