Skip to content

PRD: Versioned file layout manifest - single source of truth for directory structure #670

@diberry

Description

@diberry

PRD: Versioned File Layout Manifest — Single Source of Truth for Directory Structure

Status: Approved (Flight)
Requested by: Dina Berry
Last Updated: 2026-03-28


Executive Summary

Squad's CLI has five independent code paths that hardcode file locations with no centralized source of truth. This inconsistency has already caused two confirmed bugs (skills discovery #77, squad.agent.md silent deletion #730) and will continue degrading reliability. We propose a versioned JSON manifest that defines every directory Squad manages, its canonical location per version, read priority order, write target, and deprecated paths. All five code paths will reference this manifest instead of hardcoding paths independently.


Problem Statement

Root Cause: Five Independent Code Paths Without Single Source of Truth

Squad manages 33+ template files and dozens of directories, but has no centralized registry for file layout across versions. Instead, five separate code paths each make independent assumptions about where files live:

  1. init.ts — Creates directories at init time; hardcodes paths like .squad/, .copilot/agents/, .github/workflows/
  2. migrations.ts — Moves files between locations on upgrade; implements custom logic for each version transition
  3. upgrade.ts — Syncs template files to target directories; references TEMPLATE_MANIFEST but lacks runtime read-order logic
  4. squad.agent.md — Tells agents where to read/write skills, decisions, and history; contains hardcoded paths that drift from runtime behavior
  5. skill-source.ts, resolver.ts — Runtime reads from hardcoded paths using either/or logic (checks one location, stops on first match)

Evidence: Two Independent Bugs with Same Root Cause

Bug 1: Skills Discovery (#77)

  • Symptom: When users upgrade Squad, the migration moves skills from .squad/skills/ to .copilot/skills/, but runtime still only checks one location
  • Impact: Skills become unavailable after upgrade; users see "skill not found" errors
  • Root cause: skill-source.ts and resolver.ts use either/or pattern (returns first match only), while migration didn't update both paths consistently
  • Related work: PR fix(sdk): merge skills from both .copilot/skills/ and .squad/skills/ (#77) #669 proposed a fix (merge both directories), but wasn't merged upstream

Bug 2: squad.agent.md Silent Deletion (#730)

  • Symptom: After squad upgrade, the file .github/agents/squad.agent.md silently disappears, breaking Copilot's ability to discover Squad as an agent
  • Three failing code paths (PR fix(#730): prevent squad.agent.md silent deletion — 3 code path fixes #731):
    • upgrade.ts:496-504 — Template source existence check with no else clause; silent skip if missing
    • init.ts:257-261 — SDK init stamps version but silently skips if file wasn't created
    • doctor.ts:~392 — Detects empty file but reports warn instead of fail
  • Impact: Squad stops working on any machine that pulls the repo; no error, no warning, no diagnostic trail
  • Root cause: Identical to Start a discussion 404 link #77 — hardcoded paths with no validation, silent skips instead of errors

Systemic Pattern: Silent Degradation

These bugs share a common pattern:

  • Write operation is guarded by a source-existence check with no else clause
  • Health check downgrades severity below actual impact
  • No post-operation validation confirms the expected outcome

The TEMPLATE_MANIFEST defines 33 template files, ~23 marked overwriteOnUpgrade: true. Any of these could silently disappear through the same pattern.

Customer Impact

  • Every squad init repo is affected by this gap
  • Skills discovery failures happen when migrations and runtime path resolution drift
  • On upgrade, either/or logic in resolver.ts can miss files entirely
  • Version-to-version maintenance is error-prone and requires manual audits of 5 independent code paths
  • Silent degradation means bugs go undetected until users report broken functionality

Proposed Solution: Versioned File Layout Manifest

Core Concept

A single versioned JSON manifest that defines:

  • Every directory Squad manages and its purpose
  • Canonical location per version (e.g., v0.8: .squad/skills/, v0.9+: .copilot/skills/)
  • Read priority order (which locations to check, in what order)
  • Write target (where new content goes)
  • Deprecated paths (old locations still valid for reading, but no longer written to)
  • Migration path (what moves where on upgrade)
  • Criticality tier (which files must exist for Squad to function)

All five code paths will reference this manifest instead of hardcoding paths independently.

Benefits

  • Version-to-version changes tracked in ONE place — manifest is the single source of truth
  • Migrations auto-derived from manifest diffs — no manual step generation
  • Runtime read logic generated from manifest — no more either/or bugs
  • Agent instructions reference manifest — correct write paths guaranteed
  • squad doctor validates layout — catch drift before it becomes bugs
  • Backward compatibility baked in — deprecated paths + readOrder enable smooth upgrades
  • New code can't introduce silent-skip bugs — CI gates catch the pattern

Manifest Schema

Version Inventory

Based on historical analysis, Squad's directory structure has evolved through these versions:

Version Range Layout Changes Affected Files
≤ v0.7 .ai-team/ era agents, decisions, skills, history
v0.8–v0.8.x Migrated to .squad/ All user-owned content moved
v0.9+ Migrated to .copilot/ Skills, decisions, agents moved; .squad/ deprecated but supported for reading

Schema Definition

{
  "schemaVersion": "1.0.0",
  "comment": "Defines all directories Squad manages, indexed by purpose",
  
  "layout": {
    // Each entry: directory purpose → location info + migration path
    
    "skills": {
      "purpose": "Reusable patterns and process knowledge",
      "canonical": ".copilot/skills/",
      "readOrder": [".copilot/skills/", ".squad/skills/", ".ai-team/skills/"],
      "writeTarget": ".copilot/skills/",
      "deprecated": [".squad/skills/", ".ai-team/skills/"],
      "since": "0.9.0",
      "tier": "important",
      "validate": "dir-exists && non-empty"
    },
    
    "decisions": {
      "purpose": "Recorded team decisions and ADRs",
      "canonical": ".copilot/decisions/",
      "readOrder": [".copilot/decisions/", ".squad/decisions/", ".ai-team/decisions/"],
      "writeTarget": ".copilot/decisions/",
      "deprecated": [".squad/decisions/", ".ai-team/decisions/"],
      "since": "0.9.0",
      "tier": "important",
      "validate": "dir-exists"
    },
    
    "agents": {
      "purpose": "Agent configurations and team roster",
      "canonical": ".copilot/agents/",
      "readOrder": [".copilot/agents/", ".squad/agents/", ".ai-team/agents/"],
      "writeTarget": ".copilot/agents/",
      "deprecated": [".squad/agents/", ".ai-team/agents/"],
      "since": "0.9.0",
      "tier": "critical",
      "validate": "dir-exists"
    },
    
    "config": {
      "purpose": "Squad configuration and team metadata",
      "canonical": ".squad/",
      "readOrder": [".squad/"],
      "writeTarget": ".squad/",
      "deprecated": [],
      "since": "0.8.0",
      "tier": "critical",
      "validate": "dir-exists"
    },
    
    "agent-md": {
      "purpose": "GitHub Copilot agent discovery file (critical for Copilot integration)",
      "canonical": ".github/agents/squad.agent.md",
      "readOrder": [".github/agents/squad.agent.md"],
      "writeTarget": ".github/agents/squad.agent.md",
      "deprecated": [],
      "since": "0.8.0",
      "tier": "critical",
      "validate": "file-exists && non-empty && contains-markers",
      "markers": ["Squad", "agent", "system"]
    },
    
    "ci-config": {
      "purpose": "Squad CI workflow configuration",
      "canonical": ".github/workflows/squad-ci.yml",
      "readOrder": [".github/workflows/squad-ci.yml"],
      "writeTarget": ".github/workflows/squad-ci.yml",
      "deprecated": [],
      "since": "0.8.0",
      "tier": "important",
      "validate": "file-exists && non-empty"
    }
  },
  
  "migrations": {
    "0.7-to-0.8": [
      { "from": ".ai-team/skills", "to": ".squad/skills", "action": "move" },
      { "from": ".ai-team/decisions", "to": ".squad/decisions", "action": "move" }
    ],
    "0.8-to-0.9": [
      { "from": ".squad/skills", "to": ".copilot/skills", "action": "move" },
      { "from": ".squad/decisions", "to": ".copilot/decisions", "action": "move" },
      { "from": ".squad/agents", "to": ".copilot/agents", "action": "move" }
    ]
  }
}

Tier Definitions

Tier Definition Validation On Missing Examples
critical Product non-functional without it fail in doctor Block operation squad.agent.md, squad-ci.yml, .copilot/agents/
important Feature degraded without it warn in doctor Continue with warning casting-registry.json, skill templates, decisions
scaffolding Convenience, recreatable info in doctor Note in log only charter.md template, history.md template

Architecture Principles

Fail-Loud Policy

Operations on critical files must NEVER silently skip. Every code path that writes, copies, or modifies a critical file must have an explicit else clause:

// ❌ Current pattern — silent degradation
if (storage.existsSync(source)) {
  storage.copySync(source, dest);
}

// ✅ Required pattern — fail-loud
if (storage.existsSync(source)) {
  storage.copySync(source, dest);
} else {
  warn(`Template source missing for critical file: ${dest}`);
  warnings.push({ file: dest, reason: 'template-source-missing' });
}

For critical tier files, missing source should be an error. For important and scaffolding tiers, a warning suffices.

Empty = Missing

Existence checks must also verify non-empty for critical files. An empty squad.agent.md is functionally identical to a missing one — Copilot can't discover the agent — yet must not be silently skipped.

// ❌ Insufficient — passes for empty files
expect(storage.existsSync(agentPath)).toBe(true);

// ✅ Required — catches empty files
expect(storage.existsSync(agentPath)).toBe(true);
const content = storage.readSync(agentPath);
expect(content.trim().length).toBeGreaterThan(0);

Post-Operation Validation

After any operation that modifies the repo structure (init, upgrade, migrate, doctor), validate all critical files:

function validateCriticalFiles(projectRoot: string): ValidationResult {
  const results: FileValidation[] = [];
  for (const [name, entry] of Object.entries(manifest.layout)) {
    const fullPath = path.join(projectRoot, entry.canonical);
    const exists = storage.existsSync(fullPath);
    const content = exists ? storage.readSync(fullPath) : null;
    const nonEmpty = content !== null && content.trim().length > 0;
    const markersPresent = entry.markers?.every(m => content?.includes(m)) ?? true;
    
    results.push({
      name,
      path: entry.canonical,
      tier: entry.tier,
      valid: exists && nonEmpty && markersPresent,
    });
  }
  return { results, allValid: results.every(r => r.valid) };
}

This runs as the final step in init(), upgrade(), and migrate() — after all file operations complete.

Recovery Cascade

When post-operation validation fails for a critical file, attempt recovery before erroring:

  1. Try restore from template — re-copy from template source
  2. Try restore from gitgit show HEAD:<path> to recover from last commit
  3. Error with clear message — if both fail, surface a specific, actionable error
async function recoverCriticalFile(entry: LayoutEntry, projectRoot: string): Promise<boolean> {
  const dest = path.join(projectRoot, entry.canonical);

  // Attempt 1: Restore from template
  const templatePath = resolveTemplatePath(entry.templateSource);
  if (templatePath && storage.existsSync(templatePath)) {
    storage.copySync(templatePath, dest);
    warn(`Recovered ${entry.canonical} from template`);
    return true;
  }

  // Attempt 2: Restore from git
  try {
    const content = execSync(`git show HEAD:${entry.canonical}`, { cwd: projectRoot });
    if (content.toString().trim().length > 0) {
      storage.writeSync(dest, content.toString());
      warn(`Recovered ${entry.canonical} from git history`);
      return true;
    }
  } catch { /* file not in git history */ }

  // Attempt 3: Error with actionable message
  throw new Error(`Critical file missing and unrecoverable: ${entry.canonical}. Reinstall Squad or manually restore this file.`);
}

Doctor Severity = Actual Impact

squad doctor severity must match actual user impact:

Condition Correct Severity Rationale
File missing, product broken fail User can't use Squad
File empty, product broken fail Functionally identical to missing
File exists but malformed warn May partially work
File missing, product works warn Degraded but functional
File missing, convenience only info No impact on core functionality

Bug fix: Empty squad.agent.md must report fail, not warn.


Implementation Strategy: Strangler Fig Pattern

We will not freeze the product. Instead, we will wrap the old system incrementally:

Step 1: Manifest Describes Current Reality

  • Write manifest JSON that documents the current layout
  • No behavior changes; manifest is purely descriptive
  • Ship and gather feedback
  • Risks: none — read-only

Step 2: Path Resolver Reads from Manifest

  • New resolvePathFromManifest() function references the manifest
  • Old paths still work; resolver logs warnings on drift
  • New code (future features) uses manifest resolver from day 1
  • Risks: low — old paths unchanged

Step 3: New Code Uses Manifest

  • When touching existing files for unrelated work, swap hardcoded paths to manifest lookups
  • No refactor of entire codebase at once
  • Incremental migration reduces regression risk

Step 4: Migrate Remaining Paths

  • Migrate remaining hardcoded paths when natural opportunities arise (other PRs, bug fixes)
  • All five code paths eventually reference the manifest
  • Strangler fig: old paths can be removed once all callers migrated

Implementation Phases

Phase 1: Foundation — Manifest Schema + Path Resolver + Doctor Integration

Objective: Establish the foundation for the Squad manifest-driven architecture.

Deliverables:

  • Manifest JSON schema defined at lib/manifest.json with 6+ directory entries
  • Path resolver function resolvePathFromManifest(name: string): string[] implemented in lib/path-resolver.ts
  • doctor.ts refactored to use CriticalFileRegistry from manifest; critical files report fail on missing/empty
  • Manifest validation run as final step in init(), upgrade(), and doctor
  • 20+ new unit tests covering manifest resolution and doctor severity alignment
  • Documentation: how to add new managed directories to the manifest

Files Modified:

  • lib/manifest.json (new)
  • lib/path-resolver.ts (new)
  • packages/squad-cli/src/cli/commands/doctor.ts (update severity logic)
  • packages/squad-cli/src/cli/core/init.ts (add validation step)
  • packages/squad-cli/src/cli/core/upgrade.ts (add validation step)
  • test/ (add 20+ tests)

Risks & Mitigations:

  • Risk: Manifest schema missing entries → Mitigation: Enumerate all entries from existing code paths first
  • Risk: Doctor false positives → Mitigation: Test against real repos (forks, samples)
  • Risk: Performance regression (extra validation step) → Mitigation: Cache manifest during operation

Definition of Done:

  • All tests pass
  • No silent-skip patterns remain in doctor
  • Manifest accurately describes all currently managed files
  • New feature work can immediately adopt manifest resolver

Phase 2: Code Migration — Audit Skill + Runtime Integration + CI Gate

Objective: Migrate the five code paths to use the manifest; add CI gates preventing regression.

Deliverables:

  • skill-source.ts refactored to resolve paths from manifest (reads all directories in readOrder, merges results)
  • resolver.ts refactored to use manifest (returns all matches, not just first)
  • upgrade.ts migration logic refactored to derive migrations from manifest diffs
  • migrations.ts updated to use manifest resolver for backward-compatible path lookups
  • squad.agent.md updated with manifest-derived paths (calls to resolve functions)
  • PR fix(sdk): merge skills from both .copilot/skills/ and .squad/skills/ (#77) #669 skills merge fix integrated (merge both directories, .copilot/ wins on conflicts)
  • PR fix(#730): prevent squad.agent.md silent deletion — 3 code path fixes #731 failing tests fixed (3 code paths in upgrade/init/doctor now fail-loud)
  • New CI job: critical-file-check — validates all critical files exist and are non-empty after init
  • Test matrix: 11 scenarios covering all version transitions + edge cases
  • 40+ new tests for migration logic and path resolution

Files Modified:

  • packages/squad-cli/src/runtime/skill-source.ts (migrate to manifest)
  • packages/squad-cli/src/runtime/resolver.ts (migrate to manifest)
  • packages/squad-cli/src/cli/core/upgrade.ts (migration derivation)
  • packages/squad-cli/src/cli/core/migrations.ts (use manifest resolver)
  • templates/squad.agent.md.template (reference manifest)
  • .github/workflows/squad-ci.yml (add critical-file-check job)
  • test/ (add 40+ tests)

Related Work:

Test Matrix: 11 Core Scenarios

  1. Init on v0.7 repo (legacy .ai-team/ layout) → reads from deprecated paths
  2. Init on v0.8 repo (.squad/ layout) → reads from deprecated paths, writes to canonical
  3. Init on v0.9 repo (.copilot/ layout) → reads and writes to canonical
  4. Upgrade v0.7 → v0.8 → verify migration moves files
  5. Upgrade v0.8 → v0.9 → verify migration moves files + canonical paths used
  6. Upgrade v0.9 → v0.9 (same version) → agent file preserved
  7. Skills resolution: both .squad/ and .copilot/ present → reads both, .copilot/ wins on conflicts
  8. Skills resolution: only .squad/ present → reads .squad/
  9. Skills resolution: only .copilot/ present → reads .copilot/
  10. Upgrade with missing template source for critical file → fail-loud, not silent skip
  11. Doctor: empty agent file → reports fail, not warn

Risks & Mitigations:

  • Risk: Backward compatibility breaks for .squad/ users → Mitigation: readOrder includes deprecated paths
  • Risk: Merge conflicts in skills when both directories present → Mitigation: .copilot/ wins deterministically
  • Risk: Migration corrupts files → Mitigation: pre-migration backup, rollback on validation failure
  • Risk: Large-scale refactor introduces regressions → Mitigation: phased rollout, feature flag

Definition of Done:


Phase 3: Automation — Migration Generator + Doctor Repair

Objective: Automate future layout changes and enable self-service CLI troubleshooting.

Deliverables:

  • Migration code generator: generateMigrationFromManifest(fromVersion, toVersion) returns migration steps
  • Manifest backfill tool: for users on v0.7/v0.8, automatically generate manifest from observed file layout
  • squad doctor --repair command: auto-fixes common issues (missing critical files, deprecated path usage, empty files)
  • Manifest versioning: support multiple manifest versions in git history for diff/review
  • Migration testing CLI: squad test-migration --from-version X --to-version Y simulates upgrade without modifying repo
  • 20+ tests covering migration generation, backfill, and repair logic

Files Modified:

  • lib/migration-generator.ts (new)
  • lib/manifest-backfill.ts (new)
  • packages/squad-cli/src/cli/commands/doctor.ts (add --repair flag)
  • packages/squad-cli/src/cli/commands/test.ts (new subcommand: test-migration)
  • .github/workflows/squad-ci.yml (add manifest version sanity check)
  • test/ (add 20+ tests)

Risks & Mitigations:

  • Risk: Auto-repair makes wrong decisions → Mitigation: --dry-run flag, manual confirmation for destructive ops
  • Risk: Migration generator produces incorrect steps → Mitigation: validate generated migrations against manifest schema
  • Risk: Backfill creates incorrect manifest for unknown old layouts → Mitigation: require --confirm flag, log inferred structure

Definition of Done:

  • Future layout changes can be described in manifest only; migration code auto-generated
  • Users can self-service repair broken layouts with squad doctor --repair
  • Migration testing available for validation before shipping

What's Complete vs What's Remaining

Done (from related work)

Remaining

Immediate Actions

  1. Land PR fix(#730): prevent squad.agent.md silent deletion — 3 code path fixes #731 (failing tests for fix: squad.agent.md silently disappears after upgrade when template source is missing #730) to unblock Phase 1 deliverables
  2. Merge or re-implement PR fix(sdk): merge skills from both .copilot/skills/ and .squad/skills/ (#77) #669 (skills merge fix) into Phase 2
  3. Create upstream Phase 1/2/3 issues (fork docs: add llms.txt outputs to docs build #80-82 don't exist upstream)

Success Criteria

Criterion Phase Measurable
All #731 tests pass 2 CI green on PR #731
No critical file can silently disappear during any CLI operation 2 validateCriticalFiles() runs after every init/upgrade/migrate
squad doctor accurately reports all broken states as fail 2 Doctor severity derived from CriticalFileRegistry tier
CI blocks PRs that introduce new silent-skip patterns 2 check-critical-files job in squad-ci.yml green
Skills discovery works with both .squad/ and .copilot/ present 2 All 11 test scenarios pass
Recovery cascade restores critical files from template or git 2 Round-trip tests (init → delete → upgrade → verify) pass
Users can migrate old layouts with squad doctor --repair 3 Repair command works on v0.7/v0.8 repos
Future layout changes require only manifest update 3 Migration code can be auto-generated from manifest diff

Related Work & References

# Repo Type Title Status Relationship
#670 upstream Issue PRD: Versioned file layout manifest OPEN (THIS) Main proposal
#730 upstream Issue squad.agent.md silently disappears OPEN Example of root cause; Phase 2 deliverable
#731 upstream PR Failing tests for #730 OPEN Needs Phase 1 foundation to pass
#732 upstream Issue Critical file resilience framework CLOSED Rolled into #670
#669 upstream PR Fix: merge skills from both directories CLOSED (not merged) Needed for Phase 2
#77 fork Issue Skills discovery bug OPEN Root cause that motivated #670
#78–82 fork Issues Phases 1–3 architecture + implementation OPEN Fork investigation; needs upstream issues

Backward Compatibility & Customer Impact

Guarantee to Existing Users

  • .ai-team/ repos (v0.7): Manifest includes .ai-team/ in readOrder; users can upgrade without moving files manually
  • .squad/ repos (v0.8): Manifest includes .squad/ in readOrder and deprecated; automatic migration on upgrade moves files to .copilot/, old paths still readable during transition
  • .copilot/ repos (v0.9+): Canonical layout; no changes
  • No breaking changes: All upgrades are forward-compatible; deprecated paths remain readable for 1–2 versions

Upgrade Communication Plan

  1. Release notes: "Squad now unifies directory structure across versions. Your layout will auto-migrate on squad upgrade."
  2. Migration log: squad upgrade output includes: "Migrated skills from .squad/skills/ to .copilot/skills/" for transparency
  3. Doctor report: squad doctor reports any deprecated paths still in use, with clear migration guidance
  4. Rollback story: If migration fails, recover with squad doctor --repair or revert to previous version

Out of Scope

  • User-owned files (overwriteOnUpgrade: false): team.md, routing.md, decisions/, agent histories, identity files. These are user content — the framework doesn't overwrite or validate them.
  • Runtime agent behavior: This proposal covers CLI file operations only (init, upgrade, migrate, doctor). Agent runtime logic (how agents read/write at runtime) is a separate concern.
  • Non-file invariants: Config validation, schema enforcement, and other non-filesystem concerns are outside this proposal's scope.

Open Questions & Decisions Needed

  1. Manifest versioning: Should we store multiple manifest versions in git history, or just the latest? (Recommendation: latest only; migrations derive from code history)
  2. Manifest distribution: Should users commit the manifest to their repos, or is it CLI-only? (Recommendation: Users don't need to commit; it's shipped with CLI)
  3. Manifest auto-update on CLI upgrade: Should squad upgrade fetch latest manifest from CLI release? (Recommendation: Yes, like TEMPLATE_MANIFEST)
  4. Concurrent access: How do we handle two Squad sessions (e.g., git worktrees) reading/writing manifest simultaneously? (Recommendation: Use git locks; out of scope for Phase 1)

Implementation Notes for Teams

For Flight (Lead)

  • Validate that manifest schema covers all 5 code paths
  • Approve architecture principles (fail-loud, post-op validation, recovery cascade)
  • Confirm phases are properly sequenced and non-blocking

For EECOM (Core Dev)

  • Technical feasibility of manifest-driven path resolution in init.ts, resolver.ts, skill-source.ts
  • Performance impact of post-operation validation (should be negligible)
  • Code review for Phase 1 & 2 implementation

For FIDO (Quality Owner)

  • Test plan adequacy; ensure 11 scenarios cover all version transitions
  • Negative path coverage mandate: every critical file write must have "template missing" test
  • CI gates (silent-skip grep check, template coverage check, doctor severity audit)

For Procedures (Prompt Engineer)

  • Impact on squad.agent.md agent instructions
  • Update instructions to reference manifest-driven paths
  • Ensure agent doesn't hardcode deprecated paths in new generated content

For PAO (DevRel)

  • Upgrade communication plan (release notes, docs, migration guide)
  • Blog post: "How Squad Now Manages File Layout"
  • FAQ: "How do I migrate my .squad/ repo to .copilot/?"

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or improvementgo:needs-researchNeeds investigationsquadSquad triage inbox — Lead will assign to a membersquad:eecomAssigned to EECOM (Core Dev)squad:fidoAssigned to FIDO (Quality Owner)squad:flightAssigned to Flight (Lead)squad:paoAssigned to PAO (DevRel)squad:proceduresAssigned to Procedures (Prompt Engineer)

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions