Skip to content

FACET 6: .git-as-files conflict/corruption safety for G5/TIN-1620#506

Draft
Jesssullivan wants to merge 1 commit into
mainfrom
facet6/dotgit-conflict-corruption-harness
Draft

FACET 6: .git-as-files conflict/corruption safety for G5/TIN-1620#506
Jesssullivan wants to merge 1 commit into
mainfrom
facet6/dotgit-conflict-corruption-harness

Conversation

@Jesssullivan

Copy link
Copy Markdown
Owner

FACET 6 — .git-as-files conflict / corruption + flip-flop safety

Read-only research + a docs/scripts precision layer for the extant large-workdir ladder. This is not new work: the repo-roam "single live repo" target is Phase 2 of docs/ops/large-workdir-onboarding-design-2026-05-25.md and Gate G5 / TIN-1620 of docs/ops/large-workdir-daily-driver-sequencing-2026-05-30.md. G5 already names the T10/T11 conflict rows + rollback. This PR adds the .git-specific git fsck / half-applied-ref checks on top of those rows, for the operator's chosen git_sync_mode = "raw" (.git-as-files) + conflict_mode = "auto" posture. It does not duplicate the git-repo-canary.sh / neo-honey-conflict-demo.sh scaffolds and does not move goalposts.

Findings (grounded in code)

  1. Concurrent .git writes under conflict_mode=auto CAN corrupt git. AutoResolver (crates/tcfs-sync/src/conflict.rs) resolves per .git/* path by lexicographic device tie-break with no .git grouping; crates/tcfsd/src/daemon.rs:1792 applies it per inbound file. The harness reproduces the interleave and git fsck reports error: refs/heads/main: invalid sha1 pointer — a real half-applied ref. Aggravator: raw collection (engine.rs:3331) checks git_is_safe once then streams .git/* over seconds (TOCTOU), and acquire_git_lock (already implemented) is never called on the upload path.
  2. The SAFE flip-flop is the right primitive. cmd_unsync_directory walks children_with_prefix(repo) (incl. every .git/*) and refuses the whole dir if any child is dirty — so unsync-on-neo → work-on-honey → rehydrate-on-neo removes neo as a .git writer before honey edits. Caveats: dehydration is non-atomic (per-file stub+remove loop, crash → partially-stubbed .git) and unsync takes no git lock / git_is_safe, so it must quiesce git first.
  3. Mid-sync .git fails git fsck (torn upload snapshot or partial restore). There is no git fsck assertion anywhere in the test suite; the only .git roundtrip test covers bundle mode (the path the operator did not choose). Raw-mode .git correctness was untested.

What the test must check (new G5-git-1..5 rows, in the doc + harness)

peer git fsck --full clean; index.lock/in-progress repos skipped not torn; clean flip-flop is fsck-clean + byte-exact; dirty-.git unsync refusal; and the concurrent-write conflict row must detect the half-applied ref (expected-fail gate until resolution is made .git-aware).

Safety / scope

  • scripts/git-dotgit-fsck-conflict-harness.sh (+ test): local-only, daemon-free, fleet-safe — throwaway canary repo + disposable prefix, refuses real ~/git trees and non-disposable remotes, never reconcile --execute against a real root. --run-push is a thin pointer to the existing git-repo-canary.sh.
  • Wired into lazy:check (bash -n + shellcheck) and a new lazy:test-git-dotgit-fsck-conflict task. Shellcheck-clean; tests pass.
  • Docs/scripts only. Deliberately does not include the unrelated in-worktree TIN-1899 crates/tcfs-cli/src/main.rs change (left unstaged).

Validation

bash scripts/test-git-dotgit-fsck-conflict-harness.sh → all pass, incl. assertion that the conflict interleave yields invalid sha1 pointer.

Adds the precision layer the extant large-workdir ladder (Phase 2
"single live repo" / Gate G5 / TIN-1620) already asks for, covering the
operator's chosen git_sync_mode = "raw" (.git-as-files) + conflict_mode
= "auto" posture. Grounded in the existing T10/T11 conflict rows and the
git-repo-canary / neo-honey-conflict-demo scaffolds; does not duplicate
them and does not move goalposts.

- docs/ops/dotgit-as-files-conflict-corruption-2026-06-08.md: read-only
  analysis. AutoResolver (conflict.rs) resolves per .git/* path by
  lexicographic device tie-break with no .git grouping; daemon.rs applies
  it per-file under conflict_mode=auto; raw collection (engine.rs) checks
  git_is_safe once then streams .git/* (TOCTOU) and never takes the
  existing acquire_git_lock. Documents corruption risk (half-applied
  refs), the safe whole-repo unsync flip-flop incl. dirty-child refusal,
  non-atomic dehydration, mid-sync .git atomicity, and the G5-git-1..5
  test rows the live-repo packet must add.

- scripts/git-dotgit-fsck-conflict-harness.sh (+ test wrapper): local-only,
  daemon-free, fleet-safe harness on a THROWAWAY canary repo and disposable
  prefix. Proves the full .git-as-files mirror is fsck-clean and exact,
  records the index.lock skip contract, and empirically reproduces the
  per-file .git conflict interleave so git fsck flags the half-applied ref
  (error: refs/heads/main: invalid sha1 pointer). Refuses real source trees
  and non-disposable remotes. The --run-push stage is a thin pointer to the
  existing git-repo-canary.sh, not a duplicate.

- Taskfile.yaml: wire both scripts into lazy:check bash -n + shellcheck and
  add lazy:test-git-dotgit-fsck-conflict.
Jesssullivan added a commit that referenced this pull request Jun 8, 2026
Three must-fixes for the dev-env zero-diff fingerprint (PR #507):

1. capture is now genuinely read-only. Drop the `git write-tree` call from
   head.env — it wrote tree objects into <repo>/.git/objects and touched the
   index, breaking the read-only contract that the live R0 step relies on when
   pointing capture at real expendable repos. The staged/index identity is
   already captured by `git ls-files -s` (index-blobs.txt) + the
   `git diff --cached` hash (diff-cached.sha256), so write-tree was redundant.
   Header + runbook read-only claims corrected to be TRUE.

2. Tighten the fsck corruption grep to genuine signals only:
   ^error: / ^fatal: / invalid sha1 pointer / broken link. Drop the broad
   `missing` / `dangling.*commit` matches that false-positive on healthy repos
   with gc'd / expired reflogs (a lone `dangling commit <sha>` notice was
   wrongly flipping fsck=dirty).

3. Make the [PR] vs [LIVE] boundary unmissable in the runbook: a green
   self-test only proves the assertion engine is internally consistent on one
   host — it is NOT proof of flip-flop zero-diff in either direction or of live
   .git corruption catching. Those are delegated to the [LIVE] R2/R3/R5 steps
   and the Facet-6 harness (PR #506).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant