Skip to content

git status shows all files as modified on fresh blobless-clone mount #16

@ThoolooExpress

Description

@ThoolooExpress

After following the Quick Start instructions (add-repo + daemon), git status inside the mount shows a lot of files as having been modified. modified. git diff shows no actual content differences for most of these files, and git update-index --refresh fixes everything.

Reproduced against cloudflare/workers-sdk:

$ git -C /tmp/workers-sdk status | grep -c "modified:"
3954
$ git -C /tmp/workers-sdk diff | wc -l
0

Root cause

Seems to be a combination of 3 separate issues:

1. batchResolveSizes silently drops all missing OIDs, leaving most files with size=0

gitstore/gitstore.go:177–219batchResolveSizes resolves blob sizes by running git cat-file --batch-check with GIT_NO_LAZY_FETCH=1. On a --filter=blob:none clone the server sends no blob content, so virtually every OID is missing from the local pack. The "missing" lines in the output (<oid> missing) have only two fields; the parser skips them (len(fields) < 3 → continue) and the corresponding BaseNode.SizeBytes stays 0, SizeState stays "unknown".

// gitstore/gitstore.go:201–207
scan := bufio.NewScanner(&outBuf)
for scan.Scan() {
    fields := strings.Fields(scan.Text())
    if len(fields) < 3 {
        continue          // ← "missing" lines dropped silently
    }
    ...
}

2. FUSE Getattr returns SizeBytes directly without guarding on SizeState

fusefs/merged.go:94Resolver.Getattr returns n.Base.SizeBytes regardless of n.Base.SizeState:

return mode, n.Base.SizeBytes, n.Base.Type, mt, nil

So ~2900 files appear as 0-byte files via every stat(2) call made to the kernel.

3. Atime and Ctime are never set in inodeAttrs

fusefs/fuse_unix.go:570–596inodeAttrs sets Mtime but leaves Atime and Ctime as zero time.Time{}. The Go zero time serializes to a deeply negative Unix timestamp (year ~1754 as seen by stat), so the value written into the git index when the mount is first stat-ed is guaranteed to mismatch the kernel-visible ctime on every subsequent call. With core.trustctime=true (default), git's stat-cache treats every file as "potentially modified" even when size and mtime agree.

return fuseops.InodeAttributes{
    Size:  size,
    Nlink: 1,
    Mode:  m,
    Uid:   ...,
    Gid:   ...,
    Mtime: mtime,
    // Atime and Ctime intentionally omitted → zero → garbage on stat
}

Put it all together

  • File reports size=0 → git stat-cache sees mismatch vs index → git reads 0 bytes → hashes empty blob → hash mismatches blob in HEAD → "modified".
  • Even for files that do have known size, bogus ctime forces a re-hash on every git status invocation (slow, and the result is only cached until the next stat-cache miss).
  • Hydrator.EnsureHydrated (hydrator.go:106) can hydrate a blob on demand when a read(2) arrives, but because the reported size is 0 the kernel never asks for any bytes. The read path is never taken, and the snapshot row stays SizeState="unknown" forever.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions