Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,8 @@ Read this before changing `artifact-fs`.
## Non-obvious CLI/runtime behavior

- `ARTIFACT_FS_ROOT` is the state root. `artifact-fs daemon --root` is the mount root. They are different things.
- `add-repo` is one-shot: register repo, clone blobless, build the initial snapshot, then exit. It does not mount FUSE or start background goroutines.
- `add-repo` is one-shot by default: register repo, clone blobless, build the initial snapshot, then exit. It does not mount FUSE or start background goroutines.
- `add-repo --async` only registers prepare state. The daemon mounts a gated placeholder, prepares clone/fetch and snapshot in the background, then opens the gate and starts watcher/refresh.
- `daemon` is long-running: it mounts registered repos and starts watcher, refresh, and hydrator workers.
- `git.CloneBlobless` already populates the git index with `read-tree HEAD`; be careful about extra index resets because they can discard staged state.

Expand Down
44 changes: 41 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ Quick start against a public repo:
```bash
export ARTIFACT_FS_ROOT=/tmp/artifact-fs-test

# Register and clone (returns immediately)
# Register, clone, and build the initial snapshot
./artifact-fs add-repo \
--name workers-sdk \
--remote https://github.com/cloudflare/workers-sdk.git \
Expand Down Expand Up @@ -102,6 +102,44 @@ Use `--hydration-concurrency` to control the number of parallel blob-fetch worke
./artifact-fs daemon --root /tmp --hydration-concurrency 8
```

## Async repo preparation

By default, `add-repo` waits for the blobless clone and initial snapshot before returning. Use `--async` when the daemon should prepare the repo in the background:

```bash
./artifact-fs add-repo \
--name workers-sdk \
--remote https://github.com/cloudflare/workers-sdk.git \
--branch main \
--mount-root /tmp \
--async
```

The daemon mounts a placeholder immediately. Operations inside that repo mount, such as `ls`, `less`, or `git -C /tmp/workers-sdk status`, wait until the clone/fetch and snapshot publish have completed. If preparation fails, those operations return an I/O error until preparation is retried:

```bash
./artifact-fs status --name workers-sdk
./artifact-fs prepare --name workers-sdk
```

Async HTTPS remotes must use ambient credentials, such as a configured Git credential helper or repo-local Git config. Inline credentials in the remote URL are rejected for async repositories.

For workflows that create the gitdir separately, `--prepared-gitdir` makes the async step fetch and prepare an existing gitdir instead of running `git clone`:

```bash
git init --separate-git-dir /tmp/workers-sdk.git --initial-branch main /tmp/workers-sdk
git -C /tmp/workers-sdk remote add origin https://github.com/cloudflare/workers-sdk.git

./artifact-fs add-repo \
--name workers-sdk \
--branch main \
--mount-root /tmp \
--async \
--prepared-gitdir \
--git-dir /tmp/workers-sdk.git \
--fetch-ref main
```

## Sandboxes and Containers

[`examples/Dockerfile`](examples/Dockerfile) builds artifact-fs and starts a FUSE-mounted repo inside a container. The container requires `--cap-add SYS_ADMIN --device /dev/fuse` for FUSE access.
Expand Down Expand Up @@ -129,7 +167,7 @@ On hosts with AppArmor enabled (Ubuntu default), add `--security-opt apparmor:un

## Architecture

ArtifactFS has two distinct phases: a one-shot **setup** (`add-repo`) that performs a fast blobless clone of a repo, and a long-running **daemon** that mounts it via FUSE and serves file operations.
ArtifactFS has two distinct phases: a one-shot **setup** (`add-repo`) that registers and usually prepares a fast blobless clone, and a long-running **daemon** that mounts it via FUSE and serves file operations. With `add-repo --async`, setup only registers the repo; the daemon performs clone/fetch and snapshot publishing while FUSE operations wait behind a readiness gate.

```
┌─────────────────────────────────────────────────┐
Expand Down Expand Up @@ -174,7 +212,7 @@ ArtifactFS has two distinct phases: a one-shot **setup** (`add-repo`) that perfo

### Data flow

1. **Clone** -- `add-repo` runs `git clone --filter=blob:none` (blobless). Only commits, trees, and refs are fetched. No file content is downloaded.
1. **Clone/fetch** -- `add-repo` runs `git clone --filter=blob:none` (blobless) unless `--async` is used. In async mode, the daemon performs either the blobless clone or a fetch into a prepared gitdir. Only commits, trees, and refs are fetched. No file content is downloaded.

2. **Index** -- `git ls-tree -r -t -z HEAD` enumerates every path in the tree. Sizes are resolved locally via `git cat-file --batch-check` with `GIT_NO_LAZY_FETCH=1` to avoid network round-trips. The result is bulk-inserted into a SQLite `base_nodes` table as a new generation.

Expand Down
232 changes: 232 additions & 0 deletions e2e_async_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,232 @@
//go:build !windows

package main

import (
"context"
"log/slog"
"os"
"os/exec"
"path/filepath"
"testing"
"time"

"github.com/cloudflare/artifact-fs/internal/daemon"
"github.com/cloudflare/artifact-fs/internal/logging"
"github.com/cloudflare/artifact-fs/internal/model"
)

func TestE2EAsyncPreparedGitDirBlocksUntilReady(t *testing.T) {
if os.Getenv("AFS_RUN_E2E_TESTS") != "1" {
t.Skip("skipping e2e tests (set AFS_RUN_E2E_TESTS=1 to run)")
}
skipIfNoFUSE(t)

remoteURL := os.Getenv("AFS_E2E_REPO")
if remoteURL == "" {
remoteURL = createLocalTestRepo(t)
}
preparedGitDir, preparedWorktree := createPreparedGitDir(t, remoteURL)
_ = preparedWorktree

unblock := filepath.Join(t.TempDir(), "unblock-fetch")
installBlockingGitFetchWrapper(t, unblock)

repo := newAsyncPreparedE2ERepo(t, preparedGitDir, "main")

waitForCondition(t, 10*time.Second, "async repo preparing", func() (bool, string) {
st, err := repo.svc.Status(context.Background(), repoName)
if err != nil {
return false, err.Error()
}
if st.State == model.PrepareStatePreparing {
return true, ""
}
return false, "state=" + st.State
})

done := make(chan error, 1)
go func() {
entries, err := os.ReadDir(repo.mountPath)
if err == nil && len(entries) == 0 {
err = os.ErrNotExist
}
done <- err
}()

select {
case err := <-done:
t.Fatalf("ReadDir returned before async prepare was released: %v", err)
case <-time.After(500 * time.Millisecond):
}

if err := os.WriteFile(unblock, []byte("go\n"), 0o644); err != nil {
t.Fatal(err)
}
select {
case err := <-done:
if err != nil {
t.Fatalf("ReadDir after prepare release: %v", err)
}
case <-time.After(30 * time.Second):
t.Fatal("ReadDir did not unblock after async prepare completed")
}

entries := lsDir(t, repo.mountPath)
assertContains(t, entries, ".git")
assertContains(t, entries, "README.md")
assertGitStatus(t, repo.mountPath, map[string]string{})
}

func TestE2EAsyncPreparedGitDirFailureThenRetry(t *testing.T) {
if os.Getenv("AFS_RUN_E2E_TESTS") != "1" {
t.Skip("skipping e2e tests (set AFS_RUN_E2E_TESTS=1 to run)")
}
skipIfNoFUSE(t)

remoteURL := os.Getenv("AFS_E2E_REPO")
if remoteURL == "" {
remoteURL = createLocalTestRepo(t)
}
preparedGitDir, preparedWorktree := createPreparedGitDir(t, "file://"+filepath.Join(t.TempDir(), "missing.git"))
repo := newAsyncPreparedE2ERepo(t, preparedGitDir, "main")

waitForCondition(t, 10*time.Second, "async prepare failure", func() (bool, string) {
st, err := repo.svc.Status(context.Background(), repoName)
if err != nil {
return false, err.Error()
}
if st.State == model.PrepareStateFailed && st.PrepareError != "" {
return true, ""
}
return false, "state=" + st.State + " prepare_error=" + st.PrepareError
})

if _, err := os.ReadDir(repo.mountPath); err == nil {
t.Fatal("ReadDir unexpectedly succeeded after prepare failure")
}

gitCmd(t, preparedWorktree, "remote", "set-url", "origin", remoteURL)
if err := repo.svc.Prepare(context.Background(), repoName); err != nil {
t.Fatalf("prepare retry: %v", err)
}

waitForCondition(t, 30*time.Second, "async prepare retry ready", func() (bool, string) {
st, err := repo.svc.Status(context.Background(), repoName)
if err != nil {
return false, err.Error()
}
if st.State == "mounted" && st.PrepareError == "" {
return true, ""
}
return false, "state=" + st.State + " prepare_error=" + st.PrepareError
})

entries := lsDir(t, repo.mountPath)
assertContains(t, entries, "README.md")
assertGitStatus(t, repo.mountPath, map[string]string{})
}

func createPreparedGitDir(t *testing.T, remoteURL string) (gitDir string, worktree string) {
t.Helper()
tmp := t.TempDir()
gitDir = filepath.Join(tmp, "prepared.git")
worktree = filepath.Join(tmp, "prepared")
run(t, "", "git", "init", "--separate-git-dir", gitDir, "--initial-branch", "main", worktree)
run(t, worktree, "git", "remote", "add", "origin", remoteURL)
return gitDir, worktree
}

func installBlockingGitFetchWrapper(t *testing.T, unblockPath string) {
t.Helper()
realGit, err := exec.LookPath("git")
if err != nil {
t.Fatal(err)
}
wrapperDir := t.TempDir()
wrapperPath := filepath.Join(wrapperDir, "git")
script := `#!/bin/sh
for arg in "$@"; do
if [ "$arg" = "fetch" ]; then
while [ ! -f "$AFS_ASYNC_GIT_UNBLOCK" ]; do
sleep 0.05
done
break
fi
done
exec "$AFS_REAL_GIT" "$@"
`
if err := os.WriteFile(wrapperPath, []byte(script), 0o755); err != nil {
t.Fatal(err)
}
t.Setenv("AFS_REAL_GIT", realGit)
t.Setenv("AFS_ASYNC_GIT_UNBLOCK", unblockPath)
t.Setenv("PATH", wrapperDir+string(os.PathListSeparator)+os.Getenv("PATH"))
}

func newAsyncPreparedE2ERepo(t *testing.T, preparedGitDir string, fetchRef string) *mountedE2ERepo {
t.Helper()
root, err := os.MkdirTemp("", "artifact-fs-e2e-async-root-*")
if err != nil {
t.Fatal(err)
}
mountDir, err := os.MkdirTemp("", "artifact-fs-e2e-async-mount-*")
if err != nil {
_ = os.RemoveAll(root)
t.Fatal(err)
}
mountPath := filepath.Join(mountDir, repoName)
if err := os.MkdirAll(mountPath, 0o755); err != nil {
_ = os.RemoveAll(mountDir)
_ = os.RemoveAll(root)
t.Fatal(err)
}

ctx, cancel := context.WithCancel(context.Background())
logger := logging.NewJSONLogger(os.Stderr, slog.LevelWarn)
svc, err := daemon.New(ctx, root, logger)
if err != nil {
cancel()
t.Fatal(err)
}
svc.SetMountRoot(mountDir)

cfg := model.RepoConfig{
Name: repoName,
ID: model.RepoID(repoName),
Branch: "main",
RefreshInterval: 5 * time.Minute,
MountRoot: mountDir,
GitDir: preparedGitDir,
PreparedGitDir: true,
FetchRef: fetchRef,
Enabled: true,
}
if err := svc.AddRepoWithOptions(ctx, cfg, daemon.AddRepoOptions{Async: true}); err != nil {
cancel()
_ = svc.Close()
t.Fatalf("add-repo async prepared-gitdir: %v", err)
}

errCh := make(chan error, 1)
go func() { errCh <- svc.Start(ctx) }()

if !waitForMount(t, mountPath, 60*time.Second) {
cancel()
_ = svc.Close()
t.Fatal("FUSE mount did not appear within timeout")
}

repo := &mountedE2ERepo{
root: root,
mountDir: mountDir,
mountPath: mountPath,
svc: svc,
cancel: cancel,
errCh: errCh,
}
t.Cleanup(func() {
repo.close(t)
})
return repo
}
14 changes: 13 additions & 1 deletion e2e_setup_darwin_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,24 @@ import (
"testing"
)

// skipIfNoFUSE skips the test if macFUSE is not installed.
// skipIfNoFUSE skips the test if macFUSE is not installed or the
// mount helper cannot be executed by this test process. Some sandboxed
// environments expose the macFUSE bundle but deny exec of mount_macfuse;
// without this preflight every e2e case waits for its mount timeout.
func skipIfNoFUSE(t *testing.T) {
t.Helper()
if _, err := os.Stat("/Library/Filesystems/macfuse.fs"); err != nil {
t.Skip("skipping: macFUSE not installed")
}
helper := "/Library/Filesystems/macfuse.fs/Contents/Resources/mount_macfuse"
cmd := exec.Command(helper, "--help")
out, err := cmd.CombinedOutput()
if err == nil {
return
}
if strings.Contains(err.Error(), "operation not permitted") || strings.Contains(string(out), "operation not permitted") {
t.Skipf("skipping: macFUSE mount helper is not executable in this environment: %v", err)
}
}

// configGitSafeDir adds safe.directory entries for the mount path.
Expand Down
Loading
Loading