Skip to content

Seekable Remote Files#1342

Open
rydrman wants to merge 5 commits intomainfrom
seekable-remote-files
Open

Seekable Remote Files#1342
rydrman wants to merge 5 commits intomainfrom
seekable-remote-files

Conversation

@rydrman
Copy link
Copy Markdown
Collaborator

@rydrman rydrman commented Mar 20, 2026

Co-Authored with Claude

The idea here is to support seeking of remote blobs in fuse, avoiding the segfaults that we keep seeing in ILM.

There are two paths here:

  1. in-memory cache of bytes. This is an LRU cache with config options to decide the max size as well as the max size for deciding which blobs can be held in memory this way. Primarily this is for small files.

  2. for larger files that ultimately need some kind of disk-cache we allow the regular stream handle to be upgraded to a stored blob if seek is invoked. In these cases, we write the blob to the local fs repo and then re-open it as a regular, seekable file.

    There's reasonable conversations to be had here about performance and the first seek call potentially being really heavy, as well as cleanup processes and policies. But I wanted something that was simple enough to be understood easily and iterated on only as needed. Seeking large remote files is inherently not going to be a good use case anyway, and by converting the remote object to a true local one instead of a temp file allows us to clean up cached objects through normal spfs clean processes and optimistically put them in a place that will avoid the remote lookups altogether if the environment is used again.

@rydrman rydrman self-assigned this Mar 20, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 20, 2026

Codecov Report

❌ Patch coverage is 27.58621% with 21 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/spfs-vfs/src/fuse.rs 0.00% 19 Missing ⚠️
crates/spfs-cli/cmd-fuse/src/cmd_fuse.rs 0.00% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

rydrman and others added 4 commits March 20, 2026 15:02
Introduce blob_cache_max_bytes (256 MiB) and blob_cache_max_single_bytes
(64 MiB) in spfs::config::Fuse, spfs_vfs::fuse::Config, and wire them
together in cmd_fuse so operators can tune cache behaviour from the spfs
config file.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remote blobs served through the FUSE BlobStream handle were read with
positional offsets ignored, causing data corruption and segfaults for
any application that used mmap or pread on a remote-backed file.

Introduce a byte-bounded in-memory BlobCache (HashMap + VecDeque,
insertion-order eviction) on the Filesystem.  When a remote blob is
opened:

* If its size is within blob_cache_max_single_bytes (default 64 MiB)
  the payload is fully downloaded into Arc<bytes::Bytes> and inserted
  into the cache keyed by digest.  Subsequent opens of the same blob
  share the Arc without re-downloading.  The new BlobCached handle
  variant services read() with a direct slice at the requested offset
  and lseek() via arithmetic on the data length, making both
  operations correct and O(1).

* If the blob is larger than the threshold it is committed to the local
  FS repository via commit_blob() so that it is tracked by the object
  graph and eligible for cleanup by spfs-clean.  The local file is then
  opened as an ordinary seekable BlobFile.  Future opens find the blob
  in the local FS repo before reaching the remote repo arm.

The BlobStream fallback is retained only when no local FS repository
is present in the stack, and the FOPEN_NONSEEKABLE | FOPEN_STREAM
flags continue to mark those handles as non-seekable.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the eager commit_blob() call in open() for large remote files
with a new BlobRemote handle variant that streams sequentially from the
remote and only downloads to the local FS repository the first time a
non-sequential read or meaningful seek is detected.

Sequential reads never touch disk; only files that are actually seeked
incur the one-time promotion cost.  If no local FS repo is present the
existing BlobStream (non-seekable) fallback is retained.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tests cover:
- BlobCache insert/get, LRU eviction, and Arc sharing for duplicate inserts
- promote_remote_to_local succeeds when the payload already exists on disk
- promote_remote_to_local is idempotent when the handle is already Local
- promote_remote_to_local returns an error when no local FS repo is present

Also adds tempfile and tokio/macros to dev-dependencies to support async tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@rydrman rydrman force-pushed the seekable-remote-files branch from 0fe97a6 to 3d613e9 Compare March 20, 2026 22:02
Remove the BlobStream fallback variant (used when no local FS repo was
available) and always use BlobRemote with FOPEN_STREAM. If promotion
fails (e.g. no local FS repo), the handler returns EIO rather than
silently serving ESPIPE via FOPEN_NONSEEKABLE.

Fix a bug in the lseek handler for BlobRemote in Streaming state: the
old code updated the tracked position arithmetically without moving the
stream cursor, so a subsequent read at the new offset would incorrectly
appear sequential and return bytes from the wrong position.  The new
logic answers SEEK_HOLE/SEEK_DATA trivially, is a no-op when the
position is unchanged, and promotes to a local file before performing
any position-changing seek so the stream cursor is always consistent.

Rename BlobRemoteInner::Streaming.position to stream_pos for clarity.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@rydrman rydrman requested review from dcookspi and jrray March 21, 2026 01:02
Copy link
Copy Markdown
Collaborator

@jrray jrray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't done a real review yet, the idea seems reasonable enough, but I had ideas on doing this without requiring local storage.

The gist is to use a moka cache and populate requested blocks of files on demand. There needs to be a way to fetch a byte range of a remote payload.

Consider an app the opens a 100 GB file, seeks somewhere into the middle of the file, reads one byte, and closes the file. You only need to fetch one block over the network to handle this, instead of persisting the whole payload locally.

}
}
let stream = fresh_stream.ok_or(spfs::Error::UnknownObject(*digest))?;
let stored = opened.commit_blob(stream).await?;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This blob would be treated as garbage and deleted by spfs clean.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's kind of the whole point here to keep things simple - it's a cache for some time but cleaned up as a separate process. Anything else seems like it will balloon this PR

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see something along the lines of adding a reference to these locally created payloads to the spfs runtime associated with this fuse session, as a way to keep these alive for the duration of that runtime.

@rydrman
Copy link
Copy Markdown
Collaborator Author

rydrman commented Mar 22, 2026

Consider an app the opens a 100 GB file, seeks somewhere into the middle of the file, reads one byte, and closes the file. You only need to fetch one block over the network to handle this, instead of persisting the whole payload locally.

I thought about this, but it doesn't feel like a common use case or at least there's no evidence that it's the case that we should optimize for...

To be clear though, I'm not against changing around all of this behaviour in the future. My goal here was to resolve the bug with as little effort and complexity as possible as a first pass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants