Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Introduce blob_cache_max_bytes (256 MiB) and blob_cache_max_single_bytes (64 MiB) in spfs::config::Fuse, spfs_vfs::fuse::Config, and wire them together in cmd_fuse so operators can tune cache behaviour from the spfs config file. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remote blobs served through the FUSE BlobStream handle were read with positional offsets ignored, causing data corruption and segfaults for any application that used mmap or pread on a remote-backed file. Introduce a byte-bounded in-memory BlobCache (HashMap + VecDeque, insertion-order eviction) on the Filesystem. When a remote blob is opened: * If its size is within blob_cache_max_single_bytes (default 64 MiB) the payload is fully downloaded into Arc<bytes::Bytes> and inserted into the cache keyed by digest. Subsequent opens of the same blob share the Arc without re-downloading. The new BlobCached handle variant services read() with a direct slice at the requested offset and lseek() via arithmetic on the data length, making both operations correct and O(1). * If the blob is larger than the threshold it is committed to the local FS repository via commit_blob() so that it is tracked by the object graph and eligible for cleanup by spfs-clean. The local file is then opened as an ordinary seekable BlobFile. Future opens find the blob in the local FS repo before reaching the remote repo arm. The BlobStream fallback is retained only when no local FS repository is present in the stack, and the FOPEN_NONSEEKABLE | FOPEN_STREAM flags continue to mark those handles as non-seekable. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the eager commit_blob() call in open() for large remote files with a new BlobRemote handle variant that streams sequentially from the remote and only downloads to the local FS repository the first time a non-sequential read or meaningful seek is detected. Sequential reads never touch disk; only files that are actually seeked incur the one-time promotion cost. If no local FS repo is present the existing BlobStream (non-seekable) fallback is retained. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tests cover: - BlobCache insert/get, LRU eviction, and Arc sharing for duplicate inserts - promote_remote_to_local succeeds when the payload already exists on disk - promote_remote_to_local is idempotent when the handle is already Local - promote_remote_to_local returns an error when no local FS repo is present Also adds tempfile and tokio/macros to dev-dependencies to support async tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
0fe97a6 to
3d613e9
Compare
Remove the BlobStream fallback variant (used when no local FS repo was available) and always use BlobRemote with FOPEN_STREAM. If promotion fails (e.g. no local FS repo), the handler returns EIO rather than silently serving ESPIPE via FOPEN_NONSEEKABLE. Fix a bug in the lseek handler for BlobRemote in Streaming state: the old code updated the tracked position arithmetically without moving the stream cursor, so a subsequent read at the new offset would incorrectly appear sequential and return bytes from the wrong position. The new logic answers SEEK_HOLE/SEEK_DATA trivially, is a no-op when the position is unchanged, and promotes to a local file before performing any position-changing seek so the stream cursor is always consistent. Rename BlobRemoteInner::Streaming.position to stream_pos for clarity. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
jrray
left a comment
There was a problem hiding this comment.
I haven't done a real review yet, the idea seems reasonable enough, but I had ideas on doing this without requiring local storage.
The gist is to use a moka cache and populate requested blocks of files on demand. There needs to be a way to fetch a byte range of a remote payload.
Consider an app the opens a 100 GB file, seeks somewhere into the middle of the file, reads one byte, and closes the file. You only need to fetch one block over the network to handle this, instead of persisting the whole payload locally.
| } | ||
| } | ||
| let stream = fresh_stream.ok_or(spfs::Error::UnknownObject(*digest))?; | ||
| let stored = opened.commit_blob(stream).await?; |
There was a problem hiding this comment.
This blob would be treated as garbage and deleted by spfs clean.
There was a problem hiding this comment.
yes, that's kind of the whole point here to keep things simple - it's a cache for some time but cleaned up as a separate process. Anything else seems like it will balloon this PR
There was a problem hiding this comment.
I would like to see something along the lines of adding a reference to these locally created payloads to the spfs runtime associated with this fuse session, as a way to keep these alive for the duration of that runtime.
I thought about this, but it doesn't feel like a common use case or at least there's no evidence that it's the case that we should optimize for... To be clear though, I'm not against changing around all of this behaviour in the future. My goal here was to resolve the bug with as little effort and complexity as possible as a first pass |
The idea here is to support seeking of remote blobs in fuse, avoiding the segfaults that we keep seeing in ILM.
There are two paths here:
in-memory cache of bytes. This is an LRU cache with config options to decide the max size as well as the max size for deciding which blobs can be held in memory this way. Primarily this is for small files.
for larger files that ultimately need some kind of disk-cache we allow the regular stream handle to be upgraded to a stored blob if seek is invoked. In these cases, we write the blob to the local fs repo and then re-open it as a regular, seekable file.
There's reasonable conversations to be had here about performance and the first seek call potentially being really heavy, as well as cleanup processes and policies. But I wanted something that was simple enough to be understood easily and iterated on only as needed. Seeking large remote files is inherently not going to be a good use case anyway, and by converting the remote object to a true local one instead of a temp file allows us to clean up cached objects through normal
spfs cleanprocesses and optimistically put them in a place that will avoid the remote lookups altogether if the environment is used again.