fix: arg/kwarg collision for local numpy vars in caching#9751
Merged
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Contributor
There was a problem hiding this comment.
No issues found across 2 files
Architecture diagram
sequenceDiagram
participant UserCode as User Cell
participant Cache as @mo.cache Decorator
participant Hash as _is_memoizable()
participant Graph as Cell Graph
participant ContentHash as Content Hash
Note over UserCode,ContentHash: Cache memoization check flow
UserCode->>Cache: Call f(a=ndarray) with args/kwargs
Cache->>Hash: Check memoizability(local_ref, value)
Hash->>Graph: Look up definitions.get(local_ref)
Graph-->>Hash: Return set of defining cell IDs (or empty set)
alt local_ref defined in same cell
Note over Hash: defs = {self.cell_id}
Hash->>Hash: bool(defs) is True, self.cell_id in defs
Hash-->>Cache: Return False (not memoizable)
Cache->>ContentHash: Recompute content hash
ContentHash-->>Cache: New hash from runtime value
Cache-->>UserCode: cache miss, execute function
else local_ref defined in different cell
Note over Hash: defs = {other_cell_id}
Hash->>Hash: bool(defs) is True, self.cell_id not in defs
Hash-->>Cache: Return True (memoizable)
Cache->>ContentHash: Use cached content hash
ContentHash-->>Cache: Existing hash from storage
Cache-->>UserCode: cache hit, return cached result
else local_ref not defined in any cell (local variable)
Note over Hash: defs = empty set
Hash->>Hash: bool(defs) is False
Hash-->>Cache: Return False (not memoizable)
Cache->>ContentHash: Recompute content hash
ContentHash-->>Cache: New hash from runtime value
Cache-->>UserCode: cache miss, execute function
end
Note over UserCode,ContentHash: Key fix: distinguish local vars (empty defs) from cross-cell refs
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes incorrect content-hash memoization during @mo.cache calls where locally-scoped values (not defined by any cell) could be memoized by name, causing different ndarray argument values (positional or keyword) to reuse a prior digest and incorrectly collide.
Changes:
- Restrict content-hash memoization to references that are actually defined by (other) cells in the notebook graph.
- Add a regression test ensuring distinct NumPy ndarray arguments to a cached function never collide across calls (positional/keyword forms).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
marimo/_save/hash.py |
Tightens _is_memoizable so only graph-defined refs are eligible for hash memoization, preventing memo collisions for function args/kwargs. |
tests/_save/test_hash.py |
Adds a NumPy regression test validating cached function calls with distinct ndarray inputs don’t produce erroneous cache hits. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📝 Summary
We do not properly detect local (defined in same cell) kwarg/arg defs, causing invalid collisions.
Came across this when doing some benchmarking.