Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
5b3e553
feat: add marufs kernel module — CXL shared-memory filesystem
moonchan-park Apr 10, 2026
c352b9c
fix: address PR #41 review — C1 claim_entry guard, C2 FIND_NAME perm,…
moonchan-park Apr 10, 2026
5e61549
feat: superblock CRC32 checksum + remove active_nodes field
moonchan-park Apr 10, 2026
cfcc845
fix: remove format mount option from fstab/systemd autoload
moonchan-park Apr 10, 2026
b20c9e9
fix: i_size_write without i_rwsem in read_iter and getattr
moonchan-park Apr 10, 2026
1ce23dd
fix: address PR #41 review — H6 mmap ref, H7 alloc CAS, H9 TOCTOU, H1…
moonchan-park Apr 10, 2026
f96183b
fix: address PR #41 review — M12 GC stop, M14 alloc sleep, M15 arch-p…
moonchan-park Apr 10, 2026
2b724f4
fix: add compat_ioctl for 32-bit userspace support
moonchan-park Apr 10, 2026
defc2ca
docs: add architecture documentation links to README
moonchan-park Apr 10, 2026
c97540d
feat: NRHT shard lock for insert serialization + TENTATIVE state
moonchan-park Apr 10, 2026
2471194
feat: index.c TENTATIVE state + DRAM shard lock for insert serialization
moonchan-park Apr 10, 2026
dca995b
feat: add sysfs deleg_info, gc_trigger all-mounts, 5 new test binaries
moonchan-park Apr 10, 2026
80d263f
fix: harden permission checks for destructive/admin sysfs operations
moonchan-park Apr 14, 2026
0cb146a
fix: address jooho review — security, overflow, compat, page cache
moonchan-park Apr 14, 2026
b406863
fix: setattr reject non-SIZE attrs gracefully + gc_thread stop check
moonchan-park Apr 14, 2026
88d3fc6
docs: add userspace API guide to README
moonchan-park Apr 14, 2026
4b2123c
refactor(layout): add uuid field to superblock with proper alignment
moonchan-park Apr 22, 2026
3eeb0b4
docs: add user-facing guide for multi-node mount, API, and security
moonchan-park Apr 22, 2026
d478071
feat: introduce ME (Mutual Exclusion) abstraction with pluggable stra…
moonchan-park Apr 22, 2026
a55ff48
feat(me): doorbell slot protocol + magic-tagged struct validation
moonchan-park Apr 22, 2026
2dd1d6f
docs: document ME strategy mount option and NRHT_JOIN pre-warm
moonchan-park Apr 22, 2026
67b30c7
feat(me): poll-thread cost counters via sysfs + bench integration
moonchan-park Apr 23, 2026
b2559bd
feat(me): shard-level scan skip via per-node pending bitmap
moonchan-park Apr 23, 2026
7e1cc3b
feat(me): eliminate CB hot-polling in poll_cycle
moonchan-park Apr 23, 2026
027d969
refactor(me): consolidate per-shard arrays into struct marufs_me_shard
moonchan-park Apr 23, 2026
b083657
refactor(me): bind me->shards[x] to local sh pointer for readability
moonchan-park Apr 23, 2026
d77aea0
perf(me): DRAM is_holder fast path + ME_{BECOME,LOSE,IS}_HOLDER macros
moonchan-park Apr 23, 2026
5d441a8
refactor(me): consolidate CB reads via me_cb_snapshot, drop unused tr…
moonchan-park Apr 24, 2026
2eab9b0
feat(stats): per-CPU fine-grained ME + NRHT counters with sysfs + bench
moonchan-park Apr 24, 2026
17c0b4e
feat(stats): wait_fast_hit counter + poll-phase breakdown in sweep
moonchan-park Apr 24, 2026
5688510
fix(me): observer-local counter-based liveness probe
moonchan-park Apr 24, 2026
b51401a
docs(me): restructure ME protocol architecture doc
moonchan-park Apr 24, 2026
5aececb
refactor(sysfs): split sysfs.c into per-domain modules
moonchan-park Apr 27, 2026
6634e8d
refactor(headers): split marufs.h / marufs_layout.h into per-domain h…
moonchan-park Apr 27, 2026
ce5f52c
test(multinode): integrate test_me_crash.sh as Section 30
moonchan-park Apr 27, 2026
99aca74
feat(bootstrap): add auto-mount slot election with chaos tests
moonchan-park Apr 28, 2026
c7f219c
fix(me): sync poll_last_slot_seq in wait_for_token success
moonchan-park Apr 28, 2026
b0270d8
feat(nrht): per-entry ref/pin counter ioctls
moonchan-park Apr 28, 2026
29fef35
docs(user_guide): add §4.3 ref/pin counter ioctls
moonchan-park Apr 28, 2026
08d3755
fix(bootstrap): bound dump_slots to caller-provided bufsize
moonchan-park Apr 29, 2026
387616f
refactor(me): split me.h into api/inline/layout headers
moonchan-park Apr 30, 2026
ed3028b
refactor(acl): centralize ioctl perm check + ANY API
moonchan-park Apr 30, 2026
a236713
fix(acl): re-check ADMIN inside ME for chown/perm_set_default
moonchan-park Apr 30, 2026
2bec39c
feat(marufs): vm_ops wrapper for mprotect enforcement; remove daxheap
moonchan-park May 8, 2026
5477f21
feat(marufs): post-exec privilege retention defense
moonchan-park May 11, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -86,3 +86,5 @@ CLAUDE.md
docs/_build/
internal_docs/
.worktrees/
# Added by code-review-graph
.code-review-graph/
194 changes: 194 additions & 0 deletions docs/source/design_doc/marufs_kernel_module_architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,194 @@
# marufs Kernel Module Architecture

## Module Components

```mermaid
graph TB
subgraph VFS["VFS Layer"]
super["super.c<br/>mount / umount / format"]
dir["dir.c<br/>readdir / lookup<br/>create / unlink"]
inode["inode.c<br/>iget / new_inode / evict"]
file["file.c<br/>mmap / ftruncate / ioctl"]
end

subgraph Data["Data Layer"]
index["index.c<br/>hash index<br/>insert / lookup / delete"]
region["region.c<br/>RAT allocator<br/>alloc / free"]
nrht_m["nrht.c<br/>Name-Ref Hash Table<br/>name_offset / find_name"]
end

subgraph Security["Security Layer"]
acl["acl.c<br/>perm check / grant<br/>chown / set_default"]
end

subgraph Maint["Maintenance"]
gc["gc.c<br/>4-phase GC sweep"]
sysfs["sysfs.c<br/>/sys/fs/marufs/ stats"]
end

super --> dir & inode & file
dir --> index
file --> index & region & acl & nrht_m
inode --> index
gc -.-> index & region & nrht_m

style VFS fill:#1B4F72,stroke:#1B4F72,stroke-width:2px,color:#fff
style Data fill:#145A32,stroke:#145A32,stroke-width:2px,color:#fff
style Security fill:#78281F,stroke:#78281F,stroke-width:2px,color:#fff
style Maint fill:#2C3E50,stroke:#2C3E50,stroke-width:2px,color:#fff
```

| Layer | File | Role |
|-------|------|------|
| VFS | `super.c` | Module init, mount/umount, DAX device setup, mkfs (format) |
| VFS | `dir.c` | Directory operations: readdir, lookup, create, unlink, d_revalidate |
| VFS | `inode.c` | Inode lifecycle: iget (from CXL index), new_inode, evict |
| VFS | `file.c` | File operations: mmap (DAX fault), ftruncate (region alloc), ioctl dispatch |
| Data | `index.c` | Global partitioned index: CAS-based insert/lookup/delete, hash chain walk |
| Data | `region.c` | RAT (Region Allocation Table): contiguous space finder, alloc/free entries |
| Data | `nrht.c` | Name-Ref Hash Table: name_offset, find_name, batch operations |
| Security | `acl.c` | Permission enforcement: delegation table check, perm_grant, chown |
| Maintenance | `gc.c` | Background GC: 4-phase sweep (dead process, stale index, local tracker, NRHT) |
| Maintenance | `sysfs.c` | sysfs interface: `/sys/fs/marufs/` stats and configuration |

## CXL Memory Layout

```mermaid
block-beta
columns 4

sb_label["◼ Global Superblock"]:4
sb["Superblock (256B)"]:4

space:4

gi_label["◼ Global Index"]:4
sh0["Shard Header 0 (64B)"]
sh1["Shard Header 1 (64B)"]
sh2["Shard Header 2 (64B)"]
sh3["Shard Header 3 (64B)"]
bk0["Buckets 0 (256 × 4B)"]
bk1["Buckets 1 (256 × 4B)"]
bk2["Buckets 2 (256 × 4B)"]
bk3["Buckets 3 (256 × 4B)"]
en0["Entries 0 (256 × 64B)"]
en1["Entries 1 (256 × 64B)"]
en2["Entries 2 (256 × 64B)"]
en3["Entries 3 (256 × 64B)"]

space:4

rat_label["◼ Region Allocation Table"]:4
rat_hdr["RAT Header (128B)"]:4
r0["RAT Entry 0 (2 KB)"]
r1["RAT Entry 1 (2 KB)"]
r_dot["... (× 253)"]
r255["RAT Entry 255 (2 KB)"]

space:4

rg["Region 0, 1, 2, ... (2 MB aligned each)"]:4

style sb_label fill:#1B4F72,color:#fff,font-weight:bold
style sb fill:#2E86C1,color:#fff
style gi_label fill:#145A32,color:#fff,font-weight:bold
style sh0 fill:#1E8449,color:#fff
style sh1 fill:#1E8449,color:#fff
style sh2 fill:#1E8449,color:#fff
style sh3 fill:#1E8449,color:#fff
style bk0 fill:#27AE60,color:#fff
style bk1 fill:#27AE60,color:#fff
style bk2 fill:#27AE60,color:#fff
style bk3 fill:#27AE60,color:#fff
style en0 fill:#52BE80,color:#fff
style en1 fill:#52BE80,color:#fff
style en2 fill:#52BE80,color:#fff
style en3 fill:#52BE80,color:#fff
style rat_label fill:#78281F,color:#fff,font-weight:bold
style rat_hdr fill:#C0392B,color:#fff
style r0 fill:#E74C3C,color:#fff
style r1 fill:#E74C3C,color:#fff
style r_dot fill:#E74C3C,color:#fff
style r255 fill:#E74C3C,color:#fff
style rg fill:#2C3E50,color:#fff
```

| Block | Size | Description |
|-------|------|-------------|
| Superblock | 256B (4 CL) | FS geometry, shard count, offsets, mounted node bitmask (`active_nodes`) |
| Shard Header | 64B (1 CL) × 4 | Per-shard bucket/entry array offsets (immutable after format) |
| Buckets | 4B × 256 per shard | Hash chain head pointers (`head_entry_idx` or `BUCKET_END`) |
| Entries | 64B (1 CL) × 256 per shard | Index entries: state, name_hash, region_id, next_in_bucket |
| RAT Header | 128B (2 CL) | max_entries, alloc_lock (CAS spinlock), allocation stats |
| RAT Entry | 2 KB (32 CL) × 256 | CL0: phys_offset/size, CL1: name, CL2: ACL, CL3-31: delegation |
| Region Data | 2 MB aligned each | Actual file data, variable size |

## ACL (Access Control List)

```mermaid
flowchart TD
subgraph GI_path["Global Index"]
gi_path["filename → Index Entry<br/>(hash → shard → bucket → chain)"]
end

subgraph RAT_path["RAT Entry (via region_id)"]
cl0["CL0: phys_offset, size"]
cl2["CL2: Owner + default_perms"]
cl3["CL3-31: Delegation Table<br/>(up to 29 entries)"]
end

subgraph RD_path["Region Data"]
region["mmap / read / write / unlink<br/>(open is always allowed)"]
end

gi_path -->|"region_id"| RAT_path
cl0 -->|"phys_offset"| region
cl2 -.->|"owner / default"| region
cl3 -.->|"delegated perms"| region

style GI_path fill:#145A32,stroke:#145A32,stroke-width:2px,color:#fff
style RAT_path fill:#1B4F72,stroke:#1B4F72,stroke-width:2px,color:#fff
style RD_path fill:#2C3E50,stroke:#2C3E50,stroke-width:2px,color:#fff
```

- **Data path** (solid): `filename → Index Entry → region_id → CL0.phys_offset → Region`
- **Permission path** (dotted): Owner (implicit all) → default_perms (non-owner baseline) → Delegation Table (per-node/pid grants)
- open() always allowed — permission check at mmap / read / write / unlink
- Delegations stored on CXL — immediately visible cross-node

## NRHT (Name-Ref Hash Table)

```mermaid
flowchart TD
subgraph GI["Global Index"]
gi_entry["filename → region_id<br/>(file-level lookup)"]
end

subgraph NRHT_F["NRHT File"]
nrht_entry["name → (offset, target_region_id)<br/>(application-level reference)"]
end

subgraph Regions["Region files"]
r0["Region 0 (data)"]
r1["Region 1 (data)"]
r2["Region 2 (data)"]
end

gi_entry -->|"region_id=0"| r0
gi_entry -->|"region_id=1"| r1
gi_entry -->|"region_id=2"| r2
gi_entry -->|"region_id=5 (NRHT)"| NRHT_F

nrht_entry -->|"target=0, offset=0x1000"| r0
nrht_entry -->|"target=1, offset=0x2000"| r1
nrht_entry -->|"target=0, offset=0x3000"| r0

style GI fill:#145A32,stroke:#145A32,stroke-width:2px,color:#fff
style NRHT_F fill:#1B4F72,stroke:#1B4F72,stroke-width:2px,color:#fff
style Regions fill:#2C3E50,stroke:#2C3E50,stroke-width:2px,color:#fff
```

- **Global Index**: `filename → region_id` — filesystem-level file lookup
- **NRHT**: `name → (offset, target_region_id)` — application-level intra-region references (e.g., KV cache keys)
- A single NRHT can freely reference **multiple regions** (N:M relationship)
- NRHT files are regular regions registered in the Global Index (own RAT entry)
Loading
Loading