ROX-30437: basic inode tracking for host path resolution #166

Molter73 · 2025-12-03T14:37:54Z

Description

This is a very basic implementation of inode tracking meant to be used
in the upcoming release and improved upon in the near future.

The current implementation will perform a scan of the paths that are
configured to be monitored, using the inode and device numbers as a key
to two maps:

A BPF hash map for kernelspace to know when an inode triggers an
event.
A HashMap in userspace that maps the inode to a path that we found the
inode at.

With these two maps we are able to confidently emit events for files
that are mounted into containers with paths that don't necessarily match
the prefixes configured for monitoring and, at a later stage in
userspace, add the path on the host to the event itself.

The implemented approach is far from complete, it will only work as long
as the files found during the initial scan are not moved, deleted or
replaced by a different file. Future patches will extend the
functionality of both kernel and userspace to be able to catch more
corner cases.

Checklist

Investigated and inspected CI test results
Updated documentation accordingly

Automated testing

Added unit tests
Added integration tests
Added regression tests

If any of these don't apply, please comment below.

Testing Performed

Added tests will validate events generated on an overlayfs file properly
shows the event on the upper layer and the access to the underlying FS.
They also validate a mounted path on a container resolves to the correct
host path.

While developing these tests, it became painfully obvious getting the
information of the process running inside the container is not
straightforward. Because containers tend to be fairly static, we should
be able to manually create the information statically in the test and
still have everything work correctly. In order to minimize the amount of
changes on existing tests, the default Process constructor now takes
fields directly and there is a from_proc class method that builds a new
Process object from /proc. Additionally, getting the pid of a process in
a container is virtually impossible, so we make the pid check optional.

This is a very basic implementation of inode tracking meant to be used in the upcoming release and improved upon in the near future. The current implementation will perform a scan of the paths that are configured to be monitored, using the inode and device numbers as a key to two maps: - A BPF hash map for kernelspace to know when an inode triggers an event. - A HashMap in userspace that maps the inode to a path that we found the inode at. With these two maps we are able to confidently emit events for files that are mounted into containers with paths that don't necessarily match the prefixes configured for monitoring and, at a later stage in userspace, add the path on the host to the event itself. The implemented approach is far from complete, it will only work as long as the files found during the initial scan are not moved, deleted or replaced by a different file. Future patches will extend the functionality of both kernel and userspace to be able to catch more corner cases.

Added tests will validate events generated on an overlayfs file properly shows the event on the upper layer and the access to the underlying FS. They also validate a mounted path on a container resolves to the correct host path. While developing these tests, it became painfully obvious getting the information of the process running inside the container is not straightforward. Because containers tend to be fairly static, we should be able to manually create the information statically in the test and still have everything work correctly. In order to minimize the amount of changes on existing tests, the default Process constructor now takes fields directly and there is a from_proc class method that builds a new Process object from /proc. Additionally, getting the pid of a process in a container is virtually impossible, so we make the pid check optional.

erthalion

Thanks, few commentaries below. On top of that I'm testing it on OCP, and the device information seems to be off for container case:

# container file
$ stat /etc/test
  File: /etc/test
  Size: 4               Blocks: 8          IO Block: 4096   regular file
Device: 0,651   Inode: 94375102    Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2025-12-10 14:49:19.215500519 +0000
Modify: 2025-12-10 14:50:53.170938879 +0000
Change: 2025-12-10 14:50:53.170938879 +0000
 Birth: 2025-12-10 14:49:19.215500519 +0000

# container event
{"timestamp":1765381528692290158,"hostname":"xxxxxxx","process":{"comm":"bash","args":["bash"],"exe_path":"/usr/bin/bash","container_id":"4b5a83a0f180","uid":0,"username":"root","gid":0,"login_uid":4294967295,"pid":1404238,"in_root_mount_ns":false,"lineage":[{"uid":0,"exe_path":"/usr/bin/conmon"},{"uid":0,"exe_path":"/usr/lib/systemd/systemd"}]},"file":{"Open":{"filename":"/etc/test","host_file":"","inode":{"inode":94375102,"dev":2097291}}}}

# overlay event
{"timestamp":1765381528692350071,"hostname":"xxxxxxx","process":{"comm":"bash","args":["bash"],"exe_path":"/usr/bin/bash","container_id":"4b5a83a0f180","uid":0,"username":"root","gid":0,"login_uid":4294967295,"pid":1404238,"in_root_mount_ns":false,"lineage":[{"uid":0,"exe_path":"/usr/bin/conmon"},{"uid":0,"exe_path":"/usr/lib/systemd/systemd"}]},"file":{"Open":{"filename":"/etc/test","host_file":"","inode":{"inode":94375102,"dev":2052}}}}

The latter event seems to have a correct dev id of the host overlay, but the former is weird.

erthalion · 2025-12-10T15:06:13Z

fact-ebpf/src/bpf/inode.h

+
+  unsigned long magic = inode->i_sb->s_magic;
+  switch (magic) {
+    case BTRFS_SUPER_MAGIC:


From what I understand, btrfs support is mostly useful for development on Fedora, but OCP doesn't use it. In that case what do you think about turning it into a set of if/else conditions, wrapped into likely/unlikely macro as needed, to make sure that we account for the most anticipated result?

With the check right below this line:

if (bpf_core_type_exists(struct btrfs_inode)) {

My understanding is that the verifier will remove the block entirely when btrfs is not used in the system because it is dead code. In this case, we should never go in this branch of the switch-case, since no event will have the btrfs magic number, but there will also not be any additional code because it was removed by the verifier. Do we need to complicate it further than this?

If this was a kernel module, I would 100% agree with you, but I believe letting the verifier remove the code is the way to go here.

The verifier will drop the btrfs part, but the switch condition with it's overhead will stay.

But in this particular case it probably doesn't matter indeed. The likely/unlikely is mostly a signal to the compiler to adapt the code layout for the locality -- and since the current layout is rather simple, looks like clang doesn't change anything that much:

// original ; switch (magic) { 18c: cmpq %rsi, %rdi 18f: jne 0x1e2 191: movl $1, %edi ; if (bpf_core_type_exists(struct btrfs_inode)) { 196: testl %edi, %edi 198: je 0x1e2 // with the macros ; if (unlikely(magic == BTRFS_SUPER_MAGIC)) { 18c: cmpq %rsi, %rdi 18f: jne 0x1e8 191: xorl %edi, %edi 193: movl $1, %esi ; if (bpf_core_type_exists(struct btrfs_inode)) { 198: testl %esi, %esi 19a: je 0x21e

We can twist the condition a bit more to look something like this:

if (bpf_core_type_exists(struct btrfs_inode) && unlikely(magic == BTRFS_SUPER_MAGIC)) { struct btrfs_inode* btrfs_inode = container_of(inode, struct btrfs_inode, vfs_inode); key.inode = inode->i_ino; key.dev = BPF_CORE_READ(btrfs_inode, root, anon_dev); } else { key.inode = inode->i_ino; key.dev = inode->i_sb->s_dev; }

I'm not 100% sure how to get the same output you got, but this might make the compiler output something a bit better while still letting the verifier remove the first check when btrfs is configured or the entire block when it is not?

erthalion · 2025-12-10T15:06:38Z

fact-ebpf/src/bpf/inode.h

+}
+
+__always_inline static long inode_remove(struct inode_key_t* inode) {
+  return bpf_map_delete_elem(&inode_map, inode);


No need for null check?

This method is currently only used like this:

inode_remove(&inode_key);

Which means the key will never be null, but you are right, I will add the null check.

erthalion · 2025-12-10T15:07:42Z

fact/src/host_scanner.rs

+    }
+
+    fn update_entry(&self, path: &Path) -> anyhow::Result<()> {
+        if !path.exists() {


Looks like there are no debug outputs on this path, so no way to see what was actually added. Makes sense to log it for debugging?

I had them for testing and it can get quite verbose very fast, I will add them back in and if it becomes a problem we can knock them down to trace in the future.

fact-ebpf/src/bpf/inode.h

erthalion · 2025-12-10T15:13:30Z

fact-ebpf/src/bpf/inode.h

+  MONITORED,
+} inode_monitored_t;
+
+__always_inline static inode_monitored_t inode_is_monitored(const inode_value_t* inode) {


That looks overly verbose for what seems to be a null check under the hood. Any reason why is this needed?

The current state is very minimal, but the next step to properly do inode tracking will require us to keep track of directories, so when a file is accessed, we will need to know if the parent directory is monitored in case the file is being created and it should be tracked. This function is in preparation to extend the inode_monitored_t enum to have a PARENT_MONITORED value that will be returned when the current inode is not monitored but the parent directory is (the function also needs to take in the inode of the parent to check this condition).

If you feel this is too much for the current implementation, I'm happy to roll back to a simple null check and add everything back in the next stage of the implementation.

No, it's fine by me, just have some short doc string to this function explaining this pls.

erthalion · 2025-12-10T15:21:16Z

fact-ebpf/src/bpf/types.h

+  unsigned long long dev;
+} inode_key_t;
+
+typedef char inode_value_t;


Any particular reason for char instead of bool?

I believe the only reason I chose this is because the conversion on the rust side was easier this way, I'll double check and come back to this.

I remember now, using bool requires us to include vmlinux.h. If we include vmlinux.h in this file, then the generated bindings created by our build script gets massive, since everything in vmlinux.h gets a binding.

There's probably a better way to go about this, for now I've gone with just using char.

erthalion · 2025-12-10T15:27:58Z

fact-ebpf/src/bpf/types.h

 } process_t;

+typedef struct inode_key_t {
+  unsigned long long inode;


Should they be unsigned long long? From what I see both inode and dev are 32 bits wide.

I can knock them down to unsigned long, I think it will require some casting on the Rust side because the MetadataExt trait gives u64 values for these for whatever reason.

Actually, it just worked 🤷🏻‍♂️

erthalion · 2025-12-10T15:31:01Z

fact/src/bpf/mod.rs

 use tokio::{
    io::unix::AsyncFd,
-    sync::{broadcast, watch},
+    sync::{mpsc, watch},


Can you describe what is it and how does this change the synchronization primitives used for events processing?

Sure. mpsc stands for multi-producer single-consumer, so we are going from a broadcast channel (which is essentially a multi-producer multi-consumer) to this type of channel. The logic behind this change comes from how the data will flow through the components:

The bpf module will grab a reference to an event coming from the kernel in the ringbuffer and turn it into an Event value by copying trivial types like integers and allocating heap memory for things like strings. Once the Event is constructed, it gets moved into an mpsc channel via its Sender, because this is done via a move operation, it should be fast and because we have a single producer writing to the channel (the bpf module), there should be no contention to insert the value (unless the channel becomes full).

The Receiver end of this mpsc channel is held by the host_scanner module. This will receive the Event values and check if they need to have their host path added to them, (in future updates, this module may need to react in some other ways, like scanning a directory after a rename puts it under a monitored path). Once this is done, the Event gets wrapped in an Arc (atomic reference counter) and forwarded via a broadcast channel to the output module.

The output module is essentially two tasks at the moment, one printing events in JSON format to stdout and another sending the events via gRPC. The reason an Arc is used is because receiving from a broadcast channel will cause a clone operation to be performed on the value, since this is the only way multiple consumers can receive it, wrapping the value in an Arc prevents all the underlying heap allocated values from being copied on receive.

More information on these channel can be found in the tokio docs:

https://docs.rs/tokio/latest/tokio/sync/broadcast/index.html

https://docs.rs/tokio/latest/tokio/sync/mpsc/index.html

I also have a draft on how we could extend this approach a bit further to do pipelining in a similar fashion: #169

Hopefully this answers the question, I'm happy to provide more clarifications if needed.

erthalion · 2025-12-10T15:34:49Z

fact/src/host_scanner.rs

@@ -0,0 +1,143 @@
+use std::{


I think it's a good idea to start any new module with a short description what it does and few words about it's internals and architecture.

Yeah, I've been dropping the ball hard on documentation. I will add them to the new module here and get to working on updating docs across the project when I get a minute.

erthalion · 2025-12-10T15:36:08Z

fact/src/host_scanner.rs

+            dev: metadata.st_dev(),
+        };
+
+        self.kernel_inode_map.borrow_mut().insert(inode, 0, 0)?;


In the future we might want to use bulk insert, does aya support it?

I looked into this for the implementation, it'd be nice for start up to be able to populate the regular map and the bulk update the kernel one. Unfortunately, aya doesn't seem to support this, but we can probably do something with aya-obj and calling the BPF syscall directly.

Molter73 · 2025-12-15T10:19:41Z

Thanks, few commentaries below. On top of that I'm testing it on OCP, and the device information seems to be off for container case:

# container file
$ stat /etc/test
  File: /etc/test
  Size: 4               Blocks: 8          IO Block: 4096   regular file
Device: 0,651   Inode: 94375102    Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2025-12-10 14:49:19.215500519 +0000
Modify: 2025-12-10 14:50:53.170938879 +0000
Change: 2025-12-10 14:50:53.170938879 +0000
 Birth: 2025-12-10 14:49:19.215500519 +0000

# container event
{"timestamp":1765381528692290158,"hostname":"xxxxxxx","process":{"comm":"bash","args":["bash"],"exe_path":"/usr/bin/bash","container_id":"4b5a83a0f180","uid":0,"username":"root","gid":0,"login_uid":4294967295,"pid":1404238,"in_root_mount_ns":false,"lineage":[{"uid":0,"exe_path":"/usr/bin/conmon"},{"uid":0,"exe_path":"/usr/lib/systemd/systemd"}]},"file":{"Open":{"filename":"/etc/test","host_file":"","inode":{"inode":94375102,"dev":2097291}}}}

# overlay event
{"timestamp":1765381528692350071,"hostname":"xxxxxxx","process":{"comm":"bash","args":["bash"],"exe_path":"/usr/bin/bash","container_id":"4b5a83a0f180","uid":0,"username":"root","gid":0,"login_uid":4294967295,"pid":1404238,"in_root_mount_ns":false,"lineage":[{"uid":0,"exe_path":"/usr/bin/conmon"},{"uid":0,"exe_path":"/usr/lib/systemd/systemd"}]},"file":{"Open":{"filename":"/etc/test","host_file":"","inode":{"inode":94375102,"dev":2052}}}}

The latter event seems to have a correct dev id of the host overlay, but the former is weird.

Overlayfs has some pretty fancy logic when handling device numbers because it needs handle the upper/lower layers and merged paths for all files. Since there is no logic for handling overlayfs specifically implemented yet, the device number will be wrong there. I haven't bothered implementing it yet because we are not doing inode tracking in container files that I know of.

* Added a missing null check. * Added missing doc strings. * Downgrade inode and dev numbers to 32 bits. * Added some logging statements.

Molter73 marked this pull request as draft December 3, 2025 15:26

Molter73 force-pushed the mauro/ROX-30437/host-scanner-base branch 2 times, most recently from 221a623 to 7774aea Compare December 4, 2025 09:23

Molter73 added 3 commits December 4, 2025 10:38

ROX-30437: fix k8s manifest for inode tracking

ce0b5fc

Molter73 force-pushed the mauro/ROX-30437/host-scanner-base branch from 7774aea to ce0b5fc Compare December 4, 2025 09:39

Molter73 marked this pull request as ready for review December 4, 2025 09:39

Stringy approved these changes Dec 8, 2025

View reviewed changes

erthalion reviewed Dec 10, 2025

View reviewed changes

chore: general cleanups

23c63ba

* Added a missing null check. * Added missing doc strings. * Downgrade inode and dev numbers to 32 bits. * Added some logging statements.

Molter73 requested a review from erthalion December 15, 2025 11:45

Small comment on char vs bool

e01148b

erthalion approved these changes Dec 15, 2025

View reviewed changes

Molter73 merged commit 087a210 into main Dec 15, 2025
23 checks passed

Molter73 deleted the mauro/ROX-30437/host-scanner-base branch December 15, 2025 14:42

This was referenced Dec 15, 2025

ROX-30437: resolve host path via inode tracking #158

Closed

ROX-30437: refine host path algorithm #149

Closed

ROX-30437: basic inode tracking for host path resolution #166

ROX-30437: basic inode tracking for host path resolution #166

Uh oh!

Conversation

Molter73 commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Testing Performed

Uh oh!

erthalion left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Molter73 commented Dec 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Molter73 commented Dec 3, 2025 •

edited

Loading

erthalion left a comment •

edited

Loading