Skip to content

Conversation

@Felixoid
Copy link
Contributor

@Felixoid Felixoid commented Dec 18, 2025

This is an attempt to fix #35.

The ENV variable SCCACHE_BASEDIRS and configuration parameter basedirs are added.

As well as new tests to validate the behavior.

@codecov-commenter
Copy link

codecov-commenter commented Dec 18, 2025

Codecov Report

❌ Patch coverage is 96.84729% with 32 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.90%. Comparing base (cd7dcd5) to head (7a966d4).
⚠️ Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
src/cache/cache.rs 76.00% 18 Missing ⚠️
src/util.rs 97.54% 4 Missing ⚠️
src/config.rs 99.08% 3 Missing ⚠️
src/server.rs 78.57% 3 Missing ⚠️
src/compiler/preprocessor_cache.rs 97.64% 2 Missing ⚠️
tests/oauth.rs 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2521      +/-   ##
==========================================
+ Coverage   71.04%   71.90%   +0.86%     
==========================================
  Files          64       64              
  Lines       35369    36403    +1034     
==========================================
+ Hits        25128    26176    +1048     
+ Misses      10241    10227      -14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Felixoid Felixoid force-pushed the add-basedir-configuration branch from bfba6ec to 9eb3241 Compare December 18, 2025 23:45
@Felixoid Felixoid force-pushed the add-basedir-configuration branch from e818064 to d2e6edd Compare December 22, 2025 10:47
@Felixoid
Copy link
Contributor Author

I got an idea, that basedirs should be added to the stats command as well

@Felixoid Felixoid force-pushed the add-basedir-configuration branch from ca0a0db to 6f47f36 Compare December 23, 2025 13:57
@Felixoid Felixoid changed the title Add SCCACHE_BASEDIR support Add SCCACHE_BASEDIRS support Dec 23, 2025
@Felixoid Felixoid force-pushed the add-basedir-configuration branch from b5a7d22 to cf0b871 Compare December 23, 2025 23:03
@AJIOB
Copy link
Contributor

AJIOB commented Dec 24, 2025

I got an idea, that basedirs should be added to the stats command as well

I think we can also provide the stat about base dir usage translation:

  • Number of base dir applied requests
  • Number of base dir skipped requests

In this case we can see, do we need to provide more/better base dirs or not

@Felixoid
Copy link
Contributor Author

  • Number of base dir applied requests
  • Number of base dir skipped requests

It's a tricky one. After taking a look, the number of successful substitutions is relatively easy to implement, although the counter should be threaded to the ServerStats.

But how to count the number of skipped directories? What is it? If a base_directory didn't match any of the output at all?

@AJIOB
Copy link
Contributor

AJIOB commented Dec 24, 2025

  • Number of base dir applied requests
  • Number of base dir skipped requests

It's a tricky one. After taking a look, the number of successful substitutions is relatively easy to implement, although the counter should be threaded to the ServerStats.

But how to count the number of skipped directories? What is it? If a base_directory didn't match any of the output at all?

No, just to check what number of cache requests has been converted using any of base dirs and what number was kept as an absolute one

@AJIOB
Copy link
Contributor

AJIOB commented Dec 24, 2025

About paths: we can try to normalize paths via https://docs.rs/normpath/latest/normpath/ or something similar instead of doing one more implementation.

@Felixoid
Copy link
Contributor Author

About paths: we can try to normalize paths via https://docs.rs/normpath/latest/normpath/ or something similar instead of doing one more implementation.

It looks like https://doc.rust-lang.org/stable/std/path/fn.absolute.html is what I should use for both cases. It normalizes slashes, but keeps the cases as is.

Although see my comment regarding the preprocessor_output. normalize_path is used not only to needles from basedirs, but to a haystack as well. Both https://doc.rust-lang.org/stable/std/path/fn.absolute.html and https://docs.rs/normpath/latest/normpath/trait.PathExt.html#tymethod.normalize_virtually work on Path-like objects. They don't fit the purpose of the plaintext normalization, as far as I can tell.

Felixoid and others added 2 commits December 29, 2025 18:11
@Felixoid Felixoid force-pushed the add-basedir-configuration branch from af7c5a2 to 50a4ceb Compare December 29, 2025 18:26
@Felixoid Felixoid force-pushed the add-basedir-configuration branch from a97218c to bccc8f4 Compare December 29, 2025 23:13
@Felixoid
Copy link
Contributor Author

No, just to check what number of cache requests has been converted using any of base dirs and what number was kept as an absolute one

I am afraid it causes changes in too many places, including the signatures of hash_key and other significant components. I want to cut this task from the PR, if you don't mind; it's already quite huge.

@Felixoid Felixoid force-pushed the add-basedir-configuration branch from 61d0ed7 to 0887cea Compare December 30, 2025 12:09
@Felixoid Felixoid force-pushed the add-basedir-configuration branch from 0887cea to e42a1f6 Compare December 30, 2025 12:12
src/util.rs Outdated
for basedir_path in basedirs.iter() {
let basedir_str = basedir_path.to_string_lossy();
let basedir = basedir_str
.trim_end_matches('/')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why we do this trims every time, not while loading it from config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On one hand, it's a good point.

On the other hand, I feel it's more reliable than to hope, that the directories are cleaned in the config generation.

Probably, we can move this logic there AND test that strip_basedirs works on the output from Config.

Felixoid and others added 2 commits December 30, 2025 14:39
Co-authored-by: Alex Overchenko <aleksandr9809@gmail.com>

* `SCCACHE_ALLOW_CORE_DUMPS` to enable core dumps by the server
* `SCCACHE_CONF` configuration file path
* `SCCACHE_BASEDIRS` base directory (or directories) to strip from paths for cache key computation. This is similar to ccache's `CCACHE_BASEDIR` and enables cache hits across different absolute paths when compiling the same source code. Multiple directories can be separated by `;` on Windows hosts and by `:` on any other. When multiple directories are specified, the longest matching prefix is used. Path matching is **case-insensitive** on Windows and **case-sensitive** on other operating systems. Environment variable takes precedence over file configuration. Only absolute paths are supported; relative paths will cause an error and prevent the server from start.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `SCCACHE_BASEDIRS` base directory (or directories) to strip from paths for cache key computation. This is similar to ccache's `CCACHE_BASEDIR` and enables cache hits across different absolute paths when compiling the same source code. Multiple directories can be separated by `;` on Windows hosts and by `:` on any other. When multiple directories are specified, the longest matching prefix is used. Path matching is **case-insensitive** on Windows and **case-sensitive** on other operating systems. Environment variable takes precedence over file configuration. Only absolute paths are supported; relative paths will cause an error and prevent the server from start.
* `SCCACHE_BASEDIRS` base directory (or directories) to strip from paths for cache key computation. This is similar to ccache's `CCACHE_BASEDIR` and enables cache hits across different absolute paths when compiling the same source code. Multiple directories can be separated by `;` on Windows hosts and by `:` on any other operating system. When multiple directories are specified, the longest matching prefix is used. Path matching is **case-insensitive** on Windows and **case-sensitive** on other operating systems. Environment variable takes precedence over file configuration. Only absolute paths are supported; relative paths will cause an error and prevent the server from start.

export SCCACHE_BASEDIRS=/home/user/project
```

You can also specify multiple base directories by separating them by `;` on Windows hosts and by `:` on any other. When multiple directories are provided, the longest matching prefix is used:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can also specify multiple base directories by separating them by `;` on Windows hosts and by `:` on any other. When multiple directories are provided, the longest matching prefix is used:
You can also specify multiple base directories by separating them by `;` on Windows hosts and by `:` on any other operating system. When multiple directories are provided, the longest matching prefix is used:

Comment on lines +18 to +19
# When multiple paths are provided, the longest matching prefix
# is applied.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot

Suggested change
# When multiple paths are provided, the longest matching prefix
# is applied.
# When multiple matching paths are provided, the longest prefix
# is used.

.unwrap();

let tempdir = tempfile::Builder::new()
.prefix("sccache_test_readonly_basedirs")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.prefix("sccache_test_readonly_basedirs")
.prefix("readonly_storage_forwards_basedirs")

?

Comment on lines 1814 to 1817
let basedirs = [
PathBuf::from("/home/user1/project"),
PathBuf::from("/home/user2/project"),
];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a case be added where the basedirs given to the two hash_key calls differ? I want to make sure that basedirs itself is not included in the hash.

/// in the format found in preprocessor output (e.g., `# 1 "/path/to/file"`).
pub fn strip_basedirs(preprocessor_output: &[u8], basedirs: &[PathBuf]) -> Vec<u8> {
if basedirs.is_empty() || preprocessor_output.is_empty() {
return preprocessor_output.to_vec();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why re-allocate instead of returning a Cow if the contents are unchanged?

src/util.rs Outdated
for basedir_path in basedirs.iter() {
let basedir_str = basedir_path.to_string_lossy();
let basedir = basedir_str
.trim_end_matches(|c| c == '/' || c == '\\')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\ should not be trimmed on non-Windows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2521 (comment)

so, stripping \|/ on windows, / on other OSes, and then adding the trailing /

// Copy everything before the match
result.extend_from_slice(&preprocessor_output[current_pos..match_pos]);
// Replace the basedir with "."
result.push(b'.');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps use __SCCACHE_BASEDIR__ so that it doesn't collide with a reference to a relative path that also lives under a basedir?

Imagine:

#line "./relative_path.h"
#line "/a/basedir/match/relative_path.h"

I worry that this can cause false matches (though the contents are likely to be very different, it might be __FILE__ that is more likely to collide in practice).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# 0 "test.c"                           ||| # 0 "/tmp/test.c"
# 0 "<built-in>"                       ||| # 0 "<built-in>"
# 0 "<command-line>"                   ||| # 0 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4 ||| # 1 "/usr/include/stdc-predef.h" 1 3 4
# 0 "<command-line>" 2                 ||| # 0 "<command-line>" 2
# 1 "test.c"                           ||| # 1 "/tmp/test.c"
# 1 "/usr/include/stdio.h" 1 3 4       ||| # 1 "/usr/include/stdio.h" 1 3 4
# 28 "/usr/include/stdio.h" 3 4        ||| # 28 "/usr/include/stdio.h" 3 4

I like the idea. But instead, I'd like to delete the match along with the following slash. And for that, we should actually normalize the basedirs by adding a trailing slash. And it's sound OK to have a current dir ./ there by default.

src/util.rs Outdated
fn normalize_path(path: &[u8]) -> Vec<u8> {
path.iter()
.map(|&b| match b {
b'A'..=b'Z' => b + 'a' - 'A',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
b'A'..=b'Z' => b + 'a' - 'A',
c @ b'A'..=b'Z' => c.to_ascii_lowercase(),

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe without any condition? Just do it for every char, not for every byte.

We can have non-english locale too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is how to make it universal and safe.

For example, if we are going to Turkish, where ı and i are different letters, that have İ and I as capital, it's already bad enough.

Cyrillic would work, I think, but it should be UTF, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At that point, you need to ask the platform for its APIs to do it. The codepage in use might not be Unicode at all.

@Felixoid Felixoid force-pushed the add-basedir-configuration branch 3 times, most recently from fae67f8 to 50a0ea6 Compare December 30, 2025 22:20
@Felixoid Felixoid force-pushed the add-basedir-configuration branch from 50a0ea6 to 7a966d4 Compare December 30, 2025 22:28
bytes
}
};
basedirs.push(normalized);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests for trailing slashes must be implemented here for checking how it works with strip_basedirs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement an equivalent to CCACHE_BASEDIR

5 participants