-
Notifications
You must be signed in to change notification settings - Fork 623
Add SCCACHE_BASEDIRS support
#2521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2521 +/- ##
==========================================
+ Coverage 71.04% 71.90% +0.86%
==========================================
Files 64 64
Lines 35369 36403 +1034
==========================================
+ Hits 25128 26176 +1048
+ Misses 10241 10227 -14 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
bfba6ec to
9eb3241
Compare
e818064 to
d2e6edd
Compare
|
I got an idea, that basedirs should be added to the stats command as well |
ca0a0db to
6f47f36
Compare
b5a7d22 to
cf0b871
Compare
I think we can also provide the stat about base dir usage translation:
In this case we can see, do we need to provide more/better base dirs or not |
It's a tricky one. After taking a look, the number of successful substitutions is relatively easy to implement, although the counter should be threaded to the ServerStats. But how to count the number of skipped directories? What is it? If a |
No, just to check what number of cache requests has been converted using any of base dirs and what number was kept as an absolute one |
|
About paths: we can try to normalize paths via https://docs.rs/normpath/latest/normpath/ or something similar instead of doing one more implementation. |
It looks like https://doc.rust-lang.org/stable/std/path/fn.absolute.html is what I should use for both cases. It normalizes slashes, but keeps the cases as is. Although see my comment regarding the |
Co-authored-by: whisperity <whisperity@gmail.com>
af7c5a2 to
50a4ceb
Compare
a97218c to
bccc8f4
Compare
I am afraid it causes changes in too many places, including the signatures of |
61d0ed7 to
0887cea
Compare
0887cea to
e42a1f6
Compare
src/util.rs
Outdated
| for basedir_path in basedirs.iter() { | ||
| let basedir_str = basedir_path.to_string_lossy(); | ||
| let basedir = basedir_str | ||
| .trim_end_matches('/') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why we do this trims every time, not while loading it from config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On one hand, it's a good point.
On the other hand, I feel it's more reliable than to hope, that the directories are cleaned in the config generation.
Probably, we can move this logic there AND test that strip_basedirs works on the output from Config.
Co-authored-by: Alex Overchenko <aleksandr9809@gmail.com>
|
|
||
| * `SCCACHE_ALLOW_CORE_DUMPS` to enable core dumps by the server | ||
| * `SCCACHE_CONF` configuration file path | ||
| * `SCCACHE_BASEDIRS` base directory (or directories) to strip from paths for cache key computation. This is similar to ccache's `CCACHE_BASEDIR` and enables cache hits across different absolute paths when compiling the same source code. Multiple directories can be separated by `;` on Windows hosts and by `:` on any other. When multiple directories are specified, the longest matching prefix is used. Path matching is **case-insensitive** on Windows and **case-sensitive** on other operating systems. Environment variable takes precedence over file configuration. Only absolute paths are supported; relative paths will cause an error and prevent the server from start. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * `SCCACHE_BASEDIRS` base directory (or directories) to strip from paths for cache key computation. This is similar to ccache's `CCACHE_BASEDIR` and enables cache hits across different absolute paths when compiling the same source code. Multiple directories can be separated by `;` on Windows hosts and by `:` on any other. When multiple directories are specified, the longest matching prefix is used. Path matching is **case-insensitive** on Windows and **case-sensitive** on other operating systems. Environment variable takes precedence over file configuration. Only absolute paths are supported; relative paths will cause an error and prevent the server from start. | |
| * `SCCACHE_BASEDIRS` base directory (or directories) to strip from paths for cache key computation. This is similar to ccache's `CCACHE_BASEDIR` and enables cache hits across different absolute paths when compiling the same source code. Multiple directories can be separated by `;` on Windows hosts and by `:` on any other operating system. When multiple directories are specified, the longest matching prefix is used. Path matching is **case-insensitive** on Windows and **case-sensitive** on other operating systems. Environment variable takes precedence over file configuration. Only absolute paths are supported; relative paths will cause an error and prevent the server from start. |
| export SCCACHE_BASEDIRS=/home/user/project | ||
| ``` | ||
|
|
||
| You can also specify multiple base directories by separating them by `;` on Windows hosts and by `:` on any other. When multiple directories are provided, the longest matching prefix is used: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| You can also specify multiple base directories by separating them by `;` on Windows hosts and by `:` on any other. When multiple directories are provided, the longest matching prefix is used: | |
| You can also specify multiple base directories by separating them by `;` on Windows hosts and by `:` on any other operating system. When multiple directories are provided, the longest matching prefix is used: |
| # When multiple paths are provided, the longest matching prefix | ||
| # is applied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # When multiple paths are provided, the longest matching prefix | |
| # is applied. | |
| # When multiple matching paths are provided, the longest prefix | |
| # is used. |
| .unwrap(); | ||
|
|
||
| let tempdir = tempfile::Builder::new() | ||
| .prefix("sccache_test_readonly_basedirs") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .prefix("sccache_test_readonly_basedirs") | |
| .prefix("readonly_storage_forwards_basedirs") |
?
| let basedirs = [ | ||
| PathBuf::from("/home/user1/project"), | ||
| PathBuf::from("/home/user2/project"), | ||
| ]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can a case be added where the basedirs given to the two hash_key calls differ? I want to make sure that basedirs itself is not included in the hash.
| /// in the format found in preprocessor output (e.g., `# 1 "/path/to/file"`). | ||
| pub fn strip_basedirs(preprocessor_output: &[u8], basedirs: &[PathBuf]) -> Vec<u8> { | ||
| if basedirs.is_empty() || preprocessor_output.is_empty() { | ||
| return preprocessor_output.to_vec(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why re-allocate instead of returning a Cow if the contents are unchanged?
src/util.rs
Outdated
| for basedir_path in basedirs.iter() { | ||
| let basedir_str = basedir_path.to_string_lossy(); | ||
| let basedir = basedir_str | ||
| .trim_end_matches(|c| c == '/' || c == '\\') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
\ should not be trimmed on non-Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, stripping \|/ on windows, / on other OSes, and then adding the trailing /
| // Copy everything before the match | ||
| result.extend_from_slice(&preprocessor_output[current_pos..match_pos]); | ||
| // Replace the basedir with "." | ||
| result.push(b'.'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps use __SCCACHE_BASEDIR__ so that it doesn't collide with a reference to a relative path that also lives under a basedir?
Imagine:
#line "./relative_path.h"
#line "/a/basedir/match/relative_path.h"
I worry that this can cause false matches (though the contents are likely to be very different, it might be __FILE__ that is more likely to collide in practice).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# 0 "test.c" ||| # 0 "/tmp/test.c"
# 0 "<built-in>" ||| # 0 "<built-in>"
# 0 "<command-line>" ||| # 0 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4 ||| # 1 "/usr/include/stdc-predef.h" 1 3 4
# 0 "<command-line>" 2 ||| # 0 "<command-line>" 2
# 1 "test.c" ||| # 1 "/tmp/test.c"
# 1 "/usr/include/stdio.h" 1 3 4 ||| # 1 "/usr/include/stdio.h" 1 3 4
# 28 "/usr/include/stdio.h" 3 4 ||| # 28 "/usr/include/stdio.h" 3 4
I like the idea. But instead, I'd like to delete the match along with the following slash. And for that, we should actually normalize the basedirs by adding a trailing slash. And it's sound OK to have a current dir ./ there by default.
src/util.rs
Outdated
| fn normalize_path(path: &[u8]) -> Vec<u8> { | ||
| path.iter() | ||
| .map(|&b| match b { | ||
| b'A'..=b'Z' => b + 'a' - 'A', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| b'A'..=b'Z' => b + 'a' - 'A', | |
| c @ b'A'..=b'Z' => c.to_ascii_lowercase(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe without any condition? Just do it for every char, not for every byte.
We can have non-english locale too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is how to make it universal and safe.
For example, if we are going to Turkish, where ı and i are different letters, that have İ and I as capital, it's already bad enough.
Cyrillic would work, I think, but it should be UTF, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At that point, you need to ask the platform for its APIs to do it. The codepage in use might not be Unicode at all.
fae67f8 to
50a0ea6
Compare
50a0ea6 to
7a966d4
Compare
| bytes | ||
| } | ||
| }; | ||
| basedirs.push(normalized); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests for trailing slashes must be implemented here for checking how it works with strip_basedirs
This is an attempt to fix #35.
The ENV variable
SCCACHE_BASEDIRSand configuration parameterbasedirsare added.As well as new tests to validate the behavior.