[pull] master from git:master#44
Conversation
* ps/reftable-sans-compat-util: Makefile: skip reftable library for Coccinelle reftable: decouple from Git codebase by pulling in "compat/posix.h" git-compat-util.h: split out POSIX-emulating bits compat/mingw: split out POSIX-related bits reftable/basics: introduce `REFTABLE_UNUSED` annotation reftable/basics: stop using `SWAP()` macro reftable/stack: stop using `sleep_millisec()` reftable/system: introduce `reftable_rand()` reftable/reader: stop using `ARRAY_SIZE()` macro reftable/basics: provide wrappers for big endian conversion reftable/basics: stop using `st_mult()` in array allocators reftable: stop using `BUG()` in trivial cases reftable/record: don't `BUG()` in `reftable_record_cmp()` reftable/record: stop using `BUG()` in `reftable_record_init()` reftable/record: stop using `COPY_ARRAY()` reftable/blocksource: stop using `xmmap()` reftable/stack: stop using `write_in_full()` reftable/stack: stop using `read_in_full()`
The license headers used across the reftable library doesn't follow our typical coding style for multi-line comments. Fix it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `struct reftable_reader` subsystem encapsulates a table that has been read from the disk. As such, the current name of that structure is somewhat hard to understand as it only talks about the fact that we read something from disk, without really giving an indicator _what_ that is. Furthermore, this naming schema doesn't really fit well into how the other structures are named: `reftable_merged_table`, `reftable_stack`, `reftable_block` and `reftable_record` are all named after what they encapsulate. Rename the subsystem to `reftable_table`, which directly gives a hint that the data structure is about handling the individual tables part of the stack. While this change results in a lot of churn, it prepares for us exposing the APIs to third-party callers now that the reftable library is a standalone library that can be linked against by other projects. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code that implements block sources is distributed across a couple of files. Consolidate all of it into "reftable/blocksource.c" and its accompanying header so that it is easier to locate and more self contained. While at it, rename some of the functions to have properly scoped names. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Restart points record the location of reftable records that do not use
prefix compression and are used to perform a binary search inside of a
block. These restart points are encoded at the end of a block, between
the record data and the footer of a table.
The block structure contains three different variables related to these
restart points:
- The block length contains the length of the reftable block up to the
restart points.
- The restart count contains the number of restart points contained in
the block.
- The restart bytes variable tracks where the restart point data
begins.
Tracking all three of these variables is unnecessary though as the data
can be derived from one another: the block length without restart points
is the exact same as the offset of the restart count data, which we
already track via the `restart_bytes` data.
Refactor the code so that we track the location of restart bytes not as
a pointer, but instead as an offset. This allows us to trivially get rid
of the `block_len` variable as described above. This avoids having the
confusing `block_len` variable and allows us to do less bookkeeping
overall.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The logic to read blocks from a reftable is scattered across both the table and the block subsystems. Besides causing somewhat fuzzy responsibilities, it also means that we have to awkwardly pass around the ownership of blocks between the subsystems. Refactor the code so that we stop passing the block when initializing a reader, but instead by passing in the block source plus the offset at which we're supposed to read a block. Like this, the ownership of the block itself doesn't need to get handed over as the block reader is the one owning the block right from the start. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `reftable_block` structure associates a byte slice with a block source. As such it only holds the data of a reftable block without actually encoding any of the details for how to access that data. Rename the structure to instead be called `reftable_block_data`. Besides clarifying that this really only holds data, it also allows us to rename the `reftable_block_reader` to `reftable_block` in the next commit, as this is the structure that actually encapsulates access to the reftable blocks. Rename the `struct reftable_block_reader::block` member accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `block_reader` structure is used to access parsed data of a reftable block. The structure is currently treated as an internal implementation detail and not exposed via our public interfaces. The functionality provided by the structure is useful to external users of the reftable library though, for example when implementing consistency checks that need to scan through the blocks manually. Rename the structure to `reftable_block` now that the name has been made available in the preceding commit. This name is in line with the naming schema used for other data structures like `reftable_table` in that it describes the underlying entity that it provides access to. The new data structure isn't yet exposed via the public interface, which is left for a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Throughout the Git codebase we're using the typedeffed version of `z_stream`, which maps to `struct z_stream_s`. By using a typedef instead of the struct it becomes somewhat harder to predeclare the symbol so that headers depending on the struct can do so without having to pull in "zlib-compat.h". We don't yet have users that would really care about this: the only users that declare `z_stream` as a pointer are in "reftable/block.h", which is a header that is internal to the reftable library. But in the next step we're going to expose the `struct reftable_block` publicly, and that struct does contain a pointer to `z_stream`. And as the public header shouldn't depend on "reftable/system.h", which is an internal implementation detail, we won't have the typedef for `z_stream` readily available. Prepare for this change by using `struct z_stream_s` throughout our code base. In case zlib-ng is used we use a define to map from `z_stream_s` to `zng_stream_s`. Drop the pre-declaration of `struct z_stream` while at it. This struct does not exist in the first place, and the declaration wasn't needed because "reftable/block.h" already includes "reftable/basics.h" which transitively includes "reftable/system.h" and thus "git-zlib.h". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
While users of the reftable library wouldn't generally require access to individual blocks in a reftable table, there are valid usecases where one may require low-level access to them. One such upcoming usecase in the Git codebase is to implement consistency checks for the reftable library where we want to verify each block individually. Create a public interface for reading blocks. The interface isn't yet complete and lacks e.g. a way to read individual records from a block. Such missing functionality will be backfilled in subsequent commits. Note that this change also requires us to expose `reftable_buf`, which is used by the `reftable_block_first_key()` function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The block iterator requires access to a bunch of data from the
underlying `reftable_block` that it is iterating over. This data is
stored by copying over relevant data into a separate set of variables.
This has multiple downsides:
- We require more storage space than necessary. This is more of a
theoretical issue as we shouldn't ever have many blocks.
- We have to perform more bookkeeping, and the variable names are
inconsistent across the two data structures. This can lead to some
confusion.
- The lifetime of the block iterator is tied to the block anyway, but
we hide that a bit by only storing pointers pointing into the block.
There isn't really any good reason why we rip out parts of the block
instead of storing a pointer to the block itself.
Refactor the code to do so. Despite being simpler, it also allows us to
decouple the lifetime of the block iterator from seeking in a subsequent
commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor the block iterators so that initialization and seeking are different from one another. This makes the iterator trivially reseekable by storing the pointer to the block at initialization time, which we can then reuse on every seek. This refactoring prepares the code for exposing a `reftable_iterator` interface for blocks in a subsequent commit. Callsites are adjusted accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Expose a generic iterator over reftable records and expose it via the public interface. Together with an upcoming iterator for reftable blocks contained in a table this will allow users to trivially iterate through blocks and their respective records individually. This functionality will be used to implement consistency checks for the reftable backend, which requires more fine-grained control over how we read data. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `reftable_table` interface is an internal implementation detail that callers have no access to. Having direct access to this structure is important though for a subsequent patch series that will implement consistency checks for the reftable backend. Move the structure into "reftable-table.h" so that it part of the public interface. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new iterator that allows the caller to iterate through all blocks contained in a table. This gives users more fine-grained control over how exactly those blocks are being read and exposes information to callers that was previously inaccessible. This iterator will be required by a future patch series that adds consistency checks for the reftable backend. In addition to that though we will also reimplement `reftable_table_print_blocks()` on top of this new iterator in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that reftable blocks can be read individually via the public interface it becomes necessary for callers to be able to distinguish the different types of blocks. Expose the relevant constants. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The logic to print individual blocks in a table is hosted in the reftable library. This is only the case due to historical reasons though because users of the library had no interfaces to read blocks one by one. Otherwise, printing individual blocks has no place in the reftable library given that the format will not be generic in the first place. We have now grown a public interface to iterate through blocks contained in a table, and thus we can finally move the logic to print them into the test helper. Move over the logic and refactor it accordingly. Note that the iterator also trivially allows us to access index sections, which we previously didn't print at all. This omission wasn't intentional though, so start dumping those sections as well so that we can assert that indices are written as expected. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/test-wo-perl-prereq: t5703: refactor test to not depend on Perl t5316: refactor `max_chain()` to not depend on Perl t0210: refactor trace2 scrubbing to not use Perl t0021: refactor `generate_random_characters()` to not depend on Perl t/lib-httpd: refactor "one-time-perl" CGI script to not depend on Perl t/lib-t6000: refactor `name_from_description()` to not depend on Perl t/lib-gpg: refactor `sanitize_pgp()` to not depend on Perl t: refactor tests depending on Perl for textconv scripts t: refactor tests depending on Perl to print data t: refactor tests depending on Perl substitution operator t: refactor tests depending on Perl transliteration operator Makefile: stop requiring Perl when running tests meson: stop requiring Perl when tests are enabled t: adapt existing PERL prerequisites t: introduce PERL_TEST_HELPERS prerequisite t: adapt `test_readlink()` to not use Perl t: adapt `test_copy_bytes()` to not use Perl t: adapt character translation helpers to not use Perl t: refactor environment sanitization to not use Perl t: skip chain lint when PERL_PATH is unset
While git-filter-branch(1) is written as a shell script, the
`--state-branch` feature depends on Perl to save and extract the object
ID mappings. This can lead to subtle breakage though:
- We execute `perl` directly without respecting the `PERL_PATH`
configured by the distribution. As such, it may happen that we use
the wrong version of Perl.
- We install the script unchanged even if Perl isn't available at all
on the system, so using `--state-branch` would lead to failure
altogether in that case.
Fix this by dropping Perl and instead implementing the feature with
shell scripting exclusively.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While git-request-pull(1) is written as a shell script, for it to function we depend on Perl being available. The script gets installed unconditionally though, regardless of whether or not Perl is even available on the system. When it's not available, the `@PERL_PATH@` variable may be substituted with a nonexistent executable path and thus cause the script to fail. Refactor the script so that it does not depend on Perl at all anymore. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "fix-texi.perl" script is used to fix up the output of `docbook2x-texi`: - It changes the filename to be "git.info". - It changes the directory category and entry. The script is written in Perl, but it can be rather trivially converted to a shell script. Do so to remove the dependency on Perl for building the user manual. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "cmd-list.perl" script is used to extract the list of commands part
of a specific category and extracts the description of each command from
its respective manpage. The generated output is then included in git(1)
to list all Git commands.
The script is written in Perl. Refactor it to use shell scripting
exclusively so that we can get rid of the mandatory dependency on Perl
to build our documentation.
The converted script is slower compared to its Perl implementation. But
by being careful and not spawning external commands in `format_one ()`
we can mitigate the performance hit to a reasonable level:
Benchmark 1: Perl
Time (mean ± σ): 10.3 ms ± 0.2 ms [User: 7.0 ms, System: 3.3 ms]
Range (min … max): 10.0 ms … 11.1 ms 200 runs
Benchmark 2: Shell
Time (mean ± σ): 74.4 ms ± 0.4 ms [User: 48.6 ms, System: 24.7 ms]
Range (min … max): 73.1 ms … 75.5 ms 200 runs
Summary
Perl ran
7.23 ± 0.13 times faster than Shell
While a sevenfold slowdown is significant, the benefit of not requiring
Perl for a fully-functioning Git installation outweighs waiting a couple
of milliseconds longer during the build process.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Above the declaration of git_work_tree_cfg, we have: /* This is set by setup_git_dir_gently() and/or git_default_config() */ char *git_work_tree_cfg; It can be verified that there is no function called 'setup_git_dir_gently' by running grep on the codebase: $ grep -R setup_git_dir_gently . ./environment.c:/* This is set by setup_git_dir_gently() and/or git_default_config() */ The comment, introduced in e90fdc3 (Clean up work-tree handling), is the only occurrence of the name 'setup_git_dir_gently'. It probably meant 'setup_git_directory_gently' as that is a name of a real function in setup.c. Correct it. Signed-off-by: Abhijeet Sonar <abhijeet.nkt@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
A common way to run Git's performance benchmarks on repositories other than Git's own repository (which is not exactly large when compared to actually large repositories) is to run them like this: GIT_PERF_LARGE_REPO=/path/to/my/large/repo \ ./p1234-*.sh -ivx Contrary to developers' common expectations, this failed to work when Git was built with a different `GIT_PERF_LARGE_REPO` value specified at build time: That build-time option would have been written to the `GIT-BUILD-OPTIONS` file, which in turn would have been sourced by `test-lib.sh`, which in turn would have been sourced by `perf-lib.sh`, which in turn would have been sourced by the perf test script, _overriding_ the environment variable specified in the way illustrated above. Since perf tests are not run as part of the build, this most likely unintended behavior was not caught and certainly not fixed, as the `GIT_PERF_*` values would have been empty at build-time. However, in 4638e88 (Makefile: use common template for GIT-BUILD-OPTIONS, 2024-12-06), a subtle change of behavior was introduced: Whereas before, a couple of build-time options (the `GIT_PERF_*` ones included) were written to `GIT-BUILD-OPTIONS` only when their values were non-empty. With this commit, they are also written when they are empty. The consequence is that above-mentioned way to run the perf tests will not only fail to pick up the desired `GIT_PERF_*` settings when they were specified differently while building Git, instead the desired settings will be only respected when specified _while building_ Git. Let's work around the original issue, i.e. let `GIT_PERF_*` environment variables override what is recorded in `GIT-BUILD-OPTIONS`. Note that this is just the tip of the iceberg, there are a couple of `GIT_TEST_*` options that may want a similar fix in `test-lib.sh`. Due to time constraints on my side, this here patch focuses exclusively on the `GIT_PERF_*` settings. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since we already teach the `repo_config()` in "f29f1990b5 (config: teach repo_config to allow `repo` to be NULL, 2025-03-08)" to allow `repo` to be NULL, no need to check if `repo` is NULL before calling `repo_config()`. Suggested-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since we already teach the `repo_config()` in "f29f1990b5 (config: teach repo_config to allow `repo` to be NULL, 2025-03-08)" to allow `repo` to be NULL, no need to check if `repo` is NULL before calling `repo_config()`. Suggested-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our tests for git-p4(1) depend on the p4d(1) and p4(1) executables to exist. As we require specific versions of those binaries which typically aren't available on common distributions, we install them manually via "ci/install-dependencies.sh". This script will put the binaries into "$CUSTOM_PATH", which gets defined by "ci/lib.sh" -- if not explicitly overridden, its value will be set to "$HOME/path". This causes issues though when running our tests as unprivileged user, as we do both in GitLab CI and GitHub Actions, because "$HOME" will be different when installing dependencies and when running the tests. Consequently, the downloaded binaries will not be found unless "$CUSTOM_PATH" is overridden to a common location. We already do this for GitLab CI, where it points to "/custom". Let's do the same for GitHub Actions so that Perforce-based tests are executed again. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The shell completion scripts in "contrib/completion" are being tested, but none of our build systems support installing them. This is somewhat confusing for Meson, where users can explicitly enable building these scripts via `-Dcontrib=completion`. This option only controlls whether the completions are built and tested against, where "building" is a bit of an euphemism for "copying them into the build directory". Teach both our Makefile and Meson to install our Bash completion script. For now, this is the only completion script that we're installing given that Bash completions "just work" with a canonical well-known location nowadays. Other completion scripts, like for example the one for zsh, don't have a well-known location and/or require extra steps by the user to make them available. As such, we skip installing these scripts for now, but we may do so in the future if we ever figure out a proper way to do this. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This perf script creates a midx by running "git multi-pack-index write" with the "--stdin-packs" option. We feed that stdin by running "find" on .git/objects/pack, using sed to strip off everything but the basename. But that sed invocation also does something peculiar: it adds a "+" to the start of each pack name. This causes the multi-pack-index command to barf. The modified name does not match any pack it knows about, so it ends up with an empty list of packs to put in the midx. And thus nothing matches the --preferred-pack option we pass, which causes it die(). The fix is to remove the extra "+" (which also lets us simplify the sed invocation a bit, as it is now just stripping the leading directories). But that leaves the mystery of why it was ever there in the first place. The answer is that an earlier iteration of the patch series had a concept of "disjoint" packs in the midx. And one of its patches here: https://lore.kernel.org/git/c52d7e7b27a9add4f58b8334db4fe4498af1c90f.1701198172.git.me@ttaylorr.com/ taught read_packs_from_stdin() to treat a leading "+" as marking a disjoint pack. But in the second version of the series, which was ultimately merged, that disjoint concept went away, and the code to parse "+" did likewise. The regular regression tests were adjusted to match, but this case in t/perf was forgotten. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the log_reencode field from struct rev-info, as it is not used. This field was introduced in 52883fb, but it hasn't been used since its introduction. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using the launchctl scheduler, the weekly job runs daily, and the daily job runs on the first six days of each month. This appears to be due to specifying "Day" in the calendar intervals, which according to launchd.plist(5) is for specifying days of the month rather than days of the week. The behaviour of running a job on the 0th day is undocumented, but in my testing appears to be the same as not specifying "Day" in the calendar interval, in which case the job will run daily. Use "Weekday" in the calendar intervals, which is the correct way to schedule jobs to run on specific days of the week. Signed-off-by: Josh Heinrichs <joshiheinrichs@gmail.com> Acked-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Developer support fix.. * js/git-perf-env-override: perf: do allow `GIT_PERF_*` to be overridden again
Since a call to repo_config() can be called with repo set to NULL these days, a command that is marked as RUN_SETUP in the builtin command table does not have to check repo with NULL before making the call. * ua/call-repo-config-with-possibly-null-repository: builtin/difftool: remove unnecessary if statement builtin/add: remove unnecessary if statement
Typofix. * as/typofix-in-env-h-header: environment: fix typo: 'setup_git_directory_gently'
Code clean-up. * az/tighten-string-array-constness: global: mark usage strings and string tables const
Fix for scheduled maintenance tasks on platforms using launchctl. * jh/gc-launchctl-schedule-fix: maintenance: fix launchctl calendar intervals
Overhaul of the reftable API. * ps/reftable-api-revamp: reftable/table: move printing logic into test helper reftable/constants: make block types part of the public interface reftable/table: introduce iterator for table blocks reftable/table: add `reftable_table` to the public interface reftable/block: expose a generic iterator over reftable records reftable/block: make block iterators reseekable reftable/block: store block pointer in the block iterator reftable/block: create public interface for reading blocks git-zlib: use `struct z_stream_s` instead of typedef reftable/block: rename `block_reader` to `reftable_block` reftable/block: rename `block` to `block_data` reftable/table: move reading block into block reader reftable/block: simplify how we track restart points reftable/blocksource: consolidate code into a single file reftable/reader: rename data structure to "table" reftable: fix formatting of the license header
Reduce requirement for Perl in our documentation build and a few scripts. * ps/fewer-perl: Documentation: stop depending on Perl to generate command list Documentation: stop depending on Perl to massage user manual request-pull: stop depending on Perl filter-branch: stop depending on Perl
Code clean-up. * lo/remove-log-reencode-from-rev-info: revision: remove log_reencode field from rev_info
A test fix. * jk/p5332-testfix: p5332: drop "+" from --stdin-packs input
Build update to install bash (but not zsh) completion script. * ps/install-bash-completion: contrib/completion: install Bash completion
CI fix. * ps/ci-resurrect-p4-on-github: ci: fix p4d executable not being found on GitHub Actions
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
Mode: paranoid | Total findings: 18 | Considered vulnerability: 18 Insecure Processing of Data (18)
More info on how to fix Insecure Processing of Data in C/C++. 👉 Go to the dashboard for detailed results. 📥 Happy? Share your feedback with us. |
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.1)
Can you help keep this open source service alive? 💖 Please sponsor : )