Skip to content

Improving CVISE for large projects#483

Open
RRr89 wants to merge 6 commits into
marxin:masterfrom
RRr89:interface_changes
Open

Improving CVISE for large projects#483
RRr89 wants to merge 6 commits into
marxin:masterfrom
RRr89:interface_changes

Conversation

@RRr89
Copy link
Copy Markdown

@RRr89 RRr89 commented Mar 19, 2026

This PR focusses on large projects
(1) the default number of threads currently does not take into consideration the disk-space available. This PR takes disk-space as well as CPU count into account.
(2) currently, there exists only a max_improvement, however, for large projects, one does not want to waste time on passes having small progress. Therefore, this PR introduces a min_improvement
(3) currently, files are ordered by size. However, for large projects (>500 files), this will always touch the same files and leave other files unaffected. This PR randomizes the file-order
(4) in large projects, not all files may be necessary for reproducing a bug. This PR introduces a clear-pass that simpliy clears a file.

Some bugs were fixed. Also, Ctrl-C now leads to statistics output, before exiting. That statistics output contains improvement per run in bytes.

RRr89 added 6 commits June 3, 2025 16:20
Some of the command-line flags do not provide enough information, e.g.,
what is the default time-out?
Others don't work as expected, e.g., --list-passes fails, when no
TEST_CASE is provided.

In addition to fixing those, this change tries to hint about possible
errors, right in the beginning. Such as:
* Not enough disk space available. The default parallel setting (`-n`/ `--n`)
  does not take into account the disk space. For large input files of several
  GB, we may run out of disk, before we run out of CPU. Therefore, this change
  provides a parallel setting calculation that takes disk space into account.
  If the user sets parallelism through the command line, the satisfaction of
  disk space requirements are checked. A warning is printed on the command
  line, if such test fails, but the execution continues.
* Interestingness test check already exceeds the timeout. This change
  measures the initial check of the interestingness test. If that already
  fails the given timeout, a warning is issued, but the execution continues.
* Creating backups may fail. When *.orig files already exist, TEST_CASEs are
  not copied. This change prints a warning, if this is the case.

All warnings described above may be switched off by setting `-w` (similar to
the gcc `-w` flag).
This pass clears a file and checks, whether that file is required for reproducing the observed bug, at all. This may be helpful, when building libraries with hundreds of files.
When clangbinarysearch fails due to a timeout, current implementation simply retries. For large projects, this may take minutes to hours.
This change introduces a max timeout count (currently set to 20) and a current timeout count. When clangbinarysearch failed `max timeout count` times, due to a timeout, it stops.
By throwin an exception on KeyboardInterrupt, cvise can still print out statistics, when user decides to cancel the run.
When printing an overview at the end of the run, it makes sense to also add the progress per run (in bytes). Thus, user may explicitly exclude passes that take a lot of time, but make little progress.
Adding a parameter for min_improvement:
Currently, only max_improvement is supported. However, in large projects it may be undesired to waste days on a pass that only makes a few byte with each pass. With min_improvement, pass which make little progress are skipped faster and passes that make larger progress are executed sooner.

Also there seems to be a bug when accounting for the improvement of a pass on a file. I fixed that.
@emaxx-google
Copy link
Copy Markdown
Collaborator

Hello, thank you for publishing this. This is a big PR that squashes together many functional changes and bugfixes. Please split it into separate PRs, one per logically self-contained chunk of changes, so that they can be reviewed individually and the project's commit history remains clear.

There are also merge conflicts - those will need to be fixed. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants