Improving CVISE for large projects#483
Open
RRr89 wants to merge 6 commits into
Open
Conversation
Some of the command-line flags do not provide enough information, e.g., what is the default time-out? Others don't work as expected, e.g., --list-passes fails, when no TEST_CASE is provided. In addition to fixing those, this change tries to hint about possible errors, right in the beginning. Such as: * Not enough disk space available. The default parallel setting (`-n`/ `--n`) does not take into account the disk space. For large input files of several GB, we may run out of disk, before we run out of CPU. Therefore, this change provides a parallel setting calculation that takes disk space into account. If the user sets parallelism through the command line, the satisfaction of disk space requirements are checked. A warning is printed on the command line, if such test fails, but the execution continues. * Interestingness test check already exceeds the timeout. This change measures the initial check of the interestingness test. If that already fails the given timeout, a warning is issued, but the execution continues. * Creating backups may fail. When *.orig files already exist, TEST_CASEs are not copied. This change prints a warning, if this is the case. All warnings described above may be switched off by setting `-w` (similar to the gcc `-w` flag).
This pass clears a file and checks, whether that file is required for reproducing the observed bug, at all. This may be helpful, when building libraries with hundreds of files.
When clangbinarysearch fails due to a timeout, current implementation simply retries. For large projects, this may take minutes to hours. This change introduces a max timeout count (currently set to 20) and a current timeout count. When clangbinarysearch failed `max timeout count` times, due to a timeout, it stops.
By throwin an exception on KeyboardInterrupt, cvise can still print out statistics, when user decides to cancel the run.
When printing an overview at the end of the run, it makes sense to also add the progress per run (in bytes). Thus, user may explicitly exclude passes that take a lot of time, but make little progress.
Adding a parameter for min_improvement: Currently, only max_improvement is supported. However, in large projects it may be undesired to waste days on a pass that only makes a few byte with each pass. With min_improvement, pass which make little progress are skipped faster and passes that make larger progress are executed sooner. Also there seems to be a bug when accounting for the improvement of a pass on a file. I fixed that.
Collaborator
|
Hello, thank you for publishing this. This is a big PR that squashes together many functional changes and bugfixes. Please split it into separate PRs, one per logically self-contained chunk of changes, so that they can be reviewed individually and the project's commit history remains clear. There are also merge conflicts - those will need to be fixed. Thanks. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR focusses on large projects
(1) the default number of threads currently does not take into consideration the disk-space available. This PR takes disk-space as well as CPU count into account.
(2) currently, there exists only a max_improvement, however, for large projects, one does not want to waste time on passes having small progress. Therefore, this PR introduces a min_improvement
(3) currently, files are ordered by size. However, for large projects (>500 files), this will always touch the same files and leave other files unaffected. This PR randomizes the file-order
(4) in large projects, not all files may be necessary for reproducing a bug. This PR introduces a clear-pass that simpliy clears a file.
Some bugs were fixed. Also, Ctrl-C now leads to statistics output, before exiting. That statistics output contains improvement per run in bytes.