Introduce bfoptimize to optimize the generate bytecode using LLMs#489
Introduce bfoptimize to optimize the generate bytecode using LLMs#489qdeslandes wants to merge 7 commits intofacebook:mainfrom
bfoptimize to optimize the generate bytecode using LLMs#489Conversation
bfbencher copies the source directory into a temporary directory, to prevent modifying the existing Git tree. For worktrees, there is not full .git directory, but a link to it, so the main Git repository (from which the worktree has been created) will be modified nonetheless (modifying the actual worktree instead of the copy). Detect if the repository is a worktree, and detach it if so.
Restructure the CLI around subcommands to leave room for a future compare mode. All existing options and behaviour are unchanged; they now live under the history subcommand. Options that will be shared with future subcommands (host, cache, report paths, hardware isolation) are defined on a shared parent parser inherited by each subcommand.
Add a compare subcommand that benchmarks exactly two commits and reports the direct performance difference between them. Unlike the history subcommand which tracks trends over a sliding window, compare produces a side-by-side table showing base/ref absolute values and deltas for both runtime and instruction count. The benchmark loop and executor setup are factored into shared helpers (_create_executor, _benchmark_commits) used by both subcommands. An optional --json-output flag writes the results to a structured JSON file for consumption by external tools.
Add --fail-on-benchmark-error option to return non-zero from bfbencher if one of the benchmark run has failed.
Iteratively improves bpfilter's BPF bytecode generation using Claude. Each iteration, an Opus 4.6 planning call (with adaptive thinking) proposes one concrete optimization to src/bpfilter/cgen/. A Claude sub-agent then implements it directly in the repo, builds, runs the full test suite, and commits if tests pass. bfbencher compare validates the change; if the mean runtime delta across all benchmarks is negative the commit is kept and becomes the new baseline, otherwise it is reverted. Attempt history is persisted in .cache/bfoptimize/history.json across sessions so the LLM never retries a previously attempted optimization. Thinking blocks from each planning call are saved to .cache/bfoptimize/<id>-thinking.txt for post-hoc inspection.
Claude review of PR #489 (b58470e)Must fix
Suggestions
Nits
|
| global _process, _log_lines, _cache_dir, _sources_dir | ||
| if _process is not None and _process.returncode is None: | ||
| raise HTTPException(409, "Already running") | ||
| _cache_dir = pathlib.Path(body.cache_dir) |
There was a problem hiding this comment.
Claude: must-fix: _cache_dir is set directly from the user-controlled body.cache_dir POST field with no validation. The DELETE /history endpoint (line 176) then calls shutil.rmtree(p) for every entry in _cache_dir. Even though the server binds to 127.0.0.1, any local process (or browser tab with localhost access) can set cache_dir to an arbitrary path (e.g., /home/user) and then call DELETE /history to recursively delete its contents.
Consider validating that _cache_dir resolves under a known safe prefix, or use a fixed cache directory that ignores the request body.
| if delta is None or delta >= 0: | ||
| delta_str = f"{delta:+.1f}%" if delta is not None else "N/A" | ||
| console.log(f"[red]Rejected (bench): mean delta {delta_str}[/red]") | ||
| repo.git.reset("--hard", baseline_sha) |
There was a problem hiding this comment.
Claude: suggestion: repo.git.reset("--hard", baseline_sha) unconditionally discards all uncommitted changes. If a developer runs bfoptimize in a repository with staged or unstaged work, those changes are silently destroyed on the first rejected benchmark.
Consider adding a repo.is_dirty(untracked_files=True) check at the top of run_optimize() and aborting with a message asking the user to commit or stash first.
| prompt=prompt, | ||
| options=ClaudeAgentOptions( | ||
| cwd=str(abs_sources), | ||
| permission_mode="bypassPermissions", |
There was a problem hiding this comment.
Claude: suggestion: permission_mode="bypassPermissions" grants the Claude agent unrestricted Bash access. While the prompt constrains changes to CGEN_DIR, prompt-based constraints are not enforceable — a model error or prompt injection via crafted source could cause commands outside cgen/. Consider documenting this risk prominently (e.g., "this tool should only be run in a dedicated worktree or disposable environment"), or restricting allowed_tools to exclude Bash and rely on Read/Edit/Write only.
| @@ -32,7 +32,6 @@ include(GNUInstallDirs) | |||
| option(NO_DOCS "Disable documentation generation" 0) | |||
| option(NO_TESTS "Disable unit, end-to-end, and integration tests" 0) | |||
There was a problem hiding this comment.
Claude: suggestion: NO_TESTS now also disables benchmarks (since the benchmark CMake target moved under tests/), but the option description still reads "Disable unit, end-to-end, and integration tests". Users who previously used -DNO_BENCHMARKS independently need to know this. Update to something like "Disable unit, end-to-end, integration tests, and benchmarks".
|
|
||
| - ``-DNO_DOCS``: disable the documentation, including the coverage and benchmarks report. | ||
| - ``-DNO_TESTS``: disable unit tests, end-to-end tests, and integration tests. | ||
| - ``-DNO_TESTS``: disable all tests. |
There was a problem hiding this comment.
Claude: suggestion: The NO_TESTS description was simplified to "disable all tests" which, while technically correct, hides the fact that benchmarks (previously controlled by the now-removed NO_BENCHMARKS) are also disabled. Consider: -DNO_TESTS: disable all tests, including benchmarks." to help users migrating from the old flag.
| no_preempt: bool = False | ||
| cpu_pin: int | None = None | ||
| slice: str | None = None | ||
| model: str = "claude-opus-4-6" |
There was a problem hiding this comment.
Claude: suggestion: RunRequest.model and RunRequest.effort are plain str fields with no validation. While bfoptimize's argparse rejects invalid values, the web UI should validate at the API boundary to fail fast with a 422. Use Literal["claude-opus-4-6", "claude-sonnet-4-6"] for model and Literal["low", "medium", "high", "max"] for effort.
|
I'm testing a new version of Claude PR reviewer, it found some more stuff: pzmarzly/claude-pr-review#14 (comment)
|
Create a tool (
tools/bfoptimize) to optimize the BPF bytecode generation to produce faster BPF programs, using LLMs.bfoptimizewill loop for a fixed number of iterations, and will, for each iteration:If the benchmarks improved, the changes are committed; otherwise they are discarded. Then the whole process starts again.
tools/bfoptimize-webis a web interface to trackbfoptimizeprogress, it's not required to usebfoptimize.This PR also contains supporting changes:
historyandcomparemode for bfbenchertestsinstead oftools