Introduce `bfoptimize` to optimize the generate bytecode using LLMs by qdeslandes · Pull Request #489 · facebook/bpfilter

qdeslandes · 2026-03-25T18:57:04Z

Create a tool (tools/bfoptimize) to optimize the BPF bytecode generation to produce faster BPF programs, using LLMs. bfoptimize will loop for a fixed number of iterations, and will, for each iteration:

Use Claude to look for improvements in the bytecode generation logic
Implement an optimization
Validate the code using the tests
Run the benchmarks and compare the results with a baseline

If the benchmarks improved, the changes are committed; otherwise they are discarded. Then the whole process starts again.

tools/bfoptimize-web is a web interface to track bfoptimize progress, it's not required to use bfoptimize.

This PR also contains supporting changes:

Support Git worktrees in bfbencher
Benchmark flavor-specific prologue duration
New history and compare mode for bfbencher
Move benchmarks into tests instead of tools

bfbencher copies the source directory into a temporary directory, to prevent modifying the existing Git tree. For worktrees, there is not full .git directory, but a link to it, so the main Git repository (from which the worktree has been created) will be modified nonetheless (modifying the actual worktree instead of the copy). Detect if the repository is a worktree, and detach it if so.

Restructure the CLI around subcommands to leave room for a future compare mode. All existing options and behaviour are unchanged; they now live under the history subcommand. Options that will be shared with future subcommands (host, cache, report paths, hardware isolation) are defined on a shared parent parser inherited by each subcommand.

Add a compare subcommand that benchmarks exactly two commits and reports the direct performance difference between them. Unlike the history subcommand which tracks trends over a sliding window, compare produces a side-by-side table showing base/ref absolute values and deltas for both runtime and instruction count. The benchmark loop and executor setup are factored into shared helpers (_create_executor, _benchmark_commits) used by both subcommands. An optional --json-output flag writes the results to a structured JSON file for consumption by external tools.

Add --fail-on-benchmark-error option to return non-zero from bfbencher if one of the benchmark run has failed.

Iteratively improves bpfilter's BPF bytecode generation using Claude. Each iteration, an Opus 4.6 planning call (with adaptive thinking) proposes one concrete optimization to src/bpfilter/cgen/. A Claude sub-agent then implements it directly in the repo, builds, runs the full test suite, and commits if tests pass. bfbencher compare validates the change; if the mean runtime delta across all benchmarks is negative the commit is kept and becomes the new baseline, otherwise it is reverted. Attempt history is persisted in .cache/bfoptimize/history.json across sessions so the LLM never retries a previously attempted optimization. Thinking blocks from each planning call are saved to .cache/bfoptimize/<id>-thinking.txt for post-hoc inspection.

github-actions · 2026-03-25T18:57:15Z

Claude review of PR #489 (`b58470e`)

Must fix

Path traversal via cache_dir in bfoptimize-web — tools/bfoptimize-web:119 — User-controlled cache_dir from POST body flows unchecked into DELETE /history which calls shutil.rmtree on all entries, allowing deletion of arbitrary directories on localhost

Suggestions

reset --hard without dirty-tree guard — tools/bfoptimize:542 — repo.git.reset("--hard", baseline_sha) silently destroys uncommitted work; add is_dirty() check at startup
bypassPermissions in agent SDK call — tools/bfoptimize:259 — Unrestricted Bash access for the Claude agent; consider documenting that this tool should run in a disposable worktree, or restrict allowed_tools
NO_TESTS description is stale — CMakeLists.txt:33 — Now also controls benchmarks but description still says "unit, end-to-end, and integration tests"
NO_TESTS doc description should mention benchmarks — doc/developers/build.rst:98 — Changed to "disable all tests" but doesn't mention benchmarks are included
Validate model/effort in RunRequest — tools/bfoptimize-web:48 — Plain str fields with no validation; use Literal types

Nits

Commit scope mismatch — tools/bfoptimize — Commit 1b13b30 introduces both bfoptimize and bfoptimize-web but message only mentions the optimization loop

Workflow run

github-actions · 2026-03-25T19:08:49Z

tools/bfoptimize-web

+    global _process, _log_lines, _cache_dir, _sources_dir
+    if _process is not None and _process.returncode is None:
+        raise HTTPException(409, "Already running")
+    _cache_dir = pathlib.Path(body.cache_dir)


Claude: must-fix: _cache_dir is set directly from the user-controlled body.cache_dir POST field with no validation. The DELETE /history endpoint (line 176) then calls shutil.rmtree(p) for every entry in _cache_dir. Even though the server binds to 127.0.0.1, any local process (or browser tab with localhost access) can set cache_dir to an arbitrary path (e.g., /home/user) and then call DELETE /history to recursively delete its contents.

Consider validating that _cache_dir resolves under a known safe prefix, or use a fixed cache directory that ignores the request body.

github-actions · 2026-03-25T19:08:50Z

tools/bfoptimize

+        if delta is None or delta >= 0:
+            delta_str = f"{delta:+.1f}%" if delta is not None else "N/A"
+            console.log(f"[red]Rejected (bench): mean delta {delta_str}[/red]")
+            repo.git.reset("--hard", baseline_sha)


Claude: suggestion: repo.git.reset("--hard", baseline_sha) unconditionally discards all uncommitted changes. If a developer runs bfoptimize in a repository with staged or unstaged work, those changes are silently destroyed on the first rejected benchmark.

Consider adding a repo.is_dirty(untracked_files=True) check at the top of run_optimize() and aborting with a message asking the user to commit or stash first.

github-actions · 2026-03-25T19:08:51Z

tools/bfoptimize

+        prompt=prompt,
+        options=ClaudeAgentOptions(
+            cwd=str(abs_sources),
+            permission_mode="bypassPermissions",


Claude: suggestion: permission_mode="bypassPermissions" grants the Claude agent unrestricted Bash access. While the prompt constrains changes to CGEN_DIR, prompt-based constraints are not enforceable — a model error or prompt injection via crafted source could cause commands outside cgen/. Consider documenting this risk prominently (e.g., "this tool should only be run in a dedicated worktree or disposable environment"), or restricting allowed_tools to exclude Bash and rely on Read/Edit/Write only.

github-actions · 2026-03-25T19:08:52Z

CMakeLists.txt

@@ -32,7 +32,6 @@ include(GNUInstallDirs)
 option(NO_DOCS "Disable documentation generation" 0)
 option(NO_TESTS "Disable unit, end-to-end, and integration tests" 0)


Claude: suggestion: NO_TESTS now also disables benchmarks (since the benchmark CMake target moved under tests/), but the option description still reads "Disable unit, end-to-end, and integration tests". Users who previously used -DNO_BENCHMARKS independently need to know this. Update to something like "Disable unit, end-to-end, integration tests, and benchmarks".

github-actions · 2026-03-25T19:08:53Z

doc/developers/build.rst


 - ``-DNO_DOCS``: disable the documentation, including the coverage and benchmarks report.
- ``-DNO_TESTS``: disable unit tests, end-to-end tests, and integration tests.
+- ``-DNO_TESTS``: disable all tests.


Claude: suggestion: The NO_TESTS description was simplified to "disable all tests" which, while technically correct, hides the fact that benchmarks (previously controlled by the now-removed NO_BENCHMARKS) are also disabled. Consider: -DNO_TESTS: disable all tests, including benchmarks." to help users migrating from the old flag.

github-actions · 2026-03-25T19:08:53Z

tools/bfoptimize-web

+    no_preempt: bool = False
+    cpu_pin: int | None = None
+    slice: str | None = None
+    model: str = "claude-opus-4-6"


Claude: suggestion: RunRequest.model and RunRequest.effort are plain str fields with no validation. While bfoptimize's argparse rejects invalid values, the web UI should validate at the API boundary to fail fast with a 422. Use Literal["claude-opus-4-6", "claude-sonnet-4-6"] for model and Literal["low", "medium", "high", "max"] for effort.

pzmarzly · 2026-03-25T19:53:23Z

I'm testing a new version of Claude PR reviewer, it found some more stuff: pzmarzly/claude-pr-review#14 (comment)

Issue	New bot
`daemon: cgen:` commit prefix is stale — daemon was removed from project (`bfoptimize:249`)	must-fix (5/5 reviewers)
`-DNO_TESTS=1` contradicts `make unit e2e integration` — test targets won't exist (`bfoptimize:246`)	must-fix (5/5 reviewers)
`results.html.j2` deleted instead of moved — `bfbencher history --report-path` broken at runtime (`bfbencher:39`)	must-fix (4/5 reviewers)
`DELETE /history` crashes with `FileNotFoundError` if cache dir doesn't exist yet (`bfoptimize-web:180`)	suggestion
`--fail-on-benchmark-error` dead code for `history` subcommand (`bfbencher:1298`)	suggestion (3/5 reviewers)
CI build job may fail — `find_package(benchmark REQUIRED)` now runs without benchmark dep (`ci.yaml:85`)	suggestion
`git commit -am` stages all tracked files, not just `CGEN_DIR/` (`bfoptimize:249`)	suggestion
No subprocess output captured on benchmark failure (`bfoptimize:238`)	nit

qdeslandes added 7 commits March 25, 2026 18:02

tools: benchmarks: add option to fail bfbencher when a benchmark fails

21a9ef3

Add --fail-on-benchmark-error option to return non-zero from bfbencher if one of the benchmark run has failed.

tools: benchmarks: move benchmarks into tests/

8818223

tests: benchmarks: add per-hook prologue benchmarks

b58470e

qdeslandes requested a review from jordalgo as a code owner March 25, 2026 18:57

meta-cla bot added the cla signed label Mar 25, 2026

github-actions bot reviewed Mar 25, 2026

View reviewed changes

pzmarzly mentioned this pull request Mar 25, 2026

review: switch to team-of-5 independent reviewers pzmarzly/claude-pr-review#14

Merged

pzmarzly mentioned this pull request Mar 25, 2026

tools: claude: use new review prompt to catch more issues #490

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce `bfoptimize` to optimize the generate bytecode using LLMs#489

Introduce `bfoptimize` to optimize the generate bytecode using LLMs#489
qdeslandes wants to merge 7 commits intofacebook:mainfrom
qdeslandes:bfoptimize

qdeslandes commented Mar 25, 2026

Uh oh!

github-actions bot commented Mar 25, 2026 •

edited

Loading

Uh oh!

github-actions bot Mar 25, 2026

Uh oh!

github-actions bot Mar 25, 2026

Uh oh!

github-actions bot Mar 25, 2026

Uh oh!

github-actions bot Mar 25, 2026

Uh oh!

github-actions bot Mar 25, 2026

Uh oh!

github-actions bot Mar 25, 2026

Uh oh!

pzmarzly commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -32,7 +32,6 @@ include(GNUInstallDirs)
		option(NO_DOCS "Disable documentation generation" 0)
		option(NO_TESTS "Disable unit, end-to-end, and integration tests" 0)

Conversation

qdeslandes commented Mar 25, 2026

Uh oh!

github-actions bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude review of PR #489 (b58470e)

Must fix

Suggestions

Nits

Uh oh!

github-actions bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

pzmarzly commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Mar 25, 2026 •

edited

Loading

Claude review of PR #489 (`b58470e`)