Skip to content

Introduce bfoptimize to optimize the generate bytecode using LLMs#489

Open
qdeslandes wants to merge 7 commits intofacebook:mainfrom
qdeslandes:bfoptimize
Open

Introduce bfoptimize to optimize the generate bytecode using LLMs#489
qdeslandes wants to merge 7 commits intofacebook:mainfrom
qdeslandes:bfoptimize

Conversation

@qdeslandes
Copy link
Copy Markdown
Contributor

Create a tool (tools/bfoptimize) to optimize the BPF bytecode generation to produce faster BPF programs, using LLMs. bfoptimize will loop for a fixed number of iterations, and will, for each iteration:

  • Use Claude to look for improvements in the bytecode generation logic
  • Implement an optimization
  • Validate the code using the tests
  • Run the benchmarks and compare the results with a baseline

If the benchmarks improved, the changes are committed; otherwise they are discarded. Then the whole process starts again.

tools/bfoptimize-web is a web interface to track bfoptimize progress, it's not required to use bfoptimize.

This PR also contains supporting changes:

  • Support Git worktrees in bfbencher
  • Benchmark flavor-specific prologue duration
  • New history and compare mode for bfbencher
  • Move benchmarks into tests instead of tools

bfbencher copies the source directory into a temporary directory, to
prevent modifying the existing Git tree. For worktrees, there is not
full .git directory, but a link to it, so the main Git repository (from
which the worktree has been created) will be modified nonetheless
(modifying the actual worktree instead of the copy).

Detect if the repository is a worktree, and detach it if so.
Restructure the CLI around subcommands to leave room for a future
compare mode. All existing options and behaviour are unchanged; they
now live under the history subcommand. Options that will be shared
with future subcommands (host, cache, report paths, hardware isolation)
are defined on a shared parent parser inherited by each subcommand.
Add a compare subcommand that benchmarks exactly two commits and
reports the direct performance difference between them. Unlike the
history subcommand which tracks trends over a sliding window, compare
produces a side-by-side table showing base/ref absolute values and
deltas for both runtime and instruction count.

The benchmark loop and executor setup are factored into shared helpers
(_create_executor, _benchmark_commits) used by both subcommands.
An optional --json-output flag writes the results to a structured JSON
file for consumption by external tools.
Add --fail-on-benchmark-error option to return non-zero from bfbencher
if one of the benchmark run has failed.
Iteratively improves bpfilter's BPF bytecode generation using Claude.
Each iteration, an Opus 4.6 planning call (with adaptive thinking) proposes
one concrete optimization to src/bpfilter/cgen/. A Claude sub-agent then
implements it directly in the repo, builds, runs the full test suite, and
commits if tests pass. bfbencher compare validates the change; if the mean
runtime delta across all benchmarks is negative the commit is kept and
becomes the new baseline, otherwise it is reverted.

Attempt history is persisted in .cache/bfoptimize/history.json across
sessions so the LLM never retries a previously attempted optimization.
Thinking blocks from each planning call are saved to
.cache/bfoptimize/<id>-thinking.txt for post-hoc inspection.
@qdeslandes qdeslandes requested a review from jordalgo as a code owner March 25, 2026 18:57
@meta-cla meta-cla bot added the cla signed label Mar 25, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 25, 2026

Claude review of PR #489 (b58470e)

Must fix

  • Path traversal via cache_dir in bfoptimize-webtools/bfoptimize-web:119 — User-controlled cache_dir from POST body flows unchecked into DELETE /history which calls shutil.rmtree on all entries, allowing deletion of arbitrary directories on localhost

Suggestions

  • reset --hard without dirty-tree guardtools/bfoptimize:542repo.git.reset("--hard", baseline_sha) silently destroys uncommitted work; add is_dirty() check at startup
  • bypassPermissions in agent SDK calltools/bfoptimize:259 — Unrestricted Bash access for the Claude agent; consider documenting that this tool should run in a disposable worktree, or restrict allowed_tools
  • NO_TESTS description is staleCMakeLists.txt:33 — Now also controls benchmarks but description still says "unit, end-to-end, and integration tests"
  • NO_TESTS doc description should mention benchmarksdoc/developers/build.rst:98 — Changed to "disable all tests" but doesn't mention benchmarks are included
  • Validate model/effort in RunRequesttools/bfoptimize-web:48 — Plain str fields with no validation; use Literal types

Nits

  • Commit scope mismatchtools/bfoptimize — Commit 1b13b30 introduces both bfoptimize and bfoptimize-web but message only mentions the optimization loop

Workflow run

global _process, _log_lines, _cache_dir, _sources_dir
if _process is not None and _process.returncode is None:
raise HTTPException(409, "Already running")
_cache_dir = pathlib.Path(body.cache_dir)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: must-fix: _cache_dir is set directly from the user-controlled body.cache_dir POST field with no validation. The DELETE /history endpoint (line 176) then calls shutil.rmtree(p) for every entry in _cache_dir. Even though the server binds to 127.0.0.1, any local process (or browser tab with localhost access) can set cache_dir to an arbitrary path (e.g., /home/user) and then call DELETE /history to recursively delete its contents.

Consider validating that _cache_dir resolves under a known safe prefix, or use a fixed cache directory that ignores the request body.

if delta is None or delta >= 0:
delta_str = f"{delta:+.1f}%" if delta is not None else "N/A"
console.log(f"[red]Rejected (bench): mean delta {delta_str}[/red]")
repo.git.reset("--hard", baseline_sha)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: repo.git.reset("--hard", baseline_sha) unconditionally discards all uncommitted changes. If a developer runs bfoptimize in a repository with staged or unstaged work, those changes are silently destroyed on the first rejected benchmark.

Consider adding a repo.is_dirty(untracked_files=True) check at the top of run_optimize() and aborting with a message asking the user to commit or stash first.

prompt=prompt,
options=ClaudeAgentOptions(
cwd=str(abs_sources),
permission_mode="bypassPermissions",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: permission_mode="bypassPermissions" grants the Claude agent unrestricted Bash access. While the prompt constrains changes to CGEN_DIR, prompt-based constraints are not enforceable — a model error or prompt injection via crafted source could cause commands outside cgen/. Consider documenting this risk prominently (e.g., "this tool should only be run in a dedicated worktree or disposable environment"), or restricting allowed_tools to exclude Bash and rely on Read/Edit/Write only.

@@ -32,7 +32,6 @@ include(GNUInstallDirs)
option(NO_DOCS "Disable documentation generation" 0)
option(NO_TESTS "Disable unit, end-to-end, and integration tests" 0)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: NO_TESTS now also disables benchmarks (since the benchmark CMake target moved under tests/), but the option description still reads "Disable unit, end-to-end, and integration tests". Users who previously used -DNO_BENCHMARKS independently need to know this. Update to something like "Disable unit, end-to-end, integration tests, and benchmarks".


- ``-DNO_DOCS``: disable the documentation, including the coverage and benchmarks report.
- ``-DNO_TESTS``: disable unit tests, end-to-end tests, and integration tests.
- ``-DNO_TESTS``: disable all tests.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: The NO_TESTS description was simplified to "disable all tests" which, while technically correct, hides the fact that benchmarks (previously controlled by the now-removed NO_BENCHMARKS) are also disabled. Consider: -DNO_TESTS: disable all tests, including benchmarks." to help users migrating from the old flag.

no_preempt: bool = False
cpu_pin: int | None = None
slice: str | None = None
model: str = "claude-opus-4-6"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: RunRequest.model and RunRequest.effort are plain str fields with no validation. While bfoptimize's argparse rejects invalid values, the web UI should validate at the API boundary to fail fast with a 422. Use Literal["claude-opus-4-6", "claude-sonnet-4-6"] for model and Literal["low", "medium", "high", "max"] for effort.

@pzmarzly
Copy link
Copy Markdown
Contributor

I'm testing a new version of Claude PR reviewer, it found some more stuff: pzmarzly/claude-pr-review#14 (comment)

Issue New bot
daemon: cgen: commit prefix is stale — daemon was removed from project (bfoptimize:249) must-fix (5/5 reviewers)
-DNO_TESTS=1 contradicts make unit e2e integration — test targets won't exist (bfoptimize:246) must-fix (5/5 reviewers)
results.html.j2 deleted instead of movedbfbencher history --report-path broken at runtime (bfbencher:39) must-fix (4/5 reviewers)
DELETE /history crashes with FileNotFoundError if cache dir doesn't exist yet (bfoptimize-web:180) suggestion
--fail-on-benchmark-error dead code for history subcommand (bfbencher:1298) suggestion (3/5 reviewers)
CI build job may fail — find_package(benchmark REQUIRED) now runs without benchmark dep (ci.yaml:85) suggestion
git commit -am stages all tracked files, not just CGEN_DIR/ (bfoptimize:249) suggestion
No subprocess output captured on benchmark failure (bfoptimize:238) nit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants