Skip to content

docs: refresh README benchmarks to the 0.5.0 release run#92

Merged
heyoub merged 1 commit into
mainfrom
docs/refresh-bench-0.5.0
Jul 1, 2026
Merged

docs: refresh README benchmarks to the 0.5.0 release run#92
heyoub merged 1 commit into
mainfrom
docs/refresh-bench-0.5.0

Conversation

@heyoub

@heyoub heyoub commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

The README benchmark snapshot was stale (run 28413170218 / commit eef9b38, still version 0.4.1). Re-pinned to run 28506238242 — the truth-linux run on the actual v0.5.0 commit 7d457936 (passed, 15m42s).

Real 0.5.0 numbers now in the README: 7/7 hard gates green; stream parse+patch overhead moved from +0.67% → -1.54% (SSE overflow added no measurable cost).

Generated via refresh-bench-snapshot.ts + docs:gen — README bench block + benchmarks/readme-snapshot.json only.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Documentation
    • Refreshed the benchmark snapshot in the README with the latest CI artifact reference, commit, date, and performance metrics.
    • Updated the benchmark snapshot data to reflect newer runtime, hard-gated pair, and diagnostic watch results.

Greptile Summary

This PR refreshes the README benchmark snapshot for the v0.5.0 CI run. The main changes are:

  • Updated the generated README benchmark block to CI run 28506238242.
  • Refreshed the hard-gated benchmark values and diagnostic watch numbers.
  • Updated benchmarks/readme-snapshot.json, which feeds the generated README block.

Confidence Score: 5/5

The change is limited to generated benchmark documentation and its snapshot data, with no runtime code modified.

The diff scope is narrow and matches the described documentation refresh.

T-Rex T-Rex Logs

What T-Rex did

  • Ran the base benchmark snapshot extraction and captured the initial results, including the stale CI run, duration, and stream overhead values.
  • Ran the head snapshot extraction and performed docs generation with pnpm run docs:gen; JSON checks passed and docs were unchanged, but the README commit token extraction did not find 7d45793.
  • Fetched GitHub API metadata for run 28506238242, confirming the full head SHA begins with 7d45793.

View all artifacts

T-Rex Ran code and verified through T-Rex

Comments Outside Diff (1)

  1. General comment

    P2 README benchmark block truncates the refreshed commit one character shorter than the claimed contract

    • Bug
      • The requested head contract says the README and snapshot should contain commit 7d457936. The JSON snapshot has source.commit: "7d457936", and the GitHub API confirms run 28506238242 is for full SHA 7d4579366efceb256f8d3313d1edbb1d27aba372. However, README.md line 248 renders commit 7d45793, and the executed head extraction reported no README match for 7d457936. pnpm run docs:gen exits successfully and leaves the README unchanged, so this mismatch is in the generated committed documentation block rather than a local dirty state.
    • Cause
      • The README generation path appears to shorten the snapshot commit to 7 characters, while the refreshed snapshot/validation contract expects the 8-character commit token 7d457936.
    • Fix
      • Update the README generation logic or committed README benchmark block so the rendered commit matches the snapshot contract (7d457936), then rerun pnpm run docs:gen and commit the resulting README if changed.

    T-Rex Ran code and verified through T-Rex

Reviews (1): Last reviewed commit: "docs: refresh README benchmarks to the 0..." | Re-trigger Greptile

The README bench snapshot was pinned to run 28413170218 / commit eef9b38, which
was still at version 0.4.1 — stale numbers for a 0.5.0 release. Re-pinned to run
28506238242, the truth-linux run on the actual v0.5.0 commit 7d45793 (version
0.5.0, passed, 15m42s). Regenerated the README bench block via docs:gen.

Real 0.5.0 numbers: all 7 hard gates green; `stream` parse+patch overhead moved
from +0.67% to -1.54% (SSE overflow work added no measurable cost).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01KxU3Y8XueHqfteVGA4KdEh
@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR refreshes the benchmark snapshot artifacts: README.md's gauntlet benchmark section is updated with new CI run reference, commit hash, date, and performance metrics, while benchmarks/readme-snapshot.json is updated with matching source provenance, duration, hard-gated pair medians, and diagnostic watch values.

Changes

Benchmark data refresh

Layer / File(s) Summary
Snapshot JSON data update
benchmarks/readme-snapshot.json
Updated source metadata (runId, commit, capturedAt), gauntlet.durationFormatted, hardGatedPairs median directive/baseline/overhead values, and diagnosticWatch fields for llm-runtime-steady.
README benchmark snapshot text update
README.md
Updated snapshot attribution line, refreshed gauntlet:full/bench:gate/package:smoke results and hard-gated pair table, and updated the Diagnostic watch paragraph.

Estimated code review effort: 1 (Trivial) | ~3 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: refreshing the README benchmark snapshot for the 0.5.0 release run.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/refresh-bench-0.5.0

Comment @coderabbitai help to get the list of available commands.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 675da755bb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread README.md
| `llm` text chunk parse | 823,421ns | 775,031ns | 6.20% | 15% |
| `worker` fallback eval | 2,982ns | 2,832ns | 5.33% | 15% |
| `llm-startup-shared` | 76,604ns | 75,313ns | 1.96% | 25% |
| `llm-promoted-startup-shared` | 244,095ns | 251,687ns | 2.58% | 25% |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Make the promoted-startup overhead internally consistent

For this refreshed snapshot, this row reports a 244,095ns median directive against a 251,687ns median baseline but still shows 2.58% overhead; the displayed medians imply roughly -3.02%, so the public release benchmark table now says this path is both faster and slower than baseline. If 2.58% is the intended median-of-per-replicate overhead, the row needs enough context or values from the same aggregate so operators do not read the 0.5.0 snapshot as a contradictory result.

Useful? React with 👍 / 👎.

@heyoub heyoub merged commit 75bf50b into main Jul 1, 2026
11 checks passed
@heyoub heyoub deleted the docs/refresh-bench-0.5.0 branch July 1, 2026 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant