Skip to content

publish: add leaderboard.json from T2 benchmark run#7

Merged
Bender1011001 merged 2 commits into
mainfrom
publish/leaderboard-T2
Feb 18, 2026
Merged

publish: add leaderboard.json from T2 benchmark run#7
Bender1011001 merged 2 commits into
mainfrom
publish/leaderboard-T2

Conversation

@Bender1011001

Copy link
Copy Markdown
Owner

Summary

  • Adds docs/api/leaderboard.json and docs/api/top10.json so the live site at fbabench.com displays the T2 benchmark results
  • These files were gitignored but never force-added after the benchmark completed, so the frontend showed "No data yet."
  • Built from public_results/agentic/openrouter_tier_runs/t2/summary.json (run run-20260210-085941)

Rankings (agentic / net profit):

  1. anthropic/claude-opus-4.6 → +$12,488 (ROI 124.9%) 🥇
  2. google/gemini-3-pro-preview → +$8,512 (ROI 85.1%) 🥈
  3. x-ai/grok-4.1-fast → -$4,874 (ROI -48.7%) 🥉
  4. meta-llama/llama-3.3-70b → -$9,576 (ROI -95.8%)
  5. deepseek/deepseek-r1 → failed
  6. openai/gpt-5.2 → failed

Test plan

  • Merge to main
  • Visit https://fbabench.com and confirm the table shows 6 model rows
  • Rank 1 should be claude-opus-4.6 with ~+$12,489 profit
  • "Last updated" card should show February 18, 2026

🤖 Generated with Claude Code

Generates docs/api/leaderboard.json and top10.json so the live site
at fbabench.com displays the benchmark results. Previously these files
were missing (gitignored but never force-added), causing the frontend
to show "No data yet."

Rankings (agentic / net profit, T2):
1. anthropic/claude-opus-4.6   +$12,488 (ROI 124.9%)
2. google/gemini-3-pro-preview +$8,512  (ROI  85.1%)
3. x-ai/grok-4.1-fast          -$4,874  (ROI -48.7%)
4. meta-llama/llama-3.3-70b    -$9,576  (ROI -95.8%)
5. deepseek/deepseek-r1        failed
6. openai/gpt-5.2              failed

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Feb 18, 2026

Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
❌ Deployment failed
View logs
fba c0f33c8 Feb 18 2026, 06:56 AM

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Feb 18, 2026

Copy link
Copy Markdown

Deploying fba-bench-enterprise with  Cloudflare Pages  Cloudflare Pages

Latest commit: c0f33c8
Status: ✅  Deploy successful!
Preview URL: https://28a8f9fc.fba-bench-enterprise.pages.dev
Branch Preview URL: https://publish-leaderboard-t2.fba-bench-enterprise.pages.dev

View logs

Removes docs/api/*.json and the explicit leaderboard.json/top10.json
entries from .gitignore so the published leaderboard data can be
committed without force-adding. Runtime-only files (live.json,
sim_theater_live.json) remain ignored.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Bender1011001 Bender1011001 merged commit 4e0217f into main Feb 18, 2026
12 of 23 checks passed
Bender1011001 added a commit that referenced this pull request Feb 26, 2026
publish: add leaderboard.json from T2 benchmark run
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant