Codoki AI Benchmarks (2025)

This repository serves as the index for Codoki's reproducible AI code-review benchmarks.
We recreated real-world bugs across multiple open-source projects, opened fresh PRs, and measured how Codoki and other tools perform in catching them.

Methodology

Dataset.
50 real bugs across Sentry (Python), Grafana (Go), Cal.com (TypeScript), Keycloak (Java), and Discourse (Ruby).
Each bug maps to an upstream PR that fixed a production defect.

Procedure.
We recreated the original bug PRs and ran Codoki on the same diffs and repository context.
A bug counted as “caught” only when Codoki identified the fault in a line-level PR comment with actionable guidance.
Mentions only in summaries did not count.

Competitor scores: Results for Greptile, Cursor, GitHub Copilot, CodeRabbit, and Graphite
are reproduced from Greptile’s public benchmark dataset.
We did not re-run competitor tools. Accessed: Sept 2025.

Dataset Index

Repository	Language	Dataset / Recreated PRs
Sentry	Python	sentry-codoki
Cal.com	TypeScript	calcom-codoki
Grafana	Go	grafana-codoki
Keycloak	Java	keycloak-codoki
Discourse	Ruby	discourse-codoki

Why This Matters

Engineering leaders need transparent, reproducible benchmarks to make informed adoption decisions.
This dataset helps teams compare tools on realistic production bugs — not trivial synthetic demos.

Contributing

Found a bug we should add to the dataset? Open an issue or PR.
Want to replicate this benchmark with your own tool? Fork this repo and run the same PR set.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Codoki AI Benchmarks (2025)

Methodology

Dataset Index

Why This Matters

Contributing

Related Links

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Codoki AI Benchmarks (2025)

Methodology

Dataset Index

Why This Matters

Contributing

Related Links

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages