Skip to content

Pull requests: sanity-labs/agent-e2e-evals

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

feat: add Gemini agent evals
#42 opened Jun 2, 2026 by jonahsnider Contributor Loading…
fix: hide rules tools in MCP evals
#41 opened Jun 2, 2026 by jonahsnider Contributor Loading…
fix: increase timeout duration and trial count for all evals
#39 opened May 29, 2026 by jonahsnider Contributor Loading…
fix: mark runs that time out as failures
#38 opened May 29, 2026 by jonahsnider Contributor Loading…
fix: improve robustness of GROQ query assertions
#37 opened May 29, 2026 by jonahsnider Contributor Loading…
fix: reduce memory usage
#35 opened May 28, 2026 by jonahsnider Contributor Loading…
ProTip! Adding no:label will show everything without a label.