Skip to content

[WIP] Get autoevals to work with trace scoring#173

Closed
Alex Z (CLowbrow) wants to merge 2 commits intomainfrom
alex/autoeval-thread
Closed

[WIP] Get autoevals to work with trace scoring#173
Alex Z (CLowbrow) wants to merge 2 commits intomainfrom
alex/autoeval-thread

Conversation

@CLowbrow
Copy link
Contributor

No description provided.

@github-actions
Copy link

github-actions bot commented Feb 18, 2026

Braintrust eval report

Autoevals (alex/autoeval-thread-1771457276)

Score Average Improvements Regressions
NumericDiff 74.7% (+1pp) 5 🟢 7 🔴
Time_to_first_token 2.52tok (+1.09tok) - 119 🔴
Llm_calls 1.55 (+0) - -
Tool_calls 0 (+0) - -
Errors 0 (+0) - -
Llm_errors 0 (+0) - -
Tool_errors 0 (+0) - -
Prompt_tokens 279.25tok (+0tok) - -
Prompt_cached_tokens 0tok (+0tok) - -
Prompt_cache_creation_tokens 0tok (+0tok) - -
Completion_tokens 18.48tok (-0.12tok) 16 🟢 16 🔴
Completion_reasoning_tokens 0tok (+0tok) - -
Total_tokens 297.73tok (-0.12tok) 16 🟢 16 🔴
Estimated_cost 0$ (+0$) - 119 🔴
Duration 3.62s (+0.12s) 60 🟢 159 🔴
Llm_duration 3.87s (+0.97s) 5 🟢 114 🔴

1 similar comment
@github-actions
Copy link

Braintrust eval report

Autoevals (alex/autoeval-thread-1771457276)

Score Average Improvements Regressions
NumericDiff 74.7% (+1pp) 5 🟢 7 🔴
Time_to_first_token 2.52tok (+1.09tok) - 119 🔴
Llm_calls 1.55 (+0) - -
Tool_calls 0 (+0) - -
Errors 0 (+0) - -
Llm_errors 0 (+0) - -
Tool_errors 0 (+0) - -
Prompt_tokens 279.25tok (+0tok) - -
Prompt_cached_tokens 0tok (+0tok) - -
Prompt_cache_creation_tokens 0tok (+0tok) - -
Completion_tokens 18.48tok (-0.12tok) 16 🟢 16 🔴
Completion_reasoning_tokens 0tok (+0tok) - -
Total_tokens 297.73tok (-0.12tok) 16 🟢 16 🔴
Estimated_cost 0$ (+0$) - 119 🔴
Duration 3.62s (+0.12s) 60 🟢 159 🔴
Llm_duration 3.87s (+0.97s) 5 🟢 114 🔴

@CLowbrow Alex Z (CLowbrow) marked this pull request as draft February 18, 2026 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant