Skip to content

Conversation

@cemde
Copy link
Collaborator

@cemde cemde commented Dec 22, 2025

Description

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Code quality improvement (refactoring, formatting, etc.)

Checklist

Contribution

Documentation

  • Added/updated docstrings for new/modified functions as instructed CONTRIBUTING.md
  • Updated relevant documentation in docs/ (if applicable)
  • Tag github issue with this PR (if applicable)

Changelog

  • Added entry to CHANGELOG.md under [Unreleased] section
    • Use Added section for new features
    • Use Changed section for modifications to existing functionality
    • Use Fixed section for bug fixes
    • Use Removed section for deprecated/removed features
  • OR this is a documentation-only change (no changelog needed)

Example:
- Support for multi-agent tracing (PR:#123)

Architecture (if applicable)

  • Core/Interface separation: Changes in maseval/core/ do NOT import from maseval/interface/
  • Dependencies: New core dependencies added sparingly; framework integrations go to optional dependencies

Additional Notes

@github-actions
Copy link

github-actions bot commented Dec 22, 2025

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  maseval/benchmark/macs
  data_loader.py 255, 297, 548-569
  macs.py
  maseval/benchmark/tau2
  data_loader.py 84-85, 100-104, 141, 150, 163, 168, 252, 297, 344, 349, 427, 481
  environment.py 85, 132, 140, 182-191, 260-262, 301, 318-322
  evaluator.py 221, 234-236, 249-250, 273, 284-285, 315, 323, 328, 335-337, 353, 412, 421, 533, 619, 646
  tau2.py 321, 324, 552, 708, 728, 730, 732, 738, 812-815
  utils.py 169-170, 174-179, 204-209
  maseval/benchmark/tau2/domains
  base.py 74, 82, 162-170, 271-275, 287, 328, 333-342
  maseval/benchmark/tau2/domains/airline
  db.py
  models.py
  tools.py 61, 69, 77, 91-94, 108, 112, 138, 150-162, 179, 182, 184, 330-339, 399, 440, 444, 475, 480, 495, 560, 662, 666
  maseval/benchmark/tau2/domains/retail
  models.py
  tools.py 65, 86, 104, 124, 217, 243, 311, 367-368, 416, 426, 430, 442, 542, 551, 555, 566, 577-578, 584, 626, 632-665, 746, 752
  maseval/benchmark/tau2/domains/telecom
  db.py
  models.py
  tools.py 74, 83, 87, 92, 96, 101, 110, 126, 147, 160-180, 229, 255, 277, 501, 563-616, 634-640
  user_models.py 255-257
  user_tools.py 46, 52, 82, 84, 86, 90-93, 120-121, 125, 152, 154, 156, 160, 170-177, 232, 270-272, 333, 427-428, 511, 586, 603, 631, 663-664, 680-688, 702-703, 708-713, 732-740, 752-753, 765-766, 778-779
  maseval/core
  benchmark.py
  simulator.py
  task.py
  user.py 447, 450-455, 465, 539-544
  maseval/core/callbacks
  result_logger.py
  maseval/interface/inference
  anthropic.py
  google_genai.py 178-188, 193
Project Total  

The report is truncated to 25 files out of 31. To see the full report, please visit the workflow summary page.

This report was generated by python-coverage-comment-action

Copy link
Collaborator Author

@cemde cemde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@cemde cemde merged commit 85547ae into main Dec 30, 2025
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants