Releases: parameterlab/MASEval
Releases · parameterlab/MASEval
v0.2.0
[0.2.0] - 2025-12-05
Added
Exceptions and Error Classification
- Added
AgentError,EnvironmentError,UserErrorexception hierarchy inmaseval.core.exceptionsfor classifying execution failures by responsibility (PR: #13) - Added
TaskExecutionStatus.AGENT_ERROR,ENVIRONMENT_ERROR,USER_ERROR,UNKNOWN_EXECUTION_ERRORfor fine-grained error classification enabling fair scoring (PR: #13) - Added validation helpers:
validate_argument_type(),validate_required_arguments(),validate_no_extra_arguments(),validate_arguments_from_schema()for tool implementers (PR: #13) - Added
ToolSimulatorErrorandUserSimulatorErrorexception subclasses for simulator-specific context while inheriting proper classification (PR: #13)
Documentation
- Added Exception Handling guide explaining error classification, fair scoring, and rerunning failed tasks (PR: #13)
Benchmarks
- MACS Benchmark: Multi-Agent Collaboration Scenarios benchmark (PR: #13)
Benchmark
- Added
execution_loop()method toBenchmarkbase class enabling iterative agent-user interaction (PR: #13) - Added
max_invocationsconstructor parameter toBenchmark(default: 1 for backwards compatibility) (PR: #13) - Added abstract
get_model_adapter(model_id, **kwargs)method toBenchmarkbase class as universal model factory to be used throughout the benchmarks. (PR: #13)
User
- Added
max_turnsandstop_tokenparameters toUserbase class for multi-turn support with early stopping. Same applied toUserLLMSimulator. (PR: #13) - Added
is_done(),_check_stop_token(), andincrement_turn()methods toUserbase class (PR: #13) - Added
get_initial_query()method toUserbase class for LLM-generated initial messages (PR: #13) - Added
initial_queryparameter inUserbase class to trigger the agentic system. (PR: #13)
Environment
- Added
Environment.get_tool(name)method for single-tool lookup (PR: #13)
Interface
- LlamaIndex integration:
LlamaIndexAgentAdapterandLlamaIndexUserfor evaluating LlamaIndex workflow-based agents (PR: #7) - The
logsproperty insideSmolAgentAdapterandLanggraphAgentAdapterare now properly filled. (PR: #3)
Examples
- Added a new example: The
5_a_day_benchmark(PR: #10)
Changed
Exception Handling
- Benchmark now classifies execution errors into
AGENT_ERROR(agent's fault),ENVIRONMENT_ERROR(tool/infra failure),USER_ERROR(user simulator failure), orUNKNOWN_EXECUTION_ERROR(unclassified) instead of genericTASK_EXECUTION_FAILED(PR: #13) ToolLLMSimulatornow raisesToolSimulatorError(classified asENVIRONMENT_ERROR) on failure (PR: #13)UserLLMSimulatornow raisesUserSimulatorError(classified asUSER_ERROR) on failure (PR: #13)
Environment
Environment.create_tools()now returnsDict[str, Any]instead oflist(PR: #13)
Benchmark
Benchmark.run_agents()signature changed: addedquery: strparameter (PR: #13)Benchmark.run()now usesexecution_loop()internally to handle agent-user interaction cycles (PR: #13)Benchmarkclass now has afail_on_setup_errorflag that raises errors observed during setup of task (PR: #10)
Callback
FileResultLoggernow acceptspathlib.Pathfor argumentoutput_dirand has anoverwriteargument to prevent overwriting of existing logs files.
Evaluator
- The
Evaluatorclass now has afilter_tracesbase method to conveniently adapt the same evaluator to different entities in the traces (PR: #10).
Simulator
- The
LLMSimulatornow throws an exception when json cannot be decoded instead of returning the error message as text to the agent (PR: #13).
Other
- Documentation formatting improved. Added darkmode and links to
Github(PR: #11). - Improved Quick Start Guide in
docs/getting-started/quickstart.md. (PR: #10) maseval.interface.agentsstructure changed. Tools requiring framework imports (beyond just typing) now in<framework>_optional.pyand imported dynamically from<framework>.py. (PR: #12)- Various formatting improvements in the documentation (PR: #12)
- Added documentation for View Source Code pattern in
CONTRIBUTING.mdand_optional.pypattern in interface README (PR: #12)
Fixed
Interface
LlamaIndexAgentAdapternow supports multiple LlamaIndex agent types includingReActAgent(workflow-based),FunctionAgent, and legacy agents by checking for.chat(),.query(), and.run()methods in priority order (PR: #10)
Other
- Consistent naming of agent
adapteroverwrapper(PR: #3) - Fixed an issue that
LiteLLMinterface andMixins were not shown in documentation properly (#PR: 12)
Removed
- Removed
set_message_history,append_message_historyandclear_message_historyforAgentAdapterand subclasses. (PR: #3)
v0.1.2
Full Changelog: v0.1.1...v0.1.2
Initial Release
This is the initial code release. Library under active development. API might change anytime.
v0.1.0-alpha
fixed email