Releases: OpenMLRL/CoMLRL
v1.3.6
This release (v1.3.6) provides the code used in Learning Decentralized LLM Collaboration with Multi-Agent Actor Critic.
See CoMLRL GitHub and Docs for more details about the latest version.
Core Features
- MAGRPO, IAC, MAAC Trainers for Decentralized LLM Collaboration
- Environments including Writing, Coding, and Minecraft Collaboration.
Changelog
- #60 Fixed critical error in hetero model loading, and allows more flexible critics and agents (see [docs/model-loading](https://openmlrl.github.io/CoMLRL/docs/user-guide/model-loading/)).
- Change downstreaming repos' interfaces accordingly (align them into v1.3.6) and polish docs.
v.1.3.5
Changelog
- Add unit tests for hyperparameter constraints.
- Clean legacy interfaces.
v.1.3.4
Changelog
- Fix the bug of loading heterogeneous models and reform the loading logics.
- Enable MBGD in MAGRPO to align with MAAC and IAC.
- Remove redundant and legacy hyperparameters (e.g., model kwargs, patching hyperparameters).
- Clean multi-device legacy, like drop last and num_workers.
- Add unit tests for model loading and separate it from CI as a badge.
- Clean short functions.
- Reorganize the docs and align the parameters.
v.1.3.3
Changelog
- Compact MAREINFORCETrainer derivation, and move to the new folder.
- Unify the interface for different trainers.
- Remove redundant patches and wrappers.
- Reorganize the variables in the config yamls.
v.1.3.2
Warning
Deprecated: A new version supports more flexible interfaces for heterogeneous LLMs loading is provided in v1.3.6.
This release (v1.3.2) provides the code used in Learning Decentralized LLM Collaboration with Multi-Agent Actor Critic.
See CoMLRL GitHub and Docs for more details about the latest version.
Core Features
- MAGRPO, IAC, MAAC Trainers (off-policy) for Decentralized LLM Collaboration
- Environments including Writing, Coding, and Minecraft Collaboration.
Changelog
- Fixed wandb logging issue in MAGRPOTrainer.
- Align all environment repos with version 1.3.2.
v1.3.1
Changelog
- Allow batch training in MAGRPOTrainer, IACTrainer and MAACTrainer
- Allow multi-turn training in IACTrainer and MAACTrainer
- Change the x-axis from data_step to env_step
- Pair with LLM_Collab_Code_Generation v1.3.1

v1.3.0
Changelog
Use TD loss for Critic update
v1.2.9
v1.2.8
v1.2.7
Changelog:
Change IPPO to be IAC, since it's on-policy.




