13 Feb 22:47

v1.3.6 Latest

Latest

This release (v1.3.6) provides the code used in Learning Decentralized LLM Collaboration with Multi-Agent Actor Critic.
See CoMLRL GitHub and Docs for more details about the latest version.

Core Features

MAGRPO, IAC, MAAC Trainers for Decentralized LLM Collaboration
Environments including Writing, Coding, and Minecraft Collaboration.

Changelog

- #60 Fixed critical error in hetero model loading, and allows more flexible critics and agents (see [docs/model-loading](https://openmlrl.github.io/CoMLRL/docs/user-guide/model-loading/)).
- Change downstreaming repos' interfaces accordingly (align them into v1.3.6) and polish docs.

Assets 2

07 Feb 22:27

v.1.3.5

Changelog

Add unit tests for hyperparameter constraints.
Clean legacy interfaces.

Assets 2

07 Feb 03:34

v.1.3.4

Changelog

Fix the bug of loading heterogeneous models and reform the loading logics.
Enable MBGD in MAGRPO to align with MAAC and IAC.
Remove redundant and legacy hyperparameters (e.g., model kwargs, patching hyperparameters).
Clean multi-device legacy, like drop last and num_workers.
Add unit tests for model loading and separate it from CI as a badge.
Clean short functions.
Reorganize the docs and align the parameters.

Assets 2

05 Feb 22:18

v.1.3.3

Changelog

Compact MAREINFORCETrainer derivation, and move to the new folder.
Unify the interface for different trainers.
Remove redundant patches and wrappers.
Reorganize the variables in the config yamls.

Assets 2

29 Jan 14:45

v.1.3.2

Warning

Deprecated: A new version supports more flexible interfaces for heterogeneous LLMs loading is provided in v1.3.6.

This release (v1.3.2) provides the code used in Learning Decentralized LLM Collaboration with Multi-Agent Actor Critic.
See CoMLRL GitHub and Docs for more details about the latest version.

Core Features

MAGRPO, IAC, MAAC Trainers (off-policy) for Decentralized LLM Collaboration
Environments including Writing, Coding, and Minecraft Collaboration.

Changelog

Fixed wandb logging issue in MAGRPOTrainer.
Align all environment repos with version 1.3.2.

Assets 2

30 Dec 18:20

v1.3.1

Changelog

Allow batch training in MAGRPOTrainer, IACTrainer and MAACTrainer
Allow multi-turn training in IACTrainer and MAACTrainer
Change the x-axis from data_step to env_step
Pair with LLM_Collab_Code_Generation v1.3.1

Assets 2

20 Dec 03:01

v1.3.0

Changelog

Use TD loss for Critic update

Assets 2

01 Dec 20:05

v1.2.9

Changelog

Add MAAC for single-turn training.

Assets 2

29 Nov 16:14

v1.2.8

Changelog

Make IAC's estimation for V rather than Q.

Assets 2

22 Nov 02:45

v1.2.7

Changelog:

Change IPPO to be IAC, since it's on-policy.

Assets 2