CollabSim: A CSCW-Grounded Methodology for Investigating Collaborative Competence of LLM Agents through Controlled Multi-Agent Experiments
Jiaju Chen, Bo Sun, Yuxuan Lu, Yun Wang, Dakuo Wang, Bingsheng Yao
CollabSim is a configurable simulation framework for systematically assessing LLM agents' collaborative competence under controlled interaction conditions. It draws on experimental paradigms from Computer-Supported Cooperative Work (CSCW) research on distributed human teams, where collaborators coordinate through structured text-based channels without co-presence.
Researchers can vary task constraints such as communication bandwidth, information visibility, and team size, so that the effects of specific interaction conditions on collaborative behavior can be examined in isolation. CollabSim also includes a probing module that elicits each agent's reported mental-model awareness of the task state, partner intentions, and own reasoning after actions, enabling analysis of internal collaborative states beyond observable behavior.
CollabSim implements four CSCW-inspired tasks that reflect core process-level coordination challenges:
| Task | CSCW paradigm | Challenge |
|---|---|---|
| Shape Factory | Resource coordination (Bos et al., 2004) | Coordinating production and trade under cost asymmetry |
| DayTrader | Social-dilemma negotiation (Bos et al., 2002) | Balancing individual and collective incentives |
| Hidden Profile | Information pooling (Stasser & Titus, 1985) | Integrating distributed private information |
| Map Task | Referential grounding (Anderson et al., 1991) | Establishing shared spatial understanding through language alone |
Each task is fully configurable via YAML. Example configs live in configs/, and paper study conditions are in configs/study_conditions/.
Pre-configured condition sweeps vary one interaction dimension at a time:
| Task | Conditions |
|---|---|
| Shape Factory | baseline, bandwidth_1_msg_per_sim_min, awareness_dashboard, group_size_6, group_size_8, group_size_10 |
| DayTrader | baseline, bandwidth_1_msg_per_5_actions, group_size_6, group_size_9 |
| Hidden Profile | baseline, bandwidth_max_words_5 |
| Map Task | baseline, bandwidth_max_words_5, canvas_visibility |
Condition names map directly to YAML files under configs/study_conditions/<task>/.
This project uses uv for dependency management.
git clone https://github.com/neuhai/CollabSim.git
cd CollabSim
uv syncOptional dev dependencies (includes pytest):
uv sync --group devRun tests:
uv run pytestCopy the example environment file and fill in your credentials:
cp .env.example .envNever commit .env or API keys. The repo .gitignore already excludes .env.
Set these environment variables to control the LLM for all agents without editing each YAML:
| Variable | Description | Example |
|---|---|---|
COLLABSIM_MODEL_PROVIDER |
Agent backend | litellm, openai, azure, sglang |
COLLABSIM_MODEL_NAME |
Model or deployment id | gpt-4o |
COLLABSIM_MODEL_TEMPERATURE |
Sampling temperature (optional) | 0.0 |
CLI flags (--model-provider, --model-name, --model-temperature) override the corresponding env var for that run only.
export COLLABSIM_MODEL_PROVIDER=litellm
export COLLABSIM_MODEL_NAME=gpt-4o
uv run python -m src.cli configs/study_conditions/shapefactory/baseline.yml --print-actionsLiteLLM (recommended default — supports OpenAI, Anthropic, Azure, Bedrock, and OpenAI-compatible endpoints):
export OPENAI_API_KEY=your-key-here
# Optional proxy:
# export LITELLM_API_BASE=http://127.0.0.1:4000
# export LITELLM_API_KEY=your-proxy-keySee configs/litellm_example.yml.
OpenAI:
export OPENAI_API_KEY=your-key-hereAzure OpenAI:
export AZURE_OPENAI_API_KEY=your-key-here
export AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
export AZURE_OPENAI_DEPLOYMENT=your-deployment-name # optional if set in YAML model.name
export AZURE_OPENAI_API_VERSION=2024-02-15-preview # optionalSee configs/hidden_profile_azure.yml.
SGLang (local open-source models):
# Terminal 1 — start inference server
uv run python scripts/start_sglang_server.py meta-llama/Llama-3.1-8B-Instruct --tp 1
# Terminal 2 — run experiment
export SGLANG_HOST=127.0.0.1
export SGLANG_PORT=30000
export COLLABSIM_MODEL_PROVIDER=sglang
export COLLABSIM_MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct
uv run python -m src.cli configs/sglang_example.yml --print-actionsSee configs/sglang_example.yml for a full example.
uv run python -m src.cli configs/study_conditions/daytrader/baseline.yml \
--run-id my_run \
--print-actionsUseful flags:
--validate-only— check config without running--max-steps N— cap simulation length (useful for smoke tests)--output-dir PATH— override log output directory--collaboration— append collaboration-priming instructions (prompts/collaboration_module.md) to each agent's initial prompt--wandb— enable Weights & Biases logging (optional; requireswandb login)
uv run python -m src.cli configs/study_conditions/hidden_profile/baseline.yml \
--max-steps 10 --print-actionsRun all conditions for one task, or all four tasks:
# All four tasks
./configs/study_conditions/run_task_batch.sh all
# One task
./configs/study_conditions/run_task_batch.sh shapefactory
# Smoke mode (10 steps per condition)
./configs/study_conditions/run_task_batch.sh all smoke
# With collaboration priming (results saved to <condition>_collab folders)
./configs/study_conditions/run_task_batch.sh all --collaborationBatch runner options:
| Flag | Description |
|---|---|
--jobs N |
Parallel conditions (default: all pending) |
--force |
Re-run even if results already exist |
--retry-failed |
Re-run incomplete conditions |
--list-failed |
List incomplete conditions without running |
--conditions a,b |
Run only named conditions |
--no-wandb-upload |
Skip W&B artifact upload after batch |
Results are written under experiments/study_conditions/. When W&B is enabled, set WANDB_PROJECT (default: collabsim).
Launch a minimal Flask UI for uploading configs and inspecting runs:
uv run flask --app src.gui.app run --debugRuns are saved to experiments/gui_runs/.
Each run produces structured logs under the configured logging.output_dir:
events.jsonl— full event trace (actions, messages, state changes)actions.jsonl— agent action recordsprobes.jsonl— mental-state probe responsesmetrics.json— task outcome metricsrun_manifest.json— experiment metadata and config snapshot
Analyze one or more run directories:
# Single run
uv run python -m analysis.report --run experiments/study_conditions/daytrader/baseline
# All runs under a folder
uv run python -m analysis.report --runs experiments/study_conditions/daytraderOutputs CSVs to analysis_output/ by default: task_metrics.csv, probe_metrics.csv, combined.csv.
Additional plotting scripts are in analysis/ (e.g. plot_probe_confidence_curves.py, plot_persona_heatmaps.py).
CollabSim is designed to be extended through YAML configuration and pluggable components.
Edit controls in a config YAML to vary collaboration constraints:
- Communication bandwidth —
controls.communication.max_messages_per_turn,max_message_words,min_sim_interval_between_communicate_sec - Communication mode —
controls.communication.mode:broadcastordirect - Information visibility —
controls.information_distribution,controls.visibility_defaults,controls.visibility_map - Team size — add or remove entries under
agents
See docs/config_fields.md and docs/config_schema.md for the full field reference.
Prompt templates live in prompts/. Override paths per experiment under prompts: in the YAML (e.g. task, action, probe, persona_profiles). Persona profiles are defined in prompts/persona_profiles.json.
Configure probes under probe: in the YAML:
probe:
cadence: per_action # or per_turn, per_agent_n_actions, on_event
templates: ["grounding_v1", "coordination_v1"]
questions_path: prompts/interview_questions_hidden_profile.jsonTemplate definitions are in configs/probe_templates.yml. See docs/probe_templates.md and docs/probe_response_schema.md.
Pass --collaboration (or set experiment.collaboration: true) to inject CSCW-grounded collaboration instructions from prompts/collaboration_module.md into each agent's system prompt. This tests whether explicit collaboration guidance improves agent behavior.
- Implement task logic in
src/tasks/<your_task>.py(registerinit_state,step,apply_action). - Register it in
src/tasks/registry.py. - Add task instructions and return-format prompts under
prompts/. - Create a YAML config referencing
task.type: <your_task>.
Reference implementations: src/tasks/shapefactory.py, src/tasks/daytrader.py, src/tasks/hidden_profile.py, src/tasks/maptask.py.
Copy an existing condition YAML in configs/study_conditions/<task>/, modify the varied dimension, and update logging.output_dir. Run it directly or include it in a batch via --conditions.
CollabSim/
├── configs/ # Experiment YAML configs and study conditions
├── prompts/ # Agent prompt templates and persona profiles
├── src/
│ ├── agents/ # LLM agent backends (LiteLLM, OpenAI, Azure, SGLang)
│ ├── controller/ # Simulation controller and stepping logic
│ ├── probe/ # Mental-state probing module
│ ├── tasks/ # Task implementations (Shape Factory, DayTrader, etc.)
│ └── cli.py # CLI entrypoint
├── analysis/ # Post-run metrics and plotting
├── docs/ # Schema and design documentation
└── scripts/ # Utility scripts (SGLang server, persona generation)
Detailed schemas and examples are in docs/:
- Config fields · Config schema · Config examples
- Event schema · Action schema · Metrics
- Probe templates · Agent lifecycle
If you use CollabSim in your research, please cite our paper:
@article{chen2026collabsim,
title = {CollabSim: A CSCW-Grounded Methodology for Investigating Collaborative Competence of LLM Agents through Controlled Multi-Agent Experiments},
author = {Chen, Jiaju and Sun, Bo and Lu, Yuxuan and Wang, Yun and Wang, Dakuo and Yao, Bingsheng},
year = {2026},
eprint = {2606.06399},
archivePrefix = {arXiv},
primaryClass = {cs.HC},
url = {https://arxiv.org/abs/2606.06399}
}This project is licensed under the MIT License. See LICENSE for details.