Can language models truly act like specific humans — not just produce humanlike text, but reflect individual values, opinions, and communication styles? HumanLM tackles this challenge by aligning LMs to internal user states (stances, beliefs) rather than merely imitating surface-level responses.
We provide end-to-end tooling for collecting raw data from six sources and processing them into train/val/test splits with LLM-generated user personas. See humanual_datasets/README.md for full instructions.
The user study interface lets annotators compare their own responses against model-generated ones on Reddit posts.
# Start the required vLLM model servers
vllm serve Qwen/Qwen3-8B --dtype auto --host 0.0.0.0 --port 8000 --tensor-parallel-size 3 --max-model-len 7168
vllm serve snap-stanford/humanlm-opinions --dtype auto --host 0.0.0.0 --port 63456 --tensor-parallel-size 2 --max-model-len 7168
# Launch the Gradio annotation interface
cd user_study
python gradio_app.py # add --debug to skip validation constraintsThe VERL recipe for HumanLM training is maintained as a git submodule at
humanlm_train/verl-recipe-humanlm.
If you cloned this repository without submodules, run:
git submodule update --init --recursiveFor first-time setup, run:
git clone --recurse-submodules https://github.com/zou-group/humanlm.gitThe HumanLM-specific training code and setup instructions are in humanlm_train/verl-recipe-humanlm/humanlm/README.md.
@article{wu2026humanlm,
title={HUMANLM: Simulating Users with State Alignment Beats Response Imitation},
url={https://humanlm.stanford.edu/},
author={Wu, Shirley and Choi, Evelyn and Khatua, Arpandeep and
Wang, Zhanghan and He-Yueya, Joy and Weerasooriya, Tharindu Cyril and
Wei, Wei and Yang, Diyi and Leskovec, Jure and Zou, James},
year={2026}
}