Official Implement of uLLSAM, our paper is available at https://arxiv.org/abs/2505.10769.
2025/04/21: First version of uLLSAM has been released, we also make model weights available in README.md. Currently we support InternLM-1.8B.
2025/12/15: We now support Qwen3-2B as LLM.
# clone the repository to your disk
cd ./uLLSAM
conda env create -f environment.yml
conda activate ullsamIf your encounter some unexpected errors, you can also refer to InvernVL and SAM to install your own environment.
Please follow README.md in checkpoints folder.
python app.pyYou can visit the application at localhost:9996 in your browser, chrome is recommended。

If you want to reproduce uLLSAM, just use the ./data/train_seg_all.jsonl to train the model, you need to prepare 9 datasets.
You can refer to torch_em to prepare and download datasets.
bash ./scripts/train_all_joint_v2.shIf you want to finetune your custom data, follow the data structure in ./data/train_seg_all.jsonl
Specifically, each line in jsonl is structured as {"image_path": "...", "conversation": [{"role": "user", "content": "Describe the image in detail\n"}, {"role": "assistant", "content": ""}]}
Our uLLSAM is heavily inspired by many outstanding prior works, including
Thank the authors of above projects for open-sourcing their assets!