We introduce Generative Universal Verifier, a novel concept and plugin designed for next-generation multimodal reasoning in vision-language models and unified multimodal models, providing the fundamental capability of reflection and refinement on visual outcomes during the reasoning and generation process.
- ViVerBench: a comprehensive benchmark spanning 16 categories of critical tasks for evaluating visual outcomes in multimodal reasoning.
- OmniVerifier-7B: Trained on large-scale visual verification data, the first omni-capable generative verifier trained for universal visual verification and achieves notable gains on ViVerBench(+8.3).
- OmniVerifier-TTS, a sequential test-time scaling paradigm that leverages the universal verifier to bridge image generation and editing within unified models, enhancing the upper bound of generative ability through iterative fine-grained optimization.
OmniVerifier advances both reliable reflection during generation and scalable test-time refinement, marking a step toward more trustworthy and controllable next-generation reasoning systems.
[2025.11] Inference code of two automated pipelines for visual verifier data construction are released.
[2025.10] Inference code of Sequential OmniVerifier-TTS (based on Qwen-Image) is released.
[2025.10] Evaluation code of ViVerBench is released.
[2025.10] Training code of OmniVerifier is released.
- Two automated data construction pipelines
- Sequential OmniVerifier-TTS on different backbones
- Parallel OmniVerifier-TTS
git clone https://github.com/Cominclip/OmniVerifier.git
cd OmniVerifier
pip install -e .Use the following command to test OmniVerifier-7B on a generated image:
python inference.pyPlease modify image_path and prompt to your own setting.
The model will output both an answer and an explanation indicating whether the image is strictly aligned with the given prompt.
We provide two evaluation approaches: rule-based and model-based. As a first step, store the model outputs in a JSON file such as your_model.json.
For rule-based evaluation:
python viverbench_eval_rule_based.py --model_response your_model.jsonFor model-based evaluation, we use GPT-4.1 as the judge model:
python viverbench_eval_model_based.py --model_response your_model.jsonWe apply DAPO to directly train Qwen2.5VL-7B without cold start:
bash examples/qwen2_5_vl_7b_dapo.shAfter training, you should merge checkpoint in Hugging Face format:
python3 scripts/model_merger.py --local_dir checkpoints/omniverifier/exp_name/global_step_1/actorWe provide the code for sequential Omniverifier-TTS using Qwen-Image. You should first generate the step0 image and use this script for iteratively self-refine:
python sequential_omniverifier_tts.py@article{zhang2025generative,
title={Generative Universal Verifier as Multimodal Meta-Reasoner},
author={Zhang, Xinchen and Zhang, Xiaoying and Wu, Youbin and Cao, Yanbin and Zhang, Renrui and Chu, Ruihang and Yang, Ling and Yang, Yujiu},
journal={arXiv preprint arXiv:2510.13804},
year={2025}
}
OmniVerifier is builded upon several solid works. Thanks to EasyR1 and veRL for their wonderful work and codebase!