Skip to content

Cominclip/OmniVerifier

Repository files navigation

Generative Universal Verifier as Multimodal
Meta-Reasoner

CURE Paper on arXiv Coding Datasets on Hugging Face ReasonFlux Coders on Hugging Face

Introduction

We introduce Generative Universal Verifier, a novel concept and plugin designed for next-generation multimodal reasoning in vision-language models and unified multimodal models, providing the fundamental capability of reflection and refinement on visual outcomes during the reasoning and generation process.

  • ViVerBench: a comprehensive benchmark spanning 16 categories of critical tasks for evaluating visual outcomes in multimodal reasoning.
  • OmniVerifier-7B: Trained on large-scale visual verification data, the first omni-capable generative verifier trained for universal visual verification and achieves notable gains on ViVerBench(+8.3).
  • OmniVerifier-TTS, a sequential test-time scaling paradigm that leverages the universal verifier to bridge image generation and editing within unified models, enhancing the upper bound of generative ability through iterative fine-grained optimization.

OmniVerifier advances both reliable reflection during generation and scalable test-time refinement, marking a step toward more trustworthy and controllable next-generation reasoning systems.

New Updates

[2025.11] Inference code of two automated pipelines for visual verifier data construction are released.

[2025.10] Inference code of Sequential OmniVerifier-TTS (based on Qwen-Image) is released.

[2025.10] Evaluation code of ViVerBench is released.

[2025.10] Training code of OmniVerifier is released.

TODO

  • Two automated data construction pipelines
  • Sequential OmniVerifier-TTS on different backbones
  • Parallel OmniVerifier-TTS

Installation

git clone https://github.com/Cominclip/OmniVerifier.git
cd OmniVerifier
pip install -e .

Quick Start: Generated Image Verification

Use the following command to test OmniVerifier-7B on a generated image:

python inference.py

Please modify image_path and prompt to your own setting.

The model will output both an answer and an explanation indicating whether the image is strictly aligned with the given prompt.

Part1: ViVerBench Evaluation

We provide two evaluation approaches: rule-based and model-based. As a first step, store the model outputs in a JSON file such as your_model.json.

For rule-based evaluation:

python viverbench_eval_rule_based.py --model_response your_model.json

For model-based evaluation, we use GPT-4.1 as the judge model:

python viverbench_eval_model_based.py --model_response your_model.json

Part2: OmniVerifier RL Training

We apply DAPO to directly train Qwen2.5VL-7B without cold start:

bash examples/qwen2_5_vl_7b_dapo.sh

After training, you should merge checkpoint in Hugging Face format:

python3 scripts/model_merger.py --local_dir checkpoints/omniverifier/exp_name/global_step_1/actor

Part3: OmniVerifier-TTS

We provide the code for sequential Omniverifier-TTS using Qwen-Image. You should first generate the step0 image and use this script for iteratively self-refine:

python sequential_omniverifier_tts.py

Citation

@article{zhang2025generative,
  title={Generative Universal Verifier as Multimodal Meta-Reasoner},
  author={Zhang, Xinchen and Zhang, Xiaoying and Wu, Youbin and Cao, Yanbin and Zhang, Renrui and Chu, Ruihang and Yang, Ling and Yang, Yujiu},
  journal={arXiv preprint arXiv:2510.13804},
  year={2025}
}

Acknowledgements

OmniVerifier is builded upon several solid works. Thanks to EasyR1 and veRL for their wonderful work and codebase!

About

[ICLR 2026 Oral] Generative Universal Verifier as Multimodal Meta-Reasoner

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors