Skip to content

JiHoonLee9898/RVCD

Repository files navigation

This repository contains code for our ACL 2025 findings accepted paper:

arXiv Retrieval Visual Contrastive Decoding to Mitigate Object Hallucinations in Large Vision-Language Models

main

Glimpse

If your goal here is to briefly verify the implementation of RVCD's overall concept, observe the following file:
/MAIN_CODES/rvcd_generation_chair_bleu.py

Install

git clone https://github.com/JiHoonLee9898/RVCD.git
cd RVCD
conda env create -f environment.yml
conda activate RVCD
cd MAIN_CODES

LVLM backbones setting

Download https://huggingface.co/liuhaotian/llava-v1.5-7b and specify it on line 14 of the following files:

MAIN_CODES/eval_configs/prior_decoding_yamls/not_rvcd_llava.yaml
MAIN_CODES/eval_configs/llava-1.5_eval.yaml

Download https://huggingface.co/Vision-CAIR/vicuna-7b and specify it on line 18 of the following files:

MAIN_CODES/minigpt4/configs/models/minigpt4_vicuna0.yaml
MAIN_CODES/minigpt4/configs/models/not_rvcd_minigpt4_vicuna0.yaml

Download https://huggingface.co/MAGAer13/mplug-owl2-llama2-7b and specify it on line 14 of the following files:

MAIN_CODES/minigpt4/configs/models/mplug_owl2_llama2.yaml
MAIN_CODES/minigpt4/configs/models/not_rvcd_mplug_owl2_llama2.yaml
MAIN_CODES/eval_configs/mplug-owl2_eval.yaml
MAIN_CODES/eval_configs/prior_decoding_yamls/not_rvcd_mplug_owl2.yaml

Specify the absolute path to RVCD/MAIN_CODES/prerained_minigpt4_7b.pth on line 8 of the following files:

MAIN_CODES/eval_configs/minigpt4_eval.yaml
MAIN_CODES/eval_configs/prior_decoding_yamls/not_rvcd_mini_gpt4_vicuna0.yaml

DINO for HALC

export CUDA_HOME=$CONDA_PREFIX
cd decoder_zoo/GroundingDINO
pip install -e .
cd ../..

Save the file downloaded from https://drive.google.com/drive/folders/1UaMJga-BKju88CXAdonbiQujBKkdcVGX to:

MAIN_CODES/decoder_zoo/GroundingDINO/weights/groundingdino_swint_ogc.pth

Arguments

Refer to the example in RVCD/MAIN_CODES/run_example.sh. Each block in the file (a total of 6) generates output captions for evaluating CHAIR/BLEU, POPE, and MME for RVCD and prior methods. Please note that this is just an example; in practice, you need to provide the absolute paths specific to your environment for each argument.

RVCD Arguments detail

Argument Default Description
--model llava-1.5 Model to use. Options: [llava-1.5, minigpt4, mplug-owl2].

--ref_folder_path
The absolute path to:

RVCD/DB_single_concept_images_flux_generated/generated_images

--data_path
The absolute path to:

/coco2014 

--data_path is [COCO_DIR].

Note that [COCO_DIR] is expected to contain both images and annotation files within the annotations subfolder. In other words, [COCO_DIR] must follow the structure:

COCO_DIR (val2014 for example)
├── annotations
│   ├── captions_val2014.json
│   ├── instances_train2014.json
│   ├── instances_val2014.json
│   ├── person_keypoints_train2014.json
│   ├── person_keypoints_val2014.json
├── COCO_val2014_000000000042.jpg
├── COCO_val2014_000000000073.jpg
...

--coco_path
For required case (rvcd for mme), likewise, the absolute path to /coco2014 ([COCO_DIR]).

--chair_cache_path
The absolute path to:

RVCD/MAIN_CODES/eval/CHAIR_CACHE/chair.pkl

--yolo_version
The default detector is 'yolov8x.pt' from ultralytics(https://github.com/ultralytics).

Argument Default Description
--rvcd_alpha 1 Negative logits regulatory parameter. default: 1
--rvcd_beta 0.1 Positive logits recovery parameter. default: 0.1

Prior decoding methods Arguments detail

Argument Description
--model Choose one from: [not_rvcd_llava, not_rvcd_mini_gpt4, not_rvcd_mplug_owl2].
-d Choose one from: [greedy, dola, halc, opera, vcd, beam].

--data_paths Note the difference from --data_path. Provide the absolute path to the MME benchmark dataset downloaded from (https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation).

HALC Additional Arguments

Argument Default Description
--k-candidate-num 4 Number of generative focal fields for local search.
--expand-ratio 0.6 The growing factor of focal fields.
--detector dino Detector to use. Options: [dino, owlv2].
--box_threshold 0.4 The threshold for bounding box in GroundingDino.

OPERA Additional Arguments

Argument Default Description
--scale_factor 50 The scale factor to scale up the self-attention weights.
--threshold 15 The threshold for attending retrospection.
--num_attn_candidates 5 The number of candidates per beam.
--penalty_weights 1 The weight of penalty term in decoding.
--beam 3 beam size.

VCD Additional Arguments

Argument Default Description
--cd-alpha 1 Amplification factor.
--cd-beta 0.1 Truncation factor for adaptive plausibility constraint.
--noise-step 500 Number of steps to add diffusion noise.

EVALUATION

Refer to the specific examples in RVCD/MAIN_CODES/run_example.sh. Each block within the file (a total of 6) generates output captions for evaluating CHAIR/BLEU, POPE, and MME for both RVCD and prior methods. The evaluation assumes the existence of the generated captions.

CHAIR/BLEU EVALUATION

First, run the following command to check the path of your activated RVCD virtual environment:

echo $CONDA_PREFIX

Then, navigate to the following path by appending the output path with:

/lib/python3.9/site-packages/pycocoevalcap/eval.py

Once you have opened eval.py, modify lines 40~46 as follows. This modification is intended to evaluate only Chair and BLEU, reducing the evaluation time:

scorers = [
            (Bleu(4), ["Bleu_1", "Bleu_2", "Bleu_3", "Bleu_4"]),
            # (Meteor(),"METEOR"),
            # (Rouge(), "ROUGE_L"),
            # (Cider(), "CIDEr"),
            # (Spice(), "SPICE")
        ]

Place the generated CHAIR/BLEU caption JSONL file under the [absolute path of eval/test_folder]. Then, run the following command to generate the _chair.json file.

python eval/caption_to_chair2.py --gt-caption-path [absolute path to coco2014/annotations/captions_val2014.json] -c [absolute path of eval/test_folder]

Let the path to the generated _chair.json file be referred to as [chair_path].

Finally, perform the evaluation by running:

python eval/eval_hallucination.py -v --metric chair --chair_input_path `[chair_path]`

POPE EVALUATION

The POPE evaluation results are saved alongside the path where the POPE captions are generated.

MME EVALUATION

Let the folder location where the generated MME captions are stored be referred to as [mme_path]. Run the following command to perform the evaluation:

python eval/mme_tool/calculation.py --results_dir [absolute path to MAIN_CODES/eval/mme_tool/my_final_results] --captions_dir [mme_path]

License

This repository is under BSD 3-Clause License(https://github.com/salesforce/LAVIS/blob/main/LICENSE.txt). Many codes are based on Lavis(https://github.com/salesforce/LAVIS), and https://github.com/BillChan226/HALC.

About

[ACL findings 2025] "Retrieval Visual Contrastive Decoding to Mitigate Object Hallucinations in Large Vision-Language Models"

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors