VIGA:Visual-Text Interaction With Guided Attention Model For Multimodal Zero-shot Anomaly Detection

Introduction

Zero-shot anomaly detection (ZSAD) methods can effectively address data collecting difficulty and scarcity in industrial scenarios.However single modal detection is not comprehensive, as it fails to capture complementary information across different modalities. Hence, we propose Visual-Text Interaction with Guided Attention model (VIGA), a multimodal zero-shot anomaly detection(MM-ZSAD) method, which identifies anomalies with diverse data sources. In this framework, VIGA introduces Tripartite Interactive Prompt (TIP) module that reduces redundancy and enables adaptive alignment of multi-view and multimodal features. Meanwhile, we facilitate the interaction between global and local visual features and respective textual prompts, thereby further refining the alignment between vision and language. To meet the challenge of attention dispersion inherent in unconstrained learning, we propose Mask Guided Attention Shaping (MGAS) strategy which incorporates prior semantic knowledge to provide explicit guidance and enhancemodel focus. VIGA achieves state-of-the-art performance on the MM-ZSADtask across the MVTec3D-AD and Eyecandies datasets, revealing its superiority in detecting unseen object categories.

Motivation

Framework of VIGA

How to Run

Prepare your dataset

Download the dataset below:

We prepare the rendering images of MVTecAD-3D, Eyecandies following the method proposed in PointAD.

Dataset	Originial version	Rendering version (BaiDu Disk)	Rendering version (Google Driver)
MVTec3D-AD	Ori	BaiDu Disk	[Google Driver]
Eyecandies	Ori	BaiDu Disk	[Google Driver]

Generate the dataset JSON

Take MVTec3D-AD for example (With multiple anomaly categories)

Structure of MVTec Folder:

mvtec3d-ad/
│
│
├── bagel/
│   ├── test/
│   │   ├── combined/
│   │   |   └── 2d_3d_cor    # point-to-pixel correspondence
|   |   |   |   └── 000
|   |   |   |   └── 001
|   |   |   |   └── ...
|   |   |   └── 2d_gt        # generated 2D ground truth
|   |   |   └── 2d_rendering # generated 2D renderings
|   |   |   └── gt           # 3D ground truth （png format）
|   |   |   └── gt_pcd       # 3D ground truth （pcd format）
|   |   |   └── pcd          # 3D point cloud （pcd format）
|   |   |   └── rgb          # RGB information （pcd format）
|   |   |   └── xyz          # 3D point cloud （tiff format）
│   |   |
│   |   └── crack/
│   |        └── ...
│   └── ...
|   
│     
│   
└── ...

(Optional) We also provide the rendering script here if you want to render point clouds into your customized 2D renderings.

Generate the class-specific JSON for training, and the JSON of all classes for testing. The JSON can be found in the corresponding dataset folder.

cd generate_dataset_json
python mvtec_3d_anomaly_mvtect_3d_ad_whole.py

Run VIGA

Quick start (one_vs_rest)

bash test.sh

Quick start (cross_dataset)

bash test_cross_dataset.sh

Main results

We evaluate VIGA in two zero-shot settings:

(1) One-vs-Rest

We train VIGA on a single class from the dataset and test its performance on the remaining classes. To ensure completeness of the result, we train VIGA three times using three distinct classes and report the averaged detection and segmentation performance.

(2) Cross-Dataset:

We train VIGA on one class on one class and test its performance on a completely different dataset with no overlap in class semantics.

We thank for the code repository: PointAD and AnomalyCLIP.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
AnomalyCLIP_lib		AnomalyCLIP_lib
assets		assets
generate_dataset_json		generate_dataset_json
multi_view		multi_view
.gitignore		.gitignore
README.md		README.md
attention.py		attention.py
dataset.py		dataset.py
logger.py		logger.py
loss.py		loss.py
metrics.py		metrics.py
prompt_ensemble.py		prompt_ensemble.py
test.sh		test.sh
test_best_mm_visual_one_image.py		test_best_mm_visual_one_image.py
test_cross_dataset.sh		test_cross_dataset.sh
test_mm_best.py		test_mm_best.py
utils.py		utils.py
visualization.py		visualization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VIGA:Visual-Text Interaction With Guided Attention Model For Multimodal Zero-shot Anomaly Detection

Introduction

Motivation

Framework of VIGA

How to Run

Prepare your dataset

Generate the dataset JSON

Run VIGA

Main results

We evaluate VIGA in two zero-shot settings:

(1) One-vs-Rest

(2) Cross-Dataset:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VIGA:Visual-Text Interaction With Guided Attention Model For Multimodal Zero-shot Anomaly Detection

Introduction

Motivation

Framework of VIGA

How to Run

Prepare your dataset

Generate the dataset JSON

Run VIGA

Main results

We evaluate VIGA in two zero-shot settings:

(1) One-vs-Rest

(2) Cross-Dataset:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages