Skip to content

handsomewhy/VIGA

Repository files navigation

VIGA:Visual-Text Interaction With Guided Attention Model For Multimodal Zero-shot Anomaly Detection

Introduction

Zero-shot anomaly detection (ZSAD) methods can effectively address data collecting difficulty and scarcity in industrial scenarios.However single modal detection is not comprehensive, as it fails to capture complementary information across different modalities. Hence, we propose Visual-Text Interaction with Guided Attention model (VIGA), a multimodal zero-shot anomaly detection(MM-ZSAD) method, which identifies anomalies with diverse data sources. In this framework, VIGA introduces Tripartite Interactive Prompt (TIP) module that reduces redundancy and enables adaptive alignment of multi-view and multimodal features. Meanwhile, we facilitate the interaction between global and local visual features and respective textual prompts, thereby further refining the alignment between vision and language. To meet the challenge of attention dispersion inherent in unconstrained learning, we propose Mask Guided Attention Shaping (MGAS) strategy which incorporates prior semantic knowledge to provide explicit guidance and enhancemodel focus. VIGA achieves state-of-the-art performance on the MM-ZSADtask across the MVTec3D-AD and Eyecandies datasets, revealing its superiority in detecting unseen object categories. visualization

Motivation

analysis

Framework of VIGA

overview

How to Run

Prepare your dataset

Download the dataset below:

We prepare the rendering images of MVTecAD-3D, Eyecandies following the method proposed in PointAD.

Dataset Originial version Rendering version (BaiDu Disk) Rendering version (Google Driver)
MVTec3D-AD Ori BaiDu Disk [Google Driver]
Eyecandies Ori BaiDu Disk [Google Driver]

Generate the dataset JSON

Take MVTec3D-AD for example (With multiple anomaly categories)

Structure of MVTec Folder:

mvtec3d-ad/
│
│
├── bagel/
│   ├── test/
│   │   ├── combined/
│   │   |   └── 2d_3d_cor    # point-to-pixel correspondence
|   |   |   |   └── 000
|   |   |   |   └── 001
|   |   |   |   └── ...
|   |   |   └── 2d_gt        # generated 2D ground truth
|   |   |   └── 2d_rendering # generated 2D renderings
|   |   |   └── gt           # 3D ground truth (png format)
|   |   |   └── gt_pcd       # 3D ground truth (pcd format)
|   |   |   └── pcd          # 3D point cloud (pcd format)
|   |   |   └── rgb          # RGB information (pcd format)
|   |   |   └── xyz          # 3D point cloud (tiff format)
│   |   |
│   |   └── crack/
│   |        └── ...
│   └── ...
|   
│     
│   
└── ...

(Optional) We also provide the rendering script here if you want to render point clouds into your customized 2D renderings.

Generate the class-specific JSON for training, and the JSON of all classes for testing. The JSON can be found in the corresponding dataset folder.

cd generate_dataset_json
python mvtec_3d_anomaly_mvtect_3d_ad_whole.py

Run VIGA

  • Quick start (one_vs_rest)
bash test.sh
  • Quick start (cross_dataset)
bash test_cross_dataset.sh

Main results

visualization

We evaluate VIGA in two zero-shot settings:

(1) One-vs-Rest

We train VIGA on a single class from the dataset and test its performance on the remaining classes. To ensure completeness of the result, we train VIGA three times using three distinct classes and report the averaged detection and segmentation performance.

industrial

(2) Cross-Dataset:

We train VIGA on one class on one class and test its performance on a completely different dataset with no overlap in class semantics.

industrial

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors