📊 Synthetic Concept Dataset for Concept-Based XAI

This repository provides a dataset of synthetic visual concepts generated using zero-shot Text-to-Image (T2I) models, designed to support research in concept-based Explainable Artificial Intelligence (XAI).

🧠 Motivation

Concept-based XAI methods aim to interpret deep learning models through human-understandable visual concepts (e.g., textures, object parts). However, these approaches typically rely on large, manually curated datasets, which limits scalability.

To address this, we explore the use of synthetic concept datasets generated via T2I models as a scalable alternative.

🗂️ Dataset Overview

The dataset contains:

🏷️ Real concept images gathered from various datasets and search engines
🎨 Synthetic concept images generated from predefined textual prompts
🔁 Multiple samples per concept to enable variability analysis

Each concept is designed to approximate a human-interpretable visual feature, such as:

textures (e.g., striped, dotted)
object parts (e.g., wings, wheels)
materials or patterns

📂 Project Tree

The root of the repository is concepts/.
After the helper script analysis.py, every concept and T2I model gets its own dedicated folder:

concepts/
│
├── analysis.py        # Helper script for the stats
│
├── asparagus/
│   ├── asparagus/       # Real asparagus images
│   ├── asparagus_flux/  # Asparagus concept generated by Flux 1.1
│   ├── asparagus_gpti1/ # Asparagus concept generated by GPT‑Image 1
│   └── asparagus_sd35/  # Asparagus concept generated by Stable Diffusion 3.5
│
├── ...                # Other concepts follow the same pattern

🔬 Use Cases

This dataset is intended for:

Evaluating concept-based XAI methods
Studying representation similarity between synthetic and real concepts
Testing intra-concept consistency across generated samples
Supporting downstream explanation tasks
Analyzing the effect of concept removal on model explanations

📈 Evaluation Protocols (from the paper)

The dataset supports four key analyses:

Concept Representation Similarity - Compare embeddings of synthetic vs. real concept images
Intra-Concept Similarity - Measure consistency across subsets of the same concept
Downstream Explanation Performance - Evaluate usefulness in explaining class predictions
Concept Removal Impact - Assess how removing a concept affects explanation behavior

⚠️ Limitations

While synthetic data offers scalability, this dataset highlights some challenges:

❗ Potential mismatch between synthetic and real-world concepts
🤖 Biases introduced by the generative model

These limitations should be carefully considered when using synthetic data for interpretability.

🚀 Getting Started

git clone https://github.com/DataSciencePolimi/ZeroShot-T2I-Concepts.git
cd ZeroShot-T2I-Concepts

After downloading or cloning the repo, you can run the bundled script to analyze the dataset:

python analysis.py

Explore the dataset structure and integrate it into your XAI pipelines.

📄 Citation

If you use this dataset, please cite:

@InProceedings{ZeroShot-T2I-Concepts,
    author    = {Astolfi, Giacomo and Bianchi, Matteo and Campi, Riccardo and De Santis, Antonio and Brambilla, Marco},
    title     = {A Framework for Evaluating Zero-Shot Image Generation in Concept-based Explainability},
    booktitle = {2026 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year      = {2026}
}

Full author list (equal contribution noted):
Giacomo Astolfi*, Matteo Bianchi*, Riccardo Campi*, Antonio De Santis, Marco Brambilla

🤝 Contributions

Contributions, issues, and discussions are welcome! Feel free to open a PR or start a discussion.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
concepts		concepts
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📊 Synthetic Concept Dataset for Concept-Based XAI

🧠 Motivation

🗂️ Dataset Overview

📂 Project Tree

🔬 Use Cases

📈 Evaluation Protocols (from the paper)

⚠️ Limitations

🚀 Getting Started

📄 Citation

🤝 Contributions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📊 Synthetic Concept Dataset for Concept-Based XAI

🧠 Motivation

🗂️ Dataset Overview

📂 Project Tree

🔬 Use Cases

📈 Evaluation Protocols (from the paper)

⚠️ Limitations

🚀 Getting Started

📄 Citation

🤝 Contributions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages