Hugging Face Image Classification Project

Custom Vision Transformer (ViT) fine-tuned for your own classes using google/vit-base-patch16-224. A dynamic framework that supports any number of classes—no hardcoded labels.

Overview

This project:

Uses the pre-trained google/vit-base-patch16-224 model
Dynamically infers classes by scanning ./data: each subfolder name becomes a class (e.g. my_cat, my_dog, my_car, …)
Modifies the model from 1000 ImageNet classes to N custom classes (N = number of subfolders)
Trains on your images with augmentation, stratified train/val split, and frozen backbone
Tests with confidence scores, uncertainty detection, and prediction overlays
Provides a Gradio web UI for interactive inference

You can use 5 classes, 10 classes, or any number—just add one folder per class under ./data.

New Features

Data augmentation — Training uses RandomResizedCrop(224), RandomHorizontalFlip, and ColorJitter. Validation uses deterministic Resize(224, 224) only. Transforms are applied in the dataset when loading each image.
Stratified 80/20 train/val split — Uses sklearn.model_selection.train_test_split with stratify=labels so train and validation keep the same class proportions. Split is done on file paths before building datasets.
Confidence scores and uncertainty detection — Inference applies softmax and reports label: XX.X%. When the top confidence is below 90%, the script prints the top 2 classes and their percentages so you can see ambiguity.
Gradio web UI — Run python main.py for a browser interface: upload an image, get a Label with Confidence Score and an Image with prediction overlay. Example images from data/ are preloaded for quick testing.

Model Performance

After auditing the dataset, I removed roughly ~50 noisy images from the my_house and my_phone folders. This cleaning step improved overall validation accuracy from about 51% to about 80%.

With the cleaned dataset, the model’s phone precision reached 1.0, and the workflow transitioned into the Gradio UI for easier experimentation and quick visual checks.

Setup

Install dependencies:

pip install -r requirements.txt

Quick Start

Step 1: Prepare your dataset

Organize images in one folder per class under ./data. Folder names = class names.

data/
  my_cat/
    image1.jpg
    ...
  my_dog/
    ...
  my_car/
  my_house/
  my_phone/

Supported formats: .jpg, .jpeg, .png, .bmp, .gif. Add as many classes as you want.

Step 2: Create the custom model

This scans ./data and builds a model with one output per class:

python model_custom.py

Creates ./custom_vit_model with N classes (N = number of subfolders in ./data). No code change needed when you add or remove classes.

Step 3: Train

python train.py --data_dir ./data --epochs 5 --batch_size 8

Options: --data_dir, --model_path, --output_dir, --epochs, --batch_size, --learning_rate. Training uses an 80% train / 20% validation stratified split and reports validation accuracy and per-class metrics at the end of each epoch.

Step 4: Test (CLI)

Single image (prints confidence, top-2 if uncertain, saves overlay):

python test.py --image my_photo.jpg

Directory of images:

python test.py --directory ./my_test_photos

Custom overlay path:

python test.py --image photo.jpg --output result.jpg

Step 5: Test (Web UI)

Launch the Gradio app (loads model from ./trained_model):

python main.py

Open the URL shown in the terminal (e.g. http://127.0.0.1:7860). Upload an image to get:

Label with Confidence Score (e.g. my_cat: 98.5%)
Image with prediction overlay (label + confidence drawn on the image)

Use the Examples (one image per class from data/) to try the model immediately.

Usage Summary

Command	Description
`python model_custom.py`	Build custom model from `./data` class folders
`python train.py [--data_dir ./data] [--epochs 5] ...`	Train with stratified 80/20 split
`python test.py --image <path>`	Single-image test + overlay saved as `prediction_output.jpg`
`python test.py --directory <dir>`	Batch test; confidence and top-2 when uncertain
`python main.py`	Start Gradio web UI for interactive inference

Documentation

README.md — This file (overview, features, usage)
USER_GUIDE.md — Step-by-step guide and troubleshooting
COMPREHENSIVE_RESULTS.md — Test results and analysis

Project Structure

huggingface-image-project/
├── main.py                      # Entry point (launches Gradio UI)
├── app.py                       # Wrapper (backward-compatible: python app.py)
├── model_custom.py              # Wrapper (backward-compatible: python model_custom.py)
├── train.py                     # Wrapper (backward-compatible: python train.py)
├── test.py                      # CLI testing (confidence, overlay, top-2)
├── requirements.txt              # Dependencies
├── src/                          # Modular code
│   ├── __init__.py
│   ├── api/
│   │   ├── __init__.py
│   │   └── inference.py        # Shared inference & overlay logic
│   ├── models/
│   │   ├── __init__.py
│   │   ├── model_custom.py    # Dynamic model creation (N classes from ./data)
│   │   └── train.py           # Training (augmentation, stratified split, frozen backbone)
│   ├── web/
│   │   ├── __init__.py
│   │   └── app.py             # Gradio UI (imports from src.api.inference)
│   └── utils/
│       ├── __init__.py
│       ├── paths.py          # Project root/data/model path helpers
│       └── download_images_loremflickr.py
├── README.md                    # This file
├── USER_GUIDE.md                # Detailed user guide
├── COMPREHENSIVE_RESULTS.md     # Results and analysis
├── .gitignore
├── custom_vit_model/          # Created by model_custom.py (not in git)
├── trained_model/             # Created by train.py (not in git)
└── data/                      # Your images, one subfolder per class (not in git)
    ├── my_cat/
    ├── my_dog/
    ├── my_car/
    ├── my_house/
    └── my_phone/

Complete Workflow

pip install -r requirements.txt
python model_custom.py
python train.py --data_dir ./data --epochs 5
python test.py --image my_photo.jpg
python main.py   # optional: web UI

Customization (technical)

Base model: google/vit-base-patch16-224 (ViT, 224×224, 768-d)
Change: Final layer Linear(768, 1000) → Linear(768, N); id2label / label2id from class names
Training: Backbone frozen; only the classification head is trained
Data: Stratified 80% train / 20% validation; training augmentation, validation resize-only

Tips

Use at least 50–100 images per class when possible
Keep similar proportions across classes for best stratified split
Reduce --batch_size (e.g. 4 or 8) if you run out of memory

Troubleshooting

No images found — Ensure data/<class_name>/ exists and filenames use supported extensions.
Model not found — Run python model_custom.py first; then train so ./trained_model exists before test.py or main.py.
Out of memory — Use a smaller --batch_size in train.py.

Requirements

Python 3.8+
See requirements.txt (PyTorch, Transformers, Gradio, scikit-learn, Pillow, etc.)

License

This project uses the google/vit-base-patch16-224 model from Hugging Face.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hugging Face Image Classification Project

Overview

New Features

Model Performance

Setup

Quick Start

Step 1: Prepare your dataset

Step 2: Create the custom model

Step 3: Train

Step 4: Test (CLI)

Step 5: Test (Web UI)

Usage Summary

Documentation

Project Structure

Complete Workflow

Customization (technical)

Tips

Troubleshooting

Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
.gradio/flagged		.gradio/flagged
src		src
.gitignore		.gitignore
COMPREHENSIVE_RESULTS.md		COMPREHENSIVE_RESULTS.md
README.md		README.md
USER_GUIDE.md		USER_GUIDE.md
app.py		app.py
main.py		main.py
model_custom.py		model_custom.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

Hugging Face Image Classification Project

Overview

New Features

Model Performance

Setup

Quick Start

Step 1: Prepare your dataset

Step 2: Create the custom model

Step 3: Train

Step 4: Test (CLI)

Step 5: Test (Web UI)

Usage Summary

Documentation

Project Structure

Complete Workflow

Customization (technical)

Tips

Troubleshooting

Requirements

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages