Skip to content

arham2003/Image-Similarity-Analyzer-CVPR

Repository files navigation

Image Matching Challenge - CVPR 2025

Overview

This repository contains our submission to the Image Matching Challenge for CVPR 2025. We've developed a dual-approach system that combines deep learning-based similarity measurement with traditional computer vision techniques to provide robust image matching capabilities.

Dataset

The dataset for this challenge is available for download at: @Kaggle: Image Matching Challenge 2025

Model Architecture

MobileNetV2-Based Embedding Network

Our primary model utilizes a fine-tuned MobileNetV2 architecture with the following enhancements:

  • Pre-trained ImageNet weights as the foundation
  • Fine-tuned top layers (last 30 layers unfrozen for training)
  • Embedding layer design:
    • Global average pooling
    • Dropout regularization (30%)
    • Two dense layers (256 → 128 neurons) with ReLU activation
    • L2 regularization for weight decay
    • L2 normalization for embedding stability

The model is trained using semi-hard triplet loss, which effectively learns a metric space where similar images are clustered together while dissimilar images are pushed apart.

Data Augmentation Pipeline

To improve generalization, we implemented a comprehensive augmentation strategy:

  • Random brightness adjustments
  • Contrast variation
  • Hue and saturation shifts
  • Horizontal flips
  • Random zoom and rotation

Training Approach

The training process incorporates:

  • Early stopping with patience
  • Learning rate reduction when plateauing
  • Graceful GPU memory handling with CPU fallback

Web Application

Our Flask-based web application provides an intuitive interface for image similarity analysis:

Web Application Interface

Features

  • Dual analysis methods: Deep learning model + SIFT feature matching
  • Interactive threshold adjustment for similarity determination
  • Detailed visualizations:
    • Keypoint detection
    • Feature matching between images
    • Similarity percentages and distance metrics
  • Responsive design with Bootstrap and animated transitions

Technical Implementation

  • GPU-accelerated inference with CPU fallback
  • Session-based result management
  • Asynchronous processing with visual feedback
  • SIFT (Scale-Invariant Feature Transform) implementation for traditional CV comparison

Performance

Our model achieved approximately 30% validation accuracy on a small, imbalanced dataset. While this may seem modest, it demonstrates effective learning despite:

  1. Limited training data
  2. Class imbalance challenges
  3. High variability in image content

The combined approach of deep learning + SIFT provides complementary strengths:

  • The neural network captures high-level semantic similarities
  • SIFT identifies specific matching features between images

Usage

Requirements

tensorflow>=2.5.0
opencv-python>=4.5.3
flask>=2.0.1
matplotlib>=3.4.2
numpy>=1.19.5
tensorflow-addons>=0.13.0
scikit-learn>=0.24.2

Running the Web Application

python app.py

The application will be available at http://localhost:5000

Using the Backend Model Directly

from tensorflow import keras
import numpy as np

# Load model
model = keras.models.load_model('image_similarity_model', compile=False)

# Calculate embeddings for images
img1 = keras.preprocessing.image.load_img('path/to/image1.jpg', target_size=(224, 224))
img2 = keras.preprocessing.image.load_img('path/to/image2.jpg', target_size=(224, 224))
    
img1_array = keras.preprocessing.image.img_to_array(img1) / 255.0
img2_array = keras.preprocessing.image.img_to_array(img2) / 255.0

emb1 = model.predict(np.expand_dims(img1_array, axis=0))
emb2 = model.predict(np.expand_dims(img2_array, axis=0))

# Calculate distance
distance = np.linalg.norm(emb1 - emb2)
similarity = np.exp(-distance / 5.0) * 100  # Convert to percentage

Future Improvements

  • Implement hard negative mining for more challenging triplets
  • Incorporate attention mechanisms to focus on discriminative regions
  • Expand dataset with more diverse image pairs
  • Explore ensemble approaches combining multiple backbone architectures
  • Implement cross-batch normalization for better feature normalization

About

Image Matching Challenge CVPR 2025 contribution using MOBILENETV2 architecture with custom layers.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors