Bachelor's Thesis Project (Ural Federal University + SKB Kontur)
Automatic document orientation detection using CNN transfer learning
This repository contains the implementation of a document orientation classification system that achieves >95% accuracy using pre-trained CNN architectures (ResNet, EfficientNet, MobileNet).
- Clone the repository:
git clone https://github.com/infernaltiger/document-orientation-cnn.git
cd document-orientation-cnn- Install dependencies:
pip install -r requirements.txt├── train_dpi100/ # Training dataset (100 DPI)
├── val_dpi100/ # Validation dataset (100 DPI)
├── testing/ # Test samples
├── model_plots/ # Generated metrics visualizations
│
├── config.py # Configuration parameters
├── data_refactor.py # PDF/JPG preprocessing utilities
├── dataset_mod.py # Dataset modification tools
├── graf_plot.py # Metrics visualization
│
├── resnet_testing.py # ResNet-18 model & testing
├── efficientnet_testing.py # EfficientNet model & testing
├── mobilenet_testing.py # MobileNet model & testing
│
├── resnet18_20min.pth # Pre-trained ResNet weights
├── effnet_98.pth # Pre-trained EfficientNet weights
├── mobilenet_88.pth # Pre-trained MobileNet weights
│
├── resnet_results.csv # ResNet performance metrics
├── effnet_12ep_88.csv # EfficientNet metrics (12 epochs)
├── mobilenet_res_v1_8min.csv # MobileNet metrics
├── merged.csv # Aggregated results
│
└── requirements.txt # Dependencies
Edit config.py to set hyperparameters:
# Example configuration
EPOCHS = 12
BATCH_SIZE = 32
LEARNING_RATE = 0.001
IMAGE_SIZE = 224Run architecture-specific training script:
# ResNet-18
python resnet_testing.py
# EfficientNet
python efficientnet_testing.py
# MobileNet
python mobilenet_testing.pyLoad pre-trained models:
import torch
from resnet_testing import ResNetModel
model = ResNetModel()
model.load_state_dict(torch.load('resnet18_20min.pth'))
model.eval()Generate plots from CSV results:
python graf_plot.pyExpected directory structure:
dataset/
├── normal/
│ └── *.jpg # Correctly oriented documents
└── flipped/
└── *.jpg # Rotated/flipped documents
Use data_refactor.py to convert PDFs or reorganize raw images.
| Model | Accuracy | Weights File |
|---|---|---|
| ResNet-18 | ~95% | resnet18_20min.pth |
| EfficientNet | ~98% | effnet_98.pth |
| MobileNet | ~88% | mobilenet_88.pth |
Detailed metrics available in corresponding .csv files.
| File | Purpose |
|---|---|
config.py |
Hyperparameters and configuration settings |
data_refactor.py |
PDF/JPG preprocessing, dataset creation utilities |
dataset_mod.py |
Dataset modification and augmentation tools |
graf_plot.py |
Metrics visualization and plotting functions |
| File | Description |
|---|---|
resnet_testing.py |
ResNet-18 architecture, training loop, evaluation |
efficientnet_testing.py |
EfficientNet implementation and testing |
mobilenet_testing.py |
MobileNet implementation and testing |
- CSV Files: Training logs, accuracy metrics, loss curves
- PTH Files: Saved model checkpoints
- model_plots/: Generated visualizations (accuracy/loss curves, confusion matrices)