Skip to content

infernaltiger/diplom_code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bachelor's Thesis Project (Ural Federal University + SKB Kontur)
Automatic document orientation detection using CNN transfer learning

This repository contains the implementation of a document orientation classification system that achieves >95% accuracy using pre-trained CNN architectures (ResNet, EfficientNet, MobileNet).

Installation

  1. Clone the repository:
git clone https://github.com/infernaltiger/document-orientation-cnn.git
cd document-orientation-cnn
  1. Install dependencies:
pip install -r requirements.txt

Project Structure

├── train_dpi100/              # Training dataset (100 DPI)
├── val_dpi100/                # Validation dataset (100 DPI)
├── testing/                   # Test samples
├── model_plots/               # Generated metrics visualizations
│
├── config.py                  # Configuration parameters
├── data_refactor.py          # PDF/JPG preprocessing utilities
├── dataset_mod.py            # Dataset modification tools
├── graf_plot.py              # Metrics visualization
│
├── resnet_testing.py         # ResNet-18 model & testing
├── efficientnet_testing.py   # EfficientNet model & testing
├── mobilenet_testing.py      # MobileNet model & testing
│
├── resnet18_20min.pth        # Pre-trained ResNet weights
├── effnet_98.pth             # Pre-trained EfficientNet weights
├── mobilenet_88.pth          # Pre-trained MobileNet weights
│
├── resnet_results.csv        # ResNet performance metrics
├── effnet_12ep_88.csv        # EfficientNet metrics (12 epochs)
├── mobilenet_res_v1_8min.csv # MobileNet metrics
├── merged.csv                # Aggregated results
│
└── requirements.txt          # Dependencies

Usage

Configuration

Edit config.py to set hyperparameters:

# Example configuration
EPOCHS = 12
BATCH_SIZE = 32
LEARNING_RATE = 0.001
IMAGE_SIZE = 224

Training

Run architecture-specific training script:

# ResNet-18
python resnet_testing.py

# EfficientNet
python efficientnet_testing.py

# MobileNet
python mobilenet_testing.py

Inference

Load pre-trained models:

import torch
from resnet_testing import ResNetModel

model = ResNetModel()
model.load_state_dict(torch.load('resnet18_20min.pth'))
model.eval()

Visualization

Generate plots from CSV results:

python graf_plot.py

Dataset Format

Expected directory structure:

dataset/
├── normal/
│   └── *.jpg   # Correctly oriented documents
└── flipped/
    └── *.jpg   # Rotated/flipped documents

Use data_refactor.py to convert PDFs or reorganize raw images.


Models & Results

Model Accuracy Weights File
ResNet-18 ~95% resnet18_20min.pth
EfficientNet ~98% effnet_98.pth
MobileNet ~88% mobilenet_88.pth

Detailed metrics available in corresponding .csv files.


File Descriptions

Core Scripts

File Purpose
config.py Hyperparameters and configuration settings
data_refactor.py PDF/JPG preprocessing, dataset creation utilities
dataset_mod.py Dataset modification and augmentation tools
graf_plot.py Metrics visualization and plotting functions

Model Scripts

File Description
resnet_testing.py ResNet-18 architecture, training loop, evaluation
efficientnet_testing.py EfficientNet implementation and testing
mobilenet_testing.py MobileNet implementation and testing

Results

  • CSV Files: Training logs, accuracy metrics, loss curves
  • PTH Files: Saved model checkpoints
  • model_plots/: Generated visualizations (accuracy/loss curves, confusion matrices)

About

Код диплоной работы

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages