Skip to content

Alen-121/Video-Frame-Enhancer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Video Frame Enhancer

A GAN-based Diffusion architecture for enhancing low-bitrate/low-resolution video frames into high-fidelity, temporally stable video sequences.

🎯 Overview

alen-vfe combines the power of Latent Diffusion Models, Adversarial Training, and Temporal Frame Interpolation to deliver state-of-the-art video enhancement.

Key Features

  • πŸš€ Fast Inference: diffusion using DDIM sampling
  • πŸ’Ύ Memory Efficient: LoRA fine-tuning
  • 🎨 High Quality: Combined loss (MSE + LPIPS + Adversarial) ensures sharp, realistic results
  • 🎬 Temporal Stability: RIFE integration eliminates flickering

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Input: Low-Res Video                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
                         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Generator (Stable Diffusion v1.5)              β”‚
β”‚                    + LoRA Fine-tuning                        β”‚
β”‚              (1 step DDIM inference)                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
                         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Discriminator (PatchGAN)                        β”‚
β”‚           Evaluates realism of enhancements                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
                         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         Smoothing Layer (RIFE)                               β”‚
β”‚     Temporal Frame Interpolation                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
                         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                Output: High-Res Video                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Components

  1. Generator: Lightweight Latent Diffusion Model (Stable Diffusion v1.5)

    • Fine-tuned with LoRA (Low-Rank Adaptation)
    • Optimized inference using DDIM
  2. Discriminator: Pre-trained PatchGAN

    • Evaluates high-frequency detail realism
    • Provides adversarial feedback during training
  3. Smoothing Layer: RIFE (Real-Time Intermediate Flow Estimation)

    • Optical flow-based frame interpolation
    • Ensures temporal consistency
    • Eliminates flickering between frames

πŸ“¦ Installation

Prerequisites

  • Python 3.8+
  • CUDA 11.7+ (for NVIDIA GPUs) or Mac M4 with MPS support
  • FFmpeg (for video processing)

Setup

# Clone the repository
git clone https://github.com/yourusername/alen-vfe.git
cd alen-vfe

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Download RIFE pretrained model
python scripts/download_rife.py

πŸš€ Quick Start

Inference (Enhance a Video)

from inference.enhancer import VideoEnhancer
from omegaconf import OmegaConf

# Load configuration
config = OmegaConf.load("config/inference_config.yaml")

# Initialize enhancer
enhancer = VideoEnhancer(config)

# Enhance video
enhancer.enhance_video(
    input_path="input_video.mp4",
    output_path="enhanced_video.mp4",
    scale_factor=4
)

Command Line

python inference/enhance.py \
    --input input_video.mp4 \
    --output enhanced_video.mp4 \
    --checkpoint checkpoints/best_model.pth \
    --scale 4 \
    --enable-rife

πŸŽ“ Training

Dataset Preparation

We use Vimeo-90K dataset for training:

# Download dataset
python data/download.py --dataset vimeo90k --output ./data

# Prepare training data
python data/prepare_dataset.py \
    --dataset vimeo90k \
    --downscale-factor 4 \
    --output ./data/processed

Training on Kaggle

  1. Upload the project to Kaggle
  2. Open notebooks/train_kaggle.ipynb
  3. Ensure GPU accelerator is enabled (T4 recommended)
  4. Run all cells

Training Locally

python training/train.py \
    --config config/training_config.yaml \
    --output-dir ./checkpoints

πŸ“Š Dataset

DIV2K

  • Size: ~7GB (perfect for quick start!)
  • Images: 800 training + 100 validation
  • Resolution: Up to 2K high-quality images
  • Download: Official Link
  • Why: Much smaller than Vimeo-90K, faster downloads, great for testing

Vimeo-90K (Optional - for Production)

  • Size: ~82GB
  • Sequences: 89,800 triplets (3 frames each)
  • Resolution: 448Γ—256
  • Download: Official Link
  • Why: Video-specific data, more data for production models

πŸ”§ Configuration

Training Configuration

Edit config/training_config.yaml:

model:
  generator:
    lora_rank: 8          # Higher = more capacity, more VRAM
    inference_steps: 4     # 1-4 steps for fast inference

training:
  batch_size: 8
  num_epochs: 100
  learning_rate:
    generator: 1.0e-5
    discriminator: 4.0e-4

loss:
  weights:
    mse: 1.0
    lpips: 0.5
    adversarial: 0.1

Inference Configuration

Edit config/inference_config.yaml:

enhancement:
  scale_factor: 4
  enable_rife: true
  target_fps_multiplier: 2

video:
  batch_size: 10
  output_codec: "libx264"
  output_crf: 18

πŸ“ Project Structure

alen-vfe/
β”œβ”€β”€ config/                 # Configuration files
β”‚   β”œβ”€β”€ training_config.yaml
β”‚   └── inference_config.yaml
β”œβ”€β”€ data/                   # Dataset utilities
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ dataset.py
β”‚   β”œβ”€β”€ download.py
β”‚   └── prepare_dataset.py
β”œβ”€β”€ models/                 # Model architectures
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ generator.py        # Stable Diffusion + LoRA
β”‚   β”œβ”€β”€ discriminator.py    # PatchGAN
β”‚   └── rife.py            # RIFE integration
β”œβ”€β”€ training/               # Training infrastructure
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ losses.py
β”‚   β”œβ”€β”€ trainer.py
β”‚   └── utils.py
β”œβ”€β”€ inference/              # Inference pipeline
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ enhancer.py
β”‚   β”œβ”€β”€ enhance.py         # CLI script
β”‚   └── video_utils.py
β”œβ”€β”€ notebooks/              # Jupyter notebooks for experiments and training
β”œβ”€β”€ runs/                   # TensorBoard event logs for training visualization
β”œβ”€β”€ outputs/                # Enhanced video outputs and sample results
β”œβ”€β”€ checkpoints/            # Model checkpoints (ignored by git)
β”œβ”€β”€ dataset/                # Training datasets (ignored by git)
β”œβ”€β”€ requirements.txt
└── README.md

πŸ“‚ Folders

  • notebooks/: Contains Jupyter notebooks for exploratory data analysis, experimental training runs, and Kaggle-specific setup.
  • runs/: Stores TensorBoard event files. You can visualize training progress by running tensorboard --logdir runs/.
  • outputs/: This is where all enhanced videos, preview images, and test results are stored.
  • checkpoints/: Directory for saving model weights during training.
  • dataset/: Local storage for training data like DIV2K or Vimeo-90K.

πŸ§ͺ Testing

# Run unit tests
pytest tests/

# Test inference pipeline
python tests/test_pipeline.py --checkpoint checkpoints/best_model.pth

# Benchmark performance
python tests/benchmark.py --device cuda

πŸ“ Loss Functions

The model uses a combined loss function:

L_total = λ₁·L_MSE + Ξ»β‚‚Β·L_LPIPS + λ₃·L_ADV
  • L_MSE: Pixel-wise Mean Squared Error (structural accuracy)
  • L_LPIPS: Learned Perceptual Image Patch Similarity (perceptual quality)
  • L_ADV: Adversarial Loss (realism)

Default weights: λ₁=1.0, Ξ»β‚‚=0.5, λ₃=0.1

πŸ§ͺ Experimental Status & Results

Important

This project is currently in an experimental phase.

  • Fine-tuning: We are experimenting with using a text-to-image model for video fine-tuning, which is a non-optimal approach and may lead to unexpected results.
  • Resources: Due to a lack of significant training resources (GPU time/memory), the current model outputs may not yet reach production-grade quality.
  • Outputs: I have shared my latest experimental outputs in the outputs/ folder for review.

Sample Result

Preview Output Latest enhancement preview. See the outputs/ folder for full video results.

Acknowledgments

πŸ“„ License

MIT License - see LICENSE file for details

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Submit a pull request

πŸ“§ Contact

For questions or issues, please open a GitHub issue.


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages