Video Frame Enhancer

A GAN-based Diffusion architecture for enhancing low-bitrate/low-resolution video frames into high-fidelity, temporally stable video sequences.

🎯 Overview

alen-vfe combines the power of Latent Diffusion Models, Adversarial Training, and Temporal Frame Interpolation to deliver state-of-the-art video enhancement.

Key Features

🚀 Fast Inference: diffusion using DDIM sampling
💾 Memory Efficient: LoRA fine-tuning
🎨 High Quality: Combined loss (MSE + LPIPS + Adversarial) ensures sharp, realistic results
🎬 Temporal Stability: RIFE integration eliminates flickering

🏗️ Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Input: Low-Res Video                     │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│              Generator (Stable Diffusion v1.5)              │
│                    + LoRA Fine-tuning                        │
│              (1 step DDIM inference)                       │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│              Discriminator (PatchGAN)                        │
│           Evaluates realism of enhancements                  │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│         Smoothing Layer (RIFE)                               │
│     Temporal Frame Interpolation                             │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│                Output: High-Res Video                        │
└─────────────────────────────────────────────────────────────┘

Components

Generator: Lightweight Latent Diffusion Model (Stable Diffusion v1.5)
- Fine-tuned with LoRA (Low-Rank Adaptation)
- Optimized inference using DDIM
Discriminator: Pre-trained PatchGAN
- Evaluates high-frequency detail realism
- Provides adversarial feedback during training
Smoothing Layer: RIFE (Real-Time Intermediate Flow Estimation)
- Optical flow-based frame interpolation
- Ensures temporal consistency
- Eliminates flickering between frames

📦 Installation

Prerequisites

Python 3.8+
CUDA 11.7+ (for NVIDIA GPUs) or Mac M4 with MPS support
FFmpeg (for video processing)

Setup

# Clone the repository
git clone https://github.com/yourusername/alen-vfe.git
cd alen-vfe

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Download RIFE pretrained model
python scripts/download_rife.py

🚀 Quick Start

Inference (Enhance a Video)

from inference.enhancer import VideoEnhancer
from omegaconf import OmegaConf

# Load configuration
config = OmegaConf.load("config/inference_config.yaml")

# Initialize enhancer
enhancer = VideoEnhancer(config)

# Enhance video
enhancer.enhance_video(
    input_path="input_video.mp4",
    output_path="enhanced_video.mp4",
    scale_factor=4
)

Command Line

python inference/enhance.py \
    --input input_video.mp4 \
    --output enhanced_video.mp4 \
    --checkpoint checkpoints/best_model.pth \
    --scale 4 \
    --enable-rife

🎓 Training

Dataset Preparation

We use Vimeo-90K dataset for training:

# Download dataset
python data/download.py --dataset vimeo90k --output ./data

# Prepare training data
python data/prepare_dataset.py \
    --dataset vimeo90k \
    --downscale-factor 4 \
    --output ./data/processed

Training on Kaggle

Upload the project to Kaggle
Open notebooks/train_kaggle.ipynb
Ensure GPU accelerator is enabled (T4 recommended)
Run all cells

Training Locally

python training/train.py \
    --config config/training_config.yaml \
    --output-dir ./checkpoints

📊 Dataset

DIV2K

Size: ~7GB (perfect for quick start!)
Images: 800 training + 100 validation
Resolution: Up to 2K high-quality images
Download: Official Link
Why: Much smaller than Vimeo-90K, faster downloads, great for testing

Vimeo-90K (Optional - for Production)

Size: ~82GB
Sequences: 89,800 triplets (3 frames each)
Resolution: 448×256
Download: Official Link
Why: Video-specific data, more data for production models

🔧 Configuration

Training Configuration

Edit config/training_config.yaml:

model:
  generator:
    lora_rank: 8          # Higher = more capacity, more VRAM
    inference_steps: 4     # 1-4 steps for fast inference

training:
  batch_size: 8
  num_epochs: 100
  learning_rate:
    generator: 1.0e-5
    discriminator: 4.0e-4

loss:
  weights:
    mse: 1.0
    lpips: 0.5
    adversarial: 0.1

Inference Configuration

Edit config/inference_config.yaml:

enhancement:
  scale_factor: 4
  enable_rife: true
  target_fps_multiplier: 2

video:
  batch_size: 10
  output_codec: "libx264"
  output_crf: 18

📁 Project Structure

alen-vfe/
├── config/                 # Configuration files
│   ├── training_config.yaml
│   └── inference_config.yaml
├── data/                   # Dataset utilities
│   ├── __init__.py
│   ├── dataset.py
│   ├── download.py
│   └── prepare_dataset.py
├── models/                 # Model architectures
│   ├── __init__.py
│   ├── generator.py        # Stable Diffusion + LoRA
│   ├── discriminator.py    # PatchGAN
│   └── rife.py            # RIFE integration
├── training/               # Training infrastructure
│   ├── __init__.py
│   ├── losses.py
│   ├── trainer.py
│   └── utils.py
├── inference/              # Inference pipeline
│   ├── __init__.py
│   ├── enhancer.py
│   ├── enhance.py         # CLI script
│   └── video_utils.py
├── notebooks/              # Jupyter notebooks for experiments and training
├── runs/                   # TensorBoard event logs for training visualization
├── outputs/                # Enhanced video outputs and sample results
├── checkpoints/            # Model checkpoints (ignored by git)
├── dataset/                # Training datasets (ignored by git)
├── requirements.txt
└── README.md

📂 Folders

notebooks/: Contains Jupyter notebooks for exploratory data analysis, experimental training runs, and Kaggle-specific setup.
runs/: Stores TensorBoard event files. You can visualize training progress by running tensorboard --logdir runs/.
outputs/: This is where all enhanced videos, preview images, and test results are stored.
checkpoints/: Directory for saving model weights during training.
dataset/: Local storage for training data like DIV2K or Vimeo-90K.

🧪 Testing

# Run unit tests
pytest tests/

# Test inference pipeline
python tests/test_pipeline.py --checkpoint checkpoints/best_model.pth

# Benchmark performance
python tests/benchmark.py --device cuda

📝 Loss Functions

The model uses a combined loss function:

L_total = λ₁·L_MSE + λ₂·L_LPIPS + λ₃·L_ADV

L_MSE: Pixel-wise Mean Squared Error (structural accuracy)
L_LPIPS: Learned Perceptual Image Patch Similarity (perceptual quality)
L_ADV: Adversarial Loss (realism)

Default weights: λ₁=1.0, λ₂=0.5, λ₃=0.1

🧪 Experimental Status & Results

Important

This project is currently in an experimental phase.

Fine-tuning: We are experimenting with using a text-to-image model for video fine-tuning, which is a non-optimal approach and may lead to unexpected results.
Resources: Due to a lack of significant training resources (GPU time/memory), the current model outputs may not yet reach production-grade quality.
Outputs: I have shared my latest experimental outputs in the outputs/ folder for review.

Sample Result

Latest enhancement preview. See the outputs/ folder for full video results.

Acknowledgments

Stable Diffusion by Stability AI
RIFE by Megvii Research
pix2pix for PatchGAN architecture
LPIPS for perceptual loss

📄 License

MIT License - see LICENSE file for details

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Make your changes
Submit a pull request

📧 Contact

For questions or issues, please open a GitHub issue.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
config		config
data		data
inference		inference
models		models
notebooks		notebooks
outputs		outputs
training		training
.gitignore		.gitignore
PROJECT_STATUS_REPORT.md		PROJECT_STATUS_REPORT.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

Video Frame Enhancer

🎯 Overview

Key Features

🏗️ Architecture

Components

📦 Installation

Prerequisites

Setup

🚀 Quick Start

Inference (Enhance a Video)

Command Line

🎓 Training

Dataset Preparation

Training on Kaggle

Training Locally

📊 Dataset

DIV2K

Vimeo-90K (Optional - for Production)

🔧 Configuration

Training Configuration

Inference Configuration

📁 Project Structure

📂 Folders

🧪 Testing

📝 Loss Functions

🧪 Experimental Status & Results

Sample Result

Acknowledgments

📄 License

Contributing

📧 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages