🔍 Visual Inference is a deepfake detection system that utilizes Xception-based feature extraction and Transformer-based sequence modeling to classify real vs. fake images. This project provides a Dockerized solution to simplify model deployment and inference.
✔️ Xception-based feature extraction for frame analysis
✔️ Transformer Encoder for sequence modeling
✔️ Cross-Attention Mechanism to refine embeddings
✔️ Docker support for easy deployment
✔️ Inference statistics to analyze model performance
git clone git@github.com:gocenalper/visual-inference.git
cd visual-inferencepip install torch torchvision timm tqdm pillowWe provide a lightweight Docker image for running the model without manually installing dependencies.
docker build -t dfdc-inference .Run the following command to mount your dataset and code inside the container:
docker run --rm -it -v "$(pwd)":/app dfdc-inferenceIf your machine has CUDA-enabled GPUs, use:
docker run --gpus all --rm -it -v "$(pwd)":/app dfdc-inferenceThe dataset should be mounted in the following format:
/DFDC/
├── REAL/
│ ├── TRAIN/
│ │ ├── video_0001/
│ │ │ ├── frame_01.jpg
│ │ │ ├── frame_02.jpg
│ │ ├── video_0002/
│ ├── TEST/
│ ├── VAL/
├── FAKE/
│ ├── TRAIN/
│ │ ├── video_0003/
│ │ │ ├── frame_01.jpg
│ │ │ ├── frame_02.jpg
│ ├── TEST/
│ ├── VAL/
Once the Docker container is running, the model will process test images and print real-time statistics:
🔍 Running Untrained Model Inference on Test Data...
📌 Image 1: Predicted = FAKE, Actual = REAL, ❌ Incorrect
📌 Image 2: Predicted = REAL, Actual = FAKE, ❌ Incorrect
📌 Image 3: Predicted = REAL, Actual = REAL, ✅ Correct
📌 Image 4: Predicted = FAKE, Actual = FAKE, ✅ Correct
📊 **Inference Statistics (Before Training)**
🔹 Total Images Processed: 1000
🟢 Real Predictions: 500 (50.0%)
🔴 Fake Predictions: 500 (50.0%)
✅ Correct Predictions: 495 (49.5%) Accuracy