Towards Zero-Emission AI: Ultra-Lightweight Transformer Architectures using Dynamic Sparsification for Edge Devices
This repository contains the official implementation and research framework for developing Ultra-Lightweight Transformer Architectures. The primary goal of this project is to drastically reduce the carbon footprint, memory usage, and energy consumption of Large Language Models (LLMs) via Dynamic Sparsification, making them fully deployable on resource-constrained edge devices (e.g., smartphones, IoT nodes) without compromising downstream task performance.
Modern Transformer-based architectures deliver state-of-the-art results but come with a massive computational and environmental cost. This project introduces a novel neural network pruning and structural optimization framework:
- Dynamic Sparsification: Evaluates token and attention-head importance in real-time during inference, dynamically skipping redundant matrix computations.
- Hardware-Aware Optimization: Tailors the sparse computational graph specifically to compile efficiently on mobile CPUs and Edge NPUs (Neural Processing Units).
-
Green AI Evaluation: Quantifies success not just by Accuracy/F1-Score, but by tracking Energy Consumption (Joules) and Carbon Footprint (
$CO_2$ emissions).
- Dynamic Attention Pruning: A custom attention layer that dynamically zeroes out low-weight attention scores on-the-fly.
- Weight Quantization: Mixed-precision training (FP16 to INT8/INT4 transformation) optimized for edge deployment.
- Comprehensive Benchmarking: Direct comparison against standard baselines (e.g., BERT-mini, MobileBERT, TinyLLaMA) across GLUE and SuperGLUE benchmarks.
- Telemetry Tools: Built-in integration with
CodeCarbonto measure absolute power draw during training and inference.
├── src/
│ ├── models/ # Custom Dynamic Sparse Transformer architectures
│ ├── training/ # Pruning-aware training and fine-tuning pipelines
│ ├── quantization/ # Quantization scripts for edge deployment (ONNX/TFLite)
│ └── utils/ # Telemetry, carbon tracking, and data loaders
├── data/ # GLUE benchmark processing scripts
├── benchmarks/ # Scripted evaluations for latency, memory, and energy
├── notebooks/ # Exploratory analysis and structural pruning visualization
├── Literature_Review/ # Research matrix and BibTeX files of reference papers
└── README.md