Towards Zero-Emission AI: Ultra-Lightweight Transformer Architectures using Dynamic Sparsification for Edge Devices

This repository contains the official implementation and research framework for developing Ultra-Lightweight Transformer Architectures. The primary goal of this project is to drastically reduce the carbon footprint, memory usage, and energy consumption of Large Language Models (LLMs) via Dynamic Sparsification, making them fully deployable on resource-constrained edge devices (e.g., smartphones, IoT nodes) without compromising downstream task performance.

📌 Research Abstract & Core Concept

Modern Transformer-based architectures deliver state-of-the-art results but come with a massive computational and environmental cost. This project introduces a novel neural network pruning and structural optimization framework:

Dynamic Sparsification: Evaluates token and attention-head importance in real-time during inference, dynamically skipping redundant matrix computations.
Hardware-Aware Optimization: Tailors the sparse computational graph specifically to compile efficiently on mobile CPUs and Edge NPUs (Neural Processing Units).
Green AI Evaluation: Quantifies success not just by Accuracy/F1-Score, but by tracking Energy Consumption (Joules) and Carbon Footprint ($CO_2$ emissions).

🛠️ Key Features & Methodology

Dynamic Attention Pruning: A custom attention layer that dynamically zeroes out low-weight attention scores on-the-fly.
Weight Quantization: Mixed-precision training (FP16 to INT8/INT4 transformation) optimized for edge deployment.
Comprehensive Benchmarking: Direct comparison against standard baselines (e.g., BERT-mini, MobileBERT, TinyLLaMA) across GLUE and SuperGLUE benchmarks.
Telemetry Tools: Built-in integration with CodeCarbon to measure absolute power draw during training and inference.

📂 Repository Structure

├── src/
│   ├── models/             # Custom Dynamic Sparse Transformer architectures
│   ├── training/           # Pruning-aware training and fine-tuning pipelines
│   ├── quantization/       # Quantization scripts for edge deployment (ONNX/TFLite)
│   └── utils/              # Telemetry, carbon tracking, and data loaders
├── data/                   # GLUE benchmark processing scripts
├── benchmarks/             # Scripted evaluations for latency, memory, and energy
├── notebooks/              # Exploratory analysis and structural pruning visualization
├── Literature_Review/      # Research matrix and BibTeX files of reference papers
└── README.md

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Literature_Review		Literature_Review
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Towards Zero-Emission AI: Ultra-Lightweight Transformer Architectures using Dynamic Sparsification for Edge Devices

📌 Research Abstract & Core Concept

🛠️ Key Features & Methodology

📂 Repository Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Towards Zero-Emission AI: Ultra-Lightweight Transformer Architectures using Dynamic Sparsification for Edge Devices

📌 Research Abstract & Core Concept

🛠️ Key Features & Methodology

📂 Repository Structure

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages