Skip to content

ZhngQ1/TAP-ViTs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

TAP-ViTs: Task-Adaptive Pruning for On-Device Deployment of Vision Transformers

arXiv

Zhibo Wang, Zuoyuan Zhang, Xiaoyi Pang, Qile Zhang, Xuanyi Hao, Shuguo Zhuo, Peng Sun

[Paper (arXiv)] [PDF]

Overview

Vision Transformers (ViTs) deliver strong performance across vision tasks but are too heavy for mobile and edge devices. Existing pruning methods either produce one-size-fits-all pruned models (ignoring device heterogeneity) or require fine-tuning with private on-device data (infeasible due to resource and privacy constraints).

TAP-ViTs is a task-adaptive pruning framework that generates device-specific pruned ViTs without accessing any raw local data. It addresses two core challenges:

  1. How to understand each device's task without seeing its data? → Each device fits a lightweight Gaussian Mixture Model (GMM) on its private data and uploads only the GMM parameters. The cloud uses these to select distribution-consistent public samples as a proxy metric dataset.

  2. How to prune effectively for each device's specific task? → A dual-granularity importance evaluation strategy jointly measures composite neuron importance and adaptive layer importance, enabling fine-grained, task-aware pruning tailored to each device's computational budget.

Key Results

  • Consistently outperforms state-of-the-art pruning methods (UPop, SAViT, UPDP, etc.) under comparable compression ratios across multiple ViT backbones (DeiT-S, DeiT-B, Swin-T, Swin-S)
  • Maintains accuracy under aggressive pruning while respecting privacy constraints (no raw data leaves the device)
  • Works across heterogeneous devices with different computational budgets

Framework

┌─────────────────────────────────────────────────┐
│                   Cloud Server                   │
│                                                   │
│  Public Dataset ──→ GMM-based Sample Selection    │
│                          ↓                        │
│              Task-representative Metric Dataset    │
│                          ↓                        │
│         Dual-granularity Importance Evaluation     │
│          (Neuron-level + Layer-level)              │
│                          ↓                        │
│              Device-specific Pruned ViT            │
└──────────────────────┬──────────────────────────┘
                       │ Deploy
          ┌────────────┼────────────┐
          ▼            ▼            ▼
      Device A     Device B     Device C
     (GMM params) (GMM params) (GMM params)

My Contributions

  • Designed federated optimization methods integrating pruned ViTs with LoRA parameter sharing for on-device deployment
  • Applied Adaptive Personalized Federated Learning (APFL) for client personalization
  • Customized BOHB-based Bayesian hyperparameter search strategy
  • Refactored the full codebase for scalability and reproducibility

Code

Code release is pending approval and will be available soon. Star/watch this repo to get notified!

Citation

@article{wang2026tapvits,
  title={TAP-ViTs: Task-Adaptive Pruning for On-Device Deployment of Vision Transformers},
  author={Wang, Zhibo and Zhang, Zuoyuan and Pang, Xiaoyi and Zhang, Qile and Hao, Xuanyi and Zhuo, Shuguo and Sun, Peng},
  journal={arXiv preprint arXiv:2601.02437},
  year={2026}
}

Acknowledgments

This work was conducted at Zhejiang University under the supervision of Prof. Zhibo Wang.

About

Task-adaptive pruning framework for deploying Vision Transformers on heterogeneous edge devices without accessing private data (arXiv 2601.02437)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages