Zhibo Wang, Zuoyuan Zhang, Xiaoyi Pang, Qile Zhang, Xuanyi Hao, Shuguo Zhuo, Peng Sun
Vision Transformers (ViTs) deliver strong performance across vision tasks but are too heavy for mobile and edge devices. Existing pruning methods either produce one-size-fits-all pruned models (ignoring device heterogeneity) or require fine-tuning with private on-device data (infeasible due to resource and privacy constraints).
TAP-ViTs is a task-adaptive pruning framework that generates device-specific pruned ViTs without accessing any raw local data. It addresses two core challenges:
-
How to understand each device's task without seeing its data? → Each device fits a lightweight Gaussian Mixture Model (GMM) on its private data and uploads only the GMM parameters. The cloud uses these to select distribution-consistent public samples as a proxy metric dataset.
-
How to prune effectively for each device's specific task? → A dual-granularity importance evaluation strategy jointly measures composite neuron importance and adaptive layer importance, enabling fine-grained, task-aware pruning tailored to each device's computational budget.
- Consistently outperforms state-of-the-art pruning methods (UPop, SAViT, UPDP, etc.) under comparable compression ratios across multiple ViT backbones (DeiT-S, DeiT-B, Swin-T, Swin-S)
- Maintains accuracy under aggressive pruning while respecting privacy constraints (no raw data leaves the device)
- Works across heterogeneous devices with different computational budgets
┌─────────────────────────────────────────────────┐
│ Cloud Server │
│ │
│ Public Dataset ──→ GMM-based Sample Selection │
│ ↓ │
│ Task-representative Metric Dataset │
│ ↓ │
│ Dual-granularity Importance Evaluation │
│ (Neuron-level + Layer-level) │
│ ↓ │
│ Device-specific Pruned ViT │
└──────────────────────┬──────────────────────────┘
│ Deploy
┌────────────┼────────────┐
▼ ▼ ▼
Device A Device B Device C
(GMM params) (GMM params) (GMM params)
- Designed federated optimization methods integrating pruned ViTs with LoRA parameter sharing for on-device deployment
- Applied Adaptive Personalized Federated Learning (APFL) for client personalization
- Customized BOHB-based Bayesian hyperparameter search strategy
- Refactored the full codebase for scalability and reproducibility
Code release is pending approval and will be available soon. Star/watch this repo to get notified!
@article{wang2026tapvits,
title={TAP-ViTs: Task-Adaptive Pruning for On-Device Deployment of Vision Transformers},
author={Wang, Zhibo and Zhang, Zuoyuan and Pang, Xiaoyi and Zhang, Qile and Hao, Xuanyi and Zhuo, Shuguo and Sun, Peng},
journal={arXiv preprint arXiv:2601.02437},
year={2026}
}This work was conducted at Zhejiang University under the supervision of Prof. Zhibo Wang.