Adaptive Rank Pruning: Dynamic Low-Rank Model Merging and Compression for Efficient AI Deployment | Zendy

M Vedhanth | Zendy; S. Mahadevi | Zendy; Anil Kumar | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Adaptive Rank Pruning: Dynamic Low-Rank Model Merging and Compression for Efficient AI Deployment

Author(s) -

M Vedhanth,

S. Mahadevi,

Anil Kumar

Publication year - 2025

Publication title -

ieee access

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 0.587

H-Index - 127

eISSN - 2169-3536

DOI - 10.1109/access.2025.3619975

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

Deploying large, pretrained models on resource limited devices remains a fundamental challenge in machine learning. While model merging and low-rank compression represent two common options, they generally employ static approaches such as factorization with a fixed rank (e.g., singular value decomposition) or weight averaging, producing some degradation in performance. This work introduces Adaptive Rank Pruning (ARP), a dynamic, layer-wise optimization of the rank during merging by using a variance-thresholding criterion, creating a unified high quality approach to compression and merging. ARP does not require retraining and is evaluated through rigorous comparisons on both classical methods (SVD) and modern state-of-the-art baselines (LoRA and QLoRA). Extensive experiments on vision (ResNet) and language (BERT) tasks show that ARP achieves a better accuracy–compression trade-off ratio, producing up to 2.5× model size reduction with less than 4% accuracy loss. We further demonstrate ARP on edge hardware (Raspberry Pi 4, Google Pixel 6), validating its ability to reduce inference latency and energy consumption compared to alternative methods. Our results reveal ARP as a robust and effective approach for deploying adaptable AI in real-world constrained environments.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research