QuantEdge: A Hybrid Quantization Approach for Optimized AI Deployment Across Edge Devices
Author(s) -
Rasim Mahmudov,
Deok-Hwan Kim
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3609798
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Deploying artificial intelligence (AI) models on edge devices introduces significant challenges due to limited computational resources, strict latency requirements, and energy constraints. These limitations hinder the performance of traditional deep learning models in real-time applications. This study addresses the pressing problem of optimizing AI inference for heterogeneous and resource-constrained edge environments by introducing QuantEdge, a hybrid quantization approach that combines post-training quantization (PTQ) and quantization-aware training (QAT). The proposed method dynamically adapts model precision and computational load based on device-specific constraints, making it suitable for a wide spectrum of hardware from low-power IoT nodes to advanced embedded systems. Experiments conducted on devices such as Jetson AGX Xavier, Asus Tinker Edge T, Raspberry Pi, and AGX clusters show that QuantEdge reduces inference latency by up to 31.8% while maintaining high accuracy. Additionally, it significantly improves energy efficiency and memory usage. The research is motivated by the growing demand for efficient on-device AI in real-world domains such as autonomous vehicles, mobile health diagnostics, smart surveillance, and edge-enabled IoT. QuantEdge presents a robust solution to real-time AI deployment challenges by tailoring quantization dynamically to hardware capabilities, thus enhancing the practicality and scalability of edge AI systems.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom