
Knowledge-Distilled Multi-Task Model with Enhanced Transformer and Bidirectional Mamba2 for Air Quality Forecasting
Author(s) -
Zi-Ang Xie,
Chee-Onn Chow,
Joon Huang Chuah,
Wong Jee Keen Raymond
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3595679
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Accurate air quality forecasting is crucial for public health and environmental policy, yet existing deep learning models often suffer from high computational costs and inadequate modeling of temporal dependencies and multi-pollutant interactions. This paper proposes a novel deep learning framework that integrates an Enhanced Transformer with Bidirectional Mamba2, optimized through multi-task learning and knowledge distillation . Using a teacher–student paradigm, the teacher model captures rich temporal semantics, while the lightweight student model retains predictive accuracy with reduced inference costs. Key innovations include a hybrid architecture combining multi-scale global-local attention and long-range dependency modeling, a regression-specific knowledge distillation approach with soft target smoothing and intermediate representation transfer , and an end-to-end multi-task design for joint forecasting of multiple pollutants. Extensive experiments on real-world datasets from Guangzhou, Chengdu , and Beijing (2018–2022) across four key pollutants (PM2.5, PM10, NO 2 , SO 2 ) demonstrate that our model significantly outperforms baseline methods, with the student model maintaining accuracy within 5% of the teacher while requiring fewer parameters. These results highlight the framework’s potential for accurate, scalable, and efficient air quality forecasting in real-time and resource-constrained environments.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom