Federated Learning with Dual-end Gradient Correction and Proxy-free Self-Distillation
Author(s) -
Haomin Wei,
YuJiang Luo,
Tao Xie,
Yug Yang
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3610089
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
In the context of non-independent and identically distributed (non-IID) data and limited client participation, repeated iterations of local models in federated learning worsen model drift, resulting in directional bias during global aggregation. Existing single-end optimization approaches, such as client-side regularization or server-side control variables, have difficulty maintaining global-local consistency. To address this, we introduce the Federated Learning with Dual-end Gradient Correction and Proxy-free Self-Distillation (FedDGC-PSD) algorithm. On the client side, a self-distillation mechanism based on historical models constrains local prediction consistency via a Kullback-Leibler (KL) divergence loss, mitigating overfitting without the need for external data; concurrently, a gradient correction term aligns local updates with the global direction using historical gradients. On the server, dynamic cross-round gradient correction terms use weighted historical gradients to reduce variance accumulation. Extensive tests were run on five datasets against four benchmark algorithms. Results show FedDGC-PSD outperforms FedAvg, FedProx, SCAFFOLD, and FedDyn by 13.41%, 14.05%, 20.23%, and 3.41%, respectively, on CIFAR-100. In a large-scale scenario with 1,000 clients and 1.5% participation, FedDGC-PSD maintains stable convergence on CIFAR-10, outperforming FedDyn by 17.62%. Ablation studies show that combining the Dual-end Gradient Correction (DGC) and Proxy-free Self-Distillation (PSD) modules improves global model accuracy by about 0.71%, 2.26%, 4.50%, and 6.16% on MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100, respectively, when compared to using only the DGC module.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom