
Guaranteed distributed machine learning: Privacy-preserving empirical risk minimization
Author(s) -
Kwabena Owusu-Agyemang,
Benjamin Appiah,
Hu Xiong
Publication year - 2021
Publication title -
mathematical biosciences and engineering
Language(s) - Uncategorized
Resource type - Journals
SCImago Journal Rank - 0.451
H-Index - 45
eISSN - 1551-0018
pISSN - 1547-1063
DOI - 10.3934/mbe.2021243
Subject(s) - differential privacy , computer science , empirical risk minimization , computation , information privacy , minification , perturbation (astronomy) , data mining , algorithm , machine learning , artificial intelligence , computer security , programming language , physics , quantum mechanics
Distributed learning over data from sensor-based networks has been adopted to collaboratively train models on these sensitive data without privacy leakages. We present a distributed learning framework that involves the integration of secure multi-party computation and differential privacy. In our differential privacy method, we explore the potential of output perturbation and gradient perturbation and also progress with the cutting-edge methods of both techniques in the distributed learning domain. In our proposed multi-scheme output perturbation algorithm (MS-OP), data owners combine their local classifiers within a secure multi-party computation and later inject an appreciable amount of statistical noise into the model before they are revealed. In our Adaptive Iterative gradient perturbation (MS-GP) method, data providers collaboratively train a global model. During each iteration, the data owners aggregate their locally trained models within the secure multi-party domain. Since the conversion of differentially private algorithms are often naive, we improve on the method by a meticulous calibration of the privacy budget for each iteration. As the parameters of the model approach the optimal values, gradients are decreased and therefore require accurate measurement. We, therefore, add a fundamental line-search capability to enable our MS-GP algorithm to decide exactly when a more accurate measurement of the gradient is indispensable. Validation of our models on three (3) real-world datasets shows that our algorithm possesses a sustainable competitive advantage over the existing cutting-edge privacy-preserving requirements in the distributed setting.