z-logo
open-access-imgOpen Access
Scalable Nonparametric Supervised Learning for Streaming and Massive Data: Applications in Healthcare Monitoring and Credit Risk
Author(s) -
Mohamed Chaouch,
Omama M. Al-Hamed
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3591883
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
This paper introduces novel nonparametric supervised learning techniques for classifying massive datasets, addressing key limitations of existing methods in Big and Streaming Data framework. We propose an offline kernel-based classifier enhanced by Batch Principal Component Analysis (PCA) for dimensionality reduction to mitigate the “curse of dimensionality”. Additionally, an online classifier is developed for streaming data, combining online PCA with a kernel-based recursive classifier using a stochastic approximation algorithm. Application to fetal well-being monitoring demonstrates that the online classifier achieves a competitive median misclassification rate (11.92%), comparable to the offline classifier (11.54%) and Random Forest (11.31%), while requiring only 1/15th of the offline classifier’s computation time. Receiver Operating Characteristic (ROC) analysis shows superior Area Under the Curve (AUC) for the offline classifier but at a significant computational cost. A second study on larger database of credit scoring confirms these findings, showing that the online classifier achieves an F1-score of 96.40% and an accuracy of 93.08%, closely matching the performance of neural networks (96.46%, 93.22%) and boosting (96.51%, 93.31%). Notably, the online classifier accomplishes this with a CPU time of only 0.87 seconds per classification - over 600 times faster than neural networks - demonstrating its effectiveness for high-frequency, real-time financial decision-making.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom