Semi-Supervised Deep Fuzzy C-Mean Clustering for Software Fault Prediction | Zendy

Ali Arshad | Zendy; Saman Riaz | Zendy; Licheng Jiao | Zendy; Aparna Murthy | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Semi-Supervised Deep Fuzzy C-Mean Clustering for Software Fault Prediction

Author(s) -

Ali Arshad,

Saman Riaz,

Licheng Jiao,

Aparna Murthy

Publication year - 2018

Publication title -

ieee access

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.587

H-Index - 127

ISSN - 2169-3536

DOI - 10.1109/access.2018.2835304

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

Software fault prediction is a consequential research area in software quality promise. In this paper, we propose a semi-supervised deep fuzzy C-mean (DFCM) clustering for software fault prediction, which is the cumulation of semi-supervised DFCM clustering and feature compression techniques. Deep is utilized for the feature-based multi clusters of unlabeled and labeled data sets along with their labeled classes. In our approach, for the training model, we simultaneously deal with the unsupervised data and supervised data to exploit the obnubilated information from unlabeled data to labeled data to support the construction of the precise model. We utilize DFCM clustering to handle the class imbalance problem and withal fuzzy theory logic is very akin to human logic and it is facile to comprehend. We further ameliorate the prediction performance with the coalescence of feature learning techniques-feature extraction and feature selection (using random-under sampling) to generate good features and remove irrelevant and redundant features to reduce the noisy data for classification. However, by the performance of the model results, the amalgamation of deep multi clusters and feature techniques work better due to their ability to identify and amalgamation essential information in data feature. The classification model is predicted on the maximum homogeneous between the features of labeled and unlabeled data, the model is trained on the un-noisy data set obtained by the deep coalescence of multi clusters and feature techniques. To check the efficacy of our approach, we chose data sets from real-world software project (NASA & Eclipse), and then we compared our approach with a number of latest classical base-line methods, and investigate the performance by using performance measures such as probability of detection, F-measure, and area under the curve.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research