Weighted knn using grey relational analysis for cross-project defect prediction | Zendy

DI Ulumi | Zendy; Daniel Siahaan | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Weighted knn using grey relational analysis for cross-project defect prediction

Author(s) -

DI Ulumi,

Daniel Siahaan

Publication year - 2019

Publication title -

journal of physics. conference series

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.21

H-Index - 85

eISSN - 1742-6596

pISSN - 1742-6588

DOI - 10.1088/1742-6596/1230/1/012062

Subject(s) - computer science , random forest , feature selection , data mining , naive bayes classifier , feature (linguistics) , software , domain (mathematical analysis) , machine learning , artificial intelligence , selection (genetic algorithm) , process (computing) , focus (optics) , support vector machine , mathematics , mathematical analysis , philosophy , linguistics , physics , optics , programming language , operating system

Defect prediction plays important roles in detecting vulnerable component within a software. Some researchers have tried to improve the accuracy of software defect prediction so that it helps developer to manage resources (human, cost, and time) better. They focus on building the software defect prediction model only for a specific domain. To our knowledge, research on cross-project domains has not been carried out before. This research developed method to predict software defect for cross-project domains. Thus, the domain contains datasets with different number of features. To extend shorted features in a dataset, the method calculates the missing values. This research employed weighted KNN to fill in the missing value. The refilled datasets were then classified using naive bayes and random forest. This research also conducted a feature selection process to select relevant features for detecting defects by means of a comparative analysis of methods of selection of features. For the experimentation, this research used seven NASA public dataset MDPs. The results show that for imbalance data, naïve bayes combined with information gain (IG) or symmetric uncertainty (SU) feature selection produced the best balance, i.e. 0.4975. It also shows that for balance data, random forest combined with gain ratio (GR) produced the best balance, i.e. 0.7795. In general, the developed method performed relatively alike the previous method, which classify only specific domain, i.e. 0.4975. It even outperformed previous method for dataset PC2, i.e. 0.4033.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore