Premium
Using discriminative feature in software entities for relevance identification of code changes
Author(s) -
Huang Yuan,
Chen Xiangping,
Liu Zhiyong,
Luo Xiaonan,
Zheng Zibin
Publication year - 2017
Publication title -
journal of software: evolution and process
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.371
H-Index - 29
eISSN - 2047-7481
pISSN - 2047-7473
DOI - 10.1002/smr.1859
Subject(s) - commit , discriminative model , computer science , artificial intelligence , relevance (law) , identification (biology) , feature (linguistics) , machine learning , code (set theory) , source code , data mining , pattern recognition (psychology) , programming language , database , linguistics , philosophy , botany , set (abstract data type) , political science , law , biology
Abstract Developers often bundle unrelated changes (eg, bug fix and feature addition) in a single commit and then submit a “poor cohesive” commit to version control system. Such a commit consists of multiple independent code changes and makes review of code changes harder. If the code changes before commit can be identified as related and unrelated ones, the “cohesiveness” of a commit can be guaranteed. Inspired by the effectiveness of machine learning techniques in classification field, we model the relevance identification of code changes as a binary classification problem (ie, related and unrelated changes) and propose discriminative feature in software entities to characterize the relevance of code changes. In particular, to quantify the discriminative feature, 21 coupling rules and 4 cochanged type relationships are elaborately extracted from software entities to construct related changes vector ( RCV ). Twenty‐one coupling rules at granularities of class, attribute, and method can capture the relevance of code changes from structural coupling dimension, and 4 cochanged type relationships are defined to capture the change type combinations of software entities that may cause related changes. Based on RCV , machine learning algorithms are applied to identify the relevance of code changes. The experiment results show that probabilistic neural network and general regression neural network provide statistically significant improvements in accuracy of relevance identification of code changes over the other 4 machine learning algorithms. Related changes vector with 72 dimensions ( R C V 72 ) outperforms other 2 RCV s with less dimensions.