Premium
Impact of structural weighting on a latent Dirichlet allocation–based feature location technique
Author(s) -
Eddy Brian P.,
Kraft Nicholas A.,
Gray Jeff
Publication year - 2018
Publication title -
journal of software: evolution and process
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.371
H-Index - 29
eISSN - 2047-7481
pISSN - 2047-7473
DOI - 10.1002/smr.1892
Subject(s) - weighting , latent dirichlet allocation , computer science , feature (linguistics) , data mining , artificial intelligence , pattern recognition (psychology) , topic model , medicine , linguistics , philosophy , radiology
Abstract Text retrieval–based feature location techniques (FLTs) use information from the terms present in documents in classes and methods. However, relevant terms originating from certain locations (eg, method names) often comprise only a small part of the entire method lexicon. Feature location techniques should benefit from techniques that make greater use of this information. The primary objective of this study was to investigate how weighting terms from different locations in source code can improve a latent Dirichlet allocation (LDA)‐based FLT. We conducted an empirical study of 4 subject software systems and 372 features. For each subject system, we trained 1024 different LDA models with new weighting schemes applied to leading comments, method names, parameters, body comments, and local variables. We conducted both a quantitative and qualitative analysis to identify the effects of using the weighting schemes on the performance of the LDA‐based FLT. We evaluated weighting schemes based on mean reciprocal rank and spread of effectiveness measures. In addition, we conducted a factorial analysis to identify which locations have a main impact on the results of the FLT. We then examined the effects of adding information from class comments, class names, and fields to the top 10 configurations for each system. This results in an additional 640 different LDA models for each system. From our results, we identified a significant effect in the performance of an LDA‐based weighting configuration when applying our weighting schemes to the LDA‐based FLT. Furthermore, we found that adding information from each method's containing class can improve the effectiveness of an LDA‐based FLT. Finally, we identified a set of recommendations for identifying better weighting schemes for LDA.