Impact of structural weighting on a latent Dirichlet allocation–based feature location technique | Zendy

Eddy Brian P. | Zendy; Kraft Nicholas A. | Zendy; Gray Jeff | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Impact of structural weighting on a latent Dirichlet allocation–based feature location technique

Author(s) -

Eddy Brian P.,

Kraft Nicholas A.,

Gray Jeff

Publication year - 2018

Publication title -

journal of software: evolution and process

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.371

H-Index - 29

eISSN - 2047-7481

pISSN - 2047-7473

DOI - 10.1002/smr.1892

Subject(s) - weighting , latent dirichlet allocation , computer science , feature (linguistics) , data mining , artificial intelligence , pattern recognition (psychology) , topic model , medicine , linguistics , philosophy , radiology

Text retrieval–based feature location techniques (FLTs) use information from the terms present in documents in classes and methods. However, relevant terms originating from certain locations (eg, method names) often comprise only a small part of the entire method lexicon. Feature location techniques should benefit from techniques that make greater use of this information. The primary objective of this study was to investigate how weighting terms from different locations in source code can improve a latent Dirichlet allocation (LDA)‐based FLT. We conducted an empirical study of 4 subject software systems and 372 features. For each subject system, we trained 1024 different LDA models with new weighting schemes applied to leading comments, method names, parameters, body comments, and local variables. We conducted both a quantitative and qualitative analysis to identify the effects of using the weighting schemes on the performance of the LDA‐based FLT. We evaluated weighting schemes based on mean reciprocal rank and spread of effectiveness measures. In addition, we conducted a factorial analysis to identify which locations have a main impact on the results of the FLT. We then examined the effects of adding information from class comments, class names, and fields to the top 10 configurations for each system. This results in an additional 640 different LDA models for each system. From our results, we identified a significant effect in the performance of an LDA‐based weighting configuration when applying our weighting schemes to the LDA‐based FLT. Furthermore, we found that adding information from each method's containing class can improve the effectiveness of an LDA‐based FLT. Finally, we identified a set of recommendations for identifying better weighting schemes for LDA.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research