
Mining Stack Overflow for API class recommendation using DOC2VEC and LDA
Author(s) -
Lee Wai Keat,
Su Moon Ting
Publication year - 2021
Publication title -
iet software
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.305
H-Index - 43
eISSN - 1751-8814
pISSN - 1751-8806
DOI - 10.1049/sfw2.12023
Subject(s) - computer science , java , class (philosophy) , plug in , benchmarking , application programming interface , code (set theory) , interface (matter) , latent dirichlet allocation , programming language , information retrieval , world wide web , artificial intelligence , topic model , operating system , set (abstract data type) , bubble , marketing , maximum bubble pressure method , business
To address the lexical gaps between natural language (NL) queries and Application Programming Interface (API) documentations, and between NL queries and programme code, this study developed a novel approach for recommending Java API classes that are relevant to the programming tasks described in NL queries. A Doc2Vec model was trained using question titles mined from Stack Overflow. The model was used to find question titles that are semantically similar to a query. Latent Dirichlet Allocation (LDA) topic modelling was applied on the Java API classes (extracted from code snippets found in the accepted answers of these similar questions) to extract a single topic comprising of the Top‐10 Java API classes that are relevant to the query. The benchmarking of the proposed approach against state‐of‐the‐art approaches, RACK and NLP2API, by using four performance metrics show that it is possible to produce comparable API recommendation results using a less complex approach that makes use of some basic machine learning models, in particular, Doc2Vec and LDA. The approach was implemented in a Java API class recommender with an Eclipse IDE's plug‐in serving as the front‐end.