
Reusable Component Retrieval from a Large Repository Using Word2Vec with Continuous Bag of Words
Author(s) -
Krishna Chythanya Nagaraju,
Cherku Ramesh Kumar Reddy
Publication year - 2021
Publication title -
ingénierie des systèmes d'information/ingénierie des systèmes d'information
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.161
H-Index - 8
eISSN - 2116-7125
pISSN - 1633-1311
DOI - 10.18280/isi.260504
Subject(s) - word2vec , component (thermodynamics) , computer science , word embedding , word (group theory) , code (set theory) , process (computing) , representation (politics) , embedding , artificial intelligence , information retrieval , natural language processing , data mining , programming language , linguistics , philosophy , physics , set (abstract data type) , politics , political science , law , thermodynamics
A reusable code component is the one which can be easily used with a little or no adaptation to fit in to the application being developed. The major concern in such process is the maintenance of these reusable components in one place called ‘Repository’, so that those code components can be effectively identified as well as reused. Word embedding allows us to numerically represent our textual information. They have become so pervasive that almost all Natural Language Processing projects make use of them. In this work, we considered to use Word2Vec concept to find vector representation of features of a reusable component. The features of a reusable component in the form of sequence of words are input to Word2Vec network. Our method using Word2Vec with Continuous Bag of Words out performs existing method in the market. The proposed methodology has shown an accuracy of 94.8% in identifying the existing reusable component.