
Development of Cross Language Clone Detector for C, C & Java Repositories using Natural Language Processing
Author(s) -
Sanjay B. Ankali,
Latha Parthiban
Publication year - 2019
Publication title -
international journal of engineering and advanced technology
Language(s) - English
Resource type - Journals
ISSN - 2249-8958
DOI - 10.35940/ijeat.b3612.129219
Subject(s) - porting , computer science , programming language , java , software , software maintenance , code (set theory) , java annotation , java modeling language , natural language , operating system , natural language processing , software development , real time java , set (abstract data type)
Reusing the code with or without modification is common process in building all the large codebases of system software like Linux, gcc , and jdk. This process is referred to as software cloning or forking. Developers always find difficulty of bug fixes in porting large code base from one language to other native language during software porting. There exist many approaches in identifying software clones of same language that may not contribute for the developers involved in porting hence there is a need for cross language clone detector. This paper uses primary Natural Language Processing (NLP) approach using latent semantic analysis to find the cross language clones of other neighboring languages in terms of all 4 types of clones using latent semantic analysis algorithm that uses Singular value decomposition. It takes input as code(C, C++ or Java) and matches all the neighboring code clones in the static repository in terms of frequency of lines matched.