z-logo
Premium
Large‐scale inter‐system clone detection using suffix trees and hashing
Author(s) -
Koschke Rainer
Publication year - 2014
Publication title -
journal of software: evolution and process
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.371
H-Index - 29
eISSN - 2047-7481
pISSN - 2047-7473
DOI - 10.1002/smr.1592
Subject(s) - computer science , code (set theory) , scalability , data mining , hash function , domain (mathematical analysis) , suffix , index (typography) , source code , filter (signal processing) , relevance (law) , tree (set theory) , artificial intelligence , machine learning , information retrieval , database , programming language , mathematical analysis , linguistics , philosophy , mathematics , set (abstract data type) , political science , law , computer vision
SUMMARY Detecting a similar code between two systems has various applications such as comparing two software variants or versions or finding potential license violations. Techniques detecting suspiciously similar code must scale in terms of resources needed to very large code corpora and need to have high precision because a human needs to inspect the results. This paper demonstrates how suffix trees can be used to obtain a scalable comparison. The evaluation is carried out for very large code corpora. Our evaluation shows that our approach is faster than index‐based techniques when the analysis is run only once. If the analysis is to be conducted multiple times, creating an index pays off. We report how much code can be filtered out from the analysis using an index‐based filter. In addition to that, this paper proposes a method to improve precision through user feedback. A user validates a sample of the found clone candidates. An automated data mining technique learns a decision tree on the basis of the user decisions and different code metrics. We investigate the relevance of several metrics and whether criteria learned from one application domain can be generalized to other domains. Copyright © 2013 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here