IBFET: Index‐based features extraction technique for scalable code clone detection at file level granularity | Zendy

Akram Junaid | Zendy; Mumtaz Majid | Zendy; Luo Ping | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

IBFET: Index‐based features extraction technique for scalable code clone detection at file level granularity

Author(s) -

Akram Junaid,

Mumtaz Majid,

Luo Ping

Publication year - 2020

Publication title -

software: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.437

H-Index - 70

eISSN - 1097-024X

pISSN - 0038-0644

DOI - 10.1002/spe.2759

Subject(s) - computer science , granularity , scalability , preprocessor , search engine indexing , code (set theory) , data mining , set (abstract data type) , clone (java method) , source code , parallel computing , operating system , artificial intelligence , programming language , dna , biology , genetics

Summary Many techniques have been developed over the years to detect code clones in different software systems to maintain security measures. These techniques often require the source code to compare the subject system against a very large data set of big code. This paper presents index‐based features extraction technique (IBFET) to detect code clones at a very large‐scale level to billions of LOC at file level granularity. We performed preprocessing, indexing, and clone detection for more than 324 billion of LOC using a Hadoop distributed environment, which is quite faster and more efficient as compared to existing distributed indexing and clone detection techniques; meanwhile, it detects all three types of clones efficiently. The MapReduce rule of divide and conquer is used for a count and retrieve the similar features between different systems. We evaluated the execution time, scalability, precision, and recall of IBFET by using a well‐known clone detection data set IJaDataset and BigCloneBench; furthermore, we compared the results with other state‐of‐the‐art tools. Our approach is faster, flexible, scalable, and provides accurate results with high authenticity and can be implemented at a large‐scale level.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research