Premium
Uncovering source code reuse in large‐scale academic environments
Author(s) -
Flores Enrique,
BarrónCedeño Alberto,
Moreno Lidia,
Rosso Paolo
Publication year - 2015
Publication title -
computer applications in engineering education
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.478
H-Index - 29
eISSN - 1099-0542
pISSN - 1061-3773
DOI - 10.1002/cae.21608
Subject(s) - reuse , computer science , source code , identifier , code (set theory) , code reuse , software engineering , obfuscation , plagiarism detection , scale (ratio) , data science , world wide web , programming language , information retrieval , software , computer security , engineering , physics , set (abstract data type) , quantum mechanics , waste management
The advent of the Internet has caused an increase in content reuse, including source code. The purpose of this research is to uncover potential cases of source code reuse in large‐scale environments. A good example is academia, where massive courses are taught to students who must demonstrate that they have acquired the knowledge. The need of detecting content reuse in quasi real‐time encourages the development of automatic systems such as the one described in this paper for source code reuse detection. Our approach is based on the comparison of programs at character level. It is able to find potential cases of reuse across a huge number of assignments. It achieved better results than JPlag, the most used online system to find similarities among multiple sets of source codes. The most common obfuscation operations we found were changes in identifier names, comments and indentation. © 2014 Wiley Periodicals, Inc. Comput Appl Eng Educ 23:383–390, 2015; View this article online at wileyonlinelibrary.com/journal/cae ; DOI 10.1002/cae.21608