z-logo
Premium
An intelligent decision support system for software plagiarism detection in academia
Author(s) -
Ullah Farhan,
Jabbar Sohail,
Mostarda Leonardo
Publication year - 2021
Publication title -
international journal of intelligent systems
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.291
H-Index - 87
eISSN - 1098-111X
pISSN - 0884-8173
DOI - 10.1002/int.22399
Subject(s) - computer science , source code , plagiarism detection , feature (linguistics) , semantics (computer science) , lexical analysis , weighting , cosine similarity , noise (video) , latent semantic analysis , information retrieval , artificial intelligence , data mining , machine learning , programming language , pattern recognition (psychology) , medicine , philosophy , linguistics , radiology , image (mathematics)
The act of source code plagiarism is an academic offense that discourages the learning habits of students. Online support is available through which students can hire professional developers to code their regular programming tasks. These facilities make it easier for students to practice plagiarism. First, raw source codes are cleaned from noisy data to extract meaningful codes as the actual logic is more important to the programmers. Second, pre‐processing techniques based on tokenization are used to convert filtered codes into meaningful tokens. It breaks the codes into small instances with the number of occurrences known as the frequency. Thirdly, the local and global weighting scheme method is applied to estimate the significance of each feature in an individual or a group of documents. It helps us greatly to zoom in on the importance of each feature of how effective it is for the next phase. Fourth, the single value decomposition method is used to reduce the dimensions of these features by maintaining the actual semantics of the source codes. This technique is used to remove overloaded noise information and collect only those features that are more effective for plagiarism detection. Fifth, the latent semantic analysis (LSA) technique is used to mine the actual semantics of the source codes in the form of latent variables. After that, the LSA features are used as input to cosine similarity to compute the plagiarism among different source codes. To validate the proposed approach, we used the topic modeling approach to group the relevant features into different topics.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here