An intelligent decision support system for software plagiarism detection in academia | Zendy

Ullah Farhan | Zendy; Jabbar Sohail | Zendy; Mostarda Leonardo | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

An intelligent decision support system for software plagiarism detection in academia

Author(s) -

Ullah Farhan,

Jabbar Sohail,

Mostarda Leonardo

Publication year - 2021

Publication title -

international journal of intelligent systems

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.291

H-Index - 87

eISSN - 1098-111X

pISSN - 0884-8173

DOI - 10.1002/int.22399

Subject(s) - computer science , source code , plagiarism detection , feature (linguistics) , semantics (computer science) , lexical analysis , weighting , cosine similarity , noise (video) , latent semantic analysis , information retrieval , artificial intelligence , data mining , machine learning , programming language , pattern recognition (psychology) , medicine , philosophy , linguistics , radiology , image (mathematics)

The act of source code plagiarism is an academic offense that discourages the learning habits of students. Online support is available through which students can hire professional developers to code their regular programming tasks. These facilities make it easier for students to practice plagiarism. First, raw source codes are cleaned from noisy data to extract meaningful codes as the actual logic is more important to the programmers. Second, pre‐processing techniques based on tokenization are used to convert filtered codes into meaningful tokens. It breaks the codes into small instances with the number of occurrences known as the frequency. Thirdly, the local and global weighting scheme method is applied to estimate the significance of each feature in an individual or a group of documents. It helps us greatly to zoom in on the importance of each feature of how effective it is for the next phase. Fourth, the single value decomposition method is used to reduce the dimensions of these features by maintaining the actual semantics of the source codes. This technique is used to remove overloaded noise information and collect only those features that are more effective for plagiarism detection. Fifth, the latent semantic analysis (LSA) technique is used to mine the actual semantics of the source codes in the form of latent variables. After that, the LSA features are used as input to cosine similarity to compute the plagiarism among different source codes. To validate the proposed approach, we used the topic modeling approach to group the relevant features into different topics.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore