Arabic Text Copy Detection using Full, Reduced and Unique Syntactical Structures | Zendy

Mohamed Taybe | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Arabic Text Copy Detection using Full, Reduced and Unique Syntactical Structures

Author(s) -

Mohamed Taybe

Publication year - 2016

Publication title -

international journal of computer applications

Language(s) - English

Resource type - Journals

ISSN - 0975-8887

DOI - 10.5120/ijca2016912088

Subject(s) - computer science , arabic , natural language processing , artificial intelligence , information retrieval , linguistics , philosophy

This paper reports on work performed to investigate the use of a combined Part of Speech (POS) tagging and a minimum edit operations algorithm to determine the level of similarity between pairs of Arabic text documents. The level of similarity can be used as an indication of duplication in full or in part of the document's content. Text is first converted into POS tags that are then fed to the string similarity algorithm to determine the similarity of pairs of documents. A normalized score is calculated and used to rank documents. Documents ranked higher than some selected threshold are considered similar and can be near or complete duplicate. The performed experiments compare results based on the use of a set of selected common subsequences that are the results of translation of text into a sequence of syntactical units. The strings are first produced using full-text (FULL). These are further refined to produce a REDUCED; where repeated consecutive characters are reduced to a single character and a number, and more refined to produce a UNIQUE string; where all repeating characters are replaced by a single character. Syntactical features of the text were used as a structural representation of the documents' content. Results obtained from the experiments using the

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research