An Efficient Text Representation for Searching and Retrieving Classical Diacritical Arabic Text
Author(s) -
Saqib Hakak,
Amirrudin Kamsin,
Palaiahnakote Shivakumara,
Omar Tayan,
Mohd Yamani Idna Idris,
Gulshan Amin Gilkar
Publication year - 2018
Publication title -
procedia computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.334
H-Index - 76
ISSN - 1877-0509
DOI - 10.1016/j.procs.2018.10.470
Subject(s) - computer science , arabic , natural language processing , representation (politics) , artificial intelligence , field (mathematics) , information retrieval , linguistics , mathematics , politics , political science , pure mathematics , law , philosophy
Due to the rapid growth of the Internet and advanced technologies, data storage and extraction of Arabic diacritical data in real time from an Arabic corpus have become a vital issue in the field of information retrieval. In this paper, we propose a new idea for representing Arabic diacritic text in the corpus such that search engines can enhance the search time of retrieving the desired text with high precision. To achieve our goal, we segment the Arabic diacritical sentences/verses into individual characters along with diacritics which are necessary for interpreting the meanings. Then, we propose a new data structure for representing data using segmented alphabets. To verify the corpus representation, the proposed approach uses the Boyer-Moore algorithm for searching given verses of Arabic diacritical data. The proposed representation of data structure reduces the search time from O(m*n) to O(1+m) in the worst case, where m denotes the diacritical verse to be searched, and n denotes the total number of diacritical verses. Experimental results on popular corpus show that the proposed method outperforms the existing search methods in terms of time complexity.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom