z-logo
open-access-imgOpen Access
A collocation extraction tool and two language resources for MSA
Author(s) -
Abdulmohsen Al-Thubaity,
Ibtehal Baazeem
Publication year - 2017
Publication title -
procedia computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.334
H-Index - 76
ISSN - 1877-0509
DOI - 10.1016/j.procs.2017.10.090
Subject(s) - computer science , collocation (remote sensing) , natural language processing , artificial intelligence , node (physics) , phrase , selection (genetic algorithm) , arabic , linguistics , machine learning , philosophy , structural engineering , engineering
Collocation extraction from corpora, whether complete or according to specific criteria, plays a significant role in computational linguistics, corpus linguistics, and natural language processing. In this paper, we present Musaheb , an Arabic collocation extraction tool that has been designed and implemented to overcome the limitations of existing collocation extraction tools. Musaheb can extract n -gram collocations (for n ≤ 5) in addition to extracting the collocates of specific word types within a window size of zero to 15 words. Moreover, it provides eight collocation statistics that may be used to calculate collocation strength, and permits the input of various constraints during node selection (that is, the determination of the word or phrase whose collocates we wish to search for) and collocate extraction. Based on the user preferences for the node, concordance, and collocate selection, Musaheb saves all nodes and their associated collocates in an XML file, allowing easy conversion to different formats. Two language resources for Modern Standard Arabic developed via Musaheb are presented in this paper: 1) a 2-gram language model, and 2) a 2-gram collocation list based on an Arabic newspaper corpus comprising more than 20 million words.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom