Splitting Arabic Texts into Elementary Discourse Units | Zendy

Iskandar Keskes | Zendy; Farah Benamara | Zendy; Lamia Hadrich Belguith | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Splitting Arabic Texts into Elementary Discourse Units

Author(s) -

Iskandar Keskes,

Farah Benamara,

Lamia Hadrich Belguith

Publication year - 2014

Publication title -

acm transactions on asian language information processing

Language(s) - English

Resource type - Journals

eISSN - 1558-3430

pISSN - 1530-0226

DOI - 10.1145/2601401

Subject(s) - treebank , computer science , natural language processing , artificial intelligence , punctuation , feature (linguistics) , scheme (mathematics) , set (abstract data type) , representation (politics) , segmentation , annotation , process (computing) , parsing , newspaper , arabic , linguistics , mathematics , mathematical analysis , philosophy , politics , political science , advertising , law , business , programming language , operating system

International audienceIn this article, we propose the first work that investigates the feasibility of Arabic discourse segmentation into elementary discourse units within the segmented discourse representation theory framework. We first describe our annotation scheme that defines a set of principles to guide the segmentation process. Two corpora have been annotated according to this scheme: elementary school textbooks and newspaper documents extracted from the syntactically annotated Arabic Treebank. Then, we propose a multiclass supervised learning approach that predicts nested units. Our approach uses a combination of punctuation, morphological, lexical, and shallow syntactic features. We investigate how each feature contributes to the learning process. We show that an extensive morphological analysis is crucial to achieve good results in both corpora. In addition, we show that adding chunks does not boost the performance of our system

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research