z-logo
open-access-imgOpen Access
Hybrid method for automatic extraction of multiword expressions
Author(s) -
Shaishav Agrawal,
Ratna Sanyal,
Sudip Sanyal
Publication year - 2018
Publication title -
international journal of engineering and technology
Language(s) - English
Resource type - Journals
ISSN - 2227-524X
DOI - 10.14419/ijet.v7i2.6.10063
Subject(s) - computer science , similarity (geometry) , artificial intelligence , natural language processing , context (archaeology) , association (psychology) , dice , pattern recognition (psychology) , latent semantic analysis , speech recognition , mathematics , statistics , paleontology , philosophy , epistemology , image (mathematics) , biology
A three phase hybrid method for automatic extraction of English multiword expressions (MWEs) has been proposed. The proposed method is based on linguistic patterns, association and context similarity between constituent words of the MWEs. First, the expressions are extract-ed in the form of N-grams from the raw text and then filtered using well defined linguistic patterns. Next, these expressions are again fil-tered using association score and context similarity score between their constituent words. Two association measures, Dice’s coefficient and PMI have been used for calculating the association score. The context similarity between words has been calculated using Latent Semantic Analysis (LSA) method. The problem of deciding the best value for the cut-off boundary thresholds in statistical methods is quite common. A two phase method of deciding the boundary threshold, using training dataset, has been proposed and employed in the current work. De-tailed performance analysis has been done on manually annotated dataset. The significant gain in performance has been observed for various types of multiword expressions.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here