z-logo
open-access-imgOpen Access
Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut
Author(s) -
Nathan Schneider,
Emily Danchik,
Chris Dyer,
Noah A. Smith
Publication year - 2014
Publication title -
transactions of the association for computational linguistics
Language(s) - English
Resource type - Journals
ISSN - 2307-387X
DOI - 10.1162/tacl_a_00176
Subject(s) - computer science , discriminative model , artificial intelligence , segmentation , natural language processing , crfs , sentence , sequence labeling , identification (biology) , representation (politics) , feature (linguistics) , conditional random field , task (project management) , chunking (psychology) , pattern recognition (psychology) , linguistics , philosophy , botany , management , politics , political science , law , economics , biology
We present a novel representation, evaluation measure, and supervised models for the task of identifying the multiword expressions (MWEs) in a sentence, resulting in a lexical semantic segmentation. Our approach generalizes a standard chunking representation to encode MWEs containing gaps, thereby enabling efficient sequence tagging algorithms for feature-rich discriminative models. Experiments on a new dataset of English web text offer the first linguistically-driven evaluation of MWE identification with truly heterogeneous expression types. Our statistical sequence model greatly outperforms a lookup-based segmentation procedure, achieving nearly 60% F1 for MWE identification.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom