Fast and Robust Wrapper Method for $N$ -gram Feature Template Induction in Structured Prediction | Zendy

Yulin Ren | Zendy; Dehua Li | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Fast and Robust Wrapper Method for $N$ -gram Feature Template Induction in Structured Prediction

Author(s) -

Yulin Ren,

Dehua Li

Publication year - 2017

Publication title -

ieee access

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.587

H-Index - 127

ISSN - 2169-3536

DOI - 10.1109/access.2017.2753832

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

N-gram feature templates that consider consecutive contextual information comprise a family of important feature templates used in structured prediction. Some previous studies considered the n-gram feature selection problem but they focused on one or several types of features in certain tasks, e.g., consecutive words in a text categorization task. In this paper, we propose a fast and robust bottom-up wrapper method for automatically inducing n-gram feature templates, which can induce any type of n-gram feature for any structured prediction task. According to the significance distribution for n-gram feature templates based on the n-gram and bias (offset), the proposed method first determines the n-gram that achieves the best tradeoff between the severity of the sparse data problem with n-gram feature templates and the richness of the corresponding contextual information, before combining the best n-gram with lower-order gram templates in an extremely efficient manner. In addition, our method uses a template pair, i.e., the two symmetrical templates, rather than a template as the basic unit (i.e., including or excluding a template pair rather than a template). Thus, when the data in the training set change slightly, our method is robust to this fluctuation, thereby providing a more consistent induction result compared with the template-based method. The experimental results obtained for three tasks, i.e., Chinese word segmentation, named entity recognition, and text chunking, demonstrated the effectiveness, efficiency, and robustness of the proposed method.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research