z-logo
open-access-imgOpen Access
Algorithms for Minimum Risk Chunking
Author(s) -
Martin Jansche
Publication year - 2006
Publication title -
lecture notes in computer science
Language(s) - English
Resource type - Book series
SCImago Journal Rank - 0.249
H-Index - 400
eISSN - 1611-3349
pISSN - 0302-9743
DOI - 10.1007/11780885_11
Subject(s) - substring , computer science , decoding methods , chunking (psychology) , algorithm , lexical analysis , measure (data warehouse) , automaton , string (physics) , artificial intelligence , theoretical computer science , data mining , mathematics , data structure , mathematical physics , programming language
Stochastic finite automata are useful for identifying sub- strings (chunks) within larger units of text. Relevant applications include tokenization, base-NP chunking, named entity recognition, and other in- formation extraction tasks. For a given input string, a stochastic automa- ton represents a probability distribution over strings of labels encoding the location of chunks. For chunking and extraction tasks, the quality of predictions is evaluated in terms of precision and recall of the chunked/ extracted phrases when compared against some gold standard. However, traditional methods for estimating the parameters of a stochastic finite automaton and for decoding the best hypothesis do not pay attention to the evaluation criterion, which we take to be the well-known F-measure. We are interested in methods that remedy this situation, both in training and decoding. Our main result is a novel algorithm for efficiently eval- uating expected F-measure. We present the algorithm and discuss its applications for utility/risk-based parameter estimation and decoding.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom