Premium
Treebank‐Based Probabilistic Phrase Structure Parsing
Author(s) -
Cahill Aoife
Publication year - 2008
Publication title -
language and linguistics compass
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.619
H-Index - 44
ISSN - 1749-818X
DOI - 10.1111/j.1749-818x.2007.00046.x
Subject(s) - treebank , computer science , natural language processing , artificial intelligence , parsing , probabilistic logic , phrase , rule based machine translation , context (archaeology) , computational linguistics , string (physics) , mathematics , paleontology , biology , mathematical physics
The area of probabilistic phrase structure parsing has been a central and active field in computational linguistics. Stochastic methods in natural language processing, in general, have become very popular as more and more resources become available. One of the main advantages of probabilistic parsing is in disambiguation: it is useful for a parsing system to return a ranked list of potential syntactic analyses for a string. In this article, we introduce probabilistic context‐free grammars (PCFGs) and outline some of their strengths and weaknesses. We concentrate on the automatic extraction of stochastic grammars from treebanks (large collections of hand‐corrected syntactic structures). We describe the current state of the field and the current research on improving the basic PCFG model. This includes lexicalized, history‐based and generative models. Finally, we briefly mention some research into probabilistic phrase structure parsing for domains other than traditional treebank text and languages other than English (Chinese, Arabic, German and French).