Sparse Information Extraction: Unsupervised Language Models to the Rescue | Zendy

Doug Downey | Zendy; Stefan Schoenmackers | Zendy; Oren Etzioni | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Sparse Information Extraction: Unsupervised Language Models to the Rescue

Author(s) -

Doug Downey,

Stefan Schoenmackers,

Oren Etzioni

Publication year - 2007

Publication title -

citeseer x (the pennsylvania state university)

Language(s) - English

Resource type - Reports

DOI - 10.21236/ada534427

Subject(s) - realm , computer science , correctness , language model , relationship extraction , scalability , artificial intelligence , information extraction , natural language processing , n gram , machine learning , algorithm , database , political science , law

Even in a massive corpus such as the Web, a substantial fraction of extractions appear infrequently. This paper shows how to assess the correctness of sparse extractions by utilizing unsupervised language models. The REALM system, which combines HMMbased and n-gram-based language models, ranks candidate extractions by the likelihood that they are correct. Our experiments show that REALM reduces extraction error by 39%, on average, when compared with previous work. Because REALM pre-computes language models based on its corpus and does not require any hand-tagged seeds, it is far more scalable than approaches that learn models for each individual relation from handtagged data. Thus, REALM is ideally suited for open information extraction where the relations of interest are not specified in advance and their number is potentially vast.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research