Predicting Document Coverage for Relation Extraction | Zendy

Sneha Singhania | Zendy; Simon Razniewski | Zendy; Gerhard Weikum | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Predicting Document Coverage for Relation Extraction

Author(s) -

Sneha Singhania,

Simon Razniewski,

Gerhard Weikum

Publication year - 2022

Publication title -

transactions of the association for computational linguistics

Language(s) - English

Resource type - Journals

ISSN - 2307-387X

DOI - 10.1162/tacl_a_00456

Subject(s) - computer science , relationship extraction , tuple , rank (graph theory) , relation (database) , task (project management) , information retrieval , natural language processing , artificial intelligence , language model , information extraction , predictive power , data mining , philosophy , mathematics , management , epistemology , discrete mathematics , combinatorics , economics

This paper presents a new task of predicting the coverage of a text document for relation extraction (RE): Does the document contain many relational tuples for a given entity? Coverage predictions are useful in selecting the best documents for knowledge base construction with large input corpora. To study this problem, we present a dataset of 31,366 diverse documents for 520 entities. We analyze the correlation of document coverage with features like length, entity mention frequency, Alexa rank, language complexity, and information retrieval scores. Each of these features has only moderate predictive power. We employ methods combining features with statistical models like TF-IDF and language models like BERT. The model combining features and BERT, HERB, achieves an F1 score of up to 46%. We demonstrate the utility of coverage predictions on two use cases: KB construction and claim refutation.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research