From French Wikipedia to Erudit: A test case for cross‐domain open information extraction | Zendy

Gotti Fabrizio | Zendy; Langlais Philippe | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

From French Wikipedia to Erudit: A test case for cross‐domain open information extraction

Author(s) -

Gotti Fabrizio,

Langlais Philippe

Publication year - 2018

Publication title -

computational intelligence

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.353

H-Index - 52

eISSN - 1467-8640

pISSN - 0824-7935

DOI - 10.1111/coin.12120

Subject(s) - computer science , classifier (uml) , pipeline (software) , information extraction , entity linking , information retrieval , open domain , domain (mathematical analysis) , natural language processing , task (project management) , artificial intelligence , named entity recognition , precision and recall , question answering , knowledge base , mathematics , programming language , mathematical analysis , management , economics

In this paper, we describe an open information extraction pipeline based on ReVerb for extracting knowledge from French text. We put it to the test by using the information triples extracted to build an entity classifier, ie, a system able to label a given instance with its type (for instance, Michel Foucault is a philosopher). The classifier requires little supervision. One novel aspect of this study is that we show how general domain information triples (extracted from French Wikipedia) can be used for deriving new knowledge from domain‐specific documents unrelated to Wikipedia, in our case scholarly articles focusing on the humanities. We believe that the present study is the first that focuses on such a cross‐domain, recall‐oriented approach in open information extraction. While our system's performance shows room for improvement, manual assessments show that the task is quite hard, even for a human, in part because of the cross‐domain aspect of the problem we tackle.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research