Targeting precision: A hybrid scientific relation extraction pipeline for improved scholarly knowledge organization | Zendy

Jiang Ming | Zendy; D'Souza Jennifer | Zendy; Auer Sören | Zendy; Downie J. Stephen | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Targeting precision: A hybrid scientific relation extraction pipeline for improved scholarly knowledge organization

Author(s) -

Jiang Ming,

D'Souza Jennifer,

Auer Sören,

Downie J. Stephen

Publication year - 2020

Publication title -

proceedings of the association for information science and technology

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.193

H-Index - 14

ISSN - 2373-9231

DOI - 10.1002/pra2.303

Subject(s) - computer science , relationship extraction , relation (database) , classifier (uml) , pipeline (software) , precision and recall , information retrieval , benchmark (surveying) , knowledge graph , data science , f1 score , information extraction , artificial intelligence , natural language processing , data mining , geodesy , programming language , geography

Knowledge graphs have been successfully built from unstructured texts in general domains such as newswire by leveraging distant supervision relation signals from linked data repositories such as DBpedia. In contrast, the lack of a comprehensive ontology of scholarly relations makes it difficult to similarly adopt distant supervision to create knowledge graphs over scholarly articles. In light of this difficulty, we propose a hybrid approach to extract scientific concept relations from scholarly publications by: (a) utilizing syntactic rules as a form of distant supervision to link related scientific term pairs; and (b) training a classifier to further identify the relation type per pair. Our system targets a high‐precision performance objective as opposed to high recall, aiming to reduce the noisy results albeit at the cost of extracting fewer relations when building scholarly knowledge graphs over massive‐scale publications. Results on two benchmark datasets show that our hybrid system surpasses the state‐of‐the‐art with an overall 60% F1 score led by the nearly 15% precision boost in identifying related scientific concepts. We further achieved an overall F1 in the range 34.1% to 51.2%, on relation classification, per experimental dataset.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research