z-logo
open-access-imgOpen Access
GROTOAP2 — The Methodology of Creating a Large Ground Truth Dataset of Scientific Articles
Author(s) -
Dominika Tkaczyk,
Paweł Szostek,
Łukasz Bolikowski
Publication year - 2014
Publication title -
d-lib magazine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.208
H-Index - 52
ISSN - 1082-9873
DOI - 10.1045/november14-tkaczyk
Subject(s) - ground truth , common ground , data science , computer science , artificial intelligence , psychology , social psychology
Scientific literature analysis improves knowledge propagation and plays a key role in understanding and assessment of scholarly communication in scientific world. In recent years many tools and services for analysing the content of scientific articles have been developed. One of the most important tasks in this research area is understanding the roles of different parts of the document. It is impossible to build effective solutions for problems related to document fragments classification and evaluate their performance without a reliable test set, that contains both input documents and the expected results of classification. In this paper we present GROTOAP2 a large dataset of ground truth files containing labelled fragments of scientific articles in PDF format, useful for training and evaluation of document content analysis-related solutions. GROTOAP2 was successfully used for training CERMINE our system for extracting metadata and content from scientific articles. The dataset is based on articles from PubMed Central Open Access Subset. GROTOAP2 is published under Open Access license. The semi-automatic method used to construct GROTOAP2 is scalable and can be adjusted for building large datasets from other data sources. The article presents the content of GROTOAP2, describes the entire creation process and reports the evaluation methodology and results.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom