A Bootstrapping-based Method to Automatically Identify Data-usage Statements in Publications
Author(s) -
Qiuzi Zhang,
Qikai Cheng,
Yong Huang,
Wei Lu
Publication year - 2016
Publication title -
journal of data and information science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.605
H-Index - 8
eISSN - 2543-683X
pISSN - 2096-157X
DOI - 10.20309/jdis.201606
Subject(s) - computer science , bootstrapping (finance) , data extraction , data mining , task (project management) , selection (genetic algorithm) , representation (politics) , information extraction , information retrieval , originality , artificial intelligence , management , medline , politics , creativity , political science , financial economics , law , economics
Purpose Our study proposes a bootstrapping-based method to automatically extract data-usage statements from academic texts. Design/methodology/approach The method for data-usage statements extraction starts with seed entities and iteratively learns patterns and data-usage statements from unlabeled text. In each iteration, new patterns are constructed and added to the pattern list based on their calculated score. Three seed-selection strategies are also proposed in this paper. Findings The performance of the method is verified by means of experiments on real data collected from computer science journals. The results show that the method can achieve satisfactory performance regarding precision of extraction and extensibility of obtained patterns. Research limitations While the triple representation of sentences is effective and efficient for extracting data-usage statements, it is unable to handle complex sentences. Additional features that can address complex sentences should thus be explored in the future. Practical implications Data-usage statements extraction is beneficial for data-repository construction and facilitates research on data-usage tracking, dataset-based scholar search, and dataset evaluation. Originality/value To the best of our knowledge, this paper is among the first to address the important task of automatically extracting data-usage statements from real data.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom