Mining chemical patents with an ensemble of open systems
Author(s) -
Robert Leaman,
Chih-Hsuan Wei,
Cherry Zou,
Zhiyong Lu
Publication year - 2016
Publication title -
database
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.406
H-Index - 62
ISSN - 1758-0463
DOI - 10.1093/database/baw065
Subject(s) - named entity recognition , computer science , task (project management) , recall , natural language processing , training set , biomedical text mining , precision and recall , f1 score , artificial intelligence , information retrieval , text mining , engineering , philosophy , linguistics , systems engineering
The significant amount of medicinal chemistry information contained in patents makes them an attractive target for text mining. In this manuscript, we describe systems for named entity recognition (NER) of chemicals and genes/proteins in patents, using the CEMP (for chemicals) and GPRO (for genes/proteins) corpora provided by the CHEMDNER task at BioCreative V. Our chemical NER system is an ensemble of five open systems, including both versions of tmChem, our previous work on chemical NER. Their output is combined using a machine learning classification approach. Our chemical NER system obtained 0.8752 precision and 0.9129 recall, for 0.8937 f-score on the CEMP task. Our gene/protein NER system is an extension of our previous work for gene and protein NER, GNormPlus. This system obtained a performance of 0.8143 precision and 0.8141 recall, for 0.8137 f-score on the GPRO task. Both systems achieved the highest performance in their respective tasks at BioCreative V. We conclude that an ensemble of independently-created open systems is sufficiently diverse to significantly improve performance over any individual system, even when they use a similar approach.Database URL: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom