z-logo
open-access-imgOpen Access
Named Entity Recognition for Bacterial Type IV Secretion Systems
Author(s) -
Sophia Ananiadou,
Dan Sullivan,
William J. Black,
Gina-Anne Levow,
Joseph J. Gillespie,
Chunhong Mao,
Sampo Pyysalo,
BalaKrishna Kolluru,
Jun’ichi Tsujii,
Bruno Sobral
Publication year - 2011
Publication title -
plos one
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.99
H-Index - 332
ISSN - 1932-6203
DOI - 10.1371/journal.pone.0014780
Subject(s) - terminology , named entity recognition , function (biology) , computer science , computational biology , biology , gene nomenclature , identification (biology) , artificial intelligence , genetics , nomenclature , taxonomy (biology) , linguistics , philosophy , botany , management , economics , task (project management)
Research on specialized biological systems is often hampered by a lack of consistent terminology, especially across species. In bacterial Type IV secretion systems genes within one set of orthologs may have over a dozen different names. Classifying research publications based on biological processes, cellular components, molecular functions, and microorganism species should improve the precision and recall of literature searches allowing researchers to keep up with the exponentially growing literature, through resources such as the Pathosystems Resource Integration Center (PATRIC, patricbrc.org). We developed named entity recognition (NER) tools for four entities related to Type IV secretion systems: 1) bacteria names, 2) biological processes, 3) molecular functions, and 4) cellular components. These four entities are important to pathogenesis and virulence research but have received less attention than other entities, e.g., genes and proteins. Based on an annotated corpus, large domain terminological resources, and machine learning techniques, we developed recognizers for these entities. High accuracy rates (>80%) are achieved for bacteria, biological processes, and molecular function. Contrastive experiments highlighted the effectiveness of alternate recognition strategies; results of term extraction on contrasting document sets demonstrated the utility of these classes for identifying T4SS-related documents.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom