pGenN, a Gene Normalization Tool for Plant Genes and Proteins in Scientific Literature | Zendy

Ruoyao Ding | Zendy; Cecilia Arighi | Zendy; JungYoun Lee | Zendy; Cathy Wu | Zendy; K. VijayShanker | Zendy

AI Assistant Blog Pricing

Open Access

pGenN, a Gene Normalization Tool for Plant Genes and Proteins in Scientific Literature

Author(s) -

Ruoyao Ding,

Cecilia Arighi,

JungYoun Lee,

Cathy Wu,

K. VijayShanker

Publication year - 2015

Publication title -

plos one

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.99

H-Index - 332

ISSN - 1932-6203

DOI - 10.1371/journal.pone.0135305

Subject(s) - normalization (sociology) , gene nomenclature , computer science , annotation , information retrieval , gene annotation , heuristics , gene , named entity recognition , computational biology , natural language processing , data mining , database , bioinformatics , artificial intelligence , genome , biology , genetics , taxonomy (biology) , botany , management , sociology , anthropology , nomenclature , economics , task (project management) , operating system

Background Automatically detecting gene/protein names in the literature and connecting them to databases records, also known as gene normalization, provides a means to structure the information buried in free-text literature. Gene normalization is critical for improving the coverage of annotation in the databases, and is an essential component of many text mining systems and database curation pipelines. Methods In this manuscript, we describe a gene normalization system specifically tailored for plant species, called pGenN ( p ivot-based Gen e N ormalization). The system consists of three steps: dictionary-based gene mention detection, species assignment, and intra species normalization. We have developed new heuristics to improve each of these phases. Results We evaluated the performance of pGenN on an in-house expertly annotated corpus consisting of 104 plant relevant abstracts. Our system achieved an F-value of 88.9% (Precision 90.9% and Recall 87.2%) on this corpus, outperforming state-of-art systems presented in BioCreative III. We have processed over 440,000 plant-related Medline abstracts using pGenN. The gene normalization results are stored in a local database for direct query from the pGenN web interface (proteininformationresource.org/pgenn/). The annotated literature corpus is also publicly available through the PIR text mining portal (proteininformationresource.org/iprolink/).

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom

About

About Careers Publisher Partners Contact Us Our institutional solutions Get Organisational Trial or Quote

Learn

FAQs Blog Terms of Use Privacy Policy

Download the Zendy App

Discover

Explore

Home ZAIA Blog