Gene name errors: Lessons not learned | Zendy

Mandhri Abeysooriya | Zendy; Megan Soria | Zendy; Mary Sravya Kasu | Zendy; Mark Ziemann | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Gene name errors: Lessons not learned

Author(s) -

Mandhri Abeysooriya,

Megan Soria,

Mary Sravya Kasu,

Mark Ziemann

Publication year - 2021

Publication title -

plos computational biology

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 2.628

H-Index - 182

eISSN - 1553-7358

pISSN - 1553-734X

DOI - 10.1371/journal.pcbi.1008984

Subject(s) - gene nomenclature , computer science , point (geometry) , gene prediction , gene , software , computational biology , data mining , bioinformatics , biology , genetics , genome , programming language , mathematics , taxonomy (biology) , botany , geometry , nomenclature

Erroneous conversion of gene names into other dates and other data types has been a frustration for computational biologists for years. We hypothesized that such errors in supplementary files might diminish after a report in 2016 highlighting the extent of the problem. To assess this, we performed a scan of supplementary files published in PubMed Central from 2014 to 2020. Overall, gene name errors continued to accumulate unabated in the period after 2016. An improved scanning software we developed identified gene name errors in 30.9% (3,436/11,117) of articles with supplementary Excel gene lists; a figure significantly higher than previously estimated. This is due to gene names being converted not just to dates and floating-point numbers, but also to internal date format (five-digit numbers). These findings further reinforce that spreadsheets are ill-suited to use with large genomic data.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research