
Recognition of Latin scientific names using artificial neural networks
Author(s) -
Little Damon P.
Publication year - 2020
Publication title -
applications in plant sciences
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.64
H-Index - 23
ISSN - 2168-0450
DOI - 10.1002/aps3.11378
Subject(s) - executable , search engine indexing , computer science , bloom filter , artificial neural network , information retrieval , classifier (uml) , artificial intelligence , natural language processing , computer network , operating system
Premise The automated recognition of Latin scientific names within vernacular text has many applications, including text mining, search indexing, and automated specimen‐label processing. Most published solutions are computationally inefficient, incapable of running within a web browser, and focus on texts in English, thus omitting a substantial portion of biodiversity literature. Methods and Results An open‐source browser‐executable solution, Quaesitor, is presented here. It uses pattern matching (regular expressions) in combination with an ensembled classifier composed of an inclusion dictionary search (Bloom filter), a trio of complementary neural networks that differ in their approach to encoding text, and word length to automatically identify Latin scientific names in the 16 most common languages for biodiversity articles. Conclusions In combination, the classifiers can recognize Latin scientific names in isolation or embedded within the languages used for >96% of biodiversity literature titles. For three different data sets, they resulted in a 0.80–0.97 recall and a 0.69–0.84 precision at a rate of 8.6 ms/word.