z-logo
Premium
Machine vision automated species identification scaled towards production levels
Author(s) -
FAVRET COLIN,
SIERACKI JEFFREY M.
Publication year - 2016
Publication title -
systematic entomology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.552
H-Index - 66
eISSN - 1365-3113
pISSN - 0307-6970
DOI - 10.1111/syen.12146
Subject(s) - biology , identification (biology) , classifier (uml) , artificial intelligence , support vector machine , taxon , machine learning , scalability , pattern recognition (psychology) , computer science , ecology , database
Computer‐automated identification of insect species has long been sought to support activities such as environmental monitoring, forensics, pest diagnostics, border security and vector epidemiology, to name just a few. In order to succeed, an automated identification programme capable of addressing the needs of the end user should be able to classify hundreds of taxa, if not thousands, and is expected to distinguish closely related and hence morphologically similar species. However, it remains unknown how automated identification methods might handle an increase in data quantity, be it in reference imagery or taxonomic diversity. We sought to test the scalability of an automated identification method in terms of the number of reference specimens used to train the classifier and the number of taxa into which the classifier should assign unknown specimens. Is there an optimal number of reference images, where the cost of acquiring more images becomes greater than the marginal increase in identification success? Does increasing taxonomic diversity affect identification success, whether negatively or positively? In order to test the scalability of the automated insect identification enterprise, we used a sparse processing technique and support vector machine to test the largest dataset to date: 72 species of fruit flies ( D iptera: T ephritidae) and 76 species of mosquitoes ( D iptera: C ulicidae). We found that: (i) machine vision methods are capable of correctly classifying large numbers of closely related species; (ii) when the misclassification of a specimen occurs at the species level, it is often classified in the correct genus; (iii) classification success increases asymptotically as new training images are added to the dataset; (iv) broad taxon sampling outside a focal group can increase classification success within it.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here